Registration

Overview

Teaching: 40 min
Exercises: 10 min
Questions
  • What is a data repository?

  • What are types of data repositories?

  • Why should you upload your data to a data repository?

  • How to choose the right database for your dataset?

Objectives
  • Define what is data repository.

  • Illustrate the importance of indexed data repository

  • Summarize the steps of data indexing in a searchable repository

What is a data repository?

It is a general term used to describe any storage space you use to deposit data, metadata and any associated research. Kindly note that database is more specific and it is mainly for the storage of your data.

Types of data repository

Data repositories are classified based on the purpose of data repository into:

A) Controlled access repository for sensitive data: explained in details in data sharing lesson of RDMkit and we will explain this type of repository in the next episode

B) Discipline specific repository: there are known repository for different data types e.g Arrayexpress for high-throughput functional genomics experiments

C) Institutional repository: In case you can not find suitable repository for your data set, some universities have their own general purpose repositories. For instance, University of Reading Research Data Archive is a general purpose repository that have similar features e.g. controlled access … etc to other databases. It can be used for students and researchers.

D) General data repository: these are usually for data that have no public repositories e.g. Zenodo

Figure 1 summarizes these types with different examples

Figure 1 Types of data repository with different examples, CC.BY from re3data.org

Why should you upload your data to a data repository?

To ensure data findability, your data should be uploaded to a public repository where it can be searched and found, This will make it comply with the fourth principle of findability (F4) which states that (Meta)data are registered or indexed in a searchable resource. Examples of these databases are ArrayExpress for high-throughput functional genomics experiments. These databases have a set of rules in place to make sure that your data will be FAIR. After you upload your data into this database, they are assigned an ID and are indexed. Indexing helps researchers find your data by using persistent identifiers, keyword or even the name of researcher.

Take a look at the ArrayExpress database where all datasets are indexed, and you can simply find any dataset using the search tools. By indexing data, you can get the dataset using any keyword other than the PID. For example, if you want to locate human NSCL cell lines, you can just type this into the search toolbox, use different keywords like cartilage, stem cells and oesteoarthritis, and you will find the same dataset. Indexing and registering datasets, also means they are curated in such a way that you may discover them using different keywords.

For example, you can find the same dataset by using its identifiers or by using keywords chosen by the dataset’s authors to describe it.

When you upload your dataset to a database, it can be curated and easily found using different keywords

By indexing your dataset, you can retrieve it using its PID

Exercise 1. How to index your dataset?

One of the things you can do to index your dataset, is to upload it to Zenodo, can you use one of the resources we recommended before to know how to do this? RDMkit, FAIRcookbook, FAIRsharing

Solution

Since you want a technical guideline, FAIRcookbook and RDMkit are the best to start with. We will start with FAIRcookbook First of all, let’s understand the structure of the FAIRCookbook. For a quick overview, you can watch our RDMBites on FAIRcookbook FAIRcookbook RDMBites

The building unit of FAIR cookbook is called a recipe, The recipe is the term used to describe instructions for how to FAIRify your data. As you see in the image, the structure of each recipe includes these main items Figure 2: 1- Graphical overview which is the mindmap for the recipe 2- Ingredients which gives you an idea for the skills needed and tools you can use to apply the recipes 3- The steps and the process 4- Recommendations of what to read next and references to your reading Figure 2. FAIRcookbook recipes structure

As we explained the structure of the recipe so let’s look for the suitable recipe in the FAIRcookbook So as you navigate the homepage of FAIRcookbook, you will find different tabs that covers each of FAIR principles, so for instance, if you want recipes on Accessibility of FAIR, you will find all recipes that can help you make your data accessible.

  • Follow the following steps to find the recipe:

1- In this exercise, we are looking for a recipe on indexing or registering dataset in a searchable resource which you can find it in the findability tab, Can you find it in this picture? Figure 3. Recipes of FAIRcookbook where you will find different recipes for FAIR, infrastructure, assessment and 
maturity models

2- Click on the findability tab

3- on the left side, you will find a navigation bar which will help you find different recipes that make your data findable. You can find on the left side the list of recipes to make your data findable

4- As you can see here, you will find a recipe on registering datasets with Wikidata and another one on depositing to generic repositories-Zenodo use case Once you click on one of these resources, you will find the following:

A) Requirements to apply the recipe to your dataset B) The instructions C) References and further readings B) Authors and licence Figure 4. Zenodo use case where you will get step by step guideline on how to deposit your data to Zenodo

In our specialized courses, we will give you examples on how to upload your data to discipline specific repository

Uploading your data to a database will make your data visible through the following:

1- Databases assign a unique persistent identifier to your data.

2- Your data will be indexed and curated, making it easier to find.

3- Some databases make it simple to connect your dataset to other datasets and link metadata to other dataset linked metadata

4- Dataset licencing, with some databases offering controlled or limited access to protect your data.

By uploading data to a database, you comply with the following FAIR principles

How to choose the right database for your dataset?

University of Reading provides an overview of the necessary criteria to choose a data repository. We can summarize it in the following bullet points:

A) Accessibility options

B) Licence

Exercise 1. How to choose the right dataset?

You are a researcher in plant sciences and want to know what are the available databases for plant genomes?

Solution

It is the time to introduce you to FAIRsharing, an important resource for metadata standards, databases and policies. The FAIRsharing is an important resource for researchers to help them identify the suitable repositories, standards and databases for their data. It also contains the latest policies from from governments, funders and publishers for FAIRer data. In the following short video, you can find that plant ensembl is the one you can use for the plant genes Screen recording showing the search process in FAIRsharing

Resources

Our resources provide an overview of data repositories and examples

The FAIR cookbook and RDMkit both provide excellent instructions for uploading your data into databases:

Key Points

  • {“This episode covers the following FAIR principles”=>nil}

  • (Meta)data are registered or indexed in a searchable resource (F4)

  • (Meta)data are released with a clear and accessible data usage licence (R1.1)