By: Nancy J. Hoebelheinrich, Knowledge Motifs LLC
You have undoubtedly heard about the “FAIR Data Principles” (FAIR) as you’re seeking funding, connecting with colleagues, and doing your research; however, you may not have had time to find out much about what these Principles are or why they matter for your research. Indeed, you may well be asking what’s the big deal, and why should I care about FAIR when I’ve got so much to do right now to manage my data?
This piece describes a few of the advantages for incorporating FAIR Principles into your work, especially in the earlier stages of the research lifecycle. I describe what FAIR is about; who is applying pressure on researchers to pay attention to FAIR; why to pay attention to FAIR-- i.e., what is the value proposition; how to incorporate at least one important component of FAIR into your research workflow; and where you can find help to do this.
What: FAIR stands for data that are Findable, Accessible, Interoperable, and Reusable. Findable data can be discovered via commonly used search engines such as those of catalogs, web portals, and data repositories. Accessible data are downloadable because researchers have made the choice to store their data in repositories that make them available for download over time and offer services to provide security and access restrictions when necessary. Interoperable data have standardized descriptions identifying the important components of a data set; example descriptions could include entity or variable names, units of measurement, key discipline-oriented terms, and/or geographic locations, so that data common to a specific domain can be discovered and reused by those within that domain as well as researchers in inter- or trans-disciplinary research areas. Reusable data provide important information that others need to know in order to incorporate your data into their research to advance the science in a specific research area. Reusable information might include descriptions of data quality, traceability (e.g., provenance), rights to reuse (e.g., licensing), and any tools, code, or technical requirements necessary to use the data effectively.
Who: Increasingly, funding sources and repositories that store, manage and archive your data have either requirements or strong recommendations to follow in order to make your data FAIR. Helping them REALLY helps you now (and Future you, when you are either reusing your own data or being asked to provide the information to others) - especially for the reasons described in the Why section, below.
Why – the value proposition: Rather than go through a litany of ways that making your data FAIR can justify the return on investment for this effort - and there is effort involved - one strategy can be used by all members of your research team. Using community-approved formats, standards, and controlled vocabularies for describing entities or variables within a dataset and related units of measurement and establishing community-based naming conventions and other structured descriptions are key to both understanding and sharing your data, especially for reuse. When registered publicly and therefore openly shared and reusable, these types of standards facilitate interoperability of your data. Such standards will:
Make it easier to add information about your data into searchable catalogs and portals;
Greatly improve the indexing of the information about your data into the catalogs and portals, thus making your data much more findable;
Elevate your data to a first-class object that is citable on its own merits rather than being a reference within a published paper;
Provide additional impact to other sources that have used the data by generating more citations of the entire dataset and/or its parts; and
Offer a greater ability to maintain your data in the long run by helping to maintain both backward compatibility (of software especially) and forward toward new processing of older data.[1]
How?
As a fairly simple example, imagine that you are conducting ornithological research on the features of an ecosystem needed to support the types of birds living in pastures such as the Curlew National Grassland in Idaho. By agreement with your funding source, you are being asked to use the most community-approved term for that type of land and for the name of the geographical location. You’ll need descriptive, human readable labels for the dataset you are creating, and a permanent identifier for each term, if available. In this case, you can go to the National Agriculture Library’s Agricultural Thesaurus and Glossary at: https://agclass.nal.usda.gov/ and search the glossary to find that the proper term to use is “permanent grasslands” for the type of land at this location. You can then search the Thesaurus for “Idaho” to find that there is a “related term” for the Curlew National Grassland with a persistent identifier (i.e., a Uniform Resource Identifier - URI) to the term in this authoritative, government-maintained resource.
In a more complex scenario, if you are a researcher using model data with simulation outputs who needs to determine what outputs should be deposited into a FAIR aligned community repository in order to share them, you could go to the Descriptor-Classifications Worksheet developed by the community developed EarthCube Research Coordination Network (RCN), “What About Model Data? Determining Best Practices for Preservation and Replicability” at https://modeldatarcn.github.io/. This template encompasses three associated use cases and can help you quickly solve this problem – rather than starting from scratch to figure it out.
Where can you find help with an Interoperability strategy?
More and more research communities are working together to build, identify, recommend, and maintain controlled vocabularies that are designed to help researchers meet these interoperability requirements. You can check with those who support you in doing your research, whether at the repository where you plan to store your data, at professional societies and/or at your institutional library or data center. In addition, the EarthCube Office offers other FAIR training materials, webinars and a Data Help Desk under the Resources tab of the EarthCube website as does GO FAIR US. Kudos to you when you take this very significant step toward an important FAIR milestone for yourself and for the greater research community!
[1] For more information, see Romain David, Laurence Mabile, Alison Specht, Sarah Stryeck, Mogens Thomsen, et al.. FAIRness Literacy: The Achilles’ Heel of Applying FAIR Principles. CODATA Data Science Journal, Committee on Data for Science and Technology (CODATA), 2020, 19 (32), pp.1-11. ⟨10.5334/dsj-2020-032⟩. ⟨hal-02483307v2⟩
Comentarios