A data catalog is an incredibly valuable and versatile resource. This invaluable tool enables different types of data sets and sources to be cataloged in place and the metadata gathered and organized into a single repository. Such a powerful resource can be utilized for a whole host of different applications and analytical initiatives, spanning activities across every industry sector. This article will discuss how a data catalog supports FAIR medical research data.

Recently, the National Institutes of Health created the New Models of Data Stewardship (NMDS) program, with the hopes of creating a single, unified ecosystem for biomedical research data. The program called for a solution to support findable, accessible, interoperable and reusable data sources through the cloud, and no other technology can support these needs like an advanced data catalog.

data catalogData catalogs enable teams across businesses of all kinds to more easily access critical data.

New Models of Data Stewardship: About the project

As the biomedical and technology industries change, researchers must be able to leverage advanced tools in order to make the most of their data assets. As NIH pointed out, “data-driven innovations … may yield transformative changes for biomedical research,” but issues including disconnected, incompatible and inaccessible datasets stand in the way of key research insights.

In order to address these, and other related issues, NIH and the Office of Strategic Coordination’s Common Fund began the NMDS program in 2017, with plans through 2020. One important part of this initiative involves the NIH’s Data Commons Pilot Phase, wherein researchers investigate potential solutions to support their specific data needs.

What does it mean to be F.A.I.R?

NIH’s Pilot Phase set out with the goal of testing and finding a robust, cloud-based tool to address current biomedical research data challenges. Researchers set out to discover the best solution for supporting findable, accessible, interoperable and reusable – or F.A.I.R – data sets.

Such a pursuit isn’t unique just to this project – it’s also a beneficial approach inside and outside of the biomedical research industry.

While there are many cloud-based data tools, none fits the bill more accurately and efficiently than an innovative data catalog like that provided by Unifi. Such a solution, particularly when used as part of a comprehensive, self-service data platform, can bring an array of advantages to data and analytics initiatives.

data catalog business intelligenceThe Data Catalog provided by Unifi offers a wealth of useful business intelligence features.

Through the lens of NIH’s F.A.I.R requirements, let’s examine how a data catalog supports each of these critical needs:

  • Findable: An innovative solution like the Unifi Data Catalog includes robust search capabilities, including two different ways to search the system. One search feature, called Solar Search, enables a Google-like experience and supports findability of specific publications and assets. For example, through Solar Search, medical research users can ask questions to find certain cataloged publications on a specific disease or condition. Another search feature uses natural language processing to enable users to search datasets, columns and metadata of cataloged assets. In this way, users have two ways to enable exploration of publications and assets and support findablility for medical research.
  • Accessible: While sensitive information and data must be protected, it must also be accessible to those that need it for analysis and other initiatives. A data catalog like that offered by Unifi enables users to connect with the platform through any web browser. As long as users have an internet connection, they can access their necessary data, without the need to download specific software or applications. Users can leverage the Explorer feature to further support accessibility and provide a catalog-like view of datasets and assets. The Dataset Explorer searches metadata specifically, whereas the BI Explorer allows users to search and access business intelligence assets, as well as reports based on those BI assets.
  • Interoperable: Siloed datasets are the enemy of any data initiatives, and this was one of the main challenges that NIH looked to address within its NMDS program. A robust data catalog can also address this requirement by support collaboration and a community aspect for data sharing. By capturing metadata and cataloging assets within the platform, users help eliminate existing barriers and improve interoperability. Once cataloged, researchers can leverage robust collaboration features like Community Feed, which displays all operations taking place within datasets including the changes being made and the users initiating them. Users can also rate the quality of specific datasets to signify good or bad quality sets. This helps support interoperability on the current project, as well as future analysis.
  • Reusable: Finally, data must be cataloged and stored in a way that enables it to be leveraged by other parties for additional or separate initiatives. An advanced catalog allows sources to be cataloged in place, ensuring that existing databases remain intact. This, supported by the ability to modify and share cataloged metadata directly within the platform, means that sources are unencumbered by silos and can be leveraged again and again for specific research.

The Unifi Data Catalog is an ideal solution for supporting findable, accessible, interoperable and reusable data. To find out more about our data catalog, connect with our experts today.