At Unifi, we continue to expand the use of AI to obscure the technical complexity of finding, exploring and preparing data to support data analytics self-sufficiency. As a part of the 2.7 release, our AI engine, OneMindTM, will enable Data Catalog Managers and Data Engineers to build the catalog faster by providing functionalities like showing similar datasets, identifying potential duplicate datasets, providing recommendations for tags and supporting additional questions for data governance through our NLP interface.
When displaying a dataset in the Dataset Explorer, the user can choose to display other datasets that are the same, or similar, by clicking Similar Datasets. The result displays the percentage of similarity based on comparing the properties of the primary dataset with a similar user-selected dataset. For Data Catalog Managers or Data Engineers, finding similar datasets allows duplicate datasets to be discovered easily and cleansed subsequently. As Data Catalog Managers build a catalog of their data, by automatically being able to select similar datasets, they can quickly move from one dataset to another similar dataset helping them to get valuable insights from data faster.
Other new key features of the Unifi Data Platform include automatic Tag Recommendations. The Unifi AI engine that parses for similarities also recommends tags based on similarity. As users explore data, they can assign a Tag to a dataset to indicate what type of information it is such as ‘Sales’ or ‘Finance.’ As the AI-engine learns which datasets are of interest to a user, it serves up recommendations based on Tags of the same nature. Using this functionality, Data Catalog Managers and Data Engineers will achieve consistency for tags usage through a common ontology or a business glossary.
New NLP packages for Data Governance
As a part of this release Data Stewards can use our NLP interface for answering common data governance and compliance related questions for gathering required stewardship related information very easily. Questions supported as a part of this release include:
- Give me all the datasets containing Personally Identifiable Information (PII)
- Show me all columns in a dataset containing PII
- Show me users who have access to a dataset
- Show me users who have access to a column in a dataset
- Show me users who have access to a datasource
Our journey of building the next generation Data Catalog powered by AI continues. Stay tuned to exciting new features planned for the future.