We started our journey of building the next generation Data Catalog that is powered by Natural Language Understanding (NLU) and Artificial intelligence(AI) two years as a part of Unifi AI Labs. Knowledge Graph-based catalog exploration is one of the key differentiators for our Data Catalog. Until now these AI/NLU features were available as LAB features. As these features are hardened over several releases and adopted by a large number of users and customers, the time has come to make our OneMind AI/NLU engine generally available. As a part of Unifi Data Platform 3.0 release, our AI based recommendation engine and NLU functionality will be available to our customers by default.
Over the last few releases we added following functionality:
- Similar Datasets: Using clustering algorithms, we are able to identify clusters in data catalog. We then present this information in form of % similarity. For a given dataset, datasets similar to more than 70% are presented in the UI. Using this functionality we can identify potentially duplicate items where percentage similarity is 100
- Related Datasets: OneMind AI engines learns over time period associations and relationship between various entities like datasets, jobs, workflows.This information of relationship is presented in the UI.
- Tag Recommendations: OneMind AI engine learns how tags are added for datasets, jobs, workflows. The engine then starts making recommendations for tags.
- People also ask for: Based on search query run by the user, our engine starts making recommendations for relevant searches made by other users.
- Trending Information: The engine provides information for what objects are trending in search.
- Typeahead and Autocomplete for NLU search similar to Google search.
- Knowledge graph view for functions and filters built for a given column.
- 25+ Recommended questions for trying out NLU based search.
Key new Features added in release 3.0 for NLU and AI functionalities are outlined below:
PII and Masking Recommendations:
The system automatically discovers Personally Identifiable Information (PII) data types using built in AI algorithms. We provide functionality for data stewards to mark a data type PII explicitly. The system learns over time what data types are marked as PII explicitly by data stewards and what masking function was used. As the system gets trained, it starts making recommendations for potential PII data types and recommendations about masking that can be applied. Data Stewards can then accept or reject this recommendation. This process of accepting/rejecting recommendations further enhances the learning model for the future PII and masking recommendations.
Tableau NLP Package
As a part of our previous release we supported functionality for cataloging metadata from Tableau servers. We also provided functionality for searching Tableau projects, workbooks, worksheets and dashboards across multiple Tableau servers and sites. As a part of this release, we now support discovering Tableau metadata through a NLU interface. The system is able to provide information for the most commonly asked questions like “show me all the views or give “me the most popular views.”
“Did you mean” recommendations for correcting NLP search questions
Sometimes a search operation doesn’t return the right results because the question asked might have been spelled incorrectly. Thus, in order to improve the accuracy of search results, we now make recommendations about search questions. If the intent (verb) or object (subject) doesn’t match with any entity in our current knowledge graph, we make recommendations about the closest possible match. This recommendation can help users to rectify the spelling problems and obtain desired search results.
Stay tuned to exciting new features planned for the upcoming releases from Unifi AI Labs.