On February 15, Unifi Software and 451 Research spoke together on a webinar discussing why a data catalog is the first step on a journey to self-service data. Before an organization can benefit from a self-service data environment to reduce time to insight, an essential first step is to catalog your data so that business users can discover what’s important to them. Seamlessly wrapping other self-service data tools around a comprehensive data catalog is the remainder of the journey to faster business insights.

This webinar discussed approaches that enable self-service data for analytics, from discovery to insight, including making the most from a data lake and standing up new data processing pipelines for achieving business success.

You can watch a recording of the full webinar here. Continue reading for answers to questions asked during this webinar.

Our data isn’t just stored in a data lake – it’s highly disbursed. What’s the best approach to discover what data you have and to catalog it without having to move your data to explore it?


At Unifi, we take a broad view of data. Our recommendation is to leave data where it is and using data virtualization connect and catalog that information to get all the information you need in a single, searchable view. This lets you look at all your data without moving your data. For example, you don’t end up populating your data lake with data you don’t need just to see what’s in there. Data virtualization with our embedded AI lends well to using Natural Language Query on top of the data, which we do, and you get a much broader value out of your organization because you’re accessing all of your data.

Could you please speak to this comment from a recent BI and Analytics review:


“An analytics team could leverage a data lake to quickly compare data sets and create reports, but deploying this to the rest of an organization will not work. With the disparate nature of data between applications at your organization along with the various ways in which data needs to be joined, a data lake and the on-demand data model idea won’t hold up to the reporting and self-service needs of your organization.”

[Matt] You need to ensure that data is delivered to the right people, with the right access, for the right purpose. If you just put all your data in a data lake and give everyone access—that approach is not going to deliver success. The catalog along with catalog-based governance is the tool that enables you to identify what data you have, who should have access to that and for what purpose, and to deliver that accordingly.

[Andy] Defining a clear data governance policy of who gets to see what data, under what circumstance and being able to audit that to see who’s accessed that data, when did they access it, and what’s been done with that—is a critical part of a governance and security aspect to a self-service data environment.

Are data quality rules and business rules managed in this tool? Do you integrate with engines like IDQ and QLik for BI reporting?


Data quality is absolutely core to the Unifi Data Platform. Understanding whether a data source has been certified or qualified by someone is very important. Crowd-sourced data quality is one of the key things that gets delivered when you have a collaboration and community aspect to your data catalog because people share information in there. You can show and recommend trusted data sets that people should use.

Unifi is independent of BI tools. Being able to use a wide variety of BI tools is important – we know no one company uses just one tool. For IT this is especially critical because the outbound data delivery has to be structured in such a way that a BI tool can consume it. Being able to use a data platform that can feed any BI seat is really key.

Which BI tools have been successful in your business cases?


Almost every client that we have on the Unifi Data platform has more than one BI tool connected to it—exporting a Tableau workbook or a Qlik file, or Excel or Hive table or whatever the format needs to be.

Our clients don’t always connect to a BI tool – one of our credit union clients runs Member 360 initiatives through Unifi running on Azure and uses Power BI as a visualization tool but the output of Unifi nightly feeds directly into their dynamic CRM system, which becomes their single source of truth for their account data.

What metadata are you sending to a Data Catalog? All technical metadata from a Data Lake?


We connect to a data lake and any data attributes stored in a data lake will be pulled into Unifi. We scrape all of the header data and derive additional insights around that data with our AI and provide that to the business user, that data then becomes searchable. Any of those terms become a Google-like search in Unifi. This is part of that shared knowledge experience – the idea that you’ve searched for sales data, for example, we show all of that in any data source irrespective of where it’s located.

Can you describe what is Catalog-based governance?


The catalog is the focal point for identifying what data you have within your data lake and across your larger enterprise environment, along with the rules and policies for determining who has access to that and when data should be accessed.

Isn’t buying best of breed products for catalog, data prep and governance better? What are some of the most common challenges you hear about implementing best of breed products?


A single environment is more strategic for deploying across the organization for multiple use cases, and the advantages for having those components tightly integrated and delivered as a single environment across the organization are likely to outweigh any advantages a company may have using a best of breed product.

Does your catalog integrate AI capabilities for searching for and suggesting business terms?


A business glossary, and being able to decipher a business glossary, is a critical aspect for being able to get intelligence out of your data catalog. Legacy datasets, for example, will often have obscure terms for data sources. Our AI is extremely robust within our data catalog and we’re able to recognize types of data from existing glossaries or business terms while still maintaining the original attribute within the data set and bringing that information into a unified view. You also have the ability to re-label or redefine a description if you’ve been given permission to do so while maintaining its original attribute.

Is any one pillar more important than the others?


[Matt] All the pillars are part of the virtuous circle of capabilities within these environments. The underlying governance provides the management of data throughout its lifecycle; the catalog provides an inventory of data in the estate, and enables users to do self-service discovery and identification of data; then users do the transformation, data preparation and automation of the data; and the AI-capabilities improve these efficiencies. All these pieces working together creates a self-service managed environment.

[Andy]  In many instances we’re seeing companies start with a data catalog because it’s really hard to have a complete view of all your company data – and to know what’s important, relevant and accurate. That said, we agree, there isn’t any one aspect that rises above the other. The importance of this seamless integration is that it can ease a lot of IT processes such as moving data for discovery and setting up specific environments to do processing, setting up environments to consume data using certain BI tools – often referred to as the plumbing of data – it’s essential and it’s what we are delivering.   

Is the AI capability within the catalog to automatically scan the data landscape for terms? How much configuration is involved for setting up the catalog?


Unifi is now available directly from the Azure Marketplace and you can go and try that for free.

The environment to try Unifi out is built automatically and you end up with a login UI. We are a browser-based application so you can login from any browser in the platform. You start by selecting what data connectors you want – and if you have permissions to connect to that data our AI starts crawling that data and pulling back that metadata. Once you start to see the data catalog being populated you can click on those data sources and immediately start to interrogate that data.

For the Azure trial, how does Unifi access private, on premises data?


We can run on premises, in the cloud and in hybrid environments. For the Azure trial, it’s like any other data access piece you have to have support from your IT team to set up tunnels to access on premises data.