Intelligent Data Virtualization and Freedom of Choice for Data Scientists
Data scientists must deliver accurate, agile insights that enhance company performance. At the same time, they face a host of technical barriers involving legacy data silos, security, data governance and a growing variety of business intelligence (BI) tools. Oddly enough, the proliferation of BI tools actually presents organizations with a challenge: each business unit may have selected and implemented its own favorite. Indeed, according to Forbes, 60 to 70 percent of business functions already utilize two or more BI tools.
But forcing users onto a single, enterprise-wide BI solution is not the answer, either, since that would require retraining and almost certainly cause user adoption issues. The real answer is for companies to let end-users choose the BI tool they prefer, and then make it possible for all these tools to connect to the necessary data.
For this strategy to work, organizations need intelligent data virtualization — a new approach to consolidating enterprise data warehouses that improves performance, data quality and convenience for data scientists. Intelligent data virtualization eliminates the technical barriers associated with orchestrating data platforms and ensuring security and proper governance, not to mention the differences in query language from one BI tool to the next. By setting up an intermediary system between the BI tools used by data scientists and the data they must query, enterprises create a data experience that is tool-agnostic.
Virtualizing Shared Queries and Business Logic
Some companies try to achieve the same ends with software-defined data storage, but that is not enough. You still need translation and business logic tools that facilitate access to data. That’s what intelligent data virtualization gives you.
BI tools tend to have their own dialects of standard or proprietary query languages, such as SQL and MDX. When users working with the same data get different answers to their queries due to subtle differences in how their BI tools query data or how the calculations were written in the BI tools themselves, you have a serious problem. In fact, that slight difference in results could turn into significant losses or missed opportunities for the enterprise.
Intelligent data virtualization solves this problem by creating a translation interface that bridges the gap between BI tools and data. Thanks to this interface, BI tools have their data queries standardized before the query is applied to the data. The result? Consistent queries across BI tools.
Intelligent data virtualization’s advantages become even more apparent when joining disparate databases or when working with very large datasets.
In the case of joining databases, for example, you can run into trouble when the data are mismatched. Perhaps the tables are composed of similar columns using different units or scales of granularity, or the columns reflect different time periods or fiscal quarters. How does the data scientist select the correct way to translate relevant data?
Intelligent data virtualization addresses that question directly by using a metadata overlay to automatically normalize data when disparate databases are joined from on-premises, cloud-based or hybrid sources. Rather than extracting datasets and joining them manually (a time-consuming and error-prone process) with intelligent data virtualization, joins and transformations are automatically managed in the background.
Query Performance and Data Discovery
In addition to providing a universal interface between BI tools and data, allowing for the frictionless joining of disparate datasets, intelligent data virtualization provides two more powerful benefits to BI users: it significantly increases query performance and enables comprehensive discoverability of enterprise data by data scientists.
In a database with billions of rows of data, queries may take hours or days to return results. With query acceleration, data scientists can analyze more data more quickly and efficiently. Intelligent data virtualization accelerates queries by generating intelligent data aggregates featuring only the data relevant to the query. This speeds up query performance 5x to 100x, depending on the data involved.
Finally, because intelligent data virtualization connects data regardless of source — from on-premises data warehouses to cloud data platforms to unstructured data in a data lake — it creates a central, virtual space data scientists can use to discover datasets. Data scientists are thus given broad, secure, rights-managed access to the panoply of data within the organization’s enterprise data warehouse.
Intelligent data virtualization closes the loop for data scientists by giving them clean, governed data with which to create feature recommendations and predictions, for example, and then enabling them to share those recommendations and predictions with hundreds or thousands of other BI users. Without virtualization, data scientists must rely on data extracts or CSV files, two options that are simply not scalable.
BI Platform Choice and Better Business Insights
The benefits of intelligent data virtualization make it an attractive choice for all BI users. At the same time, these benefits create an environment that not only encourages freedom of BI platform choice, but also produces a shared data intellect among data scientists. Wide visibility to data, agile query performance, and reliably consistent query results enable data scientists to achieve a more nuanced understanding of their data relative to the company’s business objectives and, ultimately, realize an ever greater impact on those objectives.
Dave Mariani is chief strategy officer of AtScale