Shape 01 Shape 02 Shape 03

How Data Virtualization is Changing Enterprise Data Architecture

Martino Corbelli Martino Corbelli on 23 April 2019
5 minute read
Introduction

Data virtualization allows an application to retrieve and manipulate data in order to create a single view of the overall data. It does this without requiring technical details about the data, such as how it is formatted at source, or where it is physically located. Unlike ETL processes, the data remains in place and does not need to move or be replicated, enabling real-time access to IT system and application databases. Data virtualization, usually only stores metadata for virtual views and integration logic.

As a cost effective, flexible and agile approach to sharing information, data virtualization compares favourably with traditional data processes and storage systems – such as data warehouses, data lakes, data vaults etc. It can efficiently bridge data across all these systems without having to create a whole new layer of physical infrastructure, making it complementary to all existing data sources and operations.

What this means is that enterprise data becomes more available, accessible and beneficial to all. As an alternative to ETL and data warehousing, data virtualization can produce quick and timely insights from multiple sources, without embarking on a major data project to produce extensive data flows and storage facilities. However, it can also be extended and adapted to serve both ETL and data warehousing

Not to be confused with data visualization - which is the graphical display of data as charts, graphs, maps, reports etc. – data virtualization can be defined as middleware providing data services to other data visualization tools or applications. The process of data extract, transform and integrate, therefore, happens virtually, allowing distributed databases and data stores to be access and viewed as a single database. For example, bringing data together from your NetSuite ERP system with your Salesforce CRM and Workday HR applications is not a new problem, but a simple example of a challenge most businesses face on a daily basis.

Lowering the cost and complexity of data integration

As large infrastructure is not required to enable data virtualization, but it will satisfy internal customers who need to be served with the data in many ways:

  • Real-Time Access: Data is immediately available when it is updated or changed at source
  • Zero Replication: Because data is not moved or copied but is connected to other sources, regardless of location, the speed at which users can access data is greatly improved
  • Abstraction: Non-technical business users can access data and need not be concerned where it resides
  • Agility: Promotes fast data, across multiple consuming applications, where changes are made without impacting the business

Gartner Research, has predicted major savings with data virtualization, as stated in their report: Predicts 2017: Data Distribution and Complexity Drive Information Infrastructure Modernization. Gartner states: “Organizations with Data Virtualization capabilities will spend 40 percent less on building and managing data integration processes for connecting distributed data assets.”

Data virtualization will help reduce the cost of data management, and in doing so make it more accessible to smaller organizations. Gartner goes on to project that by 2020, 35 percent of enterprise organizations will implement data virtualization as an alternative to traditional methods of data integration.

Data virtualization can be considered a technology, whereas a data warehouse is an architecture. Both can work together, for example data virtualization may be used as part of a logical data warehouse architecture, but it also has many more use cases. Because it is abstracting the peculiarities of specific data sources, advanced data federation across private and public clouds is one of those use cases.

Other use case examples include:

  • Advanced-analytics resource
  • Ability to ingest information originating in any source, format and schema
  • Seamless interoperability between legacy back-end systems
  • Agile capability to aggregate and process any dynamic mix of at-rest and in-motion information
Big data – virtualized

Whatever your enterprise data architecture includes, in terms of your mix of storage and server platforms, geographical locations and multiple cloud environments, having an abstraction layer ensures unified access, modeling, deployment, optimization and management of big data as a heterogeneous resource. By abstracting your data from its underpinnings in this way, it gives any big data cloud storage the opportunity to realise its potential.

Further, the hybrid IT environments commonly deployed by organizations, will logically lead them down the data virtualization path, and help them establish common data models in the context of a complex business environment. Unless all your big data sits in a single public cloud service, it will be necessary to virtualize access to public, private and hybrid cloud architectures.

We need a new approach, not a new architecture

The exponential growth in data volume and diversity, data silos, inherent data latency and the high operating costs of trying to deliver the right information to the right user at the right time, will see data virtualization become a key part of the intelligent core – as described by IDC – which identifies unifying data and applications as the key to digital transformation.

 

Picture1

 

The modern data landscape is being shaped by the growing number of sources, such as enterprise and consumer apps, the web, third parties, machines, social media and IoT (Internet of Things). This means organizations are faced with integrating and analysing a diverse mix of old and new data, existing in various structured, semi-structured and unstructured formats.

Data is generated and integrated in batch as well as real-time and all of this must be managed and put somewhere. Enterprise data architecture is struggling to keep up, and even the largest enterprises can’t spend time and resources rolling out new infrastructure every time a new data management concept comes along to solve one data problem. The challenge is multi-dimensional, and any solution has to be multifaceted in order to adapt.

Putting users first

Serving data to knowledge workers in an efficient way, allowing them to fulfil their work effectively, is the key driver for the future innovations that will accelerate business success in every economy. Under the growing pressures of data volume and complexity, what businesses are crying out for is the agility that will enable them to act and adapt to the data insights they garner from their information. To establish these ways of working, they need the ability to:

  • Immediately share data between company departments, in real-time
  • Easily join and view data from multiple sources
  • Transform and prepare data ready for the next stage
  • Avoid costly development and use of technical teams for every integration
  • Reduce the overhead of expensive storage systems, such as data warehouses, data lakes, data marts, data vaults etc.

All of the above business imperatives are difficult and expensive to deliver without data virtualization, if indeed possible at all. In the past we have focused so much on solving the data problem that the user was barely a consideration.

Now, that data must flow and be touched by many people in an enterprise, we are seeing the need to provision self-service capabilities for data owners and their knowledge workers. By understanding more about enterprise data architecture and the needs of business users, it is data virtualization that is putting their requirements ahead of the technology. This will help drive real change and extend the benefits and lifecycles of existing architectures.

A lot has been written about the data warehouse being dead, and yet many are still in operation and will continue to do so without being rendered obsolete, thanks in part to technologies like data virtualization. Enterprise data architecture, by its very nature, is an expensive and time-consuming endeavour, that often results in rigid structures requiring lots of support and maintenance to run. Data virtualization on the other hand, is a modernization opportunity, in a world of constant change, where businesses demand agility so large and small companies alike can act and adapt in order to compete on a more level playing field.

Back to Blog
Author
Martino Corbelli
Author
Martino Corbelli

Martino Corbelli, Chief Product Officer at SPINR explores how integration-Platform-as-a-Service (iPaaS) technology can simplify the data complexities barring the way to digital transformation.

About SPINR

Get your copy of the SPINR overview datasheet

SPINR provides real-time data and insight to any organization. We make the process of integrating, cleansing and sharing data simple, efficient and accessible to all. Download your copy now