In a digital world, where companies of all sizes are processing more data than ever before, data operations (DataOps) can help ensure that data is managed in the most effective way possible, delivering accelerated value to the business. But what is DataOps?
Not to be confused with DevOps
You’re probably already familiar with DevOps – a set of practices combining software development and IT operations to make the development lifecycle as fast and efficient as possible. DevOps improves interaction between developers and operational IT staff, and works particularly well with agile communication techniques, which tighten interactions between developers and customers – never usually a natural combination.
Both DevOps and agile practices, try to mitigate the risk of building the wrong thing. They help developers respond to customer feedback as quickly as possible, and deploy bug fixes as soon as possible, amongst other things. As DevOps teams mature, they gravitate towards standardised platforms. Typically, organizations start off with a developer-led culture and siloed teams. As they integrate DevOps practices, they become more service-oriented, creating cross-discipline teams and standard operational definitions which are applied across the company.
Ultimately, DevOps can lead to organizations developing platform features for all stakeholders to work from, so that shared scalability services are in operation across the business.
Same philosophy, different context
DevOps helps organizations to accelerate the benefits of technology and data by harnessing all the necessary skills, supporting tools and working methodologies and focusing them on delivery. DataOps aims at the same outcomes – but where DevOps is focused on improving interaction between developers and operational IT, DataOps is focused on improving interaction between customers, analysts and engineers.
DataOps blends some DevOps and agile concepts to ensure analytical and operational data is a high-quality asset. It seeks to answer questions like:
- How do I know the data I’m using is correct?
- Is data or the algorithm/model stale?
- Can I see how an aggregation was computed? (e.g. Why was a patient categorized as ‘High Risk’)
- How do I even find the right datasets to work with?
- How can I ensure all data conforms to a standard or Common Data Model, that can be accessed and utilized by all relevant stakeholders?
Making data enablement possible
DataOps can be a mechanism for self-service data innovation, enabling platforms for the entire data lifecycle and empowering users to interact and enrich their organizational data. This means that practitioners can leapfrog the decades of DevOps introspection and create stable, dependable data platform services. There is even a DataOps Manifesto, which states that the concept aims to provide an environment for emergent data use and allow as many people as possible to use that environment. In other words, DataOps is about opening up the possibilities of harnessing data to all stakeholders within an organisation.
The manifesto has 18 principles, including:
- Continually satisfy your customer
- Value working analytics
- Embrace change
- It’s a team sport
- Daily interactions
- Analytics is code
- Make it reproduceable
Other important aspects include monitoring quality and performance as well as avoiding repetition of previous work. Repeatability or reuse is key here to improving cycle times. Let’s pick out one of the key manifesto principles to expand upon. “Orchestrate” is described as: “The beginning-to-end orchestration of data, tools, code, environments, and the analytic team’s work is a key driver of analytic success.” This encapsulates the broader themes of the manifesto into a single primary definition.
The Manifesto also stipulates that the environment should be ‘observable’ – recording lineage, provenance and telemetry, and that governance and policy functions should use this instrumentation. Data should also be composable, meaning that users should be able to layer and mashup authoritative datasets and use common data models. Where possible, tools and automation should be used for deploying, updating and administering, so as to reduce the effort required to run a DataOps function.
Developing a DataOps function
All this sounds very positive, right? So how do you get started on developing DataOps in your organization?
A typical DataOps team incorporates data analysts, who prepare and wrangle data for operations and analysis; data scientists, who find new and emergent patterns in data, and data engineers, who create stable, scalable data pipelines. You may already have some or all of these roles within your company, or you may be able to outsource some of these functions to a third-party. Then there are customers, both internal and external, who focus on defining data requirements and consuming data.
People are just one part of the equation, and it is the convergence of people, process and technology that is needed to run your DataOps. To help the DataOps team succeed in their endeavours, there are specific tools available, which are generally more advanced technologies than those used in DevOps and agile methodologies. Self-service tooling for data discovery (search), data transformation and preparation, warehousing, data lake, AI/ML and API management can all play vital roles in developing your DataOps. This advanced tooling can lead to the blurring of roles and a more emergent data ecosystem.
Clicks and code
We developed SPINR to be one of these advanced self-service platforms, that can accelerate development time for technical teams, but also allow business users to be citizen integrators, promoting the democratization of data. Our product design and future roadmap fully supports the DataOps manifesto and all its key principles.