What is a data infrastructure strategy?
I’m often asked to unpack ‘data’. There’s a lot of confusion and, often, people think it’s a function of ‘IT’ and just for the geeks.
A ‘data strategy’ can often be quite shallow in its thinking and tends not to focus on data sharing—which is a cultural shift in the way collaboration needs to work.
Collaboration in a data-enabled business needs to be about how humans and machines can combine their intelligence, systems, and accelerate and optimise solutions to problems. At the core of this is the ability to easily access, share and use systems internally and externally.
Even if there’s an understanding that data should be used at a strategic level, it’s rare that there is a good understanding of data strategy at C-level, and often it’s deferred to CIO/CTO to make decisions that, in my view, should be more collaboratively defined by the whole team, aligned with business targets, and explicit contributions to ROI.
For example, I would like to think that the age of “build a data lake” or “let’s build a portal” as a starting point are over. However, my observation is that we are very far away from that.
We need to begin by thinking of data as part of our infrastructure and, with this, thinking about how it can be shared between systems regardless of where those systems are.
The Open Data Institute (ODI), frames data infrastructure as the “systems, processes, and tools that enable the creation, management, and use of data.” This includes both physical infrastructure (e.g. servers and storage systems) and virtual infrastructure (e.g. software and networks).
The definition goes further to include: the people, policies, and practices that support the creation, management, processing and usage of data; roles and responsibilities for managing data; the standards and protocols for how data should be collected, stored, and shared.
A strong data infrastructure is the foundation for data management, analytics, visualisation, and value creation. And, in a networked world, the systems for management, analysis, visualisation may exist anywhere.
A strong strategy must, therefore, address interoperability at its core.
A data infrastructure strategy is a plan for building, managing, and maintaining systems, processes, and tools that enable data creation, management and use in a way that can be shared across systems.
It is important to understand the role of data sharing in supporting outcomes and impacts and, based on this, to create a framework for sharing data.
A well-designed strategy can save time and money by rapidly connecting data and systems together: to help people make better, more informed decisions and use data to drive growth and innovation. It must take into account the types of data as well as the business processes (logistical, operational and legal), and understand the applications that rely on that data.
Critically, it must consider the organisation’s data governance policies and procedures, legal and regulatory requirements.
In summary, four core elements of a robust data infrastructure strategy must include:
- Data governance: policies, procedures, and standards (that can be enforced) to ensure the usability, quality, security, and integrity of data, its usage and sharing (internally and externally to any organisation).
- Data access and security: how data will be accessed and shared, both across organisations and externally, including protecting data from unauthorized access or misuse. This will combine technical and legal measures.
- Data analytics and visualization: understanding the tools used to analyse, process, visualise and share outputs, and the processes and skills needed to use them effectively.
- Data storage and management: where and how data will be stored, the tools and processes needed to manage data throughout its lifecycle, regardless of where the data ‘physcially’ resides. This will include time-based, jurisdictional and related policies.
Note that only point four (and part of point two) of this are functions of ‘IT’. Points one and three span the entire organisation, as managers and as users.
To focus attentiion, a data infrastructure strategy should directly support goals and outcomes, minimise risks, unlock effective usage and sharing to drive growth and innovation.
Finally, it would be prudent to begin to include ‘algorithms’ (as opposed to code) as part of strategic development as while in many cases it may be the data that is ‘moving to the processing’, it is also the case that the algorithms may have to ‘go where the data is’ for efficiency and/or compliance reasons. This may be the subject of a future post.