On data: where is it and what can we do with it?

Gavin Starks
4 min readMay 5, 2023

--

From a discussion with a non-technical board which I was helping recently

There was confusion around ‘where is the data?’, ‘who owns what?’ and ‘what can be done with it?’.

My synopsis is that:

The ‘data gap’ is usually leadership
Data governance needs to be taken seriously at the C-suite, management and internal processes. Not doing so both creates risks and misses opportunities. This must be addressed ‘today’.

  1. Legal rules need to better balance risks and benefits
    There are likely areas of legal exposure on contracts that must be (re)thought through in a way that both protects the organisation and helps to unlock innovation through data sharing. Often the default is too restrictive and/or misses entire categories of risk.
  2. Company policies must be enforceable, monitored and applied
    Organisations should use procurement and contracting processes and ensure that they are fit for purpose, and have standard operating procedures for data governance. This should include clear definitions of who is responsible (e.g. ‘Data Controller’, Chief Data Officer), for what areas and purposes, and how data processing will be managed. All third-party services must contain clear definitions of IP, security, support, and maintenance (even if some services are offered for ‘free’).

One issue I heard a lot of confusion about was the ‘location’ of data.
Its location can be interpreted in two ways:

  1. Its physical location

Data can be stored on computers and in systems that are under the organisation’s direct ownership or control through third parties that it uses. These might include cloud computing providers like Google, Amazon, or third-party analysis systems. It is completely fine to have data in many physical places (and ‘the cloud’ is exactly that).

The impact of ‘many physical systems’ does create an operational burden and the organisation’s data teams should aim to minimise this burden by storing data in as few systems as possible. But they must maintain direct contractual control to ensure that the data is well managed and can be held to account (e.g. such as direct contracts with third parties).

2. Its legal location

Regardless of physical location, the legal basis for data sharing must be clearly defined, contracted and enforced.

Organisations should aim to act as (and it will be in many jurisdictions) the primary data controller for all data that it collects. Making this clear is especially important when dealing with countries where rules around data are ‘early stage’. The concept of data ‘ownership’ is highly complex (especially with personal data) and a focus on ‘rights’ (rather than just ‘ethics’) can often help people focus on what’s important.

The organisation can, and should, own the IP (intellectual property) on certain data it collects ( e.g. raw such as non-personal data that its teams have collected), derivative data (aggregate statistics), reports, analyses, insights, visualisations, etc. Owning the IP allows the organisation to explicitly license it to others if it wishes to (whether under Open Data or Shared Data licenses).

The organisation will not ‘own’ personal data about individuals (e.g. EU citizens’ rights are covered by GDPR). Instead, it has a role as a data controller or as a data processor. Depending on its contracts it may have the right to do things with the data, including analysing it and sharing outputs with others.

If the organisation contracts a third party to do, for example, some data cleaning or analysis, it must (a) have the rights to do so; and, (b) do so under a contractual agreement that allows data to be processed by that third party. The organisation should not assign any ownership rights to any ‘primary’ data and nor should it assign rights to any derivate outputs (e.g. analysis) without assessing the risks or benefits of doing so. If it does, such rights must be codified in a contract.

It may be the case that sharing data with a third party can help create additional benefits (e.g. they can improve their analytics systems), in the same way using data from others can create additional benefits for the organisation.

However, there are also new risks, such as using an organisation's data to train machine-learning / artificial intelligence systems in ways that cannot be predicted. These risks include commercial, competitive, legal, liability, IP, ethical and moral hazards.

The impact of ‘many legal contracts’ creates compound risks and the organisation must be crystal clear about its approach to data governance, including protections, licensing, processing and security. Creating common legal frameworks with multiple parties can take time, but they can also create cohesion, reducing risk and unlocking permission for innovation.

One way to think about this is to imagine ‘technical systems’ as consultants. It doesn’t matter where they are, we don’t give a consultant the right to do anything other than what we need them to do. It doesn’t stop them from learning and building on their own experience. As we move forward with new ‘ai’ systems, this takes on materially different dimensions and creates new types of risk and new types of opportunities.

The purpose of this piece was to help non-technical people understand some of the basis of ‘where’ is data, who can use it, and for what purposes. I hope you found it useful and please leave comments if you have further questions or feedback.

One personal note: I find it far more useful to think about data rights (can this data be used for that purpose in this way) rather than data ‘ethics’ (which can mean very different things to different people).

As we continue on our journey to a data-enabled world, we must ensure that data governance processes are in place to help everyone understand what they can and can’t do, and what to do when things go wrong.

--

--

No responses yet