This section will firstly explore different conceptual bases for governing data. Secondly, it will discuss modular approaches through which institutions and principles are being proposed to govern data, and thirdly it will make the case for a justice-oriented approach to data governance.
Conceptual bases for the governance of data
There are three main assumptions that underpin the governance of data. First, prevalent on the state level, is that data is a strategic national asset: a key resource for population control and influence, for surveillance and monitoring both domestically and abroad, and for stimulating a domestic digital economy. In short, data is central to controlling citizens and to building digital power on the international level (
Pohle and Thiel, 2020). In China, for example, this manifests in a state-led data economy where population control and state security are both supported through data regulation (
Covington, 2021). Similarly, Russia's data governance model prioritises state and security interests, while also providing a framework for regulating the way businesses process data about people. India has also flirted with this model: a 2021 legislative report (
Lok Sabha Secretariat, 2021) noted that data was an asset of ‘national importance’ and argued for both data localisation and infrastructural investment to support it. In all these cases, the interests of domestic digital businesses and the state are placed above big tech, for instance by regulating social media platforms as publishers of online information so that they can be subjected to state control.
The second type of assumption identifiable amongst both state and private actors is that data is a proprietary asset that behaves like private property, and that can be more or less fully commodified (
World Economic Forum, 2011). This property-based assumption is underpinned by the adoption of the individual (or ‘legal person’, such as a company) lens for defining rights and claims, rather than, for example, groups or society as a whole. This claim is based on the notion that, if individuals (or legal persons) are properly compensated with rights over data and (sometimes) financial or other utility, data can constitute both a kind of capital and a commodity that can be freely traded. This in turn is the basis for digital markets around the world and is a key feature of data governance propositions on the global level (for example, by the G20 and World Trade Organisation (WTO)) due to its importance to economic growth. The US, Canada, Singapore and, to a lesser extent, the European Union are typical examples of international actors who lead with this assumption, focusing on building markets and international businesses around data. This in turn offers opportunities for states to serve as hubs for powerful market actors' data transactions, as in the case of Singapore. This asset-focused approach to data governance is supported by international diplomacy. The G20, WTO and other international organisations such as the United Nations Conference on Trade and Development align with the assumption that free international trade in data is a core priority for global data governance, and advocate for a light-touch approach to rights where data trade comes first, and data protection and other rights are a desirable extra (
Heseleva et al., 2020). They also align around treating AI technologies as products (
OECD, 2024;
Veale and Zuiderveen Borgesius, 2021), rather than systemic and public technologies. This perspective occludes the view that people may resist AI technology and infrastructures on the basis that they constitute a form of governmentality premised on certain values, with which many groups (as we explore below) do not align.
The last assumption we see motivating approaches to data and AI governance is the notion that there is something essentially different about personal and non-personal data – one so embedded in most data governance discussions that it tends to become invisible. It is worth surfacing, however, for two reasons. First, because it is the conceptual basis for many claims embedded in models and instruments of data governance, many of which become dysfunctional without it – for example that data should be governed in line with human rights which themselves are based on the liberal ideal of the rational, empowered individual. Second, because we are currently seeing a destabilisation of the personal/non-personal distinction just at the point where data governance has come to rely most heavily on it as a way to channel large quantities of deidentified data to developers of AI models.
Other approaches towards governing data
Beyond these mainstream assumptions, it is also important to make visible alternative visions which are frequently used to contest the notion that data constitutes a national security or market asset, and that people can only have rights over data that relates to them when that relation makes them identifiable. For instance, critical scholarship, civil society advocacy and labour organising have variously claimed that data can be classified as labour when people provide their attention and engagement to social media platforms (
Fuchs, 2014), become building blocks in the AI economy (
Pasquinelli and Joler, 2021) or directly engage as workers labelling data for AI models or content moderation (
Perrigo, 2022). Data has also been defined as embodied human social relations, community and by extension, a form of territory (
de Souza et al., 2024;
Lehuedé, 2022;
Tierra Común, 2021). Lastly, data is a source of claims to rights and equity, whether economic (
Gurumurthy and Chami, 2022a), political (
Wylie et al., 2018) or cultural (
Kukutai and Taylor, 2016), and thus an embodied claim to identity on the part of the marginalised and oppressed.
These critical analyses are often translated and operationalised into different governance models for data, namely public data trusts, data collaboratives, data commons (including semi-commons), data cooperatives and different notions of data sovereignty, that is, personal and indigenous.
In public data trusts, a public institution is formed or tasked to manage citizens' data collected from diverse actors including commercial companies based on a trust relationship for promoting innovation and policymaking (
Micheli et al., 2020;
Morozov and Bria, 2018). An example of a limited version of a data trust is the DECODE initiative, with pilots in Barcelona, where residents used environmental sensors to collect data to share it anonymously with their communities in their own terms, as well as in Amsterdam, where people prove their age without sharing full identity information (
Bass and Old, 2020;
Kortlander and Espuny Contreras, 2019;
Sagarra et al., 2019). In such a model, the main agreement between data subjects and public actors is a trust relationship that depends on public engagement based on consultations, strong accountability mechanisms and collective benefits. The main value of this model is the public interest that could translate into the use of public data for policymaking, social innovation and to address social challenges (
McDonald, 2019;
Micheli et al., 2020;
Wylie et al., 2018).
Data collaboratives are based on the idea is that private data collected by companies can be pooled with public data through an independent third party to generate public value (
GovLab, 2024;
Mozilla Insights et al., 2020). The data collaboration could take the form of a multi-stakeholder partnership that governs access to, sharing and use of data (
Mozilla Insights et al., 2020). In these cases data can be effectively siloed by producers to keep it from other commercial uses, although it will still flow through, and thus profit, larger proprietary infrastructures at various points in its journey. One example of this is the recently proposed Intelligence Community Data Co-op (
OSINT Foundation, 2024) in the US, which aims to collect and share commercially available and open source data for the entire US intelligence community. Data collaboratives are more developed than other models covered. The NYU GovLab has created a database with more than 200 cases of data collaboratives, mainly from the health, transportation and humanitarian sectors (
GovLab, 2024).
Another model that can interact congruently with that of the free data market is the data commons, a powerful imaginary in technology policy and data governance (
Purtova and van Maanen, 2023), where various parties aim to create data commons in relation to health, energy, financial and other domains of data. The commons vision involves making data available through shared infrastructures in ways which make it less excludable by economic interests. This potentially offers alternatives that could work in parallel to the neoliberal market model, still allowing for markets for data to exist and for innovation to be prioritised. An example of a data commons in formation comes from the New Hanse project based in Germany, which has produced a ‘blueprint’ aimed at enabling ‘cities and communities to access and use urban data to gain better democratic control of urban space and provide more effective public services’ (
New Hanse Project, 2023: 2). The idea of the data commons has its roots in the fundamental work done by Ostrom and others (
Ostrom and Hess, 2007) on how to keep public goods available to the public. Ostrom, however, warns anyone interested in institutionalising commons-oriented approaches that such an approach needs to be combined with contextual and domain-relevant instruments to make sure that the resource behaves in the interests of the expected beneficiaries (
Ostrom and Hess, 2007: 1). Applying this work to the digital realm has mainly resulted in the idea that some kinds of data can usefully be defined as ‘common pool resources’ – owned by a particular community, where that community can restrict access by outsiders.
Gurumurthy and Chami (2022b), for instance, argue for a ‘semi-commons’ approach where individuals, public agencies and legal persons such as corporations or data altruism organisations all have different and conditional rights over data, within the parameters of privacy rights. They examine the notion of the commons in the Indian proposal for non-personal data governance (
Government of India, 2023), which states that such data ‘are a nation's or community's collective resources as arising from their natural and/or social spaces, and should be governed as such’. The authors argue in line with Ostrom, however, that a commons-based data stewardship model, if shaped by corporate interests, could both legitimise corporate claims to ownership of data with public value, and reinforce inequality of access to – and power over – data by centering guardianship with the most powerful interests.
In contrast to the larger-scale commons, data cooperatives are models in which data subjects or data holders' organisations are the main actors that voluntarily provide data to the pool to be used, shared and accessed through a collective of particular groups or communities. The main value of this model is the public interest that translates into collective protection and, if organised in the right ways, empowerment of the community. As such, cooperatives may be platforms that provide collective services in relation to data (such as the Dutch BO Akkerbouw project (
BO Akkerbouw, 2024), aimed at providing farmers with individual data spaces), or data cooperatives where datasets (or data derivative) can be shared by the group. Cooperatives emerged as a response to the problems caused by defining data as a market commodity, and the top-down, platformised and opaque nature of other governance models (
Micheli et al., 2020;
Mozilla Insights et al., 2020;
Mulgan and Straub, 2019). Such cooperatives tend to be diverse, with interests ranging from privacy and labour to social aims such as autonomy and sustainability (
Micheli et al., 2020;
Mozilla Insights et al., 2020).
The model of personal data sovereignty fits with the overall market approach to data in which data subjects are defined as market agents who aim to control the access to, use and sharing of their data (
Micheli et al., 2020). The model is dependent on new intermediaries within the data economy. It is increasingly being translated from a set of aims, based in the libertarian political ideology commonly found in the communities working on decentralised technologies, into a set of technical tools aimed at realising greater interoperability between consumer services and individualised rather than collective forms of data governance, both of which form business models where the developers of those tools and infrastructures can provide them commercially. The advent of cryptographic technologies and the libertarian values of the crypto community have been influential for this model, but so far the vision of ‘trustless trust’ has not been widely adopted by those producing data in their everyday lives. The provision of these personal data stores is in parallel, however, rather than an alternative to, the use of data outside of them by for-profit or state actors. The most prominent example of personal data sovereignty comes from the proposal of ‘digital identity wallets’ by the European Commission (
European Commission, 2024), where self-sovereign identity model is clearly influential, despite the projected end product being a very large-scale identity management system.
Lastly, work on indigenous data sovereignty is not like the other governance models on this list. Rather than being a model per se, it is more a principle used to advance a different proposed basis for governing data. The CARE principles, which advocate for collective benefit, authority to control, responsibility and ethics, are designed to centre indigenous peoples' interests, world views and rights. They are designed to address lacunae in the FAIR principles (findability, accessibility, interoperability, and reusability) which place an emphasis on findability, accessibility, Interoperability and Reusability, but as indigenous data sovereignty experts argue do not address challenges of power, including those centred on historical contexts (
Carroll et al., 2020). Data from indigenous communities is one of many resources that have historically been used by colonial powers to exercise control over the lives of indigenous people. Therefore, the indigenous data sovereignty movement promotes the idea of managing information about their peoples, territories, lifeways and natural resources according to their laws, practices and values. Its claims are legally based on Articles 18 and 19 of the United Nations Declaration on the Rights of Indigenous Peoples, which state that indigenous peoples have the right to participate in the matters that affect them through their means and procedures (
Taylor and Kukutai, 2016).
Why we need a justice-oriented approach to governing data and AI
So far, we have described governance approaches toward state or business power, building public-sector resources or limiting access to particular actors, as well as different institutional, as well as principle-based structures for operationalizing data governance. Overall, any model for data governance is a statement about who should benefit from data and how. Hence, if we wish to think about data governance in relation to the power to automate, optimise and scale up interventions through AI systems, perhaps the most important issue becomes data's lifecycle. Datasets determine the practical capacities and effects of AI models, and their continual repurposing is of central relevance to the developing AI economy. To get a grip on data's ever-evolving lifecycle, however, it is necessary to go beyond the focus of data protection on individual identifiability and thus bypass one of the core tools of data governance worldwide. This is, however, both a political and a practical challenge given the centrality of data protection – and thus the personal/non-personal binary – to current thinking about what data may flow, and for what purposes.
In practice, what creates data's economic value is the ability to transform and reuse it. This is both of central importance to understanding how data enters AI assemblages and how it fuels their impacts on people. For example, all over the majority world, the fintech industry's ‘alternative credit models’ ingest a huge variety of non-financial data about people, from mobile device usage data to the characteristics of their social networks (
Aitken, 2017). Meanwhile in the US, Customs and Border Protection and Immigrations Customs Enforcement officials have demanded telematics data from the companies that collectively track the movements of tens of millions of vehicles on a daily basis (
Brewster, 2021).
These cases demonstrate just a few of the myriad ways in which apparently innocuous metadata from people's everyday devices or deidentified data becomes potentially highly sensitive as soon as it is subjected to particular research practices. Ironically, these practices usually begin with the deidentification and aggregation of the data, removing it from the purview of data protection. The examples also highlight how, when data flows between institutions that have different functions and different modes of independence and accountability to the public, the capacity of those affected to interrogate the data changes. Data collected from people in one context may be used against their interests in another. Once the overarching rubric of data protection is no longer present, the formal connection between data and risk or vulnerability is entirely lost, and data is free to use and redeploy.
Despite data protection's notion of vulnerability as rooted in the innate characteristics and attributes of people, it therefore makes sense to think of data and data technologies, including AI, as potentially creating new vulnerabilities rather than just exploiting existing ones. If, as we argue, data now changes categories in terms of its effects and risks depending on who has access to it, what their purpose is, and what analytical methods are used, then there is no way to cover this kind of fluidity without declaring all social data potentially sensitive and remaking data governance to deal with this risk – something that has already been argued by data protection scholars (
Purtova, 2018).
Our intent is not to argue that data is too complex to govern effectively, or that it will always escape from measures taken to control it. Instead, these examples demonstrate that data has a lifecycle during which it is likely to transform multiple times with diverse meanings, purposes and effects, that it may travel between jurisdictions while doing so, and that these attributes matter for attempts to regulate and shape both data flows and ensuing uses.
We therefore advocate for justice-oriented data governance. This approach has a temporal dimension: data that is controllable at one point in its trajectory, or for a particular group involved in generating it, may quickly escape control as soon as it travels onto large infrastructures for purposes of interoperability, sharing or monetisation. Therefore checks and controls are needed at points where data's use or users change, which may occur throughout the open-ended lifecycle of data, across its different sectoral uses, and in line with the different aims of the actors using it – for instance ways to opt in and out of different datafied communities (
de Souza and Bhardwaj, 2024). It also has a spatial dimension, in that as data flows between sectors, institutions or markets, there is a recalibration in terms of access, use and perceived ownership. Tackling this multidimensionality is therefore critical to ensure that data's mutability does not subvert or destabilise the conditions and rules designed to protect people from misuse.