Digital twins created by our data could be used to predict our future behaviour

An illustration of a digital twin
With sufficient information, AI can make many inferences about our personalities. Photo credit: Newshub

A digital twin is a copy of a person, product or process that is created using data. This might sound like science fiction, but some have claimed that you will likely have a digital double within the next decade.

As a copy of a person, a digital twin would - ideally - make the same decisions that you would make if you were presented with the same materials.

This might seem like yet another speculative claim by futurists. But it is much more possible than people might like to believe.

While we might tend to assume that we are special and unique, with a sufficient amount of information, artificial intelligence (AI) can make many inferences about our personalities, social behaviour and purchasing decisions.

The era of big data means that vast quantities of information (called “data lakes”) are collected about your overt attitudes and preferences as well as behavioural traces that you leave behind.

Equally jarring is the extent to which organisations collect our data. In 2019, the Walt Disney Company acquired Hulu, a company that journalists and advocates pointed out had a questionable record when it came to data collection.

Seemingly benign phone applications - like ones used for ordering coffee - can collect vast quantities of from users every few minutes.

The Cambridge Analytica scandal illustrates these concerns, with users and regulators concerned about the prospects of someone being able to identify, predict and shift their behaviour.

But how concerned should we be?

High vs. low fidelity

In simulation studies, fidelity refers to how closely a copy, or model, corresponds to its target. Simulator fidelity refers to the degree of realism a simulation has to real-world references.

For example, a racing video game provides an image that increases and decreases in speed when we depress keys on a keyboard or controller. Whereas a driving simulator might have a windscreen, chassis, gear stick and gas and brake pedals, a video game has a lower degree of fidelity than the driving simulator.

A digital twin requires a high degree of fidelity that would be able to incorporate real-time, real-world information: if it is raining outside now, it would be raining in the simulator.

In industry, digital twins can have radical implications. If we are able to model a system of humans and machine interaction, we have the ability to allocate resources, anticipate shortages and breakdowns, and make projections.

A human digital twin would incorporate a vast quantity of data about a person’s preferences, biases and behaviours, and be able to have information about a user’s immediate physical and social environment to make predictions.

These requirements mean that achieving a true digital twin are a remote possibility for the near future. The amount of sensors required to accumulate the data and process capacity necessary to maintain a virtual model of the user would be vast. In the present, developers settle for a low-fidelity model.

Ethical issues

Producing a digital twin raises social and ethical issues concerning data integrity, a model’s prediction accuracy, the surveillance capacities required to create and update a digital twin, and ownership and access to a digital twin.

British Prime Minister Benjamin Disraeli is frequently quoted as saying, “There are three kinds of lies: lies, damned lies and statistics,” implying that numbers cannot be trusted.

The data collected about us relies on gathering and analysing statistics about our behaviours and habits to make predictions about how we would behave in given situations.

This sentiment reflects a misunderstanding about how statisticians gather and interpret data, but it does raise an important concern.

One of the most important ethical issues with a digital twin relates to the quantitative fallacy, which assumes that numbers have an objective meaning divorced from their context.

When we look at numbers, we often forget that they have specific meanings that come from the measurement instruments used to collect them. And a measurement instrument might work in one context but not another.

When collecting and using data, we must acknowledge that the selection includes certain features and not others. Often, this selection is done out of convenience or due to the practical limitations of technology.

We must be critical of any claims based on data and artificial intelligence because the design decisions are not available to us. We must understand how the data were collected, processed, used and presented.

Power imbalances

The imbalance of power is a growing discussion in the public concerning, data, privacy and surveillance. At smaller scales, this can produce or increase digital divides - the gap between those who do and those who do not have access to digital technologies.

At larger scales, this threatens a new colonialism premised on access to and control of information and technology.

Even the creation of low-fidelity digital twins provides opportunities to monitor users, make inferences about their behaviour, attempt to influence them, and represent them to others.

While this can help in health-care or education settings, a failure to give users the ability to access and assess their data can threaten individual autonomy and the collective good of society.

Data subjects do not have access to the same resources as large corporations and governments. They lack the time, training, and perhaps the motivation. There is a need for consistent and independent oversight to ensure that our digital rights are preserved.

The Conversation