Synthetic faces could solve algorithmic bias and ethical data quandary: IDVerse

October 18, 2023

For years, many of the AI systems used for facial recognition and identity verification have had a race and gender bias problem. Systems have shown higher misidentification rates for people with darker skin, and have sometimes contributed to wrongful arrests. The main reason behind this is the data AI models have been trained on, which disproportionately includes white and male faces.

To counter this, some companies are turning to enriching their data sets with fake faces. Generative AI can create a vast amount of data, capturing the uniqueness of humans in different scenarios and different environments, says IDVerse Co-founder and Chief Technology Officer Matthew Adams.

“We’re able to generate so many conditions and so many different variants of certain faces that far exceed what you would ever get in reality,” he says. “You inherently get better results, you get a lot less bias.”

IDVerse uses generative AI to train the deep neural network systems used in its “Zero Bias” AI technology. The Australian company, previously known as OCR Labs, says it helps minimize discrimination on the basis of race, age and gender and claims 99.99 percent accuracy in testing.

Synthetic faces are also being used to solve other problems for biometrics developers like IDVerse.

While global governments have been issuing warnings about deepfakes’ potential to undermine democracies by sowing confusion, businesses are already battling an increasing number of scams and frauds relying on generative AI. The same generative AI tools, however, can also be used to detect deepfakes during attacks.

“Synthetic faces when they’re used badly in the industry are called deep fakes, that’s when they are used to try to fool protection systems,” says Loc Nguyen, IDVerse’s CMO. “We create, for lack of a better name, ‘goodfakes.’ Goodfakes are used to train our algorithms to spot the deepfakes. Same tool, different outcomes.”

The use of generative AI may also help get around increasing privacy regulations on biometric data use and eventually lead to more ethical sourcing of facial data.

The traditional way of training good AI algorithms has been gathering as much data as possible. But that sometimes leads to unethical practices, including obtaining facial data from people who have not consented to sharing it. Adams believes there is a need for a “nutrition stamp” for AI models that could ensure that faces were sourced ethically with the consent of the owners.

“We don’t know the ingredients of those models. We don’t know whether it was ethically sourced or how they were trained,” he says.

These days, the industry is moving towards a different path of training facial recognition, according to IDVerse.

Many companies have trained their algorithms by using facial data from one specific region. When the time comes to enrich datasets they usually try to obtain as much diverse facial data as they can get their hands on. But algorithms are sometimes unable to adapt the way they are processing faces.

IDVerse executives say they are “big believers” in training datasets on as much synthetic data as possible and not using real data.

The Australian company was founded in 2014 as a research firm and launched commercially in 2018 as OCR Labs. It became the first private Australian company to win accreditation as an identity provider for operations outside of the government’s digital ID system. In May this year, it changed its name, focusing itself on biometric verification, liveness detection, document verification and video KYC.

Its reliance on synthetic data came out of difficulties in gaining access to real faces.

“Ethically, you just can’t get massive data sets from certain companies or countries,” says Adams. “From my understanding, we are the only ones actually using a fully generated face model within our stack.”

Generative AI is not a silver bullet. It can also create synthetic media that courts the old bias, Nguyen says. But deepfakes have opened many doors. With today’s computers, we are already able to collect the same amount of faces and permutations that in a real-life scenario may take 200 to 300 hundred years. And as computer processing power increases, so will the number of fake faces. If all goes according to plan for IDVerse, this will mean the end of algorithmic bias in facial recognition.