Go beyond overall biometrics accuracy testing, early and broadly
Biometric systems with generally high accuracy can have hidden performance problems in the form of differentials in error rates for different groups of people or based on various other factors. How those differentials are found and addressed in a laboratory setting was explored by Fime Laboratory Biometric Service Line Manager Joel Di Manno in a presentation at the FIDO Alliance’s Authenticate 2023 conference.
In his presentation, “How can bias influence the usage of biometrics?”, Di Manno recounted an experience with a customer seeking quality assurance testing for face biometrics liveness detection.
A particular presentation attack instrument was found to be successful at a rate far beyond the generally strong results. A demographic analysis showed a high frequency of the same skin tone in successful attacks with masks.
There are many possible reasons for this performance disparity, Di Manno explained, including but not limited to the balance of training data. Addressing the errors by improving the performance for the given demographic is a better approach in this case than, for instance, training the system on more masks.
Confirming the suspicion with live subjects, however, would require hiring dozens of people from the affected demographic in a costly and time consuming process, Di Manno points out. Fime’s solution is to use synthetic data, not for an official evaluation, but to test the hypothesis of a demographic weakness in the algorithm.
Once confirmed, the developer analyzed its training data and found a lack of representation for the affected skin tone.
Di Manno proceeded to outline a wide range of factors that can introduce errors, from bias and personal details to environmental challenges and capture device quality. Different factors will create differentials in different modalities, he points out.
An example of an environmental factor introducing errors into fingerprint biometrics is seen in a research paper and subsequent experiment Di Manno summarizes. In a low-humidity environment, the error rate was significantly higher for female subjects than males. In the medium and high-humidity environments, the same difference was not present.
A paper published by Fime in October delves into differentials observed in fingerprint algorithms, finding that the performance one of two was not stable across different environments. This shows, according to Di Manno, the benefits of drilling down into the specific scenarios that can introduce differentials prior to deployment.
He urges developers to take this kind of close look at their products as early as possible in the process, to minimize the real-world impact that different match conditions, from the skin tone of the subject to the humidity level, have on biometric performance.