Evaluation of Forensic Methodology for Automatic Comparison of Speakers in Synthesized Voices
DOI:
https://doi.org/10.70365/2764-0779.2025.164Keywords:
Synthesized voice, forensic speaker comparison, voice cloning, automatic speaker recognition system, deepfakeAbstract
Oral communication carries identifying information beyond the transmitted message, enabling the development of vocal biometric systems and scientific protocols for forensic speaker comparison (FSC). With the evolution of artificial intelligence voice synthesis, concerns arise about security and human detection capability. Despite the performance of Automatic Speaker Recognition Systems (ASRS), these still need to evolve to overcome AI synthesis technologies, especially in the Brazilian Portuguese context. This work aims to compare the performance of ASRS applied with FSC methodologies on AI-synthesized voices, questioning how ASRS using ECAPA-TDNN implemented in SpeechBrain reacts to cloned voice comparison. The quantitative exploratory methodology used the Brazilian Portuguese Forensic Corpus (CFPB) for calibration and the CEFALA-1 corpus for experimentation, employing the SpeechBrain ECAPA-TDNN model and ElevenLabs® and Coqui-TTS® cloning services. Results showed that the framework presented optimal performance on natural voices (balanced accuracy > 95%), but vulnerabilities to synthesized voices, with all cloned voices classified as the same speaker. Given this result, the development of specific protocols for forensic analyses with suspected voice cloning is recommended.
Downloads
Downloads
Published
Data Availability Statement
Data related to this research can be requested from the authors via email.
Issue
Section
License
Copyright (c) 2025 Avante: Academic Journal of the Police of Minas Gerais

This work is licensed under a Creative Commons Attribution 4.0 International License.

