Evaluation of Forensic Methodology for Automatic Comparison of Speakers in Synthesized Voices

Authors

DOI:

https://doi.org/10.70365/2764-0779.2025.164

Keywords:

Synthesized voice, forensic speaker comparison, voice cloning, automatic speaker recognition system, deepfake

Abstract

Oral communication carries identifying information beyond the transmitted message, enabling the development of vocal biometric systems and scientific protocols for forensic speaker comparison (FSC). With the evolution of artificial intelligence voice synthesis, concerns arise about security and human detection capability. Despite the performance of Automatic Speaker Recognition Systems (ASRS), these still need to evolve to overcome AI synthesis technologies, especially in the Brazilian Portuguese context. This work aims to compare the performance of ASRS applied with FSC methodologies on AI-synthesized voices, questioning how ASRS using ECAPA-TDNN implemented in SpeechBrain reacts to cloned voice comparison. The quantitative exploratory methodology used the Brazilian Portuguese Forensic Corpus (CFPB) for calibration and the CEFALA-1 corpus for experimentation, employing the SpeechBrain ECAPA-TDNN model and ElevenLabs® and Coqui-TTS® cloning services. Results showed that the framework presented optimal performance on natural voices (balanced accuracy > 95%), but vulnerabilities to synthesized voices, with all cloned voices classified as the same speaker. Given this result, the development of specific protocols for forensic analyses with suspected voice cloning is recommended.

Downloads

Download data is not yet available.

Author Biographies

  • Adelino Silva, Academia de Polícia Civil de Minas Gerais

    Adelino Pinheiro Silva é bacharel (2004), mestre (2007) e doutor (2020) em Engenharia Elétrica pela Universidade Federal de Minas Gerais; capacitado (2009) em Fonética Forense junto a Secretaria Nacional de Segurança Pública. Editor da Revista Brasileira de Criminalística, da Revista Criminalística e Medicina Legal e da Revista Avante. Compõe o corpo docente e a coordenação do Curso de Gestão em Segurança Pública e Inteligência Aplicada (GESPIN) e atua no Setor de Perícias em Áudio e Vídeo no Instituto de Criminalística de Minas Gerais, onde realiza exames técnicos e pesquisas.

  • Gerson Albuquerque da Silva, Superintendência da Polícia Técnico-Científica de São Paulo

    He holds a degree in Physics, Linguistics and Portuguese Language and Literature. He studied Forensic Speech Sciences at the Department of Language and Linguistic Science at York University (Yorkshire, UK). He also holds a master's degree in Information Engineering and Multimedia. From 2013 to 2016 he was a member of the Group of Audio and Voice Signals for Reconstruction and Recognition of the Polytechnic School of the University of São Paulo [Process FAPESP 2012 / 24789-0] under the coordination of Professor Miguel Arjona Ramírez (http://www.bv.fapesp.br/pt/pesquisador/8134/miguel-arjona-ramirez). Since 2019, he is partner ofDevelopment of an open source forensic voice comparison system for research and practice Aston University. He has experience in Defense, Forensic Phonetics, Forensic Acoustics Forensic Audio and Forensic Metrology.He has produced hundreds of reports related to Forensic Linguistics, Forensic Phonetics, Forensic Acoustics and Forensic Audio Analysis. (Texto informado pelo autor)

  • Rafaello Virgilli, Superintendence of the Technical-Scientific Police of Goiás

    Possessed a degree in Physics from the University of São Paulo (2004) and a master's degree in Computer Science from the Federal University of Goiás (2022). He is currently the Criminal Expert of the Superintendence of the Technical-Scientific Police of GO. My experience in the area of Computer Science, with an emphasis on Deep Learning. Mainly addressing the following topics: deep learning, voice recognition.

  • Ronaldo Rodrigues da Silva, Federal Police

    Master's degree in Electrical Engineering, specializing in Forensic Computing and Information Security, from the University of Brasília. Postgraduate degree: Executive MBA in Project Management from the Getúlio Vargas Foundation. Specialization in Occupational Safety Engineering from the Federal Center for Technological Education of Paraná. Electrical Engineer with a concentration in Electronics and Telecommunications from the Federal Center for Technological Education of Paraná. He worked as a Telecommunications Engineer in fixed-line and data telephony service providers and as a Telecommunications Specialist for the National Telecommunications Agency. He has been a Federal Criminal Expert since 2007, assigned to the National Institute of Criminalistics of the Federal Police in Brasília, working in the area of audiovisual and electronic equipment forensics.

Published

2025-12-16

Data Availability Statement

Data related to this research can be requested from the authors via email.

How to Cite

SILVA, Adelino; SILVA, Gerson; VIRGILLI, Rafaello; SILVA, Ronaldo. Evaluation of Forensic Methodology for Automatic Comparison of Speakers in Synthesized Voices. Avante: Academic Journal of the Police of Minas Gerais, [S. l.], v. 1, n. 9, 2025. DOI: 10.70365/2764-0779.2025.164. Disponível em: https://revistaavante.policiacivil.mg.gov.br/index.php/avante/article/view/164. Acesso em: 19 dec. 2025.