Microsoft has developed an Artificial Intelligence (AI) called VALL-E, which is capable of imitating a voice with just a sample of three seconds. The demonstrations of VALL-E’s abilities are quite convincing.

However, the company is aware of the potential dangers of such a tool in the hands of malicious individuals. To learn more about this development, listen to the Vitamine Tech audio chronicle, where Emma Hollen explains how VALL-E works in detail.

The emergence of “deepfake” technology in images and videos has raised concerns about potential misuse. Now, with the unveiling of Microsoft’s new artificial intelligence (AI) model of speech synthesis called VALL-E, “deepfake” sound may also become a reality. This AI model can imitate and simulate a person’s voice with just a three-second audio sample. Once it has learned a specific voice, VALL-E can synthesize that person’s sound while preserving its unique characteristics and emotions.

At Microsoft, it is thought that VALL-E could be used for applications speech synthesis applications, but also, and this is obviously more worrying, for speech editing in a recording. It would be possible to edit and modify the sound from a transcription textual transcription of a speech.

Imagine a speech by a politician modified by this artificial intelligence artificial intelligence. A sample text, a three-second recording, and artificial intelligence does the rest. Overview VALL-E AI Imitates Your Voice in 3 Seconds - Incredible!