Google's AI Clones Your Voice After Listening for 5 Seconds! 🤐

  • Published on Nov 11, 2019
  • ❤️ Check out Weights & Biases here and sign up for a free demo:
    The shown blog post is available here:
    📝 The paper "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" and audio samples are available here:
    An unofficial implementation of this paper is available here. Note that this was not made by the authors of the original paper and may contain deviations from the described technique - please judge its results accordingly!
    🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
    Alex Haro, Anastasia Marchenkova, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Matthias Jost, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil.
    Splash screen/thumbnail design: Felícia Fehér -
    Károly Zsolnai-Fehér's links:
    Instagram: twominutepa...
    Twitter: karoly_zsolnai

Comments • 2 263

  • Flo Rolf
    Flo Rolf 2 years ago +965

    Imagine going to an audition for voice acting and after 5 Seconds the judges kick you out but a year later you hear yourself in the movie.

  • Addsomehappy
    Addsomehappy 2 years ago +3

    Now you don't even have to read your scripts, just put them through this thing

  • Max Xu
    Max Xu 2 years ago +1

    Pretty sure this episode is synthesised by an AI.

  • Andy
    Andy 2 years ago +1

    "What's your credentials?"

  • ItsTonyCO
    ItsTonyCO 2 years ago +681

    This model was trained with ~20K voice recordings. Imagine Facebook training it with 2 Billion.

  • Gin san
    Gin san 2 years ago +522

    This paper: We nailed it!

  • Moorlachs
    Moorlachs 2 years ago +9

    imagine getting a call from AI claiming to be you

  • Matt Rommel

    I've been inspired to make a voice encoding/decoding workflow from watching your previous videos, so this one was very exciting.

  • omarcortes88
    omarcortes88 2 years ago +973

    So, basically, this is how Terminators mimic other people's voices.

  • AllisonJuno
    AllisonJuno 2 years ago +201

    imagine if at the end of the video he revealed the entire voiceover was an AI

  • Jacen Solo
    Jacen Solo 2 years ago +313

    Indie and visual novel developers could really use this to improve their games without having to hire voice actors.

  • Zansi
    Zansi 2 years ago

    Could we use this to make voice command software work nearly perfectly? As in, instead of having to program it to understand every individual word, or gradually get better over time as you use it, could you just speak a 5-second phrase and have it synthesize your voice, checking what you say against its own synthesized version in order to confirm accuracy?

  • Wail Rimouche
    Wail Rimouche 2 years ago +2

    RIP voice recognition security commands.

  • Hernando Malinche
    Hernando Malinche 2 years ago +172

    This would be really useful for Audiobooks :D

  • Marouane
    Marouane 2 years ago +3

    This kind of tech would be really useful for translating movies into other languages while keeping the original actor's voice, i hope to see some of this soon. Thanks for the great video.

  • P8qzxnxfP85xZ2H3wDRV
    P8qzxnxfP85xZ2H3wDRV 2 years ago +5

    This episode would have blown my mind, if you revealed in the end that the voice-over actually was synthesized from a 5 second sample of your voice.

  • Ryan Denziloe
    Ryan Denziloe 2 years ago +4

    May I say that, in addition to the "wow factor" of the final results at the start of the video, your more detailed expositions of the technical details of the papers are very much appreciated.

  • jack tringoli

    I feel like this could be used to bypass some sort of voice recognition security lol

  • RoliTheOne
    RoliTheOne 2 years ago +2

    Missed opportunity: Revealing at the end that your voice was synthesized throughout the video.

  • WoofCaptain
    WoofCaptain 2 years ago +57

    Wtf, the synthesis is so good that I might not even suspect it was synthesized if I wasn't told.