This AI Makes "Audio Deepfakes"!

Share
Embed
  • Published on Apr 7, 2020
  • ❤️ Check out Weights & Biases and sign up for a free demo here: www.wandb.com/papers
    Their blog post on is available here:
    www.wandb.com/articles/improv...
    📝 The paper "Neural Voice Puppetry: Audio-driven Facial Reenactment" and its online demo are available here:
    Paper: justusthies.github.io/posts/n...
    Demo - **Update: seems to have been disabled in the meantime, apologies!** : kaldir.vc.in.tum.de:9000/
    ❤️ Watch these videos in early access on our Patreon page or join us here on TheXvid:
    - www.patreon.com/TwoMinutePapers
    - thexvid.com/channel/UCbfY...
    🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
    Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh.
    www.patreon.com/TwoMinutePapers
    Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: discordapp.com/invite/hbcTJu2
    Károly Zsolnai-Fehér's links:
    Instagram: twominutepa...
    Twitter: karoly_zsolnai
    Web: cg.tuwien.ac.at/~zsolnai/
  • Science & TechnologyScience & Technology

Comments • 1 018

  • Alex Paden
    Alex Paden 2 years ago +1992

    (cough) custom audiobook character voices (cough) (cough) :D

    • Alex Paden
      Alex Paden 2 months ago +1

      @Allison Hart I see less of a desire to replace narrators and more of the ability to make narration possible for anyone without recording or knowing how to manipulate their own voice.
      Multiple character voices is sort of just a side effect of being able to create a unique voice automatically.
      If I wanted the best quality of narration I would choose a human actor, possibly multiple human actors. PErhaps this is a way of selling enough audiobooks to fund a voice actor.

    • Allison Hart
      Allison Hart 2 months ago

      it's a cool idea, but speaking as an audiobook narrator myself, i had a ton of fun playing with pitch and formant for different character voices. (formant is an effect that mimics changing size/shape of the human vocal tract to be able to change a male to female or female to male voice more convincingly). i narrated a book where there were different demon voices, and depending on the "rank" of the demon character i made the voice deeper and more intimidating, and also had to act out the whole thing. voice acting is acting. this is something an AI could never do, at least not for 50+ more years or so.
      so while i think it's cool, coming from the creator perspective i'll hate a bit to see this happen. technology is awesome but this discussion is kinda similar to the "AI generated art" discussion. AI art is cool, but there's an argument to be made that art is meant to come from a human being and what stands to be lost if more and more art becomes AI made. does it lose the human side?

    • James
      James 3 months ago

      I smell money

    • Lifestyle
      Lifestyle  11 months ago

      I want to use it as my software, how do I do it?

    • Alex Paden
      Alex Paden 11 months ago

      @-Postapokalypso - If it possible to train AI generated voices to follow the intonation of a speaker and sync it to the script then yeah I think it seems reasonable that is possible. I don't know that much about voice synthesis techniques currently

  • Nowtronix
    Nowtronix 2 years ago +1891

    To pronounce "Károly Zsolnai-Fehér" is the Turing test of audio deepfakes ...

    • M3RC3B0SS
      M3RC3B0SS 7 months ago

      @Слава всей Руси !和平 Not all of them, but we have lot of freetime phylosophers and etc and that is the problem with some Hungarian people, they can be too filled with emotions meanwhile their face is like: 😐
      I don't feel myself really an other creature, but with these things we like a weird A.I who is really machine like on the outside but in the inner have ton of things.
      I don't disagree with your theory, maybe our oldies was fall in love with non-human creatures, we will never know :/

    • Ivory AS
      Ivory AS Year ago

      Nowtronix
      Ummm... yeah...
      How do I tell _anything_ to say that, _much _*_less_* a computer!

    • Markus Müller
      Markus Müller Year ago

      Technology+Science+Skill, the pros and cons thexvid.com/p/PL7tB6qL0r-AYne4REmG9Axwv3gfNRBUjN

    • Roman
      Roman Year ago +1

      I failed.

    • Sophia Cristina
      Sophia Cristina Year ago +1

      @Teutonic Gaming YES!

  • Tyson Mccartney-Rundle
    Tyson Mccartney-Rundle 2 years ago +2133

    Bout to use my dads face and have him say he’s proud

    • zubby
      zubby 17 days ago

      F

    • Rey
      Rey 4 months ago

      Nnnnnnnnnn

    • Enrico Sanchez
      Enrico Sanchez 5 months ago

      He'd be proud of you even if you lived in a van, down by the river!

    • An Indian
      An Indian 9 months ago

      I dont need so

  • Taka
    Taka 2 years ago +916

    With this, I can create hundreds of NPCs for games and add their own custom dialogue with unique voices. Modders are going to love it.
    Edit: it actually happened. Check out xvaSynth mod for Skyrim.

    • Harlequin Comics Full
      Harlequin Comics Full 25 days ago

      Can you please say how you did that?

    • Judess 69er
      Judess 69er Year ago +1

      you wouldn't need even an eighth of the programming required here for skyrim models... even in VR.... you just basically use the lip sync algorithm like they do in games like GTA Online you speak in mic, avatar mouth moves....

    • Stephen Queen
      Stephen Queen Year ago +2

      I'd be really keen to do this for tales of symphonia 2, and fix the audio for Lloyd to make it the original VA's

    • Scott Johnny Helgemo Aune
      Scott Johnny Helgemo Aune Year ago

      That’s..........
      ACTUALLY GENIUS!!

    • Green Delta17
      Green Delta17 Year ago

      never though about that. I guess I get to work on my NPCs too.

  • Regah Productions
    Regah Productions 2 years ago +362

    Now pair this with the ThisPersonDoesNotExist system and now you can create people, give them faces, give them voices, and have them talk coherently! This is getting a bit scary, not gonna lie lmao

  • VR Wizard
    VR Wizard 2 years ago +690

    Having a face time meeting with multiple friends where everyone looks and sounds like Obama might be something funny to do in the future. 😅

    • Sara S
      Sara S Year ago +2

      @Amanda Zeller or more

    • Mike LaRose
      Mike LaRose Year ago

      Why not Jeffrey Dahmer?

    • George Moten
      George Moten Year ago +2

      Somebody doesn't realize the what ifs going on here 🤔

    • Amanda Zeller
      Amanda Zeller Year ago +2

      @Kim Jong-Un There probably is. I heard once that Government tech is 10-20 years Ahead of what the Average citizen uses.

    • Leslie Pinner
      Leslie Pinner Year ago +1

      Anybody but obummer!

  • Sheo
    Sheo 2 years ago +27

    This is incredibly dangerous technology.

  • Loony Luna
    Loony Luna 2 years ago +126

    I wonder if this "puppetry" technique could be run in reverse to do lip reading.

    • lordchickenhawk
      lordchickenhawk Year ago +3

      @Nikolai Megdanov The NSA already knows all our locations...

    • Nikolai Megdanov
      Nikolai Megdanov Year ago

      NSA wants to know your location!

    • Heath Mitchell
      Heath Mitchell Year ago

      @PurpleGhost Or just use voice recognition...

    • Slumberland.mp4
      Slumberland.mp4 Year ago

      How to break the code use eminem😂

    • PurpleGhost
      PurpleGhost 2 years ago +1

      Yes! This could be very useful for disabled people-- if a deaf or HoH person can use an app set to lip-read they might be able to have more access to face time conversations with people who don't know sign language (not that people shouldn't also learn sign language)

  • Hedi Sparay
    Hedi Sparay Year ago +4

    Remember that image experts can detect all of these techniques by analyzing them. If you make fakes by passing them off as real, and you spread the fake by not specifying that it is a fake, you risk a great deal of justice.

  • Marcelo Prado
    Marcelo Prado 2 years ago +10

    This is super scary! Imagine all the fake news implications

  • J. F. D. Smit
    J. F. D. Smit Year ago +25

    Something that told me the "justice system" is highly biased towards whomever they want to win the case, was when I learned that video evidence was not allowed as testimony until it was possible to manipulate the the audio of what people said in the videos.

  • Ray Romanov
    Ray Romanov 2 years ago +6

    I'm simultaneously terrified and extremely intrigued for what lies in the future for this.

  • The Potato Of 42
    The Potato Of 42 2 years ago +148

    Just be sure to keep the AntiDeepfake development ahead of the main thing.

    • Jennifer Jones
      Jennifer Jones Year ago +1

      Too late

    • CaptainMisery86
      CaptainMisery86 Year ago

      @Jimmy De'Souza dude we can see the candidates now and our choices are Trump or Biden

    • Gus
      Gus 2 years ago

      @Zeta Hurley or we are extinct.....

    • Zeta Hurley
      Zeta Hurley 2 years ago

      @Gus ah but you see by then we could have artificial intelligences that can think and feel on a scale hundreds of times more than humans- a sentience like that would just be the next step of evolution, humans don't bother even trying to dismantle the small monkey tribes in the woods, and I could imagine a similar situation would occur between AIs and humans, just you know, probably they have a space empire and the Earth is essentially an animal sanctuary to them

    • Gus
      Gus 2 years ago +1

      @Zeta Hurley and when that happens, we probably have vr societies already or even a full vr dive which means deepfakes won't be a problem anymore that is if everyone chose to live there and make it more important than the real word. Also we would need some sort of unique ID that cannot be replicated by anything other than its creator which is of course, the government or some company....

  • Nyachi
    Nyachi Year ago +122

    If this technology doesn't concern and frighten you then you haven't been paying attention

    • Dylan Ting
      Dylan Ting 10 months ago

      Though most people watching this video are probably using this for entertainment and not for blackmail purposes or something like that, so please don't bring that up here dude

    • 【h】【a】【r】【i】【s】
      【h】【a】【r】【i】【s】 Year ago

      Any website u recomand for ai voice deepfake

    • connflicts
      connflicts Year ago +4

      @Eavy Eavy you do realize these can affect everyone right? imagine you're in court and someone uses a deepfake of you as video evidence agains you. i bet you wouldn't be calling it "paranoia" then. this is a serious issue

    • Eavy Eavy
      Eavy Eavy Year ago +4

      Wary about something that doesnt affect you isnt being aware, it is called paranoid at best, pretentious at worst.

  • p0rt3r
    p0rt3r 2 years ago +81

    Now, when a movie gets dubbed into a different language, they can actually lip-sync the video to the audio. :-D

    • John Sormani
      John Sormani 2 months ago

      Well actually the actor can just be a voice actor that is in the editing studio to lipsync all of the spoken footage. The lipsync of other languages can be done with the exact timber of the original actor, so it sounds closer to the Original voice. The stunt double can be the actor as well.

    • Tord Iversen
      Tord Iversen Year ago

      @digimaks Right now yes, but in maybe 6 years we might have fixed this problem as the technology gets better!

    • digimaks
      digimaks Year ago +1

      Wont always work, since in movies faces and heads turn allot, allot of things overlap in front, etc.

    • Noah Camacho
      Noah Camacho Year ago +3

      thats true!! Didn't think about that.

    • Vineeth K
      Vineeth K Year ago +8

      now that's a great use of this technology

  • Patriot Watchdog 1776
    Patriot Watchdog 1776 Year ago +11

    You could do some serious damage with this. They probably already have. Impressive none the less

  • G E
    G E 2 years ago +61

    woudld be interesting to see if it can handle a rastic change of tone, like shouting and wispering even if the reference audio is just talking.

    • Darth Malice
      Darth Malice 2 years ago +3

      Love how people get so pressed by those who correct grammar so the other can learn. Damn. XD

    • Deivison Carvalho
      Deivison Carvalho 2 years ago

      @Jorge C. M. Your name should be Jorgen like the sweadish one

    • Tori Ko
      Tori Ko 2 years ago +1

      *drastic

  • MD FM 28
    MD FM 28 2 years ago +14

    Could be useful for game industry, it'll be nice to hear npc calling out your name.
    but will it kill the voice acting industry or it will evolve the industry with this tool invented?

    • Nemo X
      Nemo X Year ago +1

      It might. Or at least voice acting will become more expensive per hour.
      But it does mean that you can take an hour of audio each from 1000 voice actors and create as much dialogue as you can come up with, instead of having all that be focused on two three main characters and everyone else repeating the same three lines.

  • AWFULJ
    AWFULJ 2 years ago

    This could have a huge impact on the music production field. Imagine coming up with some melodic content of some sort and feeding it to a network trained to make it sound like something else. I guess pitch information could be an '' issue'' but this would be such a great sound design tool.

  • Morty Rickerson
    Morty Rickerson 2 years ago +2

    I can see this being used in the future for VR games. Such as mmorps where you would insert your face for the avatar and this would replicate you talking to the other players in the VR world with you.

  • Toby Siu
    Toby Siu 2 years ago +1

    I was doing the exact same thing in my project, guess I could save some time now. Thank you for introducing this to us!

  • Eric Steele
    Eric Steele Year ago +32

    Thank you that explains so much why were seeing people talk to us on the TV that are already dead LOL.

  • Q Nawiat
    Q Nawiat Year ago +8

    Saw a “queen of England” video showing a fake queen, I believe the technology is capable of.

  • odomobo
    odomobo 2 years ago +12

    Nobody's concerned that in a few years, powerful entities may use this technology such that we can no longer discern what is true?

  • J C
    J C Year ago +20

    This is how they rewrite history.

  • Icewind007
    Icewind007 2 years ago +9

    Finally, we can just have Attenborough narrate everything.

  • TheNitroG1
    TheNitroG1 Year ago +36

    Ladies and gentlemen, WELCOME...to the death of objective reality.

    • G R
      G R Year ago

      Objective reality already died

    • Revil K
      Revil K Year ago +1

      Objective reality has never existed in the virtual realm, but still exists in the real tangible world

    • Eavy Eavy
      Eavy Eavy Year ago +1

      Conspiracy paranoid: omg nothing is real
      Already sincr wr have more than 1 religions

  • hr rüben
    hr rüben 2 years ago +30

    For me it's kind of creepy to see the newsreader of Germany's Tagesschau in a video about "Audio Deepfakes".
    Especially with this quality.

  • Lyle Willis
    Lyle Willis Year ago +1

    Blows me away. There’s nothing now that can’t be imitated or assumed to be genuine & truly the real person that we are witnessing

  • Giorgio Mazza
    Giorgio Mazza 2 years ago

    All of this is amazing. Congrats on all people involved

  • no philza dont jump
    no philza dont jump 2 years ago

    If this works perfectly, we could make new episodes of shows where one or multiple the voice actors passed away or other stuff, which would be really great! :D

  • John Smith
    John Smith 2 years ago +42

    A missed opportunity to make all of the presidents say “What a time to be alive!” in unison.

  • Craig Davidson
    Craig Davidson Year ago

    Oh I can see great possibilities here. Imagine, being able to combine this with Deep Fake technologies to recreate the lost episodes of The Quatermass Experiment.

  • Kolli Wanne
    Kolli Wanne 2 years ago

    Honestly the voice program alone is awesome. So much creative things we can do. Idc about all the illegal stuff, the rest is already good enough :D

  • Aleixo Alonso
    Aleixo Alonso 2 years ago +4

    Just imagine how useful this would be in the process of remastering old records or repairing them. You take the voice of maybe Margaret Whiting, "deepfake" it and put new instruments over it :o Or something even older than that! 1920s singers voices' crystal clear and in 2020 Hi-Fi stereo :o

  • newnews brooklyn
    newnews brooklyn Year ago +36

    This is scary and i suspect its being used now

    • Wanda Williams
      Wanda Williams Year ago +4

      You can bet they are.

    • Bob Hope
      Bob Hope Year ago

      Yes and Biden said Kamala was on her knees under his desk slobbering on his giblets. In his own voice on video... Lol

  • jerry t
    jerry t 2 years ago

    I have a humble question to ask you, regarding the future use of this technology:
    Assuming the deepfakes technology will grow better in the future, so people of importance will use higher definition audio/video recording to preserve authenticity. But will this last on? Would one day the AI learning exceeds mankind's power to increase authenticity of videos and audios?

  • luchinazo
    luchinazo 2 years ago

    damn! imagine the future! when we will be able to comunicate telepatically, you just have to record your voice once and then every thought gets syntesized so the other person can hear your voice in their heads or a full dive vr game

    • Anonymous
      Anonymous 2 years ago

      Could be used as a torture tool, using you past loved ones voice to make you confess to a crime

  • dThineni
    dThineni 2 years ago +2

    Your content, Dr. Zsolnai-Fehér, is by far my favorite media-wide. Thank you

  • çağatay odabaşı
    çağatay odabaşı 2 years ago

    You are the best channel on this website! Thanks for your works.

  • mozakimon
    mozakimon 2 years ago

    Is the 3D mask auto-generated by the AI? This is amazing

  • DeepSpace12
    DeepSpace12 2 years ago +5

    I think we need to hear the audio of each puppet too to feel the power of the technique...

  • Nixel
    Nixel 2 years ago +1

    So, I just started reading the 6th part in the Hitchhiker's Guide trilogy, and a VERY similar technique was just introduced in the story. This is literally Science Fiction becoming real.

    • Nixel
      Nixel 2 years ago +1

      @Generic Schlub Yup, that's the one.

  • CubixThree
    CubixThree 2 years ago +29

    Finally. We can now create the Ultimate Mario Voice.
    With the power of every single thing Charles Martinet has said voicing Mario, we can create a Mario Deepfake.

    • Almondatchy
      Almondatchy 2 years ago +4

      CubixThree Oh yeah

    • UncleBibby
      UncleBibby 2 years ago +9

      "So long, gay Bowsette!!"

    • Wolfram Stahl
      Wolfram Stahl 2 years ago +18

      So with what I learned from this channel, 3-4 papers down the line and we should have a browser plugin that changes the voice to Waluigi whenever Trump speaks.

  • MtTrips
    MtTrips Year ago +24

    WOW. MAKES ME BELIEVE MORE THAT WHAT WE ARE WATCHING WITH THIS ADMINISTRATION IS A MOVIE, LIKE WE ARE BEING TOLD.

  • After Arrival
    After Arrival 2 years ago +1

    I want a vst plugin of this voice synth so much!
    I could make a whole choir of different people with my voice.

  • Me
    Me Year ago +1

    Moral of story: You can only trust that someone is in a video if they verified it themselves. **Videos are no longer reliable evidence**

  • bread
    bread 2 years ago +10

    Standing by my previous statement, we're all in danger from this. Just remember this: Shitposting beats shitposting.

  • Jasc Tomm
    Jasc Tomm 2 years ago

    This could be useful for video narration. The artificial narration can sound much better and natural than today’s options. In essence, I can swap my bad voice with a pro voice for my videos without sounding artificial.

  • Pedro H
    Pedro H Year ago +1

    imagine you put ai to "imagine" the description from book and create a movie based on words..... that would make a lot of people to "read" more

  • Tasty Not tasty
    Tasty Not tasty Year ago

    This could be awesome for the film making industry.

  • Thedeepseanomad
    Thedeepseanomad 2 years ago +2

    All those poor de-aging solutions for actors playing supposedly younger versions of their characters will soon be a thing of the past.
    Data in star trek Picard anyone? Fixing him with deepfake tech should be a piece of cake today.

  • SteeVeeDee
    SteeVeeDee Year ago +2

    Basically we're going to have to be very skeptical in future of any video 'evidence' or reports of what people are supposed to have said.
    Having said that it leaves the way for some very funny content creation...

  • Капбэтут Александр

    I can't fully imagine what will be possible!Audio actors will have it easier,the text to speech industry will be more realistic,great to be young while the science is entering it's "Golden Age"!

  • thisisntsergio
    thisisntsergio 2 years ago

    I want my Google assistant or Siri or Cortana to speak in my voice. That would be amazing.
    Are there any apps out right now that can voice synthesize

  • Daisy L
    Daisy L Year ago +39

    This is one of many reasons why we can't believe anything we see/hear.

    • M K
      M K Year ago +1

      @Eavy Eavy can you be more clear ?

    • Eavy Eavy
      Eavy Eavy Year ago

      Wary about something that doesnt affect you isnt being aware, it is called paranoid at best, pretentious at worst.

    • Daisy L
      Daisy L Year ago

      @M K I didn't say that one has to see God to believe in him. I'm not sure what you are talking about. I am not an atheist.

    • M K
      M K Year ago +1

      This can be said to all Atheists that need to see God to believe in him

  • Warp Zone
    Warp Zone 2 years ago

    I used to be scared about the day when this technology would influence our politics. Now I see it as a relief. At least people will have evidence for the lies they believe, instead of believing it "just because."

  • Gerben van Soesjes

    Next time:
    This AI creates an android that looks identical to you with just 5 seconds of camera input
    What a time to be alive!

  • Heinrich Wonders
    Heinrich Wonders 2 years ago +4

    I love the first female voice. Could listen to her for hours, just reading laundry lists.

  • wizardchris90
    wizardchris90 Year ago

    Makes me wow how far some clever people have come.

  • LoVeLoVe
    LoVeLoVe 2 years ago +260

    So dope. Finally I can pretend to be a camgirl on MFC and make bank $$$.

    • JayImHere
      JayImHere 7 months ago +1

      This guy is going places!

    • Sara Mes
      Sara Mes Year ago

      MFC?

    • Zed
      Zed Year ago

      I laughed out loud to this!

    • Brian Kirkpatrick
      Brian Kirkpatrick 2 years ago +3

      @KuraIthys Well, if you can just get it right for 5 seconds, you can feed the result into this sucker and get the AI to output your girl voice on demand.

    • KuraIthys
      KuraIthys 2 years ago +2

      @randomguy9777 That's been well known in certain circles for decades.
      The issue is, not everyone can do it reliably.
      But overall, with enough practice, yeah...
      The key realisations are:
      1. All children essentially sound female
      2. When a boy's voice breaks, it happens in a way that can't purely be due to physical changes in the vocal tract. (if it were it would be a far more subtle transition)
      It follows then, that to a certain degree with careful control over your vocal chords, you can replicate your former 'child' voice, more or less. Which just so happens to be a rough approximation of what your voice would sound like now if you had been a woman...
      Pretty difficult technique to master properly though. Especially since figuring out the 'trick' to it the first time is not easy, and even then it's likely to seem unnatural, and thus it's easy to strain your voice.
      Plus you need to combine about 7 different things at once, (relating not just to the tone of your voice but also how you speak) to seem convincing, and slip up even momentarily and it's going to be noticable...

  • YourSkyliner
    YourSkyliner Month ago

    I kind of expected him to say that this video was voiced by an AI in the end..

  • diarykeeper
    diarykeeper Year ago

    Creepy.. but also useful,
    for deceased synchro voices.

  • Tallywort
    Tallywort 2 years ago

    I know you ask not to judge the audio at 2:28, but the synthesised voice makes it extremely hard for me to judge the matching video output.

  • Julian Sahne
    Julian Sahne 5 months ago

    Imagine the film industry for translation. You can have the same actor say different words

  • Just Thomas
    Just Thomas 2 years ago +4

    I've been using a google colab notebook to send friends audio of themselves saying quirky things.
    For some reason I couldn't get it to work properly with Cave Johnson's voice... Never was able to troubleshoot that one...,

    • Deivison Carvalho
      Deivison Carvalho 2 years ago

      How did you do this in laymans terms I want to transform my best friend in a gay camboy

  • Yanami
    Yanami Year ago +1

    Stuff like this should be illegal. We are more and more reaching the point where science is going from fascinating to scary. This is a paradise for criminals and abusers.

  • victoria & Johnny rodriguez

    Great work! Thank you

  • Rando
    Rando Year ago

    Tech's great in entertainment, but too dangerous for informative purposes

  • PhotonicSauce
    PhotonicSauce 2 years ago

    This is so similar to the mission impossible voice copying technology!!!!!

  • Jason
    Jason 2 years ago

    I want this as a VST to play in Ableton. OMG

  • Shui Kheng
    Shui Kheng Year ago

    Frightening !! What AI has produced

  • elvancor
    elvancor Year ago

    Is the voice synthesis not available yet? I was waiting for an example where someone appears to say something he never said, in his own voice.

  • Dissonance Paradiddle
    Dissonance Paradiddle 2 years ago

    Here's hoping you can take your own audio to copy inflection before long instead of text to speech

  • Leonardo SA
    Leonardo SA 2 years ago +13

    VR Chat will become great with deepfake voices

  • Sam Chen
    Sam Chen 2 years ago

    I wonder what this does to copyright/trademark law. As a public figure, would you be able to protect your identity by trademarking your face and voice? Would using TheXvid clips of someone to voice characters in movies be stealing? And would different video/audio readings of the same book created using Neural Puppetry be transformative enough to have its own copyright?

  • quaver
    quaver 2 years ago

    Wow... Deep learning techniques are borderline black magic now.

  • LacaMenDRY
    LacaMenDRY Year ago

    This really help to add the sound that I can produce.
    For my full movie animation.

  • station
    station 2 years ago

    had a chance to meet Justus Thies at SIGGRAPH back in 2017, each year he keeps making great headway in this area. Great stuff.

  • jwod
    jwod 2 years ago

    This is awesome for text to speech users! imagine getting obama to read novels to you out loud!

    • L
      L Year ago

      Obama? Never. Traitor in Chief is what he was.

  • camelCased
    camelCased Year ago

    Is there a project that supports cloning voice timbre without TTS? I've tried this one: github.com/CorentinJ/Real-Time-Voice-Cloning but it seems to generate only TTS text to the target voice. I would like to record my own phrase with my voice to keep emotions and inflections, and then encode it as if spoken by the target person's voice.

  • vix86
    vix86 2 years ago

    I wonder how far are we from making models take in an audio voice sample and using it for cues on intonation in the output. Once we can do that in near real time, we'll complete the circle of men impersonating women online lol.

  • Samer Kadoura
    Samer Kadoura 2 years ago

    I wish a similar method can be used to make music from basic midi notes.

  • Age richesse
    Age richesse 2 years ago

    Could you please make a tutorial to make those deepfakes videos ? I registered on wandb, downloaded python3, read the documentation on wandb but I can't do it

  • neatpolygons
    neatpolygons 2 years ago +3

    best part of every video you make is when you go "WHOAW"

  • تلفزيون العائله

    I wonder if I can simulate someone s voice (in a free software available on the net sites) ?

  • Random Guy
    Random Guy 2 years ago +3

    By seeing ur videos, its like i am seeing some science fiction movie 😍

  • Matthew Voke
    Matthew Voke 2 years ago

    Good stuff, love it!

  • Noorquacker
    Noorquacker 2 years ago

    We can literally just craft Zoom voice chats at this point holy frick

  • koolerpure
    koolerpure 2 years ago +2

    this technology is both amazing and scary at the same time just depends the intent for using it. imagine being able to help grieving families by letting them facetime one last time with their passed loved one

  • Enweave
    Enweave 2 years ago +1

    I gradually become more anxious every time 2MPapers uploads something new. Next week what? A 4k realtime video/audio deepfake based on text description?)))

    • UncleBibby
      UncleBibby 2 years ago +1

      "oracle machine, our biological human eyes have limitations, right? and we're stuck between the 3rd and 4th dimension... what does reality ACTUALLY look like?"

  • BreakMaker
    BreakMaker Year ago

    Károly, ez tényleg fantasztikus, gratula!

  • Bread
    Bread Month ago

    This seems Scary and Fun at the same time

  • Cloud
    Cloud Year ago +5

    I’m seeing this videos every where, it almost looks like something is about to being shown and then will cover by saying is fake 🤔

  • Credible Threat
    Credible Threat Year ago +1

    Interesting.
    Would it be possible for somebody to make a voice recording of, for example, Donald Trump reading from a McDonald's Menu or Boris Johnson singing The Prodigy's 'Firestarter'.

    • Credible Threat
      Credible Threat Year ago

      @S K I wouldn't mind finding out how, if you or anybody else has a link.

    • S K
      S K Year ago +1

      Yeah

  • Amanda Zeller
    Amanda Zeller Year ago

    Thank You! ! ! This is some scary stuff!

  • China Life
    China Life Year ago

    My friend said that A.I. can translate Stephen Hawking's facial expressions into text. I think it's not possible.

  • Harry Gaia
    Harry Gaia Year ago

    While this may seem cool or even entertaining, this can be used for evil as well. It could create a person as yourself and place you somewhere that you were not physically at or put words in your mouth that you have not spoken......"framed" by your enemies.

  • Prophet of the Singularity

    I have been using a Singing synth for vocals in some of my songs, just got the new version and this technology is improving rapidly, within 10 years I think there will be lots of songs with synthesized vocals, I have 2 songs that have no human vocals at all in them on my page, you can still tell that it is not real at least in some parts, but what I did not anticipate is that there are things you can do with this that are actually better than a real human singer or at least different because you can do things that a human voice can not normally do, like hold incredibly long notes, days if needed , computers do not have to breathe, and also has a complete and full range and you can switch from extremely low to high in microseconds.

  • GreatMCGamer
    GreatMCGamer 2 years ago

    I feel like the people working on these neural networks don't understand how internet works.
    Other wise they would not need to "disable" something for "mis use"