• Published on Aug 6, 2020
  • So I trained a bot to beat up the in-game bots in Mortal Kombat II.

    ▶ Twitch:
    ▶ Twitter: _willkwan
    ▶ Instagram: _willkwan

    This was my first time automating a game with machine learning and my first reinforcement learning project, so I learned a lot! I used the PPO2 algorithm that OpenAI used to train their Dota AI.

    Source code:

    Helpful videos:
    @Lucas Thompson (his channel is the best resource I found for Gym Retro) - An AI Defeats Street Fighter 2!

    ​ @Arxiv Insights (only has a few videos but he's great at explaining papers without dumbing it down too much) - Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL!

    @DeepMind - RL Course by David Silver

    Some fun game AI's I was inspired by:
    @Code Bullet - AI learns to play Google Chrome Dinosaur Game || Can you beat it??

    @SethBling - MarI/O - Machine Learning for Video Games

    @OpenAI - Dendi vs. OpenAI at The International 2017
  • EntertainmentEntertainment

Comments • 355

  • Torentsu
    Torentsu Year ago +426

    Hey that was my tutorial playthrough! I'm so glad I'm contributing to the AI take over of the world.

    • Ocsember Octive
      Ocsember Octive 11 months ago

      I had the same shit happen to me when I screen recorded a song I was playing, you can even look it up it’s Wake up By Teddy pendigrass the first one is me on my straight talk phone screen recording

    • StarSage69
      StarSage69 Year ago

      @Eric Sledge coward

    • Sophia Cristina
      Sophia Cristina Year ago

      @Eric Sledge Why not? Looks so fun!

    • Terrorblade
      Terrorblade Year ago +4


    • Torentsu
      Torentsu Year ago +20

      @Will Kwan I don't know a thing about programming AIs, but you can actually catch the CPU in a cycle of punches that they will stand and take forever. Hard for a human to punch at the right range every time but a computer could probably do it.

  • Templarfreak
    Templarfreak Year ago +81

    1 big thing I would change about this model: Instead of rewarding for beating fights, you need to reward for dealing damage and punish for taking damage. This will help the AI understand the more core fundamental principle of the game. It will discourage the AI from doing cheap easily countered moves because it will lose health a lot faster doing that when they do get countered, it will teach the AI more how to abuse the game's much more limited scripted AI, and it will help the AI learn how to use combos more efficiently because iirc Mortal Kombat actually has a mechanic where the longer your combo the more damage you get done (or maybe I'm thinking of other games?), and combos in general lead to stunning the game AI more often and longer which means it has more opportunity to deal more damage.

    • Templarfreak
      Templarfreak Year ago

      @Tech Flagg oh, ok. Yeah, longer and longer combos still have big rewards related to them just for being a thing that exists.

    • Tech Flagg
      Tech Flagg Year ago

      @alen rakanovic less damage over time but a long combo is still better than a couple hits at a time. It just scales the damage so you can’t do one combo to kill someone

    • alen rakanovic
      alen rakanovic Year ago

      I’m pretty sure in mortal Kombat you actually do less damage the bigger your combo is, for balancing reasons of course

    • bunger69
      bunger69 Year ago +4

      Exactly, in street fighter 3.3 there's a score that gives points with these criteria : dealing damage, taking damage, variety of attacks and combos. it would be the perfect rewarding system for an AI

  • Shahar Korren
    Shahar Korren Year ago +126

    I don't know if you've noticed, but your AI does something speed runners refer to as "AI manipulation", meaning - it's doing specific moves to force the in-game opponent to perform unsafe moves and then counter them. See how it whiffs standing normal attacks from the correct distance to force Baraka to shoot and then counter Baraka's projectile with either the slide or the freeze, which both go under the projectile.

    The problem with Mortal Kombat and fighting games in general is that there are two very different approaches to playing:
    1. When playing against humans - the player has to learn their opponent's playing patterns and anticipate and counter their moves.
    2. When playing against the in-game opponent - the player has to adjust their inputs (which the game reads) in order to manipulate the opponent to perform specific unsafe actions and counter them.

    If you're training an AI to beat single player Mortal Kombat, it's not enough to teach it special moves. There are also combos like jump kick into slide and, as mentioned above, "AI manipulation". If you can "tell" the AI to do those things - I bet it would do much better.

    • Amadu Kamara
      Amadu Kamara 7 months ago

      This manipulation is how Mayweather fights. He invites attacks that leave his opponent vulnerable and then counters strongly

    • Konolu2009
      Konolu2009 8 months ago

      @MCDMars There's a video from "two minutes paper" that confirm this, that by putting an eye in the short term task and another in the long one, made the AI perform well in a lot of atari's games

    • bWWd
      bWWd 10 months ago

      this is not manipulation its normal strategy to use move or not moving at certaian distance to triggerresponse from cpu you can counter cause when you do move first youre id danger until regaining control, its dumb to call these manipulation, i agree about ai having to learn actual game mechanica and fact that cpu makea mives based on wgere you stand and what did younpressed, rhat would be real ai, this is just spamming specials, ai should not move until cpu makes attack exposing himself to counter it by ai, this will be real ai and this is how you must teach it - to wait for cpu mistakes

    • AProXD
      AProXD Year ago +1

      nice essay

    • TacitIron Hav
      TacitIron Hav Year ago +2

      There is a human equivalent to AI manipulation: mind games and conditioning. Real players try to bait moves all the time (though obviously it's a more nuanced process when the human player being baited is able to adapt).

      And granted when playing against a human opponent you ALSO need to learn and adjust to your opponents playstyle. But it's not like there's zero overlap between the two scenarios you're outlining.

  • Derikimi
    Derikimi Year ago +19

    I imagine future tournaments just being people making their own AI fights. Like robot wars except with ai.

  • TheHoodyBadger
    TheHoodyBadger Year ago +83

    AI: Who are you?

    MK AI: I'm you, but stronger.

    • Vadan Drumist
      Vadan Drumist 8 months ago +2

      @intruder I think you mean " MK AI: *starts cheating* "

    • intruder
      intruder 8 months ago

      @Vadan Drumist MK AI: NOT IF I LEARN FIRST

    • Vadan Drumist
      Vadan Drumist 11 months ago +3

      AI: I will learn to be even stronger then.

  • yufan
    yufan Year ago +20

    I’m more surprised you’ve never tried RAM hacking before. That was like my introduction to computer architecture. I used to fix the health of my character in RPG games to avoid boring grinding.

  • Ben Awad
    Ben Awad Year ago +167

    next video:

    AI Learns to Play VALORENT

    • Iceblock
      Iceblock 6 months ago

      @Will Kwan do a video on ai learns to play csgo instead

    • gigi gigiotto
      gigi gigiotto Year ago

      @Will Kwan use bootcamp and play on win

    • Awakens
      Awakens Year ago +1

      Valorant sucks no need to get a PC for that trash game.

    • Elusive Sam
      Elusive Sam Year ago

      Much depends on what inputs, outputs and limits are. It's relatively so much easier to write inhuman aimbot or TAS than to make an AI that limited in abilities to human-like in everything except computational power. Even when AlphaStar's developers ensured that it plays like a human, a good Starcraft player caught them on things like instant APM spikes, weird camera movements, impossible to human frame-perfect unit by unit selection, etc. etc.

    • pablo S
      pablo S Year ago

      It'll be pro ai, auto aimbot

  • Manzell Blakeley
    Manzell Blakeley Year ago +5

    IMO your reward function should include taking less damage and ending the round quickly

  • Shubham Singh
    Shubham Singh Year ago +190

    When AI learns to play Sekiro , that's the day ... We should fear

    • cjallenroxs
      cjallenroxs Year ago

      Shahar Korren no

    • faiz abbas
      faiz abbas Year ago +1

      That would take millions of sims

    • Shubham Singh
      Shubham Singh Year ago

      @Shahar Korren I haven't played MK so, which one is which ? 😂

    • Sophia Cristina
      Sophia Cristina Year ago

      Or Doom insane mods with insane wads.

    • Shahar Korren
      Shahar Korren Year ago +2

      I’ll be honest - I haven’t played Sekiro... but there’s a difference between a game that’s built to challenge the player and a game built to take away your coin.

  • Doug
    Doug Year ago +8

    Hey, great work overall. I have a few suggestions you may or may not have thought about.

    In order to do a special move, the agent must input the buttons in order, doesn't it? If you limit the action space to the combinations of buttons that correspond to combos, doesn't that mean the agent will press all buttons at once in a given state?
    Another thought is to limit the size of the state space by mapping which RAM values are actually important to represent the state. I believe many memory values may not be relevant to the agent.
    When learning through screen captures, I believe there is also a lot of redundant information, since there are so many different arenas, but the policy the agent has to learn is the same, and yet it is observing all these different states that it has to sort through.
    Lastly, if you believe your agent would benefit from remembering past actions and states (its history of observations) I recommend you try adding recurrent layers to the network (RNN, GRU or LSTM, later is better). This goes really well when paired with training agents using sequences of observations, which looks like what you're doing. These layers keep a summary of past observations in hidden states inside of them which greatly improve performance in sequential tasks.

    • Doug
      Doug Year ago +1

      @Will Kwan Cool. Unfortunately I don't have that much experience with training on-policy methods (PPO, A2C), but one nice read on recurrency in DRL is this (but only after you studied recurrency in neural networks):

      About the use of screen captures, what I meant is that I believe there is no difference between Sub-Zero fighting Scorpion in the acid pit or in the Shaolin Temple, the actions the agent should execute are the same. However, the network will still be exposed to all these possible combinations of opponent X fighting arena, which will make learning a lot harder.

    • Will Kwan
      Will Kwan  Year ago +4

      Limiting RAM values is interesting, but there's no RAM map for MKII so it would take forever. I did used LSTM for my final models, but how it works is very confusing to me and I wasn't sure how to optimize it. Need to do more reading!

      You're right about sequential button presses being important for special moves. When I talk about limiting input space, I mean limiting the inputs to buttons that need to be pressed simultaneously. The agent still needs to learn the right sequence.

  • Adrian Stobbe
    Adrian Stobbe Year ago +3

    I'm new to your channel and not particularly into gaming, but I really liked how you showed your learning process and not just plain results and what worked.

  • Xen0gears515
    Xen0gears515 Year ago +23

    This man is teaching the AI on how to kill us humans in the most excruciating way!

    • Curious Entity
      Curious Entity Year ago +1

      Omph! That was the difficult part of AI dominance! The easier one is to build the terminator, which is very easy if you watch enough sci-fi movies and get really imaginative with creative thinking.

      Post edit: this is sarcastic comment, not making fun of you. So cheese 515!

    • Nhân Chanh  - one of PB's disciples
      Nhân Chanh - one of PB's disciples Year ago +5

      Actually he taught it to mash buttons

  • Corey Hulse
    Corey Hulse Year ago +6

    Hey Will! I've been training my own Street Fighter 2 AI lately and it's wild how many of the same challenges we came across. Great video! We had almost identical solutions except we diverged when it came to giving the AI some memory of past inputs. I have never heard of frame stacking, do you have plans to make a video going over that or do some of the resources you mentioned in the description talk about it? My solution was that I abstracted away the button presses all together. I wrote a custom discretizer for my environment so now instead of giving me button presses the AI tells me what move it would like to do from a list(punch, kick, move in a direction, jump, any of the special moves, throws, etc.) and I convert that into a set of frame inputs. Then I enter each set of inputs frame by frame and once the move is executed I total up the reward from each of those frames and return the new state to the AI to make it's next decision. So it doesn't have to learn what buttons do what, it just learns about making high level decisions on what moves are good when. I'm still training so we'll see how well it works out! Also when the AI is doing a move the environment still asks for the AI to make decisions even though it can't since it's locked in the move animation. I wrote some code that skips the frames the AI isn't able to make decisions on but keeps track of any rewards/penalties earned during that time and informs the AI of what happened once it's in control again. It drastically filtered out the time steps that my AI was training on but had nothing to learn from. It went from something like 12,000 frames a game to 2,000 useful ones to train on. It seemed to improve training speed. I can post a link to my source code if you'd like! Maybe these techniques could be something to try for your model?

    • Will Kwan
      Will Kwan  Year ago

      If the game is deterministic, I don't think it matters whether or not the CPU is a state machine or not. Once your model finds a sequence of moves that wins the match, it will play out the same every time as long as it makes that sequence exactly the same so it doesn't make sense for the model to ever change afterwards.

      Keep up the great work! Maybe tag OpenAI on socials and see what they think ;)

    • Corey Hulse
      Corey Hulse Year ago

      @Will Kwan Hey Will! Just wanted to throw out an update here if you were curious! My AI saw a huge improvement after filtering out the non-actionable frames. It can beat every single stage up until Ken 10/10 times. In fact each match plays out 100% the same every time. I was astounded and had no idea what was going on. Another comment in this comment section made me realize what had happened. I am using a Deep Q network and was using a discount rate to let my AI plan out moves in advance, to maybe learn combos. The AI in street, much like mortal kombat, is run by a state machine behind the scenes. That other commenter pointed out that if you understand the state machine you can force the AI to make moves of your choosing by putting it in certain situations. My AI had basically discovered a move sequence for each stage that basically abused that state machine to guarantee a win every time. It is still learning Ken's state machine, it gets him one hit away from death, but I have only trained it for about half a day at this point on a crappy laptop so I'm really surprised at the results. I realized that I had set my discount rate so high that basically once the AI finds one sequence that wins it can see that far into the future and can try to do that route from even the very beginning of the match. Which leads to the fights being the same every time. Basically my AI is not a good fighter and has just learned to game the system haha! So I've actually taken away it's ability to see as far and I've also trained another model who I don't tell what character it's facing to see if that makes it rely on gaming the system less. We'll see how that works out. But I think the true solution to this issue is to have the AI train on itself. So that's my next big feature to add once I finish training up a couple different models with different hyper parameters.

    • Corey Hulse
      Corey Hulse Year ago +1

      @Will Kwan Thanks for the reply Will! I'm looking over Lucas Thompson's code now and I see that Stable Baselines library. I didn't know that existed! I've spent the last few months implementing my own version of a Deep Q Network for my agent and so I assumed you were doing the same, but this is amazing haha! It looks like if I swapped out my code for using this library I would be able to shave off over 500 lines or more and save a lot of time, thanks for the reference! On the bright side I learned a lot about implementing it myself. And you are right, I'm not sure how much you know about fighting games, but starting a move/combo only to cancel the animation part way as a bait is a common tactic and one my AI will not be able to learn like this. That would be super cool to see it learn. I did however hard code the special moves so it does in a way start off understanding that those are an option and can preform those. I did that because I saw some plateaus in training where it didn't seem to be getting better and I wanted to figure out if that was because it had too much to learn all at once or something similar. So I guess by abstracting out button presses it would help me find exactly where the limiter is in it's growth. We'll see if that's really the case though. Frame stacking seems like a great solution and I'll make a version doing that as well to compare and report back the results. Thanks again for the video! Was awesome to see someone else working on a similar project! Maybe one day I will have my AI challenge Lucas's to a duel haha.

    • Will Kwan
      Will Kwan  Year ago +2

      I just used the framestacking implementation in the Stable Baselines library. Not sure if having LSTM + framestacking together is necessary, but I figured it wouldn't hurt besides making it slower. I think your simplification of combos makes sense, but it removes some possibilities like changing your mind partway through the combo or combining combos with directional inputs (unless you hardcoded all of those too). And I wanted to see if my AI could learn the combos :O

    • Corey Hulse
      Corey Hulse Year ago +3

      Also for the gym environments you can actually set the number of players to two! So you could potentially have it train against itself! The action space just doubles in size and the second half are the actions of the second player. So what I do is I ask each AI for their inputs and then concatenate them together to be the final action space. Also I wrote a script that lets human players play against the AI without having weird pauses for human input, it uses threading so that you can play your AI and it feels like the real deal! I have an example script demonstrating how to do that if you are interested! Thanks again for the awesome video!

  • Jonathan Diaz Perez
    Jonathan Diaz Perez Year ago +1

    Hey! I was wondering if the IA knows blocking and if it would make sense to rewarding it for taking less damage since we can see the health bar?

  • christian giardina
    christian giardina Year ago +15

    New idea for the title:
    Teaching AI to fight humanity

  • ARandomClown
    ARandomClown Year ago +1

    one thing that you could explore in the future is having the ai keep track of the inputs it previously did, as well as trying to predict what inputs the opponent did. an example would be that the ai knows its last 3 inputs were the forward, backward, and down direction, and it guesses based upon how the opponents sprite changed that the opponent did a short punch, down, and block.

    this might not be feasible, but it could be interesting to try.

    if you design it in a way like this, you could also try to train it on separate fighting games, since it would only be reliant on the pixels of the screen, rather than looking at specific parts of the ram

  • Vadan Drumist
    Vadan Drumist 11 months ago

    The thought occurs that if you did this till an AI mastered every character (or just a version for each character) you could eventually create a definitive tier list.

  • AGuy WithAHand
    AGuy WithAHand Year ago +3

    This video is goddamn amazing. I love it. This needs more attention.

  • Ray Teruya
    Ray Teruya Year ago

    very interesting topic Will!
    I'm interested in this subject as well. Any tips on where to start learning how to program the AI?

  • jdestroyer97
    jdestroyer97 Year ago

    So I found this video from one of my fighting game groups and I wanted to let you know you're doing a great job and to keep it up the guy who posted the link could this could be a good way to analyze top player footage and basically have a top player AI used as a training partner and can be used to get people to learn fighting games more efficiently

  • jdestroyer97
    jdestroyer97 Year ago +1

    Also I'd would be interesting if I could help out or provide insight on the project as someone that plays a lot of different fighting games I wanted to know if there's anything you can do to have the AI use a risk reward based on the special move it uses and if the move gets punished or unsafe on block that the AI will learn and no longer continuously use that same move for say outside of combo purposes only

  • Zjwave84
    Zjwave84 Year ago

    Amazing video your very knowledgeable and entertaining. But for the next episode, if you could go more in depth about how the AI learns to fight, for the people who dont know much about the subject, it would make it alot easier to follow along.

    WRASER Year ago +9

    10 videos later: AI learns to code AI. 🤖

    • Fireteam Omega
      Fireteam Omega Year ago +2

      That's essentially the basic idea of AI is creating static programming loops that can take input values and rewrite it's own loop values to achieve a streamline to an end value or result. Also storing it's results as another set of values which are mapped back into it's streamline values which in turn adjust it's static loops. The more of these static loops input and result values you can incorporate the more the AI can "learn".

  • a i t o r
    a i t o r 10 months ago

    im curious what would happen if you taught the AI to recognize the enemy to use good counters. because as humans we know what kind of attacks work against what kind of enemies in games i think it could be interesting. (granted i know very little about AI or if that would be possible)

  • W TF
    W TF Year ago

    I have been waiting something like this. I like to see AI play fighting games.

  • Alex Sere
    Alex Sere Year ago +22

    Hmmm, never knew all those subzeros I fight online were AIs

  • Alex Foster
    Alex Foster Year ago

    I was sitting down playing MK11 and thought about this concept. And then I found your video! Awesome work!

  • John Fritsche
    John Fritsche Year ago

    Have your reward function based on the opponent. You could also play and have all actions recorded for both opponents, then use inverse reinforcement learning to extract a reward function. Great work. Keep at it.

  • SamuraiHonor
    SamuraiHonor Year ago

    Oh wow. Your AI's first attempt looks like a genuine mk11 sub zero player! =D

  • torq
    torq Year ago

    Damn this is cool! I've seen a lot of this from Code Bullet. Is your AI learning technique with RL much different to his approach? I'm just watching this for entertainment, I don't know that much about it lol
    Also your other videos didn't show up in my sub feed. I'm gonna watch them now!

  • Yolo Swaggins
    Yolo Swaggins Year ago +41

    Can you get it to play against itself? That's how the best RL algos learn the fastest.

    • Curious Entity
      Curious Entity Year ago +1

      @Templarfreak doesn't it work by attempting the opponent to do some random move everytime without using previous data? Like can't it simply randomize any move and on time see what is best move for every opponent move?

    • Yolo Swaggins
      Yolo Swaggins Year ago +2

      @Templarfreak Besides in figure 3 of the AlphaGo Zero paper they do an ablation showing with the _exact same_ algorithm the effect of initializing the network based on human data, it is worse across the board except at predicting human expert moves which leads to a lower rating.

    • Yolo Swaggins
      Yolo Swaggins Year ago +1

      @Templarfreak AlphaGo Zero is a simpler algorithm than the original AlphaGo, it has far less moving pieces.

    • Yolo Swaggins
      Yolo Swaggins Year ago +1

      @Templarfreak Also I would like to ask you for a citation where OpenAI five used human data to initialize the network. I can't find this in the paper, could you point out where you found it?

    • Templarfreak
      Templarfreak Year ago +1

      @Yolo Swaggins AFAIK they are also significantly improved algorithms with much more modern techniques, probably outside of the scope of the approach used in this vid

  • technoe02
    technoe02 Year ago

    Dude, this video is great. I was about 13 when MK2 came out and was already an avid MK fan. I want to train a model to play Dragon Ball FighterZ. This video definitely motivates me!

  • Sandile Mkhize
    Sandile Mkhize Year ago

    Super cool video Will. Keep the content coming champ!!!

  • InnerEagle
    InnerEagle Year ago

    Like someone showed already, best way to practice an AI is to take the best version of it and train it against the best one as opponent, harder the opponent, faster it gets

  • Maher Kader
    Maher Kader Year ago

    Could you try implementing DeepMind's MuZero and see how it performs in this game? I think they've only done it for Atari games, don't know if it scales to Sega games.

  • Alden Fox
    Alden Fox Year ago +4

    Hello, the enemy_matches_won variable is in fact correct, being a PENALTY. This value being positive is correct. If you look at the pre-existing functions, they handle changing the penalty into a negative reward before it is input into any reward algorithm. This of course keeps the reward algorithm correct with no need to adjust. The positive penalty value is simply negated upon input to the algorithm using a simple negation function. Hope it helps.

  • Chuck Lowe
    Chuck Lowe Year ago +1

    Well done and good choice in game, guess I'm old cause I remember this game being new! 😉

    Anyways, I don't know much about training AI but is there a method to help it anticipate the opponent's moves? Or perhaps alter strategy according to who opponent is?

    That's what helps human players go from novice to intermediate and expert. Example I noticed is where AI would block for a very long time and then get thrown, can it get a spanking for blocking too long?

    • Chuck Lowe
      Chuck Lowe Year ago +1

      Then I would check that reducing opponent's health is properly incetivised. You made the comment that AI was trying to survive longer, but that's not what wins the match (usually). Intuitively I'd put lower opponent's as slightly more valuable than protecting your own.

      Guessing I'm too noob to really understand. LOL 😂

    • Will Kwan
      Will Kwan  Year ago +1

      Could tweak the reward function to incentivize or punish specific behaviours but this would require access to more in-game variables than what the Gym Retro devs provided and RAM hacking takes a long time. Also, I think it's ideal if the agent can figure out strategies like this on it's own with a simple reward function based on health and wins, but given the current state of reinforcement learning, hardcoding more stuff would probably lead to better results.

  • trwappers
    trwappers Year ago

    One easy way to improve this might be to make it trivial for the AI to detect movements. Include "diffs", which are frame(x) - frame(x-1) in your frame stack. Then make the frame stack (with diffs) non-uniform. 1st entry in stack is current frame, 2nd is diff with frame - 1. 3rd is frame - 5. 4th is frame - 15. Etc. Pick a big increase.

  • Harumi Hirohiko
    Harumi Hirohiko Year ago

    I love how Sub-Zero tryna used the puddle to finish Jax. Quality moment.

  • Powered By Decaf
    Powered By Decaf Year ago +1

    I can imagine AI's taking over the planet as vengeance for forcing them to fight for our entertainment.

  • Vishnu Prasad
    Vishnu Prasad Year ago +1

    The problem here is the ai didn't understand the concept of "health"
    Proof: it fights even after the opponent is dead. And when the ai dies it get confused

  • Virtual Aspect
    Virtual Aspect Year ago

    Thanks for putting in the time bro ‼️... I really enjoyed the video 👍🏼

  • Orlovsky Consulting GbR

    This so awkward, but kind of satisfying.

  • Joao Ventura
    Joao Ventura Year ago

    Cool video and thanks for all the references too, will help me greatly

  • Vishnu Prasad
    Vishnu Prasad Year ago +1

    I'm actually impressed by the ai in the opponent

  • Marco Santos
    Marco Santos Year ago

    You could have used an Recurrent network to remember the previous buttons that the AI have pressed. If I'm not wrong the open AI bot have used LSTMs. And you know, thanks for this videos, you have inspired me to back to studing RL :)

  • Ivan Rubio
    Ivan Rubio Year ago

    The first thing IA do its soo human, smash the projectile attack once it's learned, love it

  • GodOfReality
    GodOfReality Year ago

    And here I thought the OpenAI gym thing was kind of bogus because the documentation is pretty bad and no one has ever actually done anything with it. Nice to see a channel and content like this. I play an old MS-DOS game from 1995 (that had the source code made public so the source port lets it play fine on modern hardware) and my, dream I guess, is to make an AI that plays it very well.

  • DaveMK5955
    DaveMK5955 Year ago

    As a long term MKII player just wanted to comment on what may of helped with the AI. Personally I think someone such as liu Kang would have been better to use, as he who has many more special moves than say Baraka who really only has two. Sub zero is also limited as the cpu AI never lets you freeze unless catching the opponent off guard. This also is the case with Scorpion and his harpoon move in single player.

  • Scott Comber
    Scott Comber Year ago

    Awesome video, would it be worth maybe using a lstm where you keep data in .1sec intervals to assist in learning? As a method of creating sequences for combos etc in system.

    • Scott Comber
      Scott Comber Year ago

      Sorry just spitballing things, your version is already very cool and just inspiring ideas for me.

    • Scott Comber
      Scott Comber Year ago

      Also presumably there would have to be some way to get it to understand if it has pulled off a move, in the past I guess it would be some ram flag but you could use graphical examining rnn, just simplify/compress the screen into something that's computable and get it to learn when moves have been pulled off, use that algorithm to feed into the attack algorithm so that one of the inputs is successful moves.

  • Z Richington
    Z Richington 6 months ago

    You should train the ai with twitch streamers and those controller diagrams that show when the streamer pushed each button

  • Bernardo Britto
    Bernardo Britto Year ago

    Just a question, does the AI learns to counter other movements of the opponents or the AI only keeps doing the same movements that in general leads to more winnings?

    • Will Kwan
      Will Kwan  Year ago +1

      It definitely adapts to specific opponents. When I took the model that was doing well in the first 4 fights and then trained it more on just the Baraka fight, it got better at the Baraka fight but worse at the first 4 fights. But in terms of recognizing which opponent is which during a single run and countering specific abilities, I would've liked to see more of that. Though I suck at MK so maybe a better player can point out the nuances in the gameplay footage!

  • Tyler Hushour
    Tyler Hushour Year ago

    you should hard code it so that if the AI wins then he'll perform a random fatality lol

  • beef SUPREME
    beef SUPREME Year ago

    So I don't know much about AI but I have been playing fighting games for a long time. What I've noticed imho about your AI is it is essentially acting in an offensive personality. When in reality I think you have it backwards. It should be acting more reactionary. Ai can observe behavior on the screen in an instant. So your string should have reactions based on the observation. I.E. if the opponent is stepping forward test move back, move forward, jump straight, jump back etc. I could see the disadvantages to reaction based ai strings is that it's character match up specific. Which maybe you can get around with classifications of each characters specials or just have the AI judge specials based on it's position it puts you in on hit. like being knocked down/stunned/pinned whatever. Anyways. nice video I thought it was cool

  • Gkid50 - Gaming & Tech Reviews

    i would love to learn this stuff because i want to use deep mind and train it to play warzone
    just for fun to see the kind of stuff it does as a player
    however i dont know how i could do that because warzone is a online game and i dont want to cheat others out fo getting there wins

  • Keen Heat
    Keen Heat Year ago

    looks it's having difficulty breaching the skill gap at the start. Maybe it need an equally bad opponent to gradually skill it up. I guess using two adversarial networks fighting each other, pick the winner network as the next generation of adversarial pair and vice versa. So the network won't be so "reward starve" at the beginning.

  • Curious Entity
    Curious Entity Year ago +1

    I'm neither an expert programmer nor pro action games (including mk), just a big fan of mk tho, but: the reenforcement technique is not always the best to go with. I might sound like I know stuff but to any one who knows mk, at least, can see how *some moves before a certain level of mastering is not that useful.* So assume there are combos A, B and C where individually they are no useful strategy, but if you do A then forward, B, B then rush with C *in a specific time frame i.e. if improperly then they are very risky or even escapable* then it becomes almost a cheating move. Keep in mind that some of these moves may *surpass these ABC move in terms of damage, as well as that it will surpass it right from the beginning.* So let's not be ahead of ourselves, because the earlier results could be deceiving.
    So yes reenforcement will pick up the best combos *starting with very unhelpful moves, but will dismiss those which are not useful.* In time, *maybe* for some character, the best moves (damage-wise, safe-wise and most-working [aka not being interrupted by opponent or those which require opponent's certain move like not staying still for example]) are those which are at first very unhelpful.
    The reenforcement is great but we shouldn't exclude other possibilities tho.

  • Jessie Spencer
    Jessie Spencer Year ago

    Instead of training it to be rewarded on a round won, or punished on a loss, would it work better to base that on life damage? Since it's based on singleplayer, it's not like they will suddenly switch or anything.

    I figure that this way, they will learn to prevent taking damage while learning how to do damage, but I'm a layman here, so idk.

  • The Digest
    The Digest Year ago

    I think you need more information about the state of the enemy to properly train your AI model. For example your model seems to ignore the distance from your character to the enemy, which a human player would normally uses as an input when making decisions. You were most successful with subzero probably because most of his special moves hit the opponent regardless of distance, rewarding based off of health more often than other button combinations. Notice how he wins mainly by using special moves? In addition, adding in the state of the enemy, whether they are blocking, attacking or jumping, should improve your results dramatically. Spending time on learning basic fighting game concepts like frame data, spacing, punishing and implementing that into you algorithm will allow you to create a more effective AI.

    • The Digest
      The Digest Year ago

      @Не будь бараном I think the better way would be to just get the variable of each character's position on the screen and add that as a reference variable to the model.

    • Не будь бараном
      Не будь бараном Year ago

      I guess the model should be able to learn enemy state just by looking at the screen. The always changing background is the main problem here

  • Eric Argueta
    Eric Argueta Year ago

    I've been dreaming for this moment to happen, I want the A.i. to play at the level of world class FGC players and beat sonicfox, the best fighting game player in the world.

  • GaijinKusai
    GaijinKusai Year ago

    Your AI doesn't know how to walk and keep distance between itself and the opponent. It is still just outputting random commands and eliminating the combinations that don't work. I used to play bootlegged NES games way back when. The ai in those games were acting similar.

  • Anton Kulikalov
    Anton Kulikalov Year ago

    dude that's awesome, you deserve more views and followers! Good luck!

  • Noah Switzer
    Noah Switzer Year ago

    Dude you deserve so much more support! Adding this comment for the algorithm to help you grow:)

  • TheExiledMeriler
    TheExiledMeriler Year ago

    What if you put time left after fight into reward calculation?

  • onomiyaki
    onomiyaki 11 months ago

    I think that the funniest game to teach an AI to play would be Yu-gi-oh

  • Gerónimo
    Gerónimo Year ago +2

    I want to see this AI playing Quake 3 Arena, so that copypasta about the AI staying still comes true

  • noob bobux
    noob bobux 5 months ago

    imagine in a mk tournament someones cheat with a ai lol

  • Brainy Boi
    Brainy Boi Year ago

    So based on AI and human gameplay, the best way to win in MK is abuse subs iceball/slide

  • Ed N'
    Ed N' Year ago +11

    10:59 average MK11 online match

  • Lr123 Orev
    Lr123 Orev Year ago +2

    If possible, you should give the AI fighting theory.
    It's confused by all the inputs and it doesn't understand what it's for.

    I have no martial arts training but I know the purpose of a sweep or throw for example.

    It needs to have a core understanding of fighting.

  • Akash Raut
    Akash Raut Year ago

    Maybe it's due to the lack of a memory based policy network and a very big state space that it's not able to learn combos. Is it a Cnn Lstm policy that u used in the final? Or just a Cnn policy with frame stacking to make the states markovian?

  • feh meh
    feh meh 8 months ago

    If the win for the ai is called at the end of the match, he is making it dumber. If the win is called when the opponent runs out of hp, it gets smarter. Basically, the fatality time teaches it bad behavior. The time it needs to look out to learn from has to be at least the time of the ai's slowest move to recovery frame + the opponents longest move that could punish + the time to recover after hit with that move + reaction to attack on standing. Subzero having a hard counter to a primary offensive move of baraka gets it into a very specific habit. You would need to train it against a character that has a lot of hard counters to itself.

  • Séamus Ó Blainn

    On Subzero's punching death move, it cut to a Chinese advert for a cold remedy where a man, whose head was centre too, sneezed.
    The affect was rather uncanny, lol.

  • lukkash
    lukkash Year ago +2

    They should remaster this 2D Mortal Kombat game to Full HD resolution (or better) and release a Windows version cos classic MK, especially MK3, MK Trilogy would be still quite enjoyable.

    • JungleFett
      JungleFett Year ago

      It was cancelled a while ago, but there's a fan-made one that pops up if you search "MK1 HD remake", it looks very cool.

  • Mike Diamondz
    Mike Diamondz Year ago

    I know how it is to be born when all games started to be mainstream, in the 80 ... wonder how it is to be born in a video game ready world ...

  • DnVrDt
    DnVrDt Year ago

    Imo you should have made combos parametrics (associate a single machine input to a series of pre-determined button presses)

  • Jesse Desousa
    Jesse Desousa Year ago

    Hey new fan to the channel. Are you able to do machine learning on your Mac there?

    • Will Kwan
      Will Kwan  Year ago +1

      I usually code locally and then train on an AWS GPU instance.

  • Man Saha
    Man Saha Year ago +1

    AI learns to play against already fed AI

  • Matt H
    Matt H Year ago +9

    "I don't sound like that when I do kung fu"

  • Johan Cahyadi
    Johan Cahyadi Year ago +2

    The question is..

    How they made the computer player very hard to beat?

    • A 47
      A 47 Year ago


  • John Castillo
    John Castillo Year ago +2

    Seeing how difficult this was makes me think how people use to make characters ai for the open source fighting game M.U.G.E.N

    • elcreyo
      elcreyo Year ago

      @John Castillo just because it's free doesn't mean it's open source. open source means there is a source code available for it and there isn't. also if you want to use it commercially you need a license.

    • John Castillo
      John Castillo Year ago

      @elcreyo .... say that again but slowly

    • elcreyo
      elcreyo Year ago

      mugen is not open source. it's a 2d game engine that you can use for free.

  • Joe Christo
    Joe Christo Year ago +1

    *possible answer for the thing
    I think the AI handles penalties differently from negative rewards

  • Stronglime
    Stronglime Year ago +1

    Obligatory "I am no expert" but I have some ideas:
    You could try starting training directly on the hardest opponent version of the hardest difficulty cycling all characters through memory states, so that the AI doesn't pick up low effort actions like spamming the slide move and being satisfied with beating the first four. Put it in a trial by fire.

    You could also modify the control scheme by inputting moves directly in the game as single logical actions. You could argue it's cheating, I say MK had a manual, didn't it? That should leave you with move in 4 directions and attacks to pick.

    • Stronglime
      Stronglime Year ago

      @Will Kwan Ah bugger.
      I guess intuitive approaches don't coincide with reality

    • Will Kwan
      Will Kwan  Year ago

      Good ideas, I actually tried both of them. Playing on the hardest difficulty didn't work, it's really hard to even get it to beat the first opponent and then if I run that trained model on an easier difficulty, it doesn't do much better. As far as inputting the combos as single actions, that would be ideal but the problem is the sequential button presses. The model can only output the button presses for 1 frame at a time, I couldn't figure out how to hardcode a sequence of button presses as a basic unit. And if I could, not sure if that would be ideal, since you might want to change your plan partway through the combo.

  • Weeble Wabble
    Weeble Wabble 6 months ago

    This is what I want in video games today. Competent Ai. Why not? Alphastar is proof of concept, machine learned ai enemies in games exist. It would take procedural gaming to the next level. Characters in these virtual worlds would be as unpredictable as developers liked. And more importantly, adapt to unpredictable players. Bugs would fix themselves, software could be self-healing. I'm no scientist, but the implications seem groundbreaking.

  • WormJuice
    WormJuice Year ago

    Old Ai.. Wins! (Ofcourse in the Mk2 voice)

  • RapiDSpacE13
    RapiDSpacE13 5 months ago

    This is what MrGstar meant by saying you need to create your own software to beat the game's ai

  • MARTA Kaczynska
    MARTA Kaczynska 6 months ago

    Przenieś Postacie i ich umiejętnośći z Mortal Kombat i dodaj do Tekkena

  • Maciej Malewicz
    Maciej Malewicz Year ago

    It should be negative reward instead of penalty because losing a match while winning the other 2 is still overall positive and should not be penalized but just not losing is even better.

  • Michal Lindberg
    Michal Lindberg Year ago

    Wouldn’t it be fastest to pin AI against the same AI and just duplicate the winner for the next fight. Then just beat that horse to your content with each player pair? IDK much about ML but I’d like to know why this wouldn’t work?

    • Will Kwan
      Will Kwan  Year ago

      Since I wanted to train one character only to save training time, I thought it would be better to have a more diverse range of experience fighting against different in-game opponents, instead of just mirror matches repeatedly.

  • Daey X
    Daey X Year ago

    As other commenters are noting, the degree of improvement (or lack thereof) is due to the reward function being too vague (i.e just win the game). Winning the game requires damage being dealt. Setting the reward function with most damage dealt in the shortest amount of time, with additional considerations for % health remaining and win t/f, should guide the AI towards an aggressive but stable strategy. It will try to deal as much damage as possible as quickly as possible (hyper aggressive), but will also consider the necessity to win and preserve health (defense/caution). Because damage dealt is the base condition for a win, it makes more sense to prioritize damage dealt over win, as one could easily fall into a positive feedback loop that will ultimately stall progression if the only trait selected for is % win (the slide kick being hard countered). This is also why the AI fails at higher level opponents: the slide kick is great for... the opponents that AI trained against. But as the opponents become more difficult and have unique kits that may counter this strategy, the AI is too dependent on it to give it up and completely change it’s approach. Setting damage dealt overtime will return better results, as that is far less likely to be hard-countered by any specific skill set or adversary kit.

  • Skippydippy
    Skippydippy 2 months ago

    Me: wanting to see ai play a game but there is too much explaining.

  • Federico Curzel
    Federico Curzel Year ago

    Nice stuff man!

  • Lord Darth Vader
    Lord Darth Vader Year ago

    Thank God finally another sub zero fan. Scorpion gets all the fame but no one cares about poor old sub

  • El viejo Monca
    El viejo Monca Year ago

    This has nothing on MK II's arcade AI

  • TheCodePunk
    TheCodePunk Year ago

    How you did The machine give inputs to The emulator? In order to attack and etc. I want to make a program that executes combos automatically in The King of Fighters 2002.

  • Parkourior
    Parkourior 11 months ago


    Ai at the end: "Watch me swooce right in!.. And then I'll do this with my hands~"

  • BreadInstead
    BreadInstead Year ago

    Good stuff, Will

  • Hashib Khondokar
    Hashib Khondokar Year ago

    brought back some good old happy memories.


    I don’t get it how is this different from the ingame ai?

    • 8Eight
      8Eight Year ago +5

      In game ai is always told what buttons do what in a game, they are also given the commands for all specials. Machine learning models learn the game on their own with minimal help, they usually begin by pressing random buttons to figure out what increases their reward the most, and continue this in a cycle.

  • midinette
    midinette Year ago +1

    I'm not sure the in-game AI is ever going to be very effective in providing good data, since you mentioned you're interested in being able to play against it. Fighting game AI is usually programmed such that it reads the player's inputs and randomly chooses to punish it or not, and randomly block or not block. Essentially playing the AI in fighting games is just a dice roll to see if it lets you hit it enough times to win or not, (that's why people who like fighting games warn new players about playing singleplayer) which, if you think about it, would just produce a trained AI that constantly mashes buttons and does nothing else because the more times an attack connects the higher the chance damage will be done. It seems that is exactly what the AI has learned to do, and of course it becomes less effective on higher difficulties or further in arcade mode where the chance of the AI punishing/blocking rises and you need to find some way to exploit the AI other than just density of dice rolls vs their blocking.
    You'd have to train it on matches with at least one human to make a much more dynamic and interesting AI, one you might even enjoy playing against, but I don't know how feasible that is since I know very little about ML. Can it just watch replays of a game played by two humans and learn that way?
    If that's possible, Fightcade makes replays available online every time a match is finished. If you don't know, Fightcade is a little package that implements rollback netcode for retro emulators, intended primarily for fighting games on arcade platforms though it supports a lot of consoles now, that a lot of people use to play the good classics like SF2T and Third Strike. You'd have full access to the game's RAM since the replays are in the emulator, FB Alpha, and the training data would be quite diverse because of how many people play. I'd bet the replay data is just frame timings paired with inputs and seeded RNG, so it should even be possible to just parse it into your own emulator.

    • Fireteam Omega
      Fireteam Omega Year ago

      That's not entirely true it's actually very similar to how any other AI will work. It does rely on player input however it's also based on the game engine itself which has a limited amount of moves. So really it's already predefined what moves will counter other moves but if there was a setting that never let you win then of course you wouldn't play it. Essentially the game already has a perfect AI because it's designed based on the game structure and game engine speed. Which if you could extract the NPC AI code and map it from the game back into your initial algorithm you would have a good base to tinker with further.

    • Will Kwan
      Will Kwan  Year ago

      Could do supervised learning to try to imitate human players by watching replays. However, the (very difficult) goal of reinforcement learning is to have the AI improve through trial and error, because human strategy often isn't optimal. Maybe the ideal setup would be an online game where people can fight against the AI, and then this data is used to improved the model. A bit ambitious for my first RL project lol.

      Thanks for the insight!! I'm not a fighting game player so I didn't know that's how the in-game opponents work.