AI Self-Learned To Play Thousands Of Games, Winning Over Human Players

Dhir Acharya - Jun 04, 2019

Though this was previously considered too complex even for algorithms, an AI agent has been developed capable of winning at online multiplayer games.

Although this was previously considered too complex even for algorithms, an AI agent has been successfully developed which is capable of winning at online multiplayer games.

In particular, a group of scientists at DeepMind research company, owned by Google, managed to produce AI agents to play in Capture the Flag - a variant of Quake III Arena. In this game, two teams go against each other in random environments, they have to find and take the enemy’s flags around the map.

The team, led by Max Jaderberg, used reinforcement learning technique in parallel gameplay to build the agents.

After being trained through 450,000 games, the bots could win over professional human players.

A-scene-from-the-game-in-which-AI-agents-rule-supreme-1 — A scene from the game in which AI agents rule supreme

There are three machine-learning paradigms, unsupervised, supervised, and reinforcement learning. The last one doesn’t involve definitive input-output pairs, and it doesn’t require erasure or correction of imperfect actions.

Rather than that, it balances between exploring an unknown domain and exploiting any knowledge gather about it, perfect for the conditions that are always changing among a load of agents, like the ones in an online multiplayer game.

DeepMind aimed at creating agents that self-learned with the same initial information as a human player, which meant there was no ability to communicate or share notes outside the game, no policy knowledge, while previous iterations provided the software models with the environment or other players’ state.

The company optimized the process by putting agents in a lot of games at the same time, putting together the results to get an overview of what tricks and tips each agents has obtained. Then they distributed that knowledge to the next generation.

CTF-task-and-computational-training-framework-2 — CTF task and computational training framework

The agents, like human players, gain experience about the strategy that they can apply in a new map though they don’t know about its layout as well as topology, or the position and intent of other players. Jaderberg and his colleagues wrote:

The reinforcement learning process has two steps, including optimizing the behavior of one agent for rewards, which is then matched with the entire dataset’s hyper-parameters. They replaced underperforming agents with mutated offspring internalizing the lessons grabbed around the board, aka population-based training.

What they got were remarkable. The agents exceeded human in the games even if the system slowed their reaction times down to human’s average levels. After practicing for hours, human gamers couldn’t beat AI agents in any more than 25 percent of attempts. What’s worth noting here is that the agents learned and applied the same tactics which human players often used.

However, the key may lie in the parallel, multi-game methodology. In similar self-learning systems, AI agents learned against their own policies in a single exercise, which means they play against themselves.

And what the team is really excited about is not just applying this in gaming, but other applications in multi-agent systems involving stable learning.