Self-game-based reinforcement learning allowed AI agents to outperform expert-level human performance on the popular computer game Dota and board games such as Chess and Go. performance, recent studies have suggested that self-play may not be as robust as previously thought. A question naturally arises: are these self-playing agents vulnerable to enemy attacks?
In the new newspaper Conflicting policies beat pro-level Go AIs, a research team from MIT, UC Berkeley, and FAR AI is using a new adversarial policy to attack the state-of-the-art AI Go KataGo system. The team believes this is the first successful end-to-end attack against an AI Go system playing at the level of a human professional.
The team summarizes its main contributions as follows:
- We propose a new attack method, hybridizing the attack of Gleave et al. (2020) and AlphaZero-like training (Silver et al., 2018).
- We demonstrate the existence of contradictory policies against the state-of-the-art Go AI system, KataGo.
- We find that the adversary pursues a simple strategy that tricks the victim into predicting victory, causing them to pass prematurely.

This work focuses on exploiting professional-grade AI Go policies with a discrete action space. The team attacks the most powerful AI Go system available to the public, KataGo, but not at its full power. Unlike KataGo, which is trained via stand-alone games, the team trained its agent on games played against a stationary victim agent, using only data from rounds where it is the opponent’s move. This “victim play” training approach encourages the role model to exploit the victim, not imitate them.
The team also introduces two distinct families of Adversarial Monte Carlo tree search (A-MCTS) – Sample (A-MCTS-S) and Recursive (A-MCTS-R) – to avoid the agent modeling the movements of its adversary in his own policy. network. Rather than using random initialization, the team uses a program that trains the agent against successively stronger versions of the victim.


In their empirical studies, the team used its contradictory policy to attack KataGo without research (the level of a top 100 European player) and KataGo at 64 visits (“almost superhuman level”). The proposed policy achieved a success rate of over 99% without research and a success rate of over 50% against KataGo at 64 visits.
Although this work suggests that learning via self-play is not as robust as expected and that contradictory policies can be used to beat the best AI Go systems, the results have been questioned by researchers. machine learning and Go communities. Reddit discussions involving article authors and KataGo developers have focused on the particulars of the Tromp-Taylor scoring system used in the experiments – while the proposed agent gets its winnings by “tricking KataGo into ending the game prematurely”, it is argued that this tactic would lead to devastating losses under more common Go rules.
The open source implementation is on GitHub and sample games are available on the project’s webpage. The paper Conflicting policies beat pro-level Go AIs is on arXiv.
Author: Hecate He | Editor: Michel Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Weekly Synchronized Global AI to get weekly AI updates.