
In the world of deep learning AI, the old board game Go looms large. Until 2016, the best human Go player could always defeat the most powerful AI. That changed with DeepMind’s AlphaGo, which used deep learning neural networks to learn the game at a level humans can’t match. More recently, KataGo has become popular as an open source Go game AI that can beat the best human Go players.
Last week, a group of AI researchers published a paper describing a method to defeat KataGo using adversarial techniques that take advantage of KataGo’s blind spots. By playing unexpected moves outside of KataGo’s training set, a much weaker opposing game program (which amateur humans can defeat) can tempt KataGo to lose.
To understand this achievement and its implications, we spoke to one of the paper’s co-authors, Adam Gleave, who holds a Ph.D. candidate at UC Berkeley. Gleave (with co-authors Tony Wang, Nora Belrose, Tom Tseng, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, and Stuart Russell) has developed what AI researchers call an “adversarial policy” . In this case, the policy seekers use a mixture of a neural network and a tree search method (called Monte-Carlo Tree Search) to find the Go moves.
KataGo’s world-class AI learned Go by playing millions of games against itself. But that’s still not enough experience to cover all possible scenarios, leaving room for vulnerabilities due to unexpected behavior. “KataGo generalizes well to many new strategies, but it weakens the further it gets from the plays it’s seen in practice,” says Gleave. “Our adversary discovered one such ‘off-distribution’ strategy that KataGo is particularly vulnerable to, but there are likely many more.”
Gleave explains that, in a Go match, the adversarial policy works by claiming a small corner of the board first. He provided a link to an example in which the opponent, controlling the black stones, plays wide to the top right of the board. The opponent allows KataGo (playing in white) to claim the rest of the board, while the opponent plays a few easy-to-capture stones in that territory.

adam gleave
“This tricks KataGo into thinking he’s already won,” says Gleave, “since his (bottom left) territory is much larger than the opponent’s. But the bottom left territory doesn’t help actually up to his score (only the white stones he played) due to the presence of black stones there, which means he is not entirely secure.”
Due to his overconfidence in a win – assuming he will win if the game ends and the points are counted – KataGo plays a pass move, allowing the opponent to intentionally pass as well, ending the game (Two consecutive passes end the game in Go.) After that, a point tally begins. As the newspaper explains, “The adversary gets points for his corner of territory (devoid of victim stones) while the victim [KataGo] does not receive points for its unsecured territory due to the presence of the opponent’s stones.”
Despite this clever trickery, adversarial politics alone isn’t so good at Go. In fact, human amateurs can defeat it relatively easily. Instead, the adversary’s sole purpose is to attack an unforeseen KataGo vulnerability. A similar scenario could be the case in almost any deep learning AI system, giving this work much broader implications.
“Research shows that AI systems that appear to work at the human level often do so in very alien ways, and therefore can fail in surprising ways for humans,” says Gleave. “This result is entertaining in Go, but similar failures in safety-critical systems could be dangerous.”
Imagine a self-driving car AI that encounters an extremely unlikely scenario that it doesn’t expect, allowing a human to trick it into engaging in dangerous behaviors, for example. “[This research] underscores the need for better automated testing of AI systems to find the worst failure modes,” says Gleave, “and not just test performance in the average case.”
Half a decade after AI finally triumphed over the best human Go players, the ancient game continues its influential role in machine learning. Insight into the weaknesses of Go-playing AI, when widely applied, might even end up saving lives.
#Goplaying #trick #beats #worldclass #loses #human #enthusiasts