Three days ago, the Google Deepmind team uploaded a paper on the scientific pre-print server arXiv that is hosted by Cornell University without much fanfare (you cannot find it on the Deepmind website). Nonetheless, it was rapidly picked up by mainstream media – including the BBC among others – and chess players all around the globe.
Given only the basic rules of the games, and by only playing against itself, Deepmind’s AlphaZero agent was able master chess and shogi (Japanese chess) to the extent that it could outperform the 2016 world computer chess programme Stockfish and the 2017 shogi computer programme champion Elmo in 4 and 2 hours respectively. In some ways, this result is not unexpected and the only question was whether Deepmind would bother to do this after an earlier version titled AlphaGoZero successfully defeated AlphaGo (its precursor that had successfully dominated the top human Go players) in 40 days. But as the authors indicated in their paper, there are more rules in chess (pieces move differently, stalemate, pawns can move 1 or 2 squares at the start, etc.) and shogi compared to Go, and there are draws. Also, one of the Deepmind founders and last author on the arXiv paper is himself a very strong chess player – in fact, FM Ong Chong Ghee and I played in the same chess tournament (Oakham Junior 1992) as Demis Hassabis in 1992 but alas, our paths never crossed.
In “tournament mode”, AlphaZero defeated Stockfish with 28 wins (25 wins occurred when AlphaZero took the white pieces, with the first move advantage), 72 draws and no losses. The margin was even greater in Shogi, with 90 wins, 8 losses and 2 draws. There are a number of objections on forums as to whether this was a “fair contest” (the Stockfish program apparently had no opening book or endgame tablebases, the computing system used could perhaps be better tweaked for performance, etc.), but I personally think these comments do not detract from the fact that this was a completely amazing performance.
The games (only 10 of AlphaZero’s chess wins were released) that were available for public viewing were amazing. Overall, the impression is that the AlphaZero agent – with just under 24 hours of “self-training”, no opening book, no endgame tablebases and no other insights – evaluated the positions far better than Stockfish (and therefore any human). There were moments of real genius (if such a human term can be used) in several of the games:
In the above position, AlphaZero had played rook to e1, leaving the knight on h6 en prise. The Stockfish programme on my phone instantly downgraded the assessment to Black having a superior position (-0.8). It failed (and I am not sure many grandmasters in the world will do differently) to understand that this position was actually very difficult for Black to defend, even after the next 14 moves were played. And then suddenly, Stockfish realised that the Black position could not be held.
This was the other amazing example, also picked up by many other chess commentators. It is White to play, and White has a space advantage, but Black appears to be guarding all the possible invasion squares successfully. In fact, the Stockfish programme on my laptop rates this only as a slight advantage for White (+0.5). It does not find the fabulous sequence that is played: 30.Bxg6! Bxg5 31.Qxg5 fxg6 32.f5! Rg8 33.Qh6 Qf7 34.f6 and Black is all tied up, but it is only after a further 4 moves, when the position below is reached, that Stockfish (on the phone – which is far weaker than the actual Stockfish program used to play AlphaZero) realised that Black was in fact, losing, even with the additional piece for which White only has one pawn and space in exchange.
Where does that leave us humans? While we wait for Deepmind to reveal its agent’s results against humans in Starcraft, there may be much that we can learn about chess beyond what current chess engines provide once all the training and tournament games are released. The paper already provides a tantalising hint in that certain chess openings were rejected by AlphaZero as it progressively improved, openings that are still very popular in human chess tournaments. As one can see from the screenshot below, it rejected playing the popular King’s Indian Defense right from the beginning. It also stopped playing the French Defense after a while, recording an incredible 39 wins and 11 draws against Stockfish when playing training games on the White side of that opening.
Addendum: Chess historian and player Olimpiu Urcan pointed out that one of the authors on the paper – Dharshan Kumaran – is a chess grandmaster. He also played in the same Oakham Junior chess tournament in 1992.