Making a chess engine - Part 2

Links

Slides: Here
Code: Here All the code is in attempt_02 directory.
Model on HF: Here

Quick details on model and training

Input

Input consists of 4M training positions and 400K test positions.
90% of these positions are got from real games from lichess. Positions have been chosen so that various eval ranges get equal representation. 10% are got from puzzle database of lichess.
Input consists of

FEN representation of the board
eval of the position - got from stockfish
top 5 moves and their scores (how much change in eval does the respective move cause) - got from stockfish

Board representation

18 plane board encoding (12 for pieces, 4 for castling rights, 1 for move, 1 for en passant pawn). A total of ~10M weights in the model (conv and resnet layers).

Model

Total around 10M parameters in the model.

Backbone consists of a convolution layer followed by 8 resenet blocks.

Value head consists of 2 fully connected layers.

Policy head consists of a conv layer leading to 64 * 73 logits which lead to softmax

We represent the moves by 64 x 73 sized probability tensor - more details in the slides.

Loss function

Huber loss with beta = 0.2 used for value head loss.

Cross entropy loss for policy head. Only the logits corresponding to valid moves participate in cross entropy.

Weight of policy loss is 0.1.

Performance

We could go till depth 8 within reasonable move times (as opposed to depth 3 with no value head). We are competitive with level 6 of stockfish, which implies an ELO rating of 2300. 2300 is like FIDE Master level performance. Wow!

Sample games

Game 1: NN vs Stockfish (skill level 4) 1-0 (NN wins)
Game 2: NN vs Stockfish (skill level 16) 0-1 (Stockfish wins)