18 plane board encoding (12 for pieces, 4 for castling rights, 1 for move, 1 for en passant pawn). A total of ~10M weights in the model (conv and resnet layers). Huber loss. Only value head is trained. Game is played with full 3 ply search.
Trained on 3.6M evals from stockfish (all types of games - weak and strong) with 400K evals in each of the ranges $(-\infty, -10), [-10, -5], [-5, -3], [-3, -1], [-1, +1], [+1, +3], [+3, +5], [+5, +10], [+10, +\infty]$ Validation data is of size 360K, with equal proportion of data in each of the above nine ranges. In the end "within 1 accuracy" (fraction of validation points where predicted value is less than 1 away from ground truth) is ~55%.
Wins against stockfish skill levels 0 and 1 (5-0 and 3-1 respectively). Loses against level 2. Implied ELO rating - 1450.