Monte Carlo Tree Search

Instead of casting out the tree, we try full assignment of variables and iterate it
- like Iterative Deepening
Combines 2 ideas:
- Evaluation by rollouts:
  - play multiple games to termination from a state s (using simple, fast or random policy) and count nb of wins/ nb of games explored
  - as law of large number, the fraction of wins correlates the true value of the position
- Selective search: explore parts of the tree that help improve the decision at the root, regardless of depth
  - Should allocate rollouts to more promising (more win rate) and uncertain nodes (less rollouts,ie less nodes explored)
  - so we defined new heuristic, Upper Confidence Bound (UCB1), that combines promising and uncertain idea
    - $U CB 1 (n) = \frac{U ( n )}{N ( n )} + C \times \frac{lo g ( N ( parent ( n )))}{N ( n )}$ where:
      - $N (n)$ is the number of rollouts of node $n$
      - $U (n)$ is the total utility (wins) for Player at parent of node $n$

Using the ideas above
Repeat until out of time:
- Selection: recursively apply UCB to choose a path down to a leaf node $n$
  - child → grandchild → … → leaf node
- Expansion: add a new child $c$ to $n$
- Simulation: run a rollout from $c$
- Backpropagation: update $U$ and $N$ function counts from $c$ back up to the root
Choose the action leading to the child with highest $N$
The value of a node, $U (n) / N (n)$ , is a weighted sum of child values
As $n \to \infty$ , the vast majority of rollouts are concentrated in the best child(ren), so the weighted average tends to max/min
No optimality is guaranteed

StrixTheKiet Notes