10 octobre 2022

Damien Busatto-Gaston (LACL)

In this talk, we consider the online computation of a strategy that aims at optimizing the expected average reward in a Markov decision process. The strategy is computed with a receding horizon and using Monte Carlo tree search (MCTS), a method known to be scalable to large state spaces. We will present their theoretical guarantees. Formal analysis of MCTS is notably challenging, and we will discuss recent results in that area. Moreover, we augment the MCTS algorithm with a notion of symbolic advice, and show that its theoretical guarantees are maintained. Symbolic advice are used to bias the selection and simulation strategies of MCTS. We illustrate our techniques using a scheduling problem and the popular game Pac-Man.