Visão geral
Exploration e Exploitation
Markov Process
Propriedade de Markov
Cadeia de Markov
Markov Decision Process (MDPs)
Monte-Carlo e Temporal-Difference Learning
Valoração de Monte-Carlo
TD Learning
Q-Learning