Description: Has a fixed policy that determines its behavior, we need to find actions to maximize reward even though not knowing transition function and reward function Goal: learn state values Model-based Reinforcement Learning Model-free Reinforcement Learning