RL Visualizer

Reward Shaping
Episodes
Steps

It's fine to interrupt—the model progress is preserved but it overrides previous model

Environment Config

Q-Learning

$$Q(s, a) \leftarrow Q + \alpha[TD_{error}]$$

DQN Loss

$$L = \mathbb{E}[(y - Q(s,a;\theta))^2]$$