Deep Q-Learning Tutorial: minDQN

A Practical Guide to Deep Q-Networks

Mike Wang
Figure 1: Balancing a pole in the CartPole Environment (Image by Author)

The Q-Learning Algorithm

Figure 2: The Q-Learning Algorithm (Image by Author)
Figure 3: An example Q-table mapping states and actions to their corresponding Q-value (Image by Author)
Figure 4: The Bellman Equation describes how to update our Q-table (Image by Author)

The Deep Q-Network Algorithm

Figure 5: The Deep Q-Network Algorithm (Image by Author)
Figure 6: A neural network mapping an input state to its corresponding (action, q-value) pair (Image by Author)
def agent(state_shape, action_shape):
learning_rate = 0.001
init = tf.keras.initializers.HeUniform()
model = keras.Sequential()
model.add(keras.layers.Dense(24, input_shape=state_shape, activation='relu', kernel_initializer=init))
model.add(keras.layers.Dense(12, activation='relu', kernel_initializer=init))
model.add(keras.layers.Dense(action_shape, activation='linear', kernel_initializer=init))
model.compile(loss=tf.keras.losses.Huber(), optimizer=tf.keras.optimizers.Adam(lr=learning_rate), metrics=['accuracy'])
return model
Figure 7: Updating the neural network with the new Temporal Difference target using the Bellman equation (Image by Author)
Figure 8: The Temporal Difference target we want to replicate using our neural network (Image by Author)