Emmanuel Braboke — Backend Software Engineer

Implementation

Implemented Q-learning from scratch to train an agent that navigates the Taxi-v3 grid environment. The core challenge was balancing exploration vs exploitation — the agent starts completely random and gradually converges to an optimal policy over 2,000 episodes using an epsilon-greedy strategy. Hyperparameters like learning rate, discount factor, and decay rate are all configurable via CLI arguments.

def update_q_table(state, action, reward, next_state):
    old_value = q_table[state, action]
    next_max = np.max(q_table[next_state])

    q_table[state, action] = (
        (1 - ALPHA) * old_value
        + ALPHA * (reward + GAMMA * next_max)
    )

Taxi Route Optimization

Implementation