← Back to projects
Taxi Route Optimization

Machine Learning

Taxi Route Optimization

Reinforcement learning agent that learns optimal taxi routing using Q-learning. Trains over 2,000 episodes in the OpenAI Gymnasium Taxi-v3 environment to find the best pickup and dropoff policy.

Training episodes

2,000

Algorithm

Q-Learning

Environment

Taxi-v3

Output

Animated GIF

Implementation

Implemented Q-learning from scratch to train an agent that navigates the Taxi-v3 grid environment. The core challenge was balancing exploration vs exploitation — the agent starts completely random and gradually converges to an optimal policy over 2,000 episodes using an epsilon-greedy strategy. Hyperparameters like learning rate, discount factor, and decay rate are all configurable via CLI arguments.

def update_q_table(state, action, reward, next_state):
    old_value = q_table[state, action]
    next_max = np.max(q_table[next_state])

    q_table[state, action] = (
        (1 - ALPHA) * old_value
        + ALPHA * (reward + GAMMA * next_max)
    )