AI Spring 2024 | Soroush Sahraei

**Google Drive:** [sahroush/AI-Spring2024](https://drive.google.com/drive/folders/1a0ZbTlm558Jumq8FnEsuEqVV8XDq2n4b?usp=sharing) Artificial Intelligence is no longer a futuristic fantasy; it's a rapidly evolving field shaping our present and future. As someone deeply immersed in this world, I've recently completed a journey through some fascinating corners of AI, tackling projects that spanned reinforcement learning, recurrent neural networks, search algorithms, and even a touch of quantum inspiration. Here's a glimpse into that journey, complete with code explorations, mathematical dives, and a few hard-won lessons. **Part 1: Reinforcement Learning - Teaching a Drunk Agent** My adventure started with a deep dive into reinforcement learning (RL). RL is all about training agents to make optimal decisions in an environment to maximize rewards. I took on a fun, slightly absurd challenge: training a *drunk* agent to navigate a simple two-door scenario, one leading to gold, the other to silver, but with a twist of probabilistic (mis)direction. This task allowed me to explore the concept of **Policy Iteration**, an algorithm that iteratively evaluates and improves an agent's decision-making strategy. The agent's actions are influenced by a fun rule: There's a certain probability that if the agent aims for the yellow door, it will end up mistakenly in the blue door, and the other way around. Through carefully designed reward mechanics, my goal was to have the agent learn to choose the right door to optimize long-term reward, even with the "drunk" probability. The optimal policy turned out to be a bit counter-intuitive in this instance, but what really impressed me was how a relatively simple RL algorithm could solve a deceptively challenging decision-making problem. **Part 2: Code Optimization and Hyperparameter Tuning** Then came the inevitable task of code optimization and hyperparameter tuning. In the context of Deep Q-Learning, the process involves building a neural network that estimates the best action for each state. This network is then trained using experiences sampled from the agent's memory. One crucial step was making this training process efficient. A crucial aspect of the algorithm is to choose between exploration (randomly selecting actions) and exploitation (leveraging learned knowledge to maximize reward), striking the balance is often achieved by carefully tuning hyperparameters like learning rate, discount factor (gamma), and exploration rate (epsilon). I found myself tweaking these values, observing how they impacted the agent's learning speed and overall performance. **Part 3: Frozen Lakes and One-Hot Encoding** The next challenge was implementing the Deep Q-Learning algorithm and applying it to the classic "Frozen Lake" environment from OpenAI Gym. The code takes the state of the frozen lake, feeds it to the neural network, and uses the network's output to decide which action to take. The agent learns through trial and error, adjusting its decision-making policy to maximize its rewards and minimize its punishments. One key ingredient for this task was **one-hot encoding**. Since neural networks work best with numerical data, and the environment's states were defined as discreet states, I needed to transform these states into one-hot encoded vectors for the Q-learning agent to process effectively. Each state is represented by a binary vector where one element is 1 (representing the active state) and the rest are 0. **Part 4: RNNs and the Vanishing Gradient Problem** Moving on from RL, I delved into the world of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. I explored the infamous "vanishing gradient problem" which affects the performance of RNNs. The core idea behind RNNs is to process sequential data, like text, by maintaining a "memory" of past inputs. However, during training, the gradients used to update the network's weights can exponentially shrink as they propagate back through time, making it difficult for the network to learn long-range dependencies. LSTMs, with their sophisticated gating mechanisms (forget gate, input gate, output gate), provide a solution to this problem. LSTMs help the network selectively retain or discard information from its memory cells, allowing gradients to flow more easily and enable the learning of long-range dependencies in sequential data. **Part 5: Hyperparameter Tuning with RNNs** The project involved training a simple RNN model on temperature data from Jena, Germany. The goal was to predict future temperatures based on past patterns. After achieving some base performance, I began optimizing the RNN by experimenting with a variety of layer structures and activation functions. By building a deeper RNN and later GRU network, I aimed to capture more complex temporal dependencies and improve the accuracy of the model's predictions. The evaluation of the models was made using the R2 scoring methodology and visual plots of the validation and training loss. **Part 6: Search Algorithms and the Quest for the Optimal Path** The next area I explored was the fascinating world of search algorithms. The core idea is that for almost any problem, there are multiple solutions, the goal is to find the optimal one. This involved implementing and understanding fundamental algorithms like: * **A\* Search:** A powerful pathfinding algorithm that uses heuristics to efficiently navigate through a graph and find the shortest path between two nodes. * **Minimax Search:** A decision-making algorithm used in game theory to find the optimal move for a player, assuming the opponent plays optimally as well. * **Breadth-First Search (BFS) and Depth-First Search (DFS):** Classic graph traversal algorithms that systematically explore a graph's nodes and edges. * **Uniform Cost Search (UCS):** A pathfinding algorithm that finds the least-cost path from a starting node to a goal node in a weighted graph. A fun task involved using A\* Search to find the shortest route from the source node to the target node, by setting the proper definition of the heuristic function, the A\* search was able to significantly outperform other search methods in more complex graphs. **Part 7: Conquering Map Coloring with Constraint Satisfaction** Another interesting problem was applying search algorithms to solve the map coloring problem. The goal is to assign colors to regions on a map in such a way that no two adjacent regions share the same color, using only a limited set of colors (proven by the 4-color theorem). I implemented a constraint satisfaction problem solver to tackle this challenge. **Part 8: Optimizers - The Mathematical Dance of Learning** Next, the mathematical concepts that power some of the most powerful optimizers in deep learning were reviewed. * **Stochastic Gradient Descent with Momentum:** This optimizer accelerates learning by incorporating a memory of past gradients, helping the model navigate the loss landscape more efficiently and avoid getting stuck in local minima. * **AdaGrad:** An adaptive learning rate algorithm that adjusts the learning rate for each parameter individually based on the historical gradients. AdaGrad excels at handling sparse data and adapting to different feature scales. * **RMSprop:** An improvement over AdaGrad that addresses the decaying learning rate problem by introducing a decay rate parameter that accumulates gradients from recent iterations only, thus avoiding the potential for learning to stall as training progresses. **Part 9: Classification and the Power of SMOTE** Lastly, I worked on a classification project aimed at classifying the classes of the dataset. The first step was to clean the data, drop missing values, and make sure that the data is ready for the model. I then realized that there was an uneven distribution in the target variable, I then tried using the SMOTE method to make the distribution of the classes uniform, which increased the performance of the model as expected. **Key Takeways** This series of projects was an incredible journey through the diverse landscape of AI. I gained a deeper understanding of: * The importance of Policy Iteration for optimal decision-making in RL. * The challenges and solutions for training RNNs (LSTMs). * The ability to understand and implement essential search algorithms. * How hyperparameters of the model could be tuned in order to get the best performance. * The value of data augmentation techniques like SMOTE for balancing data and enhancing model robustness. As I continue my exploration of AI, I'm excited to leverage these insights to build even more sophisticated and impactful applications. This is just the beginning of a long and fascinating adventure.