Abstract
This is the Multi-Armed Bandit Outreach project. This project was built in an effort to provide outreach to students who are interested in machine learning and are between the education levels of 11th Grade and college Sophomore.
The application uses a real-world scenario of selecting a restaurant to eat at. The restaurants are each given a reward distribution, it is the goal of the system to find which restaurant is the optimal choice for each iteration. The participant is shown a simulation of the problem both without context and with it. This is to help them build an understanding of the purpose of adding context to multi-armed bandits, as well as to see how context affects every day scenarios. The participant is also able to change the scale of the system, by changing the number of restaurants, number of iterations or even selecting a different bandit model.
The user is able to select from Epsilon Greedy, Thompson Sampling, Upper Confidence Bound and Random Selection models. The participant may also choose to run the application in a way that will compare these models for them and show the results of each bandit model. This functionality helps the participant understand that not all multi-armed bandits operate exactly the same and shows that there are different solutions to the same problem. Even if they all fall under the category of multi-armed bandits, each model approaches the problem differently.
Download
This project can be accessed at OutreachMAB Github.
Setup
Python 3.10 is required for this application. You can then clone the repository in order to access the project.Repository Cloning
-
Clone
git clone https://github.com/iCMAB/OutreachMAB.git
-
Install dependencies
pip3 install -r requirements.txt
-
Run
python3 main.py
Application Usage
When running the program, there are 3 important screens to pay attention to:
1. Settings Selection

Here you can change the bandit model, number of arms (restaurants in the context of the problem) and then number of iterations.
2. Simulation

The simulation consists of three major parts. The control center in the top left where the participant can go through the iterations of the simulation. Then there is the reward and regret graphs, one is cumulative and one is per iteration. The last part is the graphs along the right side of the screen. These graphs show the current distribution of rewards that the bandit has collected from each restaurant.
3. Results

At the conclusion of the simulation, the final graphs are shown.
The two larger graphs show reward and regret in two different ways. The first graph is cumulative reward and regret, with the second being average reward and regret over each iteration. Each of these graphs have a description beneath them to explain what the graph represents. Then on the right the final distribution found by the bandit for each restaurant is shown.
The Team
| Carter Vail | Dante Falardeau |
|---|---|
|
Fourth Year Software Engineering student. Interested in software development. |
Fifth year Software Engineer interested in integrating automation into existing workflows. |
| Devroop Kar | Dr. Daniel Krutz |
|
Incoming PhD Student in Computing and Information Sciences. Data Engineer and AI Enthusiast |
Director of the
AWARE LAB
and assistant professor. Interested in Self Adaptive Systems, Strategic Reasoning and Computing Education. |