| weights | ||
| .gitattributes | ||
| .gitignore | ||
| README.md | ||
| license | base_model | pipeline_tag | tags | ||
|---|---|---|---|---|---|
| apache-2.0 |
|
object-detection |
|
Marathon Bib Detector
During a marathon, participants are often required to wear a bib displaying their number. This allows organizers to identify participants easily, even in races with thousands of runners. This project aims to automatically recognize and read those identifiers, to later be able to track the participants, their progress, their estimated time of arrivals, and other data.
This repository archives a school project from 2025, in which we trained an AI to recognize the bibs worn by participants. It explains how our model was created and how it works.
Data Sources
The base for our dataset was created from a list of YouTube videos, recording hours of footage of people running in various races and countries, with different lighting, different angles of view, different cameras, and different bib designs or formats to make the model as generic as we could and to avoid overtraining.
- https://youtube.com/watch?v=338nQ8d1Z3Y
- https://youtube.com/watch?v=61H0GRdgvEQ
- https://youtube.com/watch?v=8yBDOP9luc8
- https://youtube.com/watch?v=eRuxPDOyizs
- https://youtube.com/watch?v=iirIYmQiPHY
- https://youtube.com/watch?v=jzy7a-oNQ28
- https://youtube.com/watch?v=KFcUYlUqQlI
- https://youtube.com/watch?v=KtSRynTrlRA
- https://youtube.com/watch?v=MWNRColAEao
- https://youtube.com/watch?v=Ng0rIYMGbb0
- https://youtube.com/watch?v=olZnm5h-zoE
- https://youtube.com/watch?v=PsMX1DYN-tQ
- https://youtube.com/watch?v=QkHM7wpy00s
- https://youtube.com/watch?v=rDcgpzJWnkM
- https://youtube.com/watch?v=sRfT140lD_4
- https://youtube.com/watch?v=SVkN5Vb6kvs
- https://youtube.com/watch?v=TL74NNlj4OU
- https://youtube.com/watch?v=TPeJU858RLU
- https://youtube.com/watch?v=uaf9uifcAa8
- https://youtube.com/watch?v=uztqWOms5bo
- https://youtube.com/watch?v=Wjj3EEWrvzo
- https://youtube.com/watch?v=WLdahHBiQgk
Analysis
The goal of our model was to detect the bib worn by the players, so we needed a model trained on images annotated with all the bib present on them. To make a decent model, we would need ten of thousand of annotated images, which would be impossible to do manually for a small team of students.
Instead, we decided to automate the annotation process of our images using more powerful tools, and then with our new annotated dataset, we would train a lightweight model capable of doing the same.
A bib contains text, and some models like PaddleOCR are capable of recognizing text and its position on an image, but we absolutely wanted to avoid detecting text such as the ads on the barriers or the walls, text on the overlay of the video, and any other noise for our OCR.
0. Original videos
We start the analysis with the original videos, which were taken during the marathon.
1. Depth-estimate of our videos
First, we ran a depth-map estimation on all the videos using the Depth-Anything-V2-Small-hf model.
It allowed us to estimate the depth of each pixel of the video, which will allow us to select areas of the video where we expect the bib to be for the OCR.
2. Create a depth-mask for our videos
If we consider that the camera is fixed, we can create a mask image indicating the "minimal depth" to consider for the OCR. This depth will allow us to ignore the background, the barriers, and the overlay of the video.
3. Apply the mask on the videos
We end up with videos where we can only see people running, with no noise for our OCR, with the only readable text now being the runners' bib numbers.
4. Annotates the videos
We were now able to create a new dataset indicating for each image which bib numbers were visible, and their boundary box.
5. Manual cleaning
OCR is not a perfect technology, especially on low-quality video. We decided to manually remove the frames were the detections did not work very well to improve the quality of our dataset.
Training
Finally, with our new dataset, we trained a new model specialized in bib detection. Based on the YOLOv11 framework, we fine-tuned it on our dataset to achieve the best results.
The training took about a day on an NVIDIA RTX 6000 Ada.
Results
the final model doesn't achieve perfect accuracy, but when used on videos applying the model on each frame, our results were much more convincing. A better training with a bigger and proper dataset would have improved the results.
Despite the average accuracy of our model, it's fast enough to run on a real-time video using a standard GPU, which could be useful for a real-time application such as a marathon tracker.
The weights are available in the weights folder. They can be used directly with the YOLOv11 framework.