Overview

The goal of this competition is to create algorithms and models that can solve tricky math problems written in LaTeX format. Your participation will help to advance AI models’ mathematical reasoning skills and drive frontier knowledge.

Description

The ability to reason mathematically is a critical milestone for AI. Mathematical reasoning is the foundation for solving many complex problems, from engineering marvels to intricate financial models. However, current AI capabilities are limited in this area.

The AI Mathematical Olympiad (AIMO) Prize is a new $10mn prize fund to spur the open development of AI models capable of performing as well as top human participants in the International Mathematical Olympiad (IMO).
This competition includes 110 problems similar to an intermediate-level high school math challenge. The Gemma 7B benchmark for these problems is 3/50 on the public and private test sets.

The assessment of AI models’ mathematical reasoning skills faces a significant hurdle, the issue of train-test leakage. Models trained on Internet-scale datasets may inadvertently encounter test questions during training, skewing the evaluation process.

To address this challenge, this competition uses a dataset of 110 novel math problems, created by an international team of problem solvers, recognizing the need for a transparent and fair evaluation framework. The dataset encompasses a range of difficulty levels, from simple arithmetic to algebraic thinking and geometric reasoning. This will help to strengthen the benchmarks for assessing AI models’ mathematical reasoning skills, without the risk of contamination from training data.

This competition offers an exciting opportunity to benchmark open AI models against each other and foster healthy competition and innovation in the field. By addressing this initial benchmarking problem, you will contribute to advancing AI capabilities and help to ensure that its potential benefits outweigh the risks.

Join us as we work towards a future where AI models’ mathematical reasoning skills are accurately and reliably assessed, driving progress and innovation across industries.

Evaluation

Submissions are evaluated on the accuracy between their predicted labels and the ground-truth labels. In other words, submissions are ranked by the fraction of predicted labels that exactly match the ground-truth labels.

In this competition, every ground-truth label is an integer between 0 and 999, inclusive.

Submitting

You must submit to this competition using the provided Python evaluation API, which serves test set instances one-by-one in random order. To use the API, follow the template in this notebook.

Timeline

  • April 1, 2024 – Start Date.
  • June 20, 2024 – Entry Deadline. You must accept the competition rules before this date in order to compete.
  • June 20, 2024 – Team Merger Deadline. This is the last day participants may join or merge teams.
  • June 27, 2024 – Final Submission Deadline.

All deadlines are at 11:59 PM UTC on the corresponding day unless otherwise noted. The competition organizers reserve the right to update the contest timeline if they deem it necessary.

Prizes

TOTAL FUND FOR PROGRESS PRIZE 1: $1,048,576

Prizes for Top-Ranking Teams in this Competition:
1st Place: $131,072
2nd Place: $65,536
3rd Place: $32,768
4th Place: $16,384
5th Place: $8,192

Threshold: In the event that any of the five top-ranking teams do not exceed the Gemma 7B benchmark of 3/50 on both the public and private test sets, that team’s prize amount shall be reduced by a factor of four. In each case, the prize amounts would be:

1st Place: $32,768
2nd Place: $16,384
3rd Place: $8,192
4th Place: $4,096
5th Place: $2,048

Overall Progress Prize Winner: The Overall Progress Prize Winner shall be the highest ranking team that achieves a score of at least 47/50 on both public and private test sets. After any prizes for the five top-ranking teams have been awarded, the remainder of the total fund shall be awarded to the Overall Progress Prize Winner.

If a team is named the Overall Progress Prize Winner in this competition, the prize will be at least $794,624. If no team is named the Overall Progress Prize Winner in this competition, the remainder of the total fund shall roll over to the next competition, where the same prize allocation will apply.

Early Sharing Prize: $10,000. An additional $10,000 cash prize will be awarded for sharing high-scoring public notebooks early in the competition to encourage participants to share information earlier and help the community make more progress over the course of the competition.

To be eligible for the Early Sharing Prize, you will need to:

  • Be the first to publish a public notebook scoring at least 20/50 on the leaderboard before April 22, 2024 11:59PM UTC.
  • Keep the notebooks and any datasets it uses publicly available until the prize is awarded at the end of the competition.

Code Requirements

Submissions to this competition must be made through Notebooks. In order for the “Submit” button to be active after a commit, the following conditions must be met:

  • CPU Notebook <= 9 hours run-time
  • GPU Notebook <= 9 hours run-time
  • Internet access disabled
  • Freely & publicly available external data is allowed, including pre-trained models
  • Submission file must be named submission.csv and be generated by the API.

Submission runtimes have been obfuscated. If you repeat the exact same submission you will see up to 30 minutes of variance in the time before you receive your score.

Please see the Code Competition FAQ for more information on how to submit. And review the code debugging doc if you are encountering submission errors.

About the Hosts

XTX Markets is a leading algorithmic trading company and has over 200 employees based in London, Paris, New York, Mumbai, Yerevan and Singapore. XTX provides liquidity in the Equity, FX, Fixed Income and Commodity markets and trades over $250bn a day across markets.

XTX Markets’ expansive research cluster contains 100,000 cores and 20,000 A/V100 GPUs and is growing. It also has 390 petabytes of usable storage and 7.5 petabytes of RAM. Alongside rich datasets and advanced technological infrastructure we are at the forefront of the crossover of finance and technology.

XTX Markets’ philanthropy focuses on maths and science education and research, alongside other areas such as academic sanctuaries, carbon removal and an employee matching programme. Since 2017, XTX Markets has donated over £100mn to charities and good causes, establishing it as a major donor in the UK and globally.

Citation

XTX Investments. (2024). AI Mathematical Olympiad – Progress Prize 1. Kaggle. https://kaggle.com/competitions/ai-mathematical-olympiad-prize

Spread the love