Tech

Academic researchers find a way to train an AI reasoning model for less than $50

Share
Share
Academic researchers find a way to train an AI reasoning model for less than $50
Sequential and parallel test-time scaling. (a): Budget forcing shows clear scaling trends and extrapolates to some extent. For the three rightmost dots, we prevent the model from stopping its thinking 2/4/6 times, each time appending “Wait” to its current reasoning trace. (b): For Qwen2.5-32B-Instruct we perform 64 evaluations for each sample with a temperature of 1 and visualize the performance when majority voting across 2, 4, 8, 16, 32, and 64 of these. Credit: arXiv (2025). DOI: 10.48550/arxiv.2501.19393

A small team of AI researchers from Stanford University and the University of Washington has found a way to train an AI reasoning model for a fraction of the price paid by big corporations that produce widely known products such as ChatGPT. The group has posted a paper on the arXiv preprint server describing their efforts to inexpensively train chatbots and other AI reasoning models.

Corporations such as Google and Microsoft have made clear their intentions to be leaders in the development of chatbots with ever-improving skills. These efforts are notoriously expensive and tend to involve the use of energy-intensive server farms.

More recently, a Chinese company called DeepSeek released an LLM equal in capabilities to those being produced by countries in the West developed at far lower cost. That announcement sent stock prices for many tech companies into a nosedive.

In this new study, the researchers claim that it is possible to train an LLM with capabilities similar to those made by OpenAI or DeepSeek for less than $50. The catch is that the researchers on this new effort used a distillation process to extract capabilities from another AI model.

To train an AI so inexpensively, the research team began with an off-the-shelf AI model made by Alibaba, a China-owned company, which created the freely available test model. The research team modified the model and called the result s1.

Preliminary training involved 1,000 question-and-answer pairs they had designed carefully to give their model a leg up on learning. They also gave it the “thinking process” behind Gemini 2.0, a freely available Google experimental model. They then trained it in just 26 minutes using 16 Nvidia H100 GPUs.

The team also tacked on what they call a little trick—they added a step called “thinking” that runs before the model provides an answer—it gives the model time to double-check its work. The result, the researchers claim, is an AI model on par with other much more well-known products, made at a fraction of the cost.

More information:
Niklas Muennighoff et al, s1: Simple test-time scaling, arXiv (2025). DOI: 10.48550/arxiv.2501.19393

Model: github.com/simplescaling/s1

Journal information:
arXiv


© 2025 Science X Network

Citation:
Academic researchers find a way to train an AI reasoning model for less than $50 (2025, February 6)
retrieved 6 February 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
AI took a huge leap in IQ, and now a quarter of Gen Z thinks AI is conscious
Tech

AI took a huge leap in IQ, and now a quarter of Gen Z thinks AI is conscious

ChatGPT’s o3 model scored a 136 on the Mensa IQ test and...

DeepSeek sees surge in developer use as 3 in 10 businesses adopt the controversial LLM provider
Tech

DeepSeek sees surge in developer use as 3 in 10 businesses adopt the controversial LLM provider

Developers shift from loyalty to flexibility as OpenAI leads, but DeepSeek gains...

China’s CATL launches new EV sodium battery
Tech

China’s CATL launches new EV sodium battery

Chinese battery giant CATL has launched a new sodium-ion battery it says...