
A group of computer scientists at Microsoft Research, working with a colleague from the University of Chinese Academy of Sciences, has introduced Microsoft’s new AI model that runs on a regular CPU instead of a GPU. The researchers have posted a paper on the arXiv preprint server outlining how the new model was built, its characteristics and how well it has done thus far during testing.
Over the past several years, LLMs have become all the rage. Models such as ChatGPT have been made available to users around the globe, introducing the idea of intelligent chatbots. One thing most of them have in common is that they are trained and run on GPU chips. This is because of the massive amount of computing power they need when trained on massive amounts of data.
In more recent times, concerns have been raised about the huge amounts of energy being used by data centers to support all the chatbots being used for various purposes. In this new effort, the team has found what it describes as a smarter way to process this data, and they have built a model to prove it.
One of the most energy-intensive parts of running AI models involves the way weights are used and stored—typically as 8- or 16-bit floating numbers. Such an approach involves a lot of memory and CPU processing, which in turn requires a lot of energy. In their new approach, the researchers have done away with using floating point numbers altogether and instead propose the use of what they describe as a 1-bit architecture.
In their innovation, weights are stored and processed using only three values: -1, 0 and 1. This allows for using nothing more than simple addition and subtraction during processing—operations that are easily done using a CPU-based computer.
Testing of the new model type showed it was able to hold its own against GPU-based models in its class size and even outperformed some of them—all while using far less memory and, in the end, much less energy.
To run such a model, the team created a runtime environment for it. The new environment is called bitnet.cpp and was designed to make the best use of the 1-bit architecture.
If the claims made by the team hold up, the development of BitNet b1.58 2B4T could be a game-changer. Instead of relying on massive data farms, users could soon run a chatbot on their computer or perhaps, their phone. In addition to reducing energy demands, localizing LLM processing would greatly improve privacy and allow for working without even being connected to the Internet.
More information:
Shuming Ma et al, BitNet b1.58 2B4T Technical Report, arXiv (2025). DOI: 10.48550/arxiv.2504.12285
© 2025 Science X Network
Citation:
Microsoft introduces an AI model that runs on regular CPUs (2025, April 22)
retrieved 22 April 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Leave a comment