The enormous potential of binarized and 1,58-bit neural networks

Quantization is a frequently used method to reduce the memory and computational capacity requirements of our machine learning models…

The enormous potential of binarized and 1,58-bit neural networks

Quantization is a frequently used method to reduce the memory and computational capacity requirements of our machine learning models. During quantization, we use less precise data formats, such as 8-bit integers (int8), to represent weights. This not only reduces computational capacity and memory requirements but also decreases energy consumption.

The extreme form of quantization is binarization, meaning the use of 1-bit weights. One might think that neural networks would become non-functional with such low precision, but, in many cases, this level of simplification does not significantly degrade the quality of the network.

The following video demonstrates the possibilities of such 1-bit (binarized) neural networks:

A paper was recently published under the title “The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits”. The difference from binarization here is that it uses three values, {-1,0,1}. This is why the authors call this architecture a 1.58-bit network. In the paper, the authors prove that networks using these 1.58-bit weights do not perform much worse than their counterparts using 8-bit weights.

Why is this such a big deal?

If these three values are sufficient to represent the weights, then multiplication, currently the most frequently used operation in neural networks, is no longer necessary. This is why GPU clusters are used for neural networks, as GPUs can perform multiplications very efficiently. Without the need for multiplications, there’s no need for GPUs, and the models can be run efficiently even on CPUs, or it’s possible to build specialized hardware (ASIC) that can (even in an analog way) run these 1.58-bit networks.

One of the most hyped tech news stories recently was about the company Groq developing an ASIC (Application-Specific Integrated Circuit) that can run LLMs much more efficiently and with less energy than traditional GPUs. In the future, new types of circuits could be developed that, instead of the usual 2 voltage levels, work with 3, thus being able to natively handle the 3 values.

Quantization is typically a post-training process. Thus, the training is done with high-precision numbers, and the quantized weights are calculated in the next phase. As a result, running the networks becomes much more efficient, enabling the possibility to run the model even on IoT devices with limited computational capacity.

Since gradient-based training does not work with 1-bit or binarized networks, non-gradient-based technologies become relevant, like genetic algorithms or other gradient-free technologies. There are open-source libraries like PyGAD for genetic algorithms or nevergrad for other gradient-free methods, that work well with the most popular machine learning frameworks.

A few years ago, I also addressed in an article that the use of genetic algorithms can be effective in the case of reinforcement learning:

Although in most cases backpropagation is much more efficient than gradient-free solutions, 1-bit networks can be run much more efficiently on specialized hardware than their floating-point counterparts. So, it might be that with backpropagation, we find the optimal network 10 times faster using floating-point numbers than with, say, genetic algorithms. But if the 1-bit network runs 20 times faster (on specialized hardware), then training will still be twice as fast using genetic algorithms. Investigating how effectively 1-bit networks can be trained with gradient-free methods could be a very interesting research topic.

Another reason why this topic is so fascinating is that these networks more closely resemble the neural networks found in the natural brain (biologically plausible). Therefore, I believe that by choosing a good gradient-free training algorithm and applying these 1-bit networks, we can build systems that are much more similar to the human brain. Moreover, this opens up the possibility for technological solutions beyond ASICs that were previously not feasible, such as analog, light-based, or even biologically based processors.

This direction might turn out to be a dead-end in the long run, but for now, its enormous potential is apparent, making it a very promising research avenue for anyone involved in the field of artificial intelligence.