Published: 18 May 2022. Updated: 25 Dec 2024

Transforming The Transformers: The GPT Family And Other Trends In AI and Natural Language Processing

Free AI consulting session

Get a Free Service Estimate

Tell us about your project - we will get back with a custom quote

At least four open-source natural language processing projects that exploit enormous neural networks are currently challenging the only big commercial NLP project: GPT-3 of OpenAI.

The open-source initiatives aim to democratise AI and boost its evolution. All of those projects are based on transformers: a special type of neural networks that have proven to be the most efficient for working with human language structures.

What are transformers and why are the recent developments in their landscape so important?

What Are Transformers In Natural Language Processing?

On their long way to success, researchers have tried different neural networks for natural language processing. At last, they arrived at two models based on either convolutional neural networks (CNN) or recurrent neural networks (RNN) with attention.

On an example of a translation task, the difference between the two is the following. Although both types take into account what they learned about a given sentence from translating the previous words in it, they would proceed with the next word in the row using different approaches.

A CNN would process every word in the sentence in parallel threads, whereas an RNN with attention will make sure it weights every previous word in the sentence in regard to its influence over the meaning of the next word, thus, handling words one after another. A CNN does not perfectly solve the problem of finding the correct meaning for each word but can work faster; an RNN produces more correct results but works slowly.

In a nutshell, a transformer is a combination of both. It uses the attention technique evaluating the mutual influence of the single words in a sentence over each other. On the other hand, it works faster thanks to the multiple “threads”: pairs of so-called encoders and decoders that help to learn, apply what was learned, and propagate the obtained knowledge to the next iteration.

What Can Transformers Do?

Apart from translation, transformers can predict which word to use next in a sentence, thus generating whole sentences of human-looking speech.

That allows us to use them for various purposes.

Transformers’ content creating abilities can be used for designing better chat bots, writing web content, and freeing the hands of technical support staff. The last use case is coupled with the transformers’ skill for information search that promises a wide range of applications in real life.

Apart from purely human languages, some transformers are able to handle programming languages and even create scripts for other deep learning models. Coding skills and the ability to understand human speech allow transformers to become frontend developers. They can be briefed in the same manner as a human developer and would come up with a web layout.

As a part of another experiment, transformers have been integrated into Excel and managed to fill in empty cells in a spreadsheet predicting values based on the existing data in the same spreadsheet. That would allow us to replace bulky Excel functions with just one transformer formula that mimics the behaviour of a whole algorithm.

In the future, transformers may replace human development operation engineers as they must be able to configure systems and provision infrastructures on their own.

Sounds like wow! In fact, 2022 brought a few inspiring updates in the field.

Transformers’ Performance and Required Resources

Imitating the human art of language processing became a very competitive case.

Measuring success is not an obvious thing. Indeed, the winner is the fastest and most accurate one. But you can achieve high speed and accuracy through a combination of two main factors:

Your neural network architecture; although, the transformer architecture currently dominates;
The number of parameters in your neural network.

With the latter, we understand the number of connections between the nodes in a network. This number does not necessarily have a linear relationship to the number of nodes, which would be the size of the network.

More importantly, for companies, research groups, and individuals, the main factors that influence their child’s success are – apparently – the size of investment they have at their disposal, the size of the training data, and access to the human talent to develop the model.

Most Powerful AI Projects In the World

Considering the factors mentioned above, let’s look at who leads the AI competition.

GPT-3

OpenAI’s GPT-3 (Generative Pre-Trained Transformer) used to be the leader in the race. It contains 175 billion parameters and can learn new language-related tasks on their own. It can do more than just translation: one of its important applications is answering questions and classifying information.

It was trained on 570 GB of clean data from 45 TB of curated data, which is a lot. Its main drawback is that OpenAI allows free access neither to the model for using it, nor to its code for enhancing it. It only offers a commercial API for getting results of the model. Consequently, only OpenAI’s researchers can contribute to it.

Like many others, GPT-3 only “speaks” English.

Wu Dao 2.0.

In a quantitative sense, Wu Dao 2.0. beats GPT-3 as it has been trained on 1.2 TB of Chinese text data, 2.5 TB of Chinese graphics data and 1.2 TB of English text data. It also has 1.75 trillion parameters, 10 times more than GPT-3 has.

Wu Dao 2.0. can work in various media modes and even draft 3D structures. It was announced as an open-source still has not arrived at GitHub for some reason.

Metaseq/OPT-175B

Meta, previously known as Facebook, has been often confronted with accusations in hiding important research results that humanity could have contributed from. Their recent attempt to make transformer models more available may help them to repair their ruined reputation.

As its name suggests, the transformer has 175 billion parameters. It has been created as a copy of GPT-3, to match its performance and ability.

Another advantage of Metaseq is that its GitHub repository hosts models with fewer parameters allowing scientists to fine-tune them for only specific tasks and avoid high maintenance and training costs associated with bigger transformer models.

However, it is not entirely open-source: the access is limited to research groups and must be requested by them and approved by Meta on a case-by-case basis.

Open GPT-X

It is always a pity when a scientific project emerges out of the fear of missing out and not because there is just enough inspiration for it. That is the case with the GPT-X project: it is nursed in Europe and branded as a response to GPT-3 and a tool for establishing Europe’s “digital sovereignty”. The German Frauenhofer Institute is the main drive of its development, supported by its long-term cooperation partners from the German and European industry and academic community.

GPT-X started just recently and there is not so much information about its progress.

GPT-J and GPT-NEO

Eleuther AI is an independent research group that pursues the goal of AI democratisation. They offer two smaller models: GPT-X with 60 billion parameters and GPT-NEO with only 6 billion. Oddly enough, GPT-X outperforms GPT-3 in coding tasks and is exactly as good in storytelling, information retrieval, and translation, making it a perfect machine for chatbots.

Google Switch Transformer

It was difficult to decide which names should land on this list and which not, but Google certainly deserves a mention, at least for two reasons.

The first one is that the Internet giant made its transformer open-source.

The second is that the Switch Transformer was given a novel architecture. It has neural networks nested in the layers of its main neural network. That allows to boost its performance without increasing the amount of the necessary computational power.

The Switch Transformer contains 1.600 billion parameters. Nonetheless, it did not let him overthrow GPT-3 in accuracy and flexibility yet; most probably, due to the lesser extent of the Switch Transformer’s training.

Conclusion

By the way, training is a pressing issue in the field: the researchers have already used all English texts available in this world! Probably, they need to follow Wu Dao’s example and switch to other languages soon.

Another problem is the one the Switch Transformer has already addressed: more network parameters with less computations. Running neural networks causes emissions of carbon dioxide in big amounts. Therefore, better performance must remain the main goal not only for commercial, but also for environmental reasons.

And this becomes possible thanks to the open-source projects: they supply this research field with the new (human) brains, new knowledge and ideas.

AI and natural language processing needs inspiration from practice. At AI Superior, we are following the updates and looking forward to implementing open-source projects findings for our industry customers and their needs. We invite you to tap on our expertise in AI and natural language processing for any use case, from online shops and marketing research to supporting engineering industries.

Let's work together!