GPT-J: GPT-3 Democratized

by | 04.07.2021 | Engineering, Investigation

There’s quite a similarity between cloud native and Artificial Intelligence. Things here move pretty fast, and if you are the curious kind, you would always be excited to explore new and exciting tech. It’s kind of bad as sometimes you get distracted by some new technology so much that you skip your work.

I am guilty of that 🙂

Anyways, let’s talk about such exciting development in our cousin field. It’s called GPT-J, “the open source cousin of GPT-3 everyone can use”.

The Leap: GPT-3

What’s GPT-3?

GPT-3 is the 3rd generation of GPT (Generative Pre-trained Transformer) language models with 175 billion parameters. It’s pretty powerful in comparison to GPT-2 where only 1.5 billion parameters were used. If you compare GPT-3 to all the remaining models, the one coming close to GPT-3 is Microsoft’s Turning NLG with just 17 billion parameters.

Now, you get some idea what the hype is all about since the last few years. The hype is about the deep learning model trained on massive text datasets with hundreds of billions of words which is capable of producing human-like text with 10x the nearest competitors parameters.

Uses of GPT-3

Now, as it’s a language model, the applications are vast. We, too, got access and are using GPT-3 in some of our blog posts, and it’s pretty good apart from producing a slight pale tone.

Read this post by GPT-3:

The list goes on, and GPT-3 enters new domains such as code completion/ generation. GitHub launched its Copilot, and we describe it as:

GitHub Copilot is an AI-assisted pair programmer that helps you write code more quickly and efficiently. GitHub Copilot extracts context from comments and code and provides quick suggestions for individual lines and entire functions. OpenAI Codex, a new AI system developed by OpenAI, powers GitHub Copilot.

The language model also has features such as chat, tweet classifier, outline creator, keyword extractor, HTML elements generator, and many more applications. Learn more here.

Commercial Uses of GPT-3

Apart from Copilot, Microsoft gave OpenAI a $1 billion investment and gave them exclusive rights to licence GPT-3 a year later. Over 300 GPT-3 projects are in the works, according to OpenAI, using a limited-access API. A tool for extracting insights from customer comments, a system that writes emails automatically from bullet points, and never-ending text-based adventure games are among them.

GPT-3 describing itself
GPT-3 describing itself. Cool isn’t it?

Now, the two big questions.

Is GPT-3 free?

NO, you need to pay, and it’s in beta preview. So, getting access is quite complicated and tiring. You need to wait months 🤷‍♂️

Our GPT-3 Davinci Engine Invoice! Learn about pricing here

Is GPT-3 open source?

NO, so all you can do is wait.

Open Source Triumphs: GPT-J

What’s GPT-J?

I have written a blog post previously on how CNCF helps the innovation cycle rolling. It’s a big organisation with hundreds of sponsors, and they are democratising cutting edge tech.

But now, GPT-J is not backed by an organisation. It doesn’t have that significant financial backing, so it’s trained on 6 billion parameters, a lot less than GPT-3 but look at the other options available.

The publicly available GPT-2 is trained on 1.5 billion parameters which are significantly less than GPT-J’s 6 billion parameters. Moreover, having so many parameters helps it perform on par with 6.7B GPT-3. So, these factors make it the most accurate publicly available model.

GPT-J gpt-3
GPT models compared

Anyone can use it and doesn’t has to wait to get access. The quality of content is quite good, and they are improving with every passing day.

Few problems with GPT-3

GPT-3 doesn’t have long-term memory. Thus it doesn’t learn from long-term interactions as people do.

Lack of interpretability is another problem that affects large and complex data sets in general. The dataset of GPT-3 is so extensive that its output is difficult to understand or analyse.

gpt-3 limitations
Lack of interpretability; more context needed

The limited input size is also a factor because transformers have a maximum input size; GPT -3 can only handle prompts that are a few sentences long.

The model takes longer to provide predictions since GPT-3 is so huge.

GPT-3 is no exception to the rule that all models are only as good as the data that was used to train them. This paper, for example, shows that anti-Muslim bias exists in GPT-3 and other big language models.

A problem the GPT-J goes to solve

It’s expected that GPT-J can’t solve all of the above problems with limitations. They lack hardware resources. But one thing Eleuther has done well is to try to remove the bias present in GPT-3 in GPT-J.

Eleuther’s dataset is more diversified than GPT-3, and it avoids some sites like Reddit, which are more likely to include questionable content. Eleuther has “gone to tremendous pains over months to select this data set, making sure that it was both well filtered and diversified, and record its faults and biases,” according to Connor Leahy, an independent AI researcher and cofounder of Eleuther.

Also, Eleuther could make developing similar tools like GitHub copilot without access to the GPT-3 API easy. People can create without being limited to API access.

Final Thoughts

Now, talk about any language model, and it will have a considerable amount of weakness, bias and silly mistakes, which is entirely normal. Work is in progress, and we are not even close to AI’s taking over everything as sceptics imagine. Sure, AI is about to change the world, but we are pretty far from that.

GPT-J is a small step towards democratising transformers, and GPT-3 is a small step towards human-like language models. Try GPT-J out here.

Feeling exploratory? If you want to learn about the technology that helped train GPT-3? It’s Kubernetes, a container orchestration technology, and you can read more about it here:

Subscribe to our newsletter for excellent posts on cloud native technology and exploratory posts like this delivered to you weekly.

Happy Learning!

Join the Community

The DevOps Awareness Program

Subscribe to the newsletter

Join 100+ cloud native ethusiasts


Join the community Slack

Discuss all things Kubernetes, DevOps and Cloud Native

More stories from our blog

How to Install Portainer on Remote Server ft. VSCode?

How to Install Portainer on Remote Server ft. VSCode?

Portainer is one of the most popular and trusted GUI for managing Docker, Swarms, ACIs and Kubernetes. The company boasts on its’ website for having 500K users, and there’s no doubt to the number looking at how easy it makes managing the tools. This post goes on the...

What’s new in Python-Tuf v0.18.0?

What’s new in Python-Tuf v0.18.0?

Python-Tuf v0.18.0 recently came, and it is quite a big update with major and minor changes. We will go through all of those changes, additions, fixes and removals in this document. Without further a due, let's start! What is Python-Tuf? The Update Framework (TUF) or...

What’s new in Envoyproxy v1.19.1?

What’s new in Envoyproxy v1.19.1?

Envoyproxy came with its new version a few days ago. Version 1.19.1 comes with very few updates. It provides a few minor behavioural changes and a few bug fixes to make the user experience smoother. In this article, we will cover all of the new changes. Let's start!...

What’s new in Jaeger v1.26.0?

What’s new in Jaeger v1.26.0?

Jaeger v1.26.0 recently came. It has a few changes in its backend. In this article, we will cover all of this in a straightforward way. We will see all of the fixes and the new features that the devs have added. Let's start! What is Jaeger? Jaeger is a graduated CNCF...

Prometheus: As Simple As Possible

Prometheus: As Simple As Possible

Distributed systems help an organisation absorb countless benefits but at the cost of complexity. With the rise of the adoption of container orchestrators like Kubernetes, a need for monitoring and alerting systems came. One such system is Prometheus which is famous...

Bootstrap K3S Data: For Beginners

Bootstrap K3S Data: For Beginners

For Kubernetes users, handling data management tasks and other analysis needs can become difficult with the inclusion of edge based devices. Internet of Things (IoT) as a whole is designed to complement online services for devices commonly used by people such as air...

What’s new in Ingress-Nginx Controller v1.0.0?

What’s new in Ingress-Nginx Controller v1.0.0?

Ingress-Nginx controller for Kubernetes came with its new release almost a month earlier. I know we are pretty late in documenting this but trust me, this update is pretty big. And in this article, we will see all of the new features and essential bug fixes and...

Getting gRPC Right: An Introduction and Review

Getting gRPC Right: An Introduction and Review

The question of APIs and their best implementation through online websites will always remain a tough nut to crack as the web undergoes scaled changes each year. It’s hard to think that the web was once draped by HTML and PHP alone until CSS and Javascript made...

What’s new in TikV v5.0.4?

What’s new in TikV v5.0.4?

TikV came up with its new release this month. It is a small one, but we can see a couple of improvements and some bug fixes along the way. In this article, we will see all of those and view the recent changes. Let's start! What is TikV? TiKV is a graduate project of...

Interested in what we do? Looking for help? Wanna talk about software strategy?