gpt-j

GPT-J: GPT-3 Democratized

by | 04.07.2021 | Engineering, Investigation

There’s quite a similarity between cloud native and Artificial Intelligence. Things here move pretty fast, and if you are the curious kind, you would always be excited to explore new and exciting tech. It’s kind of bad as sometimes you get distracted by some new technology so much that you skip your work.

I am guilty of that 🙂

Anyways, let’s talk about such exciting development in our cousin field. It’s called GPT-J, “the open source cousin of GPT-3 everyone can use”.

The Leap: GPT-3

What’s GPT-3?

GPT-3 is the 3rd generation of GPT (Generative Pre-trained Transformer) language models with 175 billion parameters. It’s pretty powerful in comparison to GPT-2 where only 1.5 billion parameters were used. If you compare GPT-3 to all the remaining models, the one coming close to GPT-3 is Microsoft’s Turning NLG with just 17 billion parameters.

Now, you get some idea what the hype is all about since the last few years. The hype is about the deep learning model trained on massive text datasets with hundreds of billions of words which is capable of producing human-like text with 10x the nearest competitors parameters.

Uses of GPT-3

Now, as it’s a language model, the applications are vast. We, too, got access and are using GPT-3 in some of our blog posts, and it’s pretty good apart from producing a slight pale tone.

Read this post by GPT-3:

The list goes on, and GPT-3 enters new domains such as code completion/ generation. GitHub launched its Copilot, and we describe it as:

GitHub Copilot is an AI-assisted pair programmer that helps you write code more quickly and efficiently. GitHub Copilot extracts context from comments and code and provides quick suggestions for individual lines and entire functions. OpenAI Codex, a new AI system developed by OpenAI, powers GitHub Copilot.

The language model also has features such as chat, tweet classifier, outline creator, keyword extractor, HTML elements generator, and many more applications. Learn more here.

Commercial Uses of GPT-3

Apart from Copilot, Microsoft gave OpenAI a $1 billion investment and gave them exclusive rights to licence GPT-3 a year later. Over 300 GPT-3 projects are in the works, according to OpenAI, using a limited-access API. A tool for extracting insights from customer comments, a system that writes emails automatically from bullet points, and never-ending text-based adventure games are among them.

GPT-3 describing itself
GPT-3 describing itself. Cool isn’t it?

Now, the two big questions.

Is GPT-3 free?

NO, you need to pay, and it’s in beta preview. So, getting access is quite complicated and tiring. You need to wait months 🤷‍♂️

Our GPT-3 Davinci Engine Invoice! Learn about pricing here

Is GPT-3 open source?

NO, so all you can do is wait.

Open Source Triumphs: GPT-J

What’s GPT-J?

I have written a blog post previously on how CNCF helps the innovation cycle rolling. It’s a big organisation with hundreds of sponsors, and they are democratising cutting edge tech.

But now, GPT-J is not backed by an organisation. It doesn’t have that significant financial backing, so it’s trained on 6 billion parameters, a lot less than GPT-3 but look at the other options available.

The publicly available GPT-2 is trained on 1.5 billion parameters which are significantly less than GPT-J’s 6 billion parameters. Moreover, having so many parameters helps it perform on par with 6.7B GPT-3. So, these factors make it the most accurate publicly available model.

GPT-J gpt-3
GPT models compared

Anyone can use it and doesn’t has to wait to get access. The quality of content is quite good, and they are improving with every passing day.

Few problems with GPT-3

GPT-3 doesn’t have long-term memory. Thus it doesn’t learn from long-term interactions as people do.

Lack of interpretability is another problem that affects large and complex data sets in general. The dataset of GPT-3 is so extensive that its output is difficult to understand or analyse.

gpt-3 limitations
Lack of interpretability; more context needed

The limited input size is also a factor because transformers have a maximum input size; GPT -3 can only handle prompts that are a few sentences long.

The model takes longer to provide predictions since GPT-3 is so huge.

GPT-3 is no exception to the rule that all models are only as good as the data that was used to train them. This paper, for example, shows that anti-Muslim bias exists in GPT-3 and other big language models.

A problem the GPT-J goes to solve

It’s expected that GPT-J can’t solve all of the above problems with limitations. They lack hardware resources. But one thing Eleuther has done well is to try to remove the bias present in GPT-3 in GPT-J.

Eleuther’s dataset is more diversified than GPT-3, and it avoids some sites like Reddit, which are more likely to include questionable content. Eleuther has “gone to tremendous pains over months to select this data set, making sure that it was both well filtered and diversified, and record its faults and biases,” according to Connor Leahy, an independent AI researcher and cofounder of Eleuther.

Also, Eleuther could make developing similar tools like GitHub copilot without access to the GPT-3 API easy. People can create without being limited to API access.

Final Thoughts

Now, talk about any language model, and it will have a considerable amount of weakness, bias and silly mistakes, which is entirely normal. Work is in progress, and we are not even close to AI’s taking over everything as sceptics imagine. Sure, AI is about to change the world, but we are pretty far from that.

GPT-J is a small step towards democratising transformers, and GPT-3 is a small step towards human-like language models. Try GPT-J out here.

Feeling exploratory? If you want to learn about the technology that helped train GPT-3? It’s Kubernetes, a container orchestration technology, and you can read more about it here:

Subscribe to our newsletter for excellent posts on cloud native technology and exploratory posts like this delivered to you weekly.

Happy Learning!

CommunityNew

The DevOps Awareness Program

Subscribe to the newsletter

Join 100+ cloud native ethusiasts

#wearep3r

Join the community Slack

Discuss all things Kubernetes, DevOps and Cloud Native

Related articles6

Startup speed, enterprise quality

Startup speed, enterprise quality

Liebe Kunden, Partner und Kollegen,2021 ist vorbei und uns alle erwarten neue Herausforderungen und Ziele in 2022.In den letzten 3 Jahren hat sich p3r von einer One-Man-Show zu einer festen Größe im deutschen Cloud-Sektor entwickelt. Mit inzwischen 11...

Introduction to GitOps

Introduction to GitOps

GitOps serves to make the process of development and operations more developer-centric. It applies DevOps practices with Git as a single source of truth for infrastructure automation and deployment, hence the name “Git Ops.” But before getting deeper into what is...

Kaniko: How Users Can Make The Best Use of Docker

Kaniko: How Users Can Make The Best Use of Docker

Whether you love or hate containers, there are only a handful of ways to work with them properly that ensures proper application use with Docker. While there do exist a handful of solutions on the web and on the cloud to deal with all the needs that come with running...

Cilium: A Beginner’s Guide To Improve Security

Cilium: A Beginner’s Guide To Improve Security

A continuation from the previous series on eBPF and security concerns; it cannot be reiterated enough number of times how important it is for developers to ensure the safety and security of their applications. With the ever expanding reach of cloud and software...

How to clean up disk space occupied by Docker images?

How to clean up disk space occupied by Docker images?

Docker has revolutionised containers even if they weren't the first to walk the path of containerisation. The ease and agility docker provide makes it the preferred engine to explore for any beginner or enterprise looking towards containers. The one problem most of...

Parsing Packages with Porter

Parsing Packages with Porter

Porter works as a containerized tool that helps users to package the elements of any existing application or codebase along with client tools, configuration resources and deployment logic in a single bundle. This bundle can be further moved, exported, shared and distributed with just simple commands.