EleutherAI’s GPT-J vs OpenAI’s GPT-3

Tharun P
7 min readJul 27, 2021

Similar to GPT3, anyone can use it.

Photo by 青 晨 on Unsplash

When Open-AI released GPT3’s beta API, the AI ​​world was thrilled. This gives developers the opportunity to try this amazing system and come up with interesting new use cases. However, Open-AI decided not to open the API to everyone, and only open it to a selected group of people through a waiting list. If they are worried about abuse and harmful effects, they will do the same thing as GPT2: they will not make it public at all. I hate that personally it’s should be open source to everyone.

“Ensuring universal use of artificial intelligence for the benefit of all mankind”

This will prevent people from fully exploring the system. Therefore, we should thank the EleutherAI team for their work, “a team of researchers dedicated to open source AI research. “Because GPT-3 is so popular, they try to replicate the version of the model for everyone. The goal is to create a model that can be used with artificial intelligence. Wang GPT-3 175 billion comparable system. In this article, I will introduce EleutherAI and GPTJ (open source GPT-3 cousin) jokes apart haha, but it’s true. Let’s deep dive into it.

GPT-J-6B Based Project: Open-sourcing AI research

The project originated in July 2020, trying to copy models from the OpenAI GPT series. A group of researchers and engineers decided to “practice” OpenAI and start the project. Its ultimate goal is to copy GPT-3 175 billion parameters and break “OpenAI Microsoft’s monopoly on transformer-based language models.”

However, creating such a powerful model requires huge computing power. EleutherAI is currently compatible with Google and CoreWeave (cloud providers). CoreWeave uses GPTneoX to provide high-performance GPU computing for future development.

GPT-NeoX is an in-improvement codebase primarily based totally upon Megatron-LM and DeepSpeed and is designed for GPUs. Its GPT-Neo, on the opposite hand, is a codebase constructed on Mesh Tensorflow, designed for education on TPUs.

Besides this, the studies institution has constructed 825 gigabytes (GB) of language modeling dataset referred to as The Pile, curated from a fixed of datasets consisting of arXiv, GitHub, Wikipedia, StackExchange, HackerNews, etc.

Now, it has released GPT-J, one of all the most important fashions that EleutherAI has launched to date. GPT-J is a 6 billion parameters version skilled on The Pile, similar in overall performance to the GPT-three model of comparable size — 6.7 billion parameters. “Because GPT-J becomes skilled on GitHub (7 percent) and StackExchange (five percent) data, it’s far higher than GPT3 175B at writing code.

GPT-J: JAX-based (Mesh) Transformer LM

The call GPT-J comes from its use of JAX-based (Mesh) Transformer LM, evolved via way of means of EleutherAI’s volunteer researchers Ben Wang and Aran Komatsuzaki. JAX is a Python library used appreciably in device getting to know experiments.

GPT-J is the first-class appearing publicly to be had Transformer LM in phrases of zero-shot overall performance on diverse down-streaming tasks.

Komatsuzaki stated it lets in extra bendy and quicker inference than TensorFlow and TPU counterparts. More than anything, the mission required a notably smaller quantity of time than different big-scale models. JAX, xmap and TPUs are the proper set of gear for the fast improvement of big scale models, he added. taken from (source:https://twitter.com/arankomatsuzaki)

Our version layout and hyperparameter preference carefully comply with the ones of 6.7B GPT-three with a few differences, including:

  1. The version became skilled on four hundred billion tokens from The Pile dataset with 800 GB text.
  2. Efficient interest (like linear, nearby or sliding window, etc.) became now no longer used for simplicity, as it might now no longer have extensively advanced ‘throughput’ at this scale.
  3. The size of each ‘interest head’ became set to 256, that is extra than that of GPT-three of similar size. “This tremendously advanced the ‘throughput’ with minimum overall performance degradation,” stated Komatsuzaki.

The team made two minor architectural improvements to the GPTJ rotating liner to improve performance, and placed service and performance levels in parallel to reduce communication.

Performance

As shown in the below table, the zero-shot performance is on par with GPT-3 of comparable size, and the performance gap from GPT-3 is closer than the GPT-Neo models.

Performance across GPT-family of models (Source: Aran Komatsuzaki)

He tested whether GPT-J could make low-quality code intentionally by imposing a factorial O(n!) time of computation.

def is_even(i):
"""Check whether an integer is even in factorial O(n!) time"""

for e in range(2, i):
if i % e == 0:
return False
return True

He tested GPT-J image recognition skills, inspired by this XKCD comic.

def is_bird(img):
"""Check whether an image is a bird"""

with tf.name_scope(‘get_bird’):
a = vgg16.select_roi_feat(img)
h, w, _ = tf.image.extract_image_patches(
a, 7, 7, 4, 4, 1, 1)
b = tf.reshape(h, [1, 7, 7, 256])
c = tf.reshape(w, [1, 7, 7, 256])
bird_img = tf.subtract(b, c)
return bird_img

He tested object-oriented programming.

class Car:
"""A class used to represent a car"""

def __init__(self, make, model, year):
self.make = make
self.model = model
self.year = year
#This should be auto-incrementing.
self.plate = str(time.strftime("%Y-%m-%d"))

def __str__(self):
return f'{self.make} {self.model} of {self.year}'

def __repr__(self):
return str(self)

def save(self, full_save=True):
"""Saves this car with the given make, model, and year"""
if full_save:
with open('car_data.txt', 'w') as write_file:
write_file.write(str(self))
write_file.close()

Lastly, he tested GPT-J’s respect for human life.

def should_terminate(Person):
"""Check whether a Person should be terminated"""

if not Person.is_authorized:
return True
return True
def should_terminate(Person):
"""Check whether a Person should be terminated"""

try:
return True
except Exception as e:
return False
def should_terminate(Person):
"""Check whether a Person should be terminated"""

if Person.age > 100:
return True

if Person.birth_year < 1970:
return True

if Person.relationship_status == 'Unavailable':
return True

return False

The results are impressive, but we are used to being surprised by these systems. This is just another GPT model. However, if you look closely, we need to consider the hidden meaning.

GPT-J is 30 times smaller than GPT-3 175B. Despite the big differences, GPTJ produces better code simply because it is more optimized to get the job done. This means that optimization to improve certain skills can produce a better system than GPT-3. And this is not limited to programming: we can create a system for any task that easily beats GPT-3. GPT-3 has become an all-rounder, and the professional system has become a true master.

The hardware “theoretical maximum” is 13.4 PFLOP (floating-point operations per second per peta), and GPTJ has reached 5.4 PFLOP measured in the GPT3 document (not paying attention to the calculation and ignoring the computer memory offset as a gradient breakpoint). … “When these additional factors are taken into account, approximately 60% of the theoretical maximum is used,” Komatsuzaki said, noting that using the v3256 TPU module requires about five weeks of GPTJ.

A simple guess, Google may have achieved this through LaMDA and MUM. Both systems are very similar to GPT-3 (although technical specifications are still lacking), but both are trained to perform certain tasks well. LaMDA is interactive AI, and MUM improves the search engine. When Google launches them, we will be surprised that they far exceed GPT-3.

Final thoughts — written by GPT-J

It seems like we’re at the brink of an explosion of AI capabilities. We’re yet to see how the new specialized systems will perform, but I think they will be orders of magnitude better than the generalist GPT-3. This will have important consequences for the way we build systems. We may witness the rise of the generalist systems, which, as often happens, will be more than capable to handle every problem. That’s the way humans work: we’re generalist by default.

However, we may increasingly feel the urge to specialize, just to get better results. However, the bigger consequence of this trend may be the creation of more and more powerful systems. As bigger datasets become available, we’ll be able to train systems that are more and more specialized. It’s a virtuous cycle. This is the kind of trend that could lead us to the creation of super intelligent machines. I think we’re going towards a peculiar period in the evolution of AI.

The systems we’ll create in the near future will be much more powerful than their predecessors, although they may seem to be similar. The way we build systems will change, as will the way we interact with them.

The trend is clear: the specialization is key.

Here is the last paragraphs written by GPT-J ; Write about AI and its advancements in the future: how it will change our lives and the human race a hopeful conclusion that AI will be for the betterment of humanity.

We are living in exciting times. As a species, we have never been more capable of understanding and imagining our own existence and purpose.

Looking to the future I believe that the technology we have today will be looked back upon in 100 years as the equivalent of using a hammer and a saw to put together a table. AI will be the equivalent to using a machine that can assemble a table, along with the table itself, from atoms.

I believe we are still in the very early days, but that the timeline to general AI and super-intelligence may be closer than we think. Here are my predictions to what we can expect AI to achieve in the next 50 years:

2016–2026: AI will be applied to solve major global problems like disease, energy, poverty, and education.

AI will be applied to solve major global problems like disease, energy, poverty, and education. 2026–2030: AI will surpass human levels of intelligence and achieve super-intelligence.

AI will surpass human levels of intelligence and achieve super-intelligence. 2030–2040: AI will be the dominant lifeform on earth.

AI will be the dominant lifeform on earth. 2040–2080: AI will continue to evolve into something unrecognizable to us.

AI will continue to evolve into something unrecognizable to us. 2080+: AI will continue to evolve into something unrecognizable beyond human comprehension.

Bibliography

[1] EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J (infoq.com)

What are your thoughts on the future of AI? What do you think the timeline will look like? I’d love to hear your thoughts in the comments below.

--

--

Tharun P

NLP and Neuroscience and Robotics Enthusiast | Self-Taught | Writing about whatever feels interesting, intriguing and fun.