GPT-3.5 vs GPT-4: Comparing Performance in Language Learning Models

One may assume that since 3.5 < 4, GPT-4 would clearly be a better option (disregarding any financial reasoning as GPT-4 is not free yet). But this is not simply an upgrade of the same Language Learning Model (LLM). Check out the main difference:

GPT 3.5

GPT-3.5 is a large language model (LLM) that is specifically designed for chat applications. It uses a text-to-text prompt-response algorithm to generate responses to user queries.

GPT-4

GPT-4 is a large language model (LLM) with a trillion parameters. It is specifically designed for a variety of natural language processing (NLP) tasks, including text generation, translation, and question answering. GPT-4 uses a generative pre-trained transformer architecture, which allows it to learn long-range dependencies in text. It is referred to as a multimodal model because it can also process and understand other types of data, such as images and audio.

-Image from an article by Eric Griffith on PCMAG.com, courtesy of AXSemantics.

Differences Within Text-To-Text Prompting

This morning I decided to run some test prompts asking which of two options is grammatically correct. I tested the prompt, shown below, on GPT-3.5 and GPT-4.

Jon:

Which is grammatically correct, option A or option B?

A) You will see Houston assisting with the build and asking "How would you like to start your new project?" Choose the recommended option, "Include sample files" by pressing ENTER.

B) You will see Houston assisting with the build and asking "How would you like to start your new project?". Choose the recommended option, "Include sample files" by pressing ENTER.

The fact that GPT-3.5 was the only one to correctly respond to my prompt on the first response is why I decided to share this info and some research I've done in this article.

GPT-4:

Option A is grammatically correct.

In option B, there is an unnecessary space before the period and no space after the period, which is incorrect.

Correct punctuation in English writing dictates that there should be no space before a period (or other punctuation marks like commas, colons, and semicolons), and there should always be a space after the period. So the correct version is "sentence." Followed by another "Sentence."

Here's the correct version:

"You will see Houston assisting with the build and asking 'How would you like to start your new project?' Choose the recommended option, 'Include sample files,' by pressing ENTER."

But it does not point out the need for the comma after "asking", as GPT-3 does on its first response. So I inquired:

Jon:

GPT 3.5 told me that the following is correct. Can you clarify:

"You will see Houston assisting with the build and asking, 'How would you like to start your new project?' Choose the recommended option, 'Include sample files,' by pressing ENTER."

GPT-4:

I apologize for any confusion. The sentence you provided is indeed grammatically correct.

The sentence correctly places the question posed by Houston in quotation marks, and then uses a comma to separate this direct quotation from the rest of the sentence. After the quotation, the narrative continues with the instruction to "Choose the recommended option, 'Include sample files,' by pressing ENTER."

So, to clarify:

"You will see Houston assisting with the build and asking, 'How would you like to start your new project?' Choose the recommended option, 'Include sample files,' by pressing ENTER."

This sentence is correct and appropriately punctuated. I apologize for any misunderstanding.

Explanation

So why?!?! Well, remember what we discussed about the main differences in the models. But still, it's not like GPT-4 cannot handle text! So with that in mind, here is a quick list of reasons you may be finding GPT-3.5 to produce better, more accurate results than GPT-4.

Randomness: Both GPT-3.5 and GPT-4 use a certain amount of randomness in their outputs. This means that sometimes, one might generate a more suitable response than the other simply due to chance.
Training Differences: GPT-4, being a newer version, may have been trained with a slightly different objective function or dataset, which could lead to differences in responses. The model's training also involves a certain amount of randomness, which could contribute to differences in performance.
Prompt Sensitivity: Different versions of the model may have varying degrees of sensitivity to the way a prompt is phrased. Slight differences in wording, punctuation, or context can lead to significantly different responses. If GPT-3.5 seems to be performing better, it could be that it's just more attuned to the specific style or context of your prompts.
Bias in Perception: Human evaluators often have a recency bias and might remember better or worse examples more clearly. It's also possible that your perception of the quality of the models' responses is affected by your expectations for each version.
Quality and Quantity of Training Data: GPT-4 was trained on more data than GPT-3.5, but having more data doesn't always translate to better performance on all tasks. The quality of the training data, the variety of the data, and how well it matches the task at hand are also important factors.
Task Complexity: Certain tasks may be better suited to the strengths of one model over the other. For example, one model might be better at generating creative stories, while another might excel at answering factual questions or giving grammatical advice.

I hope this helps you understand these powerful tools a bit more while increasing the value they provide you. It's all about knowing what to ask (prompt), sure. But it's perhaps even more important that you know WHO, or WHAT is receiving and responding to your prompts.

Never hesitate to hit me up with questions, comments, jobs, or anything tech related!!! Please ❤️ if you find value and subscribe to the newsletter for more articles about React, Web Development, AI-Language Models (ChatGPT), React Native, Typescript, TailwindCSS, Tutorials, Learning Aids, and more!!!

-Jon Christie

jonchristie.net

Is GPT-3.5 Better Than GPT-4?

Rarely but sometimes, yeah!

Table of contents

GPT 3.5

GPT-4

Differences Within Text-To-Text Prompting

Jon:

GPT-4:

Jon:

GPT-4:

Explanation