ChatGPT has become the most common tool for everyday work, for all of us in a very short time of period.
ChatGPT has become our new search engine, our Wikipedia, our dictionary, and even our colleague who helps us with our regular tasks.
I am thinking about the day when these types of tools will turn into machines, look similar to humans, and have the advantage of knowing everything 😐
The more we learn about AI, the more we understand it and use it in a better way. Let’s understand chatGPT and how we calculate perplexity to identify, if the content is written by AI or Humans, We will also understand the technology behind chatGPT working.
What is ChatGPT?
ChatGPT is an advanced language model that understands the natural language, it is trained on different resources. ChatGPT is developed by OpenAI and it is based on GPT-4 architecture.
GPT-4 stands for generative pre-trained transformer, this is a powerful AI model that can generate human-like content.
ChatGPT is a conversational chatbot that was launched as a prototype on Nov 30, 2022. You can use ChatGPT for writing content, drafting emails, debugging your code, and writing a thesis.
Technologies behind ChatGPT
Most of us already know, what is chatGPT and how we can use it for our productivity, But only a few of us know how chatGPT actually works and what technologies working behind it.
Let’s start with Artificial Intelligence
Artificial Intelligence has two main fields, Machine Learning and Deep Learning, both are called weak AI.
Machine Learning handles structured data like tables and Deep Learning handles unstructured data like documents.
You must be thinking of if these two are weak AI then what is strong AI?
Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI) are considered Strong AI.
Deep Learning has multiple types of deep learning models, with time These models evolved and now we are using one of the deep learning models called transformers on which the chatGPT is based.
Type of Deep Learning models
- Artificial Neural Network (ANN)
- Convolutional Neural Network (CNN)
- Recurrent Neural Network (RNN)
- Transformers (chatGPT based on it)
A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filter (or kernel) optimization.
Difference between feed-forward neural network and Recurrent Neural Network
Feed-Forward Neural Network | Recurrent Neural Network |
Information moves in the forward direction | information goes through the cycle of a loop |
Not good at predicting the future | Good at predicting the future |
Check for only current input | Check for current input and learn from past inputs |
Can’t remember past | It can remember due to its internal memory, create a copy of output, and loop it back into the network |
Now, let’s compare the Recurrent Neural Network and Transformers
Difference between Recurrent Neural Networks and Transformers
Recurrent Neural Network | Transformers |
Good for the sequence Challenges | Good for extended Sequence |
Lost info when additional element added into the sequence | No loss of information because of the hidden state of each step in the encoder |
Maintain an internal state that is updated at each time step and used to process the current input and the previous state. | Use a self-attention mechanism to weigh the importance of different parts of the input at each position. |
This allows them to capture temporal dependencies in sequential data | This allows them to effectively process sequential data of varying lengths, but not explicitly maintain an internal state |
Transformers are designed for language generation tasks, translations, summarization, and text completion.
After chatGPT, Article writing becomes very easy, so some people just copy and paste chatGTP content and they called themselves professional bloggers without knowing that Google and other search engines can detect, whether an article is written by humans or an AI.
So, How do these search engines detect, whether the content written on a blog post is written by AI or a human? Let’s find out.
What is Perplexity? and How Perplexity is calculated for a language model to check whether the content is written by an AI or a human?
Perplexity is a measure of how well a probability model predicts a sample, it is a way to evaluate a language model, Perplexity measures how well a language model can predict a given sequence of words
A lower perplexity indicates that the model is more confident and accurate in its predictions that the content written is AI-generated, while a higher perplexity suggests higher uncertainty and less accurate predictions and is more like human-written content.
Mathematically, perplexity is calculated as follows for a language model:
Perplexity(W) = 2^H(W)
Where:
- W represents a sequence of words or tokens.
- H(W) is the entropy of the sequence, which measures the average number of bits needed to represent each token in the sequence according to the model’s predictions.
Suppose we have a very basic language model that predicts the next word in a sentence based on the previous word. Here’s our text corpus:
Corpus: "I am happy. I am excited."
We want to calculate the perplexity of the sentence “I am excited.” according to our language model. To do this, we’ll follow these steps:
Step 1: Tokenize the sentence:
Tokens: ["I", "am", "excited", "."]
Step 2: Calculate the probability of each token given the previous token: Let’s say our simple language model estimates probabilities as follows:
- P(“am” | “I”) = 0.8
- P(“excited” | “am”) = 0.6
- P(“.” | “excited”) = 0.9
Step 3: Calculate the entropy: Entropy (H) measures the average number of bits needed to represent each token in the sequence. It’s calculated using the formula:
H(W) = - (1/N) * Σ[log2(P(wi | wi-1))]
Where N is the number of tokens in the sequence and wi-1 and wi are the previous and current tokens, respectively.
For our example, the entropy calculation is:
H("I am excited.") = - (1/3) * [log2(0.8) + log2(0.6) + log2(0.9)]
Step 4: Calculate perplexity: Perplexity (PP) is calculated as 2 raised to the power of the entropy:
Perplexity(W) = 2^H(W)
For our example:
Perplexity("I am excited.") = 2^H("I am excited.")
Now, let’s plug in the values and calculate:
H("I am excited.") ≈ - (1/3) * [log2(0.8) + log2(0.6) + log2(0.9)] ≈ 1.361
Perplexity("I am excited.") ≈ 2^1.361 ≈ 2.355
So, the perplexity of the sentence “I am excited.” according to our simple language model is approximately 2.355.
Lower the value it is more likely to be generated and if the value is higher it is more likely to be human-written.
Thank you for reading this. Happy Learning 🙂
Checkout our React JS articles