AI has been a great icebreaker for any formal or informal conversation during the last two years. Everyone knows what AI is and what it can do. But, it has either been difficult to understand or an unknown to learn how AI, as we know it today, actually works.
This is a deep dive into Large Language Models or LLMs — the tech that makes AI work.
If you haven’t been living under a rock, ChatGPT, Bard, Bing, etc., one of these is your go-to search or research companion. These new tech inventions have altered how humans seek information from the “world wide web.”
AI models largely have been in purely research mode and recently broke out to the general audience when ChatGPT broke the internet in October 2022. That’s not the whole truth, though. Over the years, we’ve used AI in various shapes and forms, sometimes without realizing we’re using AI. Here are some great examples of AI that have formed an essential part of our lives, but we didn’t care for them as much as we do for ChatGPT:
- Google Assistant, Siri, Amazon Alexa
- Gmail and other emails apps machine-learning based spam filtering
- Amazon’s shopping recommendations
- Spotify’s music recommendations and many more are hiding in plain sight.
So, what is so different about AIs such as ChatGPT and Bard and Bing Chat? The answer lies in their models and the capabilities these models provide. These text-based AI systems are built on “Large Language Models” or “LLMs,” which make them potent and help them share (write) information in a manner any human would. So, let’s look at what LLMs are, and we’ll also take at the most prominent models of the recent past.
A Brief History of AI
AI never was the AI we know today. The first AI System, called Theseus, was just a maze-solving mouse created by Claude Shannon in 1950. Theseus, the mouse, could learn the path to solve the maze, and when presented with a different maze, he would be able to learn from his surroundings and experience of solving mazes to find the solution. Below you can see the “mighty mouse” in action.
Theseus was the first of its kind machine learning in action system. It was trained on <0.01 PetaFLOPs of data —insignificant compared to today’s AI systems. OpenAI’s latest AI model, GPT-4, is conditioned on over 21 billion PetaFLOPs of data —2100 billion times bigger than the Theseus dataset. The amount of data and complexity of logic used by the AI models of the recent past is beyond incomprehensible.
What are Language Models?
Language Models are an outgrowth of Natural Language Processing (NLP), a field of AI focused on the interaction between computers and human language. But what exactly are language models? Let’s break it down.
In simple terms: A language model is a mathematical representation of language. It can understand a given text or audio and generate sequences of words with respect to prompts it receives. Each piece of language fed to or generated by a language model is first represented in its corresponding mathematical expression and then converted into human language based on the underlying logic.
A language model learns the formative structure of a language segment (input) and then predicts the sequence of words (output) or suggests what word might come next in a sentence (output).
Types of Language Models
Statistical Language Models: Claude Shannon’s landmark paper, “A Mathematical Theory of Communication,” published in 1948, laid the groundwork for understanding language structure from a probabilistic perspective. This meant computers could now use statistics to understand and write language based on the vast dataset they could already access. They usually develop language with probabilities of the next word given a previous word.
Neural Network Language Models: The statistical language models were a great example of early LMs, which could develop language reliably but could go horribly wrong as complexity rose. In the 2000s, deep learning quickly became a thing. Now, computers would not require constant data to create outputs and could rely on “learning mechanisms.” This gave language models characteristics similar to human brains (or neurons); hence, they were called Neural Language Models.
These models use “artificial neural networks” to understand and generate text, which allows them to capture longer dependencies and create more coherent text. These were the first language models which could learn from their work and thus required decreasing data streams to keep going —eventually, no external data would be needed to improve the model.
The neural network architecture of language models had a significant drawback. It arose from their sequential nature to training and developing language. As a result, these models were very time-consuming and would require significant resources for every computation.
Transformers: To counter this limitation, Google Brain —Google’s AI team— introduced a new version of the deep neural network-based language model in a paper called “Attention is All You Need.” It has since revolutionized language modeling and is the basis for many Large Language Models we know today. The Transformer is best known for its attention mechanism, which weighs the significance of different parts of the input data to create contextually relevant language outputs.
Evolution of AI-Language Models
Classifying language models into three objective segments might limit how far the language models have come since the late 1940s. As you would know by now, ChatGPT took the world by storm, DALL-E and Stable Diffusion took center stage with an array of deep fake controversies, and the likes of Jasper and Character.AI raised money rapidly to become some of the fastest-growing tech companies of our times.
Let’s take a quick look at the most significant models of the most significant years in AI;
1970s-1980s: N-gram models and Speech Recognition
Yes, Google Assistant and Facebook were not the first to eavesdrop on your conversation. Speech Recognition long way back in 1971, when IBM created the “Automatic Call Identification System,” which enabled engineers all over the US to talk to and receive “spoken” answers from a computer system located in Raleigh, North Carolina.
N-gram models, on the hand, pre-dated speech recognition systems and were the first applications of the statistical language models.
2000s-2010s: Recurrent Neural Networks (RNNs)
Unlike the previous language models, RNNs had connections that could loop back to themselves, allowing them to retain a memory of prior inputs. This feature made RNNs more suitable for larger data sequences like text. RNNs had some exciting applications that would enable computation systems to learn dense word representations (paragraphs, stories, etc.) —these were called “word embeddings.” In 2013, Google introduced one of the most complex word embedding models called Word2Vec.
2018: BERT and GPT
Now, we’re venturing into the transformer era brought about by, again, AI innovation at Google. In 2018, the world saw the introduction of two of the essential language models of this century: BERT (Bidirectional Encoder Representations from Transformers) by Google and GPT (Generative Pre-trained Transformer) by OpenAI. BERT is designed to understand the context of words in search queries, and GPT aims to generate human-like text (we all know how good it is).
BERT and GPT marked the beginning of the significant language models era or, as you would read in most places, LLMs.
2022: GPT-4, Bard, and Beyond
In 2020, OpenAI released GPT-3, which contains 175 billion parameters, making it the most prominent language model. GPT-3 set new standards in size and capabilities, demonstrating remarkable performance in a wide range of NLP tasks with minimal fine-tuning, such as writing whole blog articles and carrying conversations better than humans.
Then in November 2022, OpenAI released ChatGPT, a general-purpose chatbot that could write about anything and everything. ChatGPT also became the fastest consumer application to reach 10 million users in 40 days —the previous best was Instagram which took 355 days. This also brings to focus the practicality and functionality of AI for use in daily life at scale.
ChatGPT’s popularity also gave birth to a new tech ecosystem that enabled the general public to write better, faster, and more about anything. Many startups with their front end and some added unique features started leveraging GPT’s capabilities for particular use cases such as finance, legal, marketing, etc.
Then in March 2023, OpenAI released a much more complex and advanced GPT-4, which was soon followed by Google’s release of Bard —its general-purpose bot— and then the language model PaLM 2 (May 2023).
GPT-4 (1 trillion parameters, OpenAI) and PaLM-2 (340 billion parameters, Google) are the most complex language models of our time, closely followed by the likes of LLaMA (65 billion parameters, Meta). These models are bringing innovation and disruption across all industries in one way or another. The most common applications are related to text generation and then technological innovation across various functions that leverage these models’ analytical and creative capabilities. Let’s take a quick look at what these modern language models are capable of;
Applications of Large Language Models
An already laborious job with a critical lack of talent, Cybersecurity has the most to gain from LLMs. These AI tools can be trained to learn context about a particular organization and thus make highly relevant predictions. For enterprises, this AI can prove to be instrumental when applied correctly for essential use cases such as:
Threat Intelligence Analysis: LLM-enabled tools can analyze vast amounts of data to identify patterns and generate actionable threat intelligence reports.
Phishing Detection: By analyzing email content and metadata, AI tools can identify phishing attempts and help mitigate email-based threats.
Security Chatbots: Intelligent chatbots for security and technical teams to answer queries, assist in the incident analysis, and even automate responses to common security incidents, which consume a lot of time as security analysts receive over 10,000 alerts daily.
AI will play a crucial role in workflow and process automation —especially in enterprises with vast data streams and many endpoints.
Content Creation and Writing Assistance
Article and Blog Writing: ChatGPT and other products built on LLMs have gained much traction for these use cases. The tools can assist amateur or professional writers by generating drafts, suggesting content ideas, or even writing entire blog posts from scratch.
Copywriting: This use case has been a viral, runaway success. Copywriting is purely text-based; thus, LLMs can perform all the underlying tasks quite well, sometimes better than their human counterparts. Some tools advancing this application are Jasper, CopyAI, etc.
Natural Language Understanding
Sentiment Analysis: Apart from writing text from given prompts, these LLMs are also quite good at understanding the sentiment behind textual data, such as emotions, quality, tone, etc. This has been of great help in improving my writing to be more objective without making it too personal.
Text Classification and Tagging: Natively or through plug-ins, specifically ChatGPT, has proved useful for data analysis and segmentation. They can automatically categorize and tag text data, which helps organize content, spam detection, and more.
Education and Learning
AI-Assisted Learning: Google recently announced its AI-learning tool called Project Tailwind; while this is taking an entirely new approach to AI-assisted learning, similar objectives can be achieved using AI as a research companion that digs into a report or a topic to help people learn better in less time.
Further, AI has already been implemented extensively in linguistics to help people learn new languages in a personalized manner.
Code Review and Analysis: GitHub’s Co-Pilot, a product that supports developers to write, review, and analyze code, is an excellent example of an “AI Dev Companion.” Co-Pilot and similar tools can assist human developers by identifying potential bugs and security vulnerabilities and suggesting improvements. They can also provide natural language explanations of complex code.
Automated Documentation Generation: Generating documentation is an important yet time-consuming task. Since LLM tools (even ChatGPT) can understand code, they can automate this process by generating relevant documentation based on code and comments. We do suggest a human review of the document before sharing it anywhere. 😉
The Future of LLMs and AI
Biases: It’s quite easy for an AI system to develop biases. An LLM is a computation system holding a barrage of data, and depending on the presumptions and assumptions of the underlying data, the model itself can develop certain characteristics which affect its answering capabilities through biases. For example, GPT-3 was trained to be largely apolitical and avoided responding to controversial political statements. Similarly, LLMs can also pick up on stereotypes and social conventions.
Lack of Common Sense Reasoning: Although both GPT-4 and Bard are extremely capable, they still lack common sense reasoning skills limits their capabilities and, in some extreme cases, could lead to them lying. For example, the early versions of Bard are known to have been “worse than a pathological liar.”
Interpretability and Transparency: OpenAI (recently) and Google are for-profit organizations with no responsibility to share the logic that powers PaLM or GPT. This makes the LLMs a “black box” for the general user base. There are unique examples, such as HuggingFace, which builds language models through open-source contributions.
Resource Intensive: Training LLMs requires enormous computational resources, which has implications for the environment and limits access to this technology to organizations with significant financial resources.
Misinformation and Deepfakes: The ability of LLMs to generate human-like text (and images) can be exploited for creating misinformation or “deep fake” content, which can be difficult to distinguish from human-written content.
Data Privacy: ChatGPT was recently in the news for getting banned in Italy due to data privacy breaches. It comes as no surprise, though, as these LLMs are trained on publicly available data —copyrighted or personal user data— there will always be instances of data privacy issues.
Common Sense Reasoning: In their current state, LLMs can often seem stupid in their reasoning because they lack common sense. Infusing future AI with common sense reasoning will bring new applications.
Continual and Lifelong Learning: Allowing LLMs to learn and adapt to new data without extensive retraining could be a game-changing advancement in AI. For example, ChatGPT currently carries information until 2021 and is rendered useless if the user seeks information beyond that time horizon.
It is no object that AI will play a crucial role in disruption, irrespective of the industry and skill set. It is unlikely that AI will replace jobs altogether, but even in its current state AI is augmenting and enhancing existing jobs. The next decade will be the decade of AI, and it will play a critical role in driving growth across industries.
As we look towards the future, AI and LLMs are not just tools but our partners in innovation. With a future focus on integrating common sense reasoning and a greater emphasis on ethical considerations, future LLMs would reshape how AI-Human Interaction looks.
In a forever-digital world, AI holds the potential to break down linguistic barriers, democratize knowledge, and catalyze skill-based work across domains.