Lesson 6.1: Basics of Large Language Model
What is a Large Language Model (LLM)?
A Large Language Model (LLM) is an advanced artificial intelligence (AI) system designed to process and generate human-like text. It is trained on vast amounts of text data to understand and produce natural language, enabling it to perform a wide range of language-related tasks.
Key Features of Large Language Models:
- Trained on a Large Amount of Text Data
- LLMs learn from massive datasets, including books, articles, websites, and other textual sources.
- This training allows them to generate coherent and contextually relevant text or understand the meaning of language.
- Handles Various Natural Language Tasks
- LLMs can perform multiple language-related tasks, such as:
- Text classification (e.g., sentiment analysis, topic labeling)
- Question answering (providing answers based on given context)
- Dialogue systems (chatbots, virtual assistants)
- Their versatility makes them a crucial step toward achieving Artificial General Intelligence (AGI).
- LLMs can perform multiple language-related tasks, such as:
- Uses "Prompting" to Generate Outputs
- Users interact with LLMs by providing natural language instructions (prompts).
- The model processes these prompts to generate specific outputs, such as summaries, translations, or creative writing.
Examples of LLMs
- GPT (Generative Pre-trained Transformer) – Developed by OpenAI (e.g., ChatGPT)
- BERT (Bidirectional Encoder Representations from Transformers) – Developed by Google
- Llama – Open-weight models by Meta
Techniques used to train LLMs
1. Autoregressive Language Modeling (AR)
Autoregressive models predict the next word in a sequence based on previous words, generating text left-to-right (or sometimes right-to-left).
How It Works
- Given a sequence of words (x₁, x₂, ..., xₜ₋₁), the model predicts the next word xₜ.
- Training maximizes the likelihood of the correct next word.
- Used in GPT (Generative Pre-trained Transformer) models.
2. Masked Language Modeling (MLM)
Masked language models predict missing (masked) words in a sentence by looking at both left and right context.
How It Works
- Random words in a sentence are replaced with a [MASK] token.
- The model predicts the masked word using bidirectional context.
- Used in BERT (Bidirectional Encoder Representations from Transformers).
Example
- Original: "The cat sat on the mat."
- Masked: "The [MASK] sat on the mat."
- Model predicts: "cat"
Zero-Shot Learning: From Supervised to Self-Supervised Learning
1. Traditional Supervised Learning (Old Way)
Models are trained on labeled datasets where each input (e.g., an image or text) has a corresponding output label.
How it works:
- Requires a large amount of manually labeled data.
- Learns a mapping from inputs to predefined categories.
- Example: A cat vs. dog classifier trained on thousands of labeled images.
Limitations:
- Needs explicit labels for every task.
- Struggles with unseen categories (cannot classify a "zebra" if only trained on cats/dogs).
- Expensive and time-consuming to collect labeled data.
Self-Supervised Learning (New Way)
Models learn from unlabeled data by creating their own "supervision" from the input structure.
How it works:
- Uses pretext tasks (e.g., predicting missing words in a sentence or rotating images to correct orientation).
- The model learns general representations without human-labeled data.
- Example: BERT (masked language modeling) or GPT (predicting next word).
- Advantages:
- Reduces dependency on labeled data.
- Learns universal representations transferable to multiple tasks.
- Scales well with large datasets (e.g., training on all of Wikipedia).
Zero-Shot Learning (Enabled by Self-Supervision)
Zero-shot learning (ZSL) is a technique that allows models to do this, so they can identify and classify new concepts without any labeled examples during training and handle tasks they weren't specifically trained for.
Suppose a model is trained to recognize animals but has never been taught about zebras in its training phase. With ZSL, the model could still figure out what a zebra is by using a description. But how? Allow me to simplify things a bit (we’ll get into more technical details in a bit) and let me explain:
- The model knows about animals and their features, like "has four legs," "lives in the savanna," or "has stripes."
- It’s given a new description: "A horse-like animal with black and white stripes living in African grasslands."
- Using its understanding of animal attributes and the provided description, the model can deduce that the image likely represents a zebra, even though it has never seen one before.
- The model makes this inference by connecting the dots between known animal characteristics and the new description.
How it works:
- Leverages pre-trained knowledge from self-supervised learning.
- Uses natural language prompts (instead of labeled data) to guide predictions.
- Example: Asking an LLM "Is this movie review positive or negative?" without fine-tuning on sentiment data.
- Key Techniques:
- Prompt Engineering: Framing tasks as natural language questions (e.g., "Translate 'hello' to French: ___").
- Semantic Embeddings: Mapping inputs and labels to a shared space (e.g., matching "cat" to an image of a cat using CLIP).
- Example Models:
- GPT-3 / ChatGPT (performs tasks via prompts).
- CLIP (classifies images using text descriptions).
3 Level of using LLM
1. Prompt Engineering (No Coding Needed)
Use natural language instructions ("prompts") to guide LLMs (e.g., ChatGPT, Gemini).
- Adjust prompts for better outputs (e.g., "Explain like I’m 5").
- Use techniques like few-shot examples or chain-of-thought prompting.
- Tools: ChatGPT, Claude, OpenAI Playground.
- Best for: Quick tasks (Q&A, drafting emails).
2. Model Fine-Tuning (Moderate Technical Skill)
Customize a pre-trained LLM (e.g., GPT-3.5, LLaMA) with your data.
- Train on domain-specific data (e.g., medical reports, legal contracts).
- Requires labeled datasets and cloud GPUs.
- Tools: OpenAI API, Hugging Face Transformers, LoRA.
- Best for: Specialized tasks (customer support bots, niche content generation).
3. Build Your Own LLM (Advanced/Research)
Train an LLM from scratch (e.g., like GPT or Gemini).
- Collect massive datasets (e.g., internet text).
- Use frameworks like PyTorch/TensorFlow and thousands of GPUs/TPUs.
- Tools: NVIDIA Megatron, Meta’s LLaMA codebase.
- Best for: Companies/researchers pushing AI boundaries (e.g., OpenAI, Google DeepMind).