Lesson 6.1: Basics of Large Language Model

What is a Large Language Model (LLM)?

A Large Language Model (LLM) is an advanced artificial intelligence (AI) system designed to process and generate human-like text. It is trained on vast amounts of text data to understand and produce natural language, enabling it to perform a wide range of language-related tasks.

Key Features of Large Language Models:

Trained on a Large Amount of Text Data
- LLMs learn from massive datasets, including books, articles, websites, and other textual sources.
- This training allows them to generate coherent and contextually relevant text or understand the meaning of language.
Handles Various Natural Language Tasks
- LLMs can perform multiple language-related tasks, such as:
  - Text classification (e.g., sentiment analysis, topic labeling)
  - Question answering (providing answers based on given context)
  - Dialogue systems (chatbots, virtual assistants)
- Their versatility makes them a crucial step toward achieving Artificial General Intelligence (AGI).
Uses "Prompting" to Generate Outputs
- Users interact with LLMs by providing natural language instructions (prompts).
- The model processes these prompts to generate specific outputs, such as summaries, translations, or creative writing.

Examples of LLMs

GPT (Generative Pre-trained Transformer) – Developed by OpenAI (e.g., ChatGPT)
BERT (Bidirectional Encoder Representations from Transformers) – Developed by Google
Llama – Open-weight models by Meta

Techniques used to train LLMs

1. Autoregressive Language Modeling (AR)

Autoregressive models predict the next word in a sequence based on previous words, generating text left-to-right (or sometimes right-to-left).

How It Works

Given a sequence of words (x₁, x₂, ..., xₜ₋₁), the model predicts the next word xₜ.
Training maximizes the likelihood of the correct next word.
Used in GPT (Generative Pre-trained Transformer) models.

2. Masked Language Modeling (MLM)

Masked language models predict missing (masked) words in a sentence by looking at both left and right context.

How It Works

Random words in a sentence are replaced with a [MASK] token.
The model predicts the masked word using bidirectional context.
Used in BERT (Bidirectional Encoder Representations from Transformers).

Example

Original: "The cat sat on the mat."
Masked: "The [MASK] sat on the mat."
Model predicts: "cat"

Zero-Shot Learning: From Supervised to Self-Supervised Learning

1. Traditional Supervised Learning (Old Way)

Models are trained on labeled datasets where each input (e.g., an image or text) has a corresponding output label.

How it works:

Requires a large amount of manually labeled data.
Learns a mapping from inputs to predefined categories.
Example: A cat vs. dog classifier trained on thousands of labeled images.

Limitations:

Needs explicit labels for every task.
Struggles with unseen categories (cannot classify a "zebra" if only trained on cats/dogs).
Expensive and time-consuming to collect labeled data.

Self-Supervised Learning (New Way)

Models learn from unlabeled data by creating their own "supervision" from the input structure.

How it works:

Uses pretext tasks (e.g., predicting missing words in a sentence or rotating images to correct orientation).
The model learns general representations without human-labeled data.
Example: BERT (masked language modeling) or GPT (predicting next word).
Advantages:
- Reduces dependency on labeled data.
- Learns universal representations transferable to multiple tasks.
- Scales well with large datasets (e.g., training on all of Wikipedia).

Zero-Shot Learning (Enabled by Self-Supervision)

Zero-shot learning (ZSL) is a technique that allows models to do this, so they can identify and classify new concepts without any labeled examples during training and handle tasks they weren't specifically trained for.

Suppose a model is trained to recognize animals but has never been taught about zebras in its training phase. With ZSL, the model could still figure out what a zebra is by using a description. But how? Allow me to simplify things a bit (we’ll get into more technical details in a bit) and let me explain:

The model knows about animals and their features, like "has four legs," "lives in the savanna," or "has stripes."
It’s given a new description: "A horse-like animal with black and white stripes living in African grasslands."
Using its understanding of animal attributes and the provided description, the model can deduce that the image likely represents a zebra, even though it has never seen one before.
The model makes this inference by connecting the dots between known animal characteristics and the new description.

How it works:

Leverages pre-trained knowledge from self-supervised learning.
Uses natural language prompts (instead of labeled data) to guide predictions.
Example: Asking an LLM "Is this movie review positive or negative?" without fine-tuning on sentiment data.
Key Techniques:
- Prompt Engineering: Framing tasks as natural language questions (e.g., "Translate 'hello' to French: ___").
- Semantic Embeddings: Mapping inputs and labels to a shared space (e.g., matching "cat" to an image of a cat using CLIP).
Example Models:
- GPT-3 / ChatGPT (performs tasks via prompts).
- CLIP (classifies images using text descriptions).

3 Level of using LLM

1. Prompt Engineering (No Coding Needed)

Use natural language instructions ("prompts") to guide LLMs (e.g., ChatGPT, Gemini).

Adjust prompts for better outputs (e.g., "Explain like I’m 5").
Use techniques like few-shot examples or chain-of-thought prompting.
Tools: ChatGPT, Claude, OpenAI Playground.
Best for: Quick tasks (Q&A, drafting emails).

2. Model Fine-Tuning (Moderate Technical Skill)

Customize a pre-trained LLM (e.g., GPT-3.5, LLaMA) with your data.

Train on domain-specific data (e.g., medical reports, legal contracts).
Requires labeled datasets and cloud GPUs.
Tools: OpenAI API, Hugging Face Transformers, LoRA.
Best for: Specialized tasks (customer support bots, niche content generation).

3. Build Your Own LLM (Advanced/Research)

Train an LLM from scratch (e.g., like GPT or Gemini).

Collect massive datasets (e.g., internet text).
Use frameworks like PyTorch/TensorFlow and thousands of GPUs/TPUs.
Tools: NVIDIA Megatron, Meta’s LLaMA codebase.
Best for: Companies/researchers pushing AI boundaries (e.g., OpenAI, Google DeepMind).