Lesson 6.1: Basics of Large Language Model


What is a Large Language Model (LLM)?

A Large Language Model (LLM) is an advanced artificial intelligence (AI) system designed to process and generate human-like text. It is trained on vast amounts of text data to understand and produce natural language, enabling it to perform a wide range of language-related tasks.

Key Features of Large Language Models:

  • Trained on a Large Amount of Text Data
    • LLMs learn from massive datasets, including books, articles, websites, and other textual sources.
    • This training allows them to generate coherent and contextually relevant text or understand the meaning of language.
  • Handles Various Natural Language Tasks
    • LLMs can perform multiple language-related tasks, such as:
      • Text classification (e.g., sentiment analysis, topic labeling)
      • Question answering (providing answers based on given context)
      • Dialogue systems (chatbots, virtual assistants)
    • Their versatility makes them a crucial step toward achieving Artificial General Intelligence (AGI).
  • Uses "Prompting" to Generate Outputs
    • Users interact with LLMs by providing natural language instructions (prompts).
    • The model processes these prompts to generate specific outputs, such as summaries, translations, or creative writing.

Examples of LLMs

  • GPT (Generative Pre-trained Transformer) – Developed by OpenAI (e.g., ChatGPT)
  • BERT (Bidirectional Encoder Representations from Transformers) – Developed by Google
  • Llama – Open-weight models by Meta

Techniques used to train LLMs

1. Autoregressive Language Modeling (AR)

Autoregressive models predict the next word in a sequence based on previous words, generating text left-to-right (or sometimes right-to-left).

How It Works

  • Given a sequence of words (x₁, x₂, ..., xₜ₋₁), the model predicts the next word xₜ.
  • Training maximizes the likelihood of the correct next word.
  • Used in GPT (Generative Pre-trained Transformer) models.

2. Masked Language Modeling (MLM)

Masked language models predict missing (masked) words in a sentence by looking at both left and right context.

How It Works

  • Random words in a sentence are replaced with a [MASK] token.
  • The model predicts the masked word using bidirectional context.
  • Used in BERT (Bidirectional Encoder Representations from Transformers).

Example

  • Original: "The cat sat on the mat."
  • Masked: "The [MASK] sat on the mat."
  • Model predicts: "cat"

Zero-Shot Learning: From Supervised to Self-Supervised Learning

1. Traditional Supervised Learning (Old Way)

Models are trained on labeled datasets where each input (e.g., an image or text) has a corresponding output label.

How it works:

  • Requires a large amount of manually labeled data.
  • Learns a mapping from inputs to predefined categories.
  • Example: A cat vs. dog classifier trained on thousands of labeled images.

Limitations:

  • Needs explicit labels for every task.
  • Struggles with unseen categories (cannot classify a "zebra" if only trained on cats/dogs).
  • Expensive and time-consuming to collect labeled data.

Self-Supervised Learning (New Way)

Models learn from unlabeled data by creating their own "supervision" from the input structure.

How it works:

  • Uses pretext tasks (e.g., predicting missing words in a sentence or rotating images to correct orientation).
  • The model learns general representations without human-labeled data.
  • Example: BERT (masked language modeling) or GPT (predicting next word).
  • Advantages:
    • Reduces dependency on labeled data.
    • Learns universal representations transferable to multiple tasks.
    • Scales well with large datasets (e.g., training on all of Wikipedia).

Zero-Shot Learning (Enabled by Self-Supervision)

Zero-shot learning (ZSL) is a technique that allows models to do this, so they can identify and classify new concepts without any labeled examples during training and handle tasks they weren't specifically trained for.

Suppose a model is trained to recognize animals but has never been taught about zebras in its training phase. With ZSL, the model could still figure out what a zebra is by using a description. But how? Allow me to simplify things a bit (we’ll get into more technical details in a bit) and let me explain:

  • The model knows about animals and their features, like "has four legs," "lives in the savanna," or "has stripes."
  • It’s given a new description: "A horse-like animal with black and white stripes living in African grasslands."
  • Using its understanding of animal attributes and the provided description, the model can deduce that the image likely represents a zebra, even though it has never seen one before.
  • The model makes this inference by connecting the dots between known animal characteristics and the new description.

How it works:

  • Leverages pre-trained knowledge from self-supervised learning.
  • Uses natural language prompts (instead of labeled data) to guide predictions.
  • Example: Asking an LLM "Is this movie review positive or negative?" without fine-tuning on sentiment data.
  • Key Techniques:
    • Prompt Engineering: Framing tasks as natural language questions (e.g., "Translate 'hello' to French: ___").
    • Semantic Embeddings: Mapping inputs and labels to a shared space (e.g., matching "cat" to an image of a cat using CLIP).
  • Example Models:
    • GPT-3 / ChatGPT (performs tasks via prompts).
    • CLIP (classifies images using text descriptions).

3 Level of using LLM

1. Prompt Engineering (No Coding Needed)

Use natural language instructions ("prompts") to guide LLMs (e.g., ChatGPT, Gemini).

  • Adjust prompts for better outputs (e.g., "Explain like I’m 5").
  • Use techniques like few-shot examples or chain-of-thought prompting.
  • Tools: ChatGPT, Claude, OpenAI Playground.
  • Best for: Quick tasks (Q&A, drafting emails).

2. Model Fine-Tuning (Moderate Technical Skill)

Customize a pre-trained LLM (e.g., GPT-3.5, LLaMA) with your data.

  • Train on domain-specific data (e.g., medical reports, legal contracts).
  • Requires labeled datasets and cloud GPUs.
  • Tools: OpenAI API, Hugging Face Transformers, LoRA.
  • Best for: Specialized tasks (customer support bots, niche content generation).

3. Build Your Own LLM (Advanced/Research)

Train an LLM from scratch (e.g., like GPT or Gemini).

  • Collect massive datasets (e.g., internet text).
  • Use frameworks like PyTorch/TensorFlow and thousands of GPUs/TPUs.
  • Tools: NVIDIA Megatron, Meta’s LLaMA codebase.
  • Best for: Companies/researchers pushing AI boundaries (e.g., OpenAI, Google DeepMind).
All systems normal

© 2025 2023 Sanjeeb KC. All rights reserved.