Lesson 6.3: LLM Prompting

LLM Prompting Refers to techniques that use structured input instructions (prompts) to guide large language models (LLMS) in generating desired outputs. By leveraging pretrained knowledge, prompts enable tasks like translation, reasoning, and classification.

LLM Prompting Methods

Large Language Models (LLMs) can be guided to perform tasks using different prompting techniques, each suited for varying levels of complexity and available examples.

1. Zero-Shot Prompting:

In zero-shot prompting, the model is given a task without any prior examples, relying solely on its pre-trained knowledge. The prompt is a direct instruction, and the model generates a response based on general understanding.

Example:
- Prompt: "Translate 'Hello' to French."
- Output: "Bonjour." Here, the model wasn’t shown any translation examples but still produces the correct output.

2. Few-Shot Prompting:

Few-shot prompting provides the model with a small number of examples (typically 2–5) to help it understand the task before asking for the desired output. This improves accuracy for niche or complex tasks.

Example:
- Prompt: "Example 1: 'The movie was great!' → Positive
- Example 2: 'I hated the book.' → Negative
- Now classify: 'The concert was amazing!'"
- Output: "Positive." The examples help the model recognize the sentiment analysis pattern.

3. Chain-of-Thought (CoT) Prompting

Chain-of-Thought prompting encourages the model to break down problems step-by-step, mimicking human reasoning. This is especially useful for math, logic, or multi-step reasoning tasks.

Example:
- Prompt: "If a pizza is cut into 8 slices and John eats 3, how many are left? Show your reasoning."
- Output: "Total slices = 8. Slices eaten = 3. Remaining slices = 8 - 3 = 5." By explicitly requesting reasoning, the model provides a transparent and accurate solution.

4. Self-Consistency Prompting

Self-consistency improves upon Chain-of-Thought (CoT) by generating multiple reasoning paths and selecting the most consistent answer. Instead of relying on a single output, the model produces several reasoning chains (e.g., via sampling) and picks the answer that appears most frequently. This reduces errors from fluke responses.

Example (Math Problem):
- Prompt: "A store has 15 apples. If 3 are sold and 5 more are delivered, how many apples are there? Show step-by-step reasoning."
- Model Generates 3 Reasoning Paths:
  - "15 - 3 = 12; 12 + 5 = 17."
  - "15 + 5 = 20; 20 - 3 = 17."
  - "15 - 3 + 5 = 17."
  - Final Answer: "17" (chosen by majority vote).
- Use Case: Math, logic puzzles, or any task where multiple valid reasoning paths exist.

5. Generated Knowledge Prompting

This method separates knowledge retrieval from reasoning. First, the LLM generates relevant facts or context about the problem, then uses that knowledge to answer the question. It’s useful for tasks requiring external knowledge not explicitly in the prompt.

Example (Fact-Based QA):
- Step 1: Knowledge Generation
  - Prompt: "Generate facts about the causes of the French Revolution."
- Model Outputs:
  - "Economic inequality and high taxes."
  - "Influence of Enlightenment ideas."
  - "Poor harvests and food shortages."
- Step 2: Answer Synthesis
  - Prompt: "Using the above facts, explain why the French Revolution began."
  - Final Answer: "The French Revolution began due to economic inequality, Enlightenment ideas challenging monarchy, and food shortages."
- Use Case: Open-domain QA, essay writing, or complex explanations.

6. ReAct Prompting (Reasoning + Action)

ReAct (Reasoning + Action) is a prompting framework that enables Large Language Models (LLMs) to think step-by-step (reasoning) and interact with external tools (action) to solve complex tasks. It mimics human problem-solving by dynamically deciding when to "think" and when to "act" (e.g., search the web, call APIs, or fetch data).

How ReAct Works
- Reasoning (Chain-of-Thought) :
  - The model breaks down the problem into logical steps.
  - Example: "To answer this, I first need to find the population of Tokyo."
- Action (Tool Use)
  - The model triggers external tools (e.g., Google Search, calculators, databases) to gather real-time information.
  - Example: [Search] "Current population of Tokyo 2024"
- Observation (Process Results)
  - The tool returns data (e.g., "Tokyo population: 37.8 million"), which the model uses to refine its reasoning.
- Final Answer
  - Combines reasoning + observed data into a coherent response.
Example: ReAct in Action
- Task: "Who is older: Elon Musk or the CEO of Microsoft?"
- ReAct Steps:
  - Reasoning: "To compare ages, I need Elon Musk's age and the current Microsoft CEO's age."
  - Action: [Search] "Elon Musk age" → "Born June 28, 1971 (age 52)"
  - Action: [Search] "Current Microsoft CEO" → "Satya Nadella, born August 19, 1967 (age 56)"
  - Observation: "Satya Nadella (56) is older than Elon Musk (52)."
  - Answer: "Satya Nadella is older."

7. Tree of Thought (ToT) Prompting:

Tree of Thought (ToT) is an advanced prompting framework that enables LLMs to explore multiple reasoning paths in parallel—like branches of a tree—before selecting the best solution. Unlike Chain-of-Thought (linear reasoning) or ReAct (sequential actions), ToT mimics human brainstorming by:

Generating diverse hypotheses (branches).
Evaluating intermediate steps (e.g., via scoring or backtracking).
Pruning weak paths to focus on optimal solutions.
How ToT Works
- Step 1: Thought Generation
  - The model creates multiple possible approaches to a problem.
  - Example Task*: "Plan a 3-day trip to Paris for a history buff."*
  - Branches:
    - Day 1: Louvre → Day 2: Versailles → Day 3: Catacombs.
    - Day 1: Musée d’Orsay → Day 2: Latin Quarter → Day 3: Arc de Triomphe.
- Step 2: Thought Evaluation
  - Each branch is scored for feasibility, interest, or constraints (e.g., travel time).
    - Branch 1: "Versailles is far from central Paris; may waste time."
    - Branch 2: "Latin Quarter is walkable; higher overall score."
- Step 3: Search Algorithm
  - The model uses breadth-first or depth-first search to:
  - Expand promising branches (e.g., add detailed activities).
  - Prune weak ones (e.g., drop inefficient routes).
- Step 4: Final Output
  - The best path is synthesized into a coherent answer:
  - "Day 1: Musée d’Orsay → Day 2: Latin Quarter (bookstores + Panthéon) → Day 3: Arc de Triomphe + WWII museums."

LLM Prompting Comparison

Method	Strengths	Weaknesses	Use Cases
Zero-Shot	Fast deployment, no data needed	Struggles with complexity	Simple Q&A, translation
Few-Shot	Clarifies task formats	Bias from limited examples	Domain-specific classification
CoT	Transparent reasoning, higher accuracy	Computationally intensive	Math, logic puzzles
Self-Consistency	Reduces random errors	Slower due to multiple paths	Scientific calculations
Multimodal CoT	Integrates visual/text data	Requires multimodal training data	Chart analysis, VQA

Challenges of LLM Prompting & Their Solutions

1. Long Prompt Complexity

Problem: Overly lengthy or complex prompts confuse the model, leading to incoherent or incomplete responses.
Solution: Hierarchical Prompting
- Break tasks into smaller subtasks with clear intermediate steps.
- Example:
  - Instead of*: "Write a detailed 500-word article about climate change impacts on agriculture, including case studies."
  - Use:
    - "List 3 key impacts of climate change on agriculture."
    - "For each impact, provide a real-world case study."
    - "Combine the above into a 500-word article."

2. Ambiguity in Prompts

Problem: Vague prompts result in off-target outputs (e.g., "Explain AI" could yield oversimplified or irrelevant answers).
Solution: Reinforcement Learning (RLHF)
- Fine-tune the model using human feedback to align outputs with intent.
Example:
- Humans rank responses to "Explain AI to a 5-year-old" vs. "Explain AI to a CEO."
- The model learns to adjust depth/tone based on implicit cues.

3. Overfitting to Prompts

Problem: Models memorize narrow patterns from training data, failing to generalize.
Solution: Prompt Diversification
- Train with diverse, domain-mixed examples to improve adaptability.
Example:
- For a medical QA model, mix prompts like:
- "What causes fever?" (General)
- "Describe malaria pathogenesis." (Specialized)
- "Explain fever to a child." (Simplified)

4. Ethical Bias

Problem: Models amplify stereotypes (e.g., gender/racial biases in hiring-related prompts).
Solution: Debiasing Techniques
- Apply fairness constraints (e.g., demographic parity) during training/inference.
Example:
- For "Generate a list of scientists," the model is penalized if outputs skew >70% toward one gender/race.

5. Hallucinations (Fabrications)

Problem: Models generate false or unverified claims (e.g., fake citations).
Solution: Retrieval-Augmented Generation (RAG) + Fact-Checking
RAG: Fetch relevant documents (e.g., Wikipedia) to ground responses.
Fact-Checking: Cross-reference outputs against trusted sources.
Example:
- Prompt: "When was the first moon landing?"
- RAG retrieves NASA’s "July 20, 1969" record.
- Model cites this instead of guessing.