Lesson 6.4.2 : LoRA & QLoRA


LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) are parameter-efficient fine-tuning (PEFT) methods that leverage low-rank matrix decomposition to adapt large language models (LLMs) with minimal compute. Here’s a deep dive into their mechanics and advantages.

1. Core Mathematical Idea: Low-Rank Adaptation

Problem with Full Fine-Tuning

  • A pretrained LLM has weight matrices WRd×kW ∈ {R}^{d \times k}
  • Full fine-tuning updates all d×kd \times k parameters → computationally expensive.

LoRA

  • Freeze the original weights WW
  • Introduce low-rank update matrices AA and BB.
    • Wupdated=W+ΔW=W+BAW_{\text{updated}} = W + \Delta W = W + BA
  • Where:
    • WRd×kW \in \mathbb{R}^{d \times k}: Original frozen weight matrix
    • BRd×rB \in \mathbb{R}^{d \times r}: Low-rank matrix (trainable)
    • ARr×kA \in \mathbb{R}^{r \times k}: Low-rank matrix (trainable)
    • rr: Rank hyperparameter (rmin(d,k))(r \ll \min(d,k))
  • Reduces memory footprint: LoRA achieves this by applying a low-rank approximation to the weight update matrix (ΔW). This means it represents ΔW as the product of two smaller matrices, significantly reducing the number of parameters needed to store ΔW.
  • Fast fine-tuning: LoRA offers fast training times compared to traditional fine-tuning methods due to its reduced parameter footprint.
  • Maintains performance: LoRA has been shown to maintain performance close to traditional fine-tuning methods in several tasks.

QLoRA:

  • Enhances parameter efficiency: QLoRA takes LoRA a step further by also quantizing the weights of the LoRA adapters (smaller matrices) to lower precision (e.g., 4-bit instead of 8-bit). This further reduces the memory footprint and storage requirements.
  • More memory efficient: QLoRA is even more memory efficient than LoRA, making it ideal for resource-constrained environments.
  • Similar effectiveness: QLoRA has been shown to maintain similar effectiveness to LoRA in terms of performance, while offering significant memory advantages.
  • WQLoRA=Dequantize(W4bit)+BAW_{\text{QLoRA}} = \text{Dequantize}(W_{4\text{bit}}) + BA
All systems normal

© 2025 2023 Sanjeeb KC. All rights reserved.