Lesson 3.4: Pretrain Language Model - BERT


Pre-trained Model

A pre-trained model is a deep learning model (like BERT) that has already been trained on a large corpus of unlabeled text data using self-supervised learning. This means the model learns general language representations by predicting masked words (in masked language modeling) or next sentences (in next sentence prediction) without human-labeled data.

  • Example: BERT is pre-trained on Wikipedia and BookCorpus.
  • Purpose: The model captures general language patterns (syntax, semantics, context).
  • Advantage: Saves time and resources compared to training from scratch.

Fine-tuned Model

A fine-tuned model takes a pre-trained model and further trains it on a specific downstream task (like text classification) using supervised learning (labeled data).

  • Process:
    • Start with the pre-trained weights (e.g., BERT’s weights).
    • Add a task-specific layer (e.g., a classifier head for sentiment analysis).
    • Train the entire model (or parts) on labeled data for the target task.
  • Example: Fine-tuning BERT on IMDb movie reviews for sentiment classification.
  • Why?: The model adapts its general knowledge to the specifics of the task, improving performance with less data.
All systems normal

© 2025 2023 Sanjeeb KC. All rights reserved.