Lesson 3.4: Pretrain Language Model - BERT
Pre-trained Model
A pre-trained model is a deep learning model (like BERT) that has already been trained on a large corpus of unlabeled text data using self-supervised learning. This means the model learns general language representations by predicting masked words (in masked language modeling) or next sentences (in next sentence prediction) without human-labeled data.
- Example: BERT is pre-trained on Wikipedia and BookCorpus.
- Purpose: The model captures general language patterns (syntax, semantics, context).
- Advantage: Saves time and resources compared to training from scratch.
Fine-tuned Model
A fine-tuned model takes a pre-trained model and further trains it on a specific downstream task (like text classification) using supervised learning (labeled data).
- Process:
- Start with the pre-trained weights (e.g., BERT’s weights).
- Add a task-specific layer (e.g., a classifier head for sentiment analysis).
- Train the entire model (or parts) on labeled data for the target task.
- Example: Fine-tuning BERT on IMDb movie reviews for sentiment classification.
- Why?: The model adapts its general knowledge to the specifics of the task, improving performance with less data.