Home Transformers
Post
Cancel

Transformers

Transformers are language models

All transformer models are language models trained on large amounts of raw text in a self-supervised fashion Not very useful for specific practical tasks => Transfer learning where the pre-trained model is fine-tuned in a supervised way Causal language modeling: Predicting next word

Masked language modeling: predict masked word in a sentence

Transfer Learning

Pretraining is the act of training a model from scratch: the weights are randomly initialized, and the training starts without any prior knowledge.

Fine-tuning, on the other hand, is the training done after a model has been pretrained. The fine-tuning will only require a limited amount of data: the knowledge the pretrained model has acquired is “transferred,” hence the term transfer learning.

General architecture

Encoder (left): The encoder receives an input and builds a representation of it (its features). This means that the model is optimized to acquire understanding from the input. Decoder (right): The decoder uses the encoder’s representation (features) along with other inputs to generate a target sequence. This means that the model is optimized for generating outputs.

Encoder-only models: Good for tasks that require understanding of the input, such as sentence classification and named entity recognition. Decoder-only models: Good for generative tasks such as text generation. Encoder-decoder models or sequence-to-sequence models: Good for generative tasks that require an input, such as translation or summarization.

The original architecture

Task: Write an article in your own words on Transformer models

This post is licensed under CC BY 4.0 by the author.
Disclaimer
The posts on this site are my own and don't necessarily represent my employer IBM's positions, strategies or opinions.
Contents