Getting started with transformer models
The AutoModel class is a tool used to create a model from a pre-existing checkpoint. This class is essentially a straightforward wrapper over a range of models within the library. Its intelligent design allows it to automatically identify the best model architecture for the given checkpoint, and create a model based on that architecture.
Creating a Transformer
You can also use the class from the transformer architecture directly instead of using the Automodel wrapper class.
Initialize the model
Initialize BERT model by loading config object
1
2
3
4
5
6
7
from transformers import BertConfig, BertModel
# Building the config
config = BertConfig()
# Building the model from the config
model = BertModel(config)
The configuration for BERT contains the following:
1
2
3
4
5
6
7
8
9
BertConfig {
[...]
"hidden_size": 768,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
[...]
}
Load the model
A. Using default configuration
Creating a model from the default configuration initializes it with random values:
1
2
3
4
5
6
from transformers import BertConfig, BertModel
config = BertConfig()
model = BertModel(config)
# Model is randomly initialized!
The model can be utilized now, but it needs to be trained first because it will generate gibberish output. We could start from scratch and train the model for the given task, but this would take a lot of time and data. So, It’s crucial to be able to exchange and reuse learned models in order to prevent needless and redundant work.
B. Loading pre-trained model
Loading a pretrained model using the from_pretrained()
method:
1
2
3
from transformers import BertModel
model = BertModel.from_pretrained("bert-base-cased")
You can also replace BertModel
with the equivalent AutoModel
class to write checkpoint-agnostic code.
Now, all of the checkpoint’s weights are used to initialize this model. It can be adjusted on a new task as well as utilized directly for inference on the tasks it was trained on. We can quickly see positive outcomes by using pre-trained weights during exercise rather than starting from blank.
Save the model
The models can be saved using save_pretrained()
method as shown below:
1
model.save_pretrained("directory_on_my_computer")
Two files will be saved. The config.json
file will contain the attributes necessary to build the model architecture whereas the pytorch_model.bin
file will contain the model’s weights.
Making predictions using transformer
Firstly, encode the input using tokenizer.
1
2
3
import torch
model_inputs = torch.tensor(encoded_sequences)
Then pass the encoded input to the model for making prediction/inference
1
output = model(model_inputs)