Pipeline functions
Let’s see what happens when we use the sentiment analysis using the Pipeline function.
1
2
3
4
5
6
7
8
9
| from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier(
[
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
]
)
|
Stages of Pipeline function
There are three stages in a pipeline function: Tokenizer, Model and Post Processing
Tokenizer Stage
- Text is split into tokens.
- Tokenizers will add some special tokens: [CLS] and [SEP]
- Tokenizer matches each token with the unique ID in the vocab of the pre-trained model.
AutoTokenizer
method in HF is used here.
1
2
3
4
| from transformers import AutoTokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
- Tokenizer can add padding and truncation to create tensors of same length.
1
2
3
4
5
6
| raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)
|
Model stage:
- Download the configuration of the model as well as the pre-trained weights of the models
- The
AutoModel
class loads a model without its pretraining head which means it will return a high dimensional tensor that is representation of sentences but not directly helpful in classifcation task.
1
2
3
4
| from transformers import AutoModel
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)
|
1
2
| outputs = model(**inputs)
print(outputs.last_hidden_state.shape)
|
- Use
AutoModelForSequenceClassification
for the classification task. This returns the logits.
1
2
3
4
5
| from transformers import AutoModelForSequenceClassification
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
|
Postprocessing stage:
- Apply softmax layer to transform logits into probabilities
1
2
3
4
| import torch
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
|
- Use
id2label
method for converting logits into labels
Reference
Hugging Face