Hugging Faceのパイプライン

Hugging Faceのパイプラインを利用した深層学習

自然言語処理 NLP(Natural Language Processing)

文章の分類：レビューの評価、スパムメールの検出、文法的に正しいかどうかの判断、2つの文が論理的に関連しているかどうかの判断
文の中の単語分類：品詞（名詞、動詞、形容詞）や、固有表現（人、場所、組織）の識別
文章内容の生成：自動生成されたテキストによる入力テキストの補完、文章の穴埋め
文章からの情報抽出：質問と文脈が与えられたときの、文脈からの情報に基づいた質問に対する答えの抽出
文章の変換：ある文章の他の言語への翻訳、文章の要約

Hugging Face https://huggingface.co/ のパイプラインを使って色々なNLPの処理ができる。

sentiment-analysis (感情分析)
zero-shot-classification (ゼロショット分類)
text-generation (文章生成)
fill-mask (空所穴埋め)
ner (named entity recognition) (固有表現認識)
question-answering (質問応答)
summarization (要約)
translation (翻訳)

基本的な使い方は簡単であり、pipelineのtask引数にやりたいことを表す上の文字列を入れて、生成されたインスタンスに文字列を入れるだけである。

from transformers import pipeline

感情分析

与えられた文章が POSITIVEかNEGATIVEかを返す。

classifier = pipeline("sentiment-analysis")
classifier("We are very happy to show you the 🤗 Transformers library.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.

[{'label': 'POSITIVE', 'score': 0.9997795224189758}]

ゼロショット分類

例を示すことなく、与えられた文章を分類する。分類したいラベルのリストを、引数 candidate_labelsで与える。

classifier2 = pipeline("zero-shot-classification")
classifier2(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445950150489807, 0.11197729408740997, 0.0434277318418026]}

文章生成

与えた文章の続きを書く。

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

[{'generated_text': 'In this course, we will teach you how to run a database with Nginx and PHP. We first take a look at how to run PHP and Nginx together. Then we will use an example MySQL database to create a database. In the same'}]

pipelineのモデル引数modelで、使用するモデルを指定することもできる。モデルは、https://huggingface.co/models から適当なものを選択する必要がある。

また、最大トークン数をmax_length、生成する文章の数をnum_return_sequencesで与えることもできる。

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

[{'generated_text': 'In this course, we will teach you how to make mistakes as well as avoid them all because they cost you money, and why it makes good money'},
 {'generated_text': 'In this course, we will teach you how to understand the best, most effective and most effective ways to perform the work of the American people. These'}]

空所穴埋め

与えた文章内の<mask>の部分に単語で埋めて文章にする。引数top_kで埋める単語数を与えることができる。

unmasker = pipeline("fill-mask")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.

unmasker("This course will teach you all about <mask> models.", top_k=2)

[{'score': 0.1961977630853653,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052729532122612,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

固有表現認識

固有表現認識 ner (named entity recognition) とは、文章内の人(PER: persons)、場所（LOC: locations)、組織(ORG: organizations)などを抽出するタスクである。

引数grouped_entitiesをTrueに設定すると固有名詞を結合して出力する。

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.

[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796021,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

質問応答

質問をquestion、文章をcontextで与えることによって、質問の答えと、その単語の開始位置と終了位置を返す。

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.

{'score': 0.6949763894081116, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

要約

文章の要約を返す。

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.

[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

翻訳

翻訳した文章を返す。 pipelineのモデル引数modelに翻訳をするためのモデルを入れる。以下の例では、英語からフランス語への翻訳モデルを指定している。（ドイツ語への翻訳の場合には、translation_en_to_de をtask引数とする。）

translator = pipeline("translation_en_to_fr")
translator("This course is produced by Hugging Face.")

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.

[{'translation_text': 'Ce cours est produit par Hugging Face.'}]

Google Colab.上でモデルを指定して翻訳を行う場合には、以下を実行してsentencepieceをインストールしてから、カーネルをリスタートする必要がある。

 !pip install sentencepiece

以下のコードはHelsinki-NLのモデルを用いて、様々な言語間の翻訳を行う。例として、英語から日本語への翻訳を示す。

def create_translation_pipeline(source_lang, target_lang):
    model_name = f'Helsinki-NLP/opus-mt-{source_lang}-{target_lang}'
    translator = pipeline("translation", model=model_name)
    return translator

def translate_text(translator, text):
    result = translator(text, max_length=500)
    return result[0]['translation_text']

# Example usage:
source_lang_code = "en"  # English
target_lang_code = "jap"  # Japanese

translator = create_translation_pipeline(source_lang_code, target_lang_code)

english_text = "This is a pen."
translated_text = translate_text(translator, english_text)

print(f"{source_lang_code.capitalize()}: {english_text}")
print(f"{target_lang_code.capitalize()}: {translated_text}")

En: This is a pen.
Jap: これ は 筆 で あ る .

仕組みの詳細

pipelineの中身は、以下の処理に分解される。

文字列 => トークナイザー => モデル　=> 後処理

from transformers import AutoTokenizer
from transformers import AutoModel
from pprint import pprint
from transformers import AutoModelForSequenceClassification
import torch

トークナイザー

まず、入力された文字列をトークン（単語や記号など）に分割し、各トークンを整数に置き換える必要がある。これには、AutoTokenizer クラスのfrom_pretrainedメソッドを使用する。引数には、https://huggingface.co/models にあるモデル名 checkpointを与える。

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

pprint(tokenizer)

DistilBertTokenizerFast(name_or_path='distilbert-base-uncased-finetuned-sst-2-english', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True)

生成したトークナーザーtokenizerに文字列（のリスト）を与えると、変換された数値情報を含んだ辞書が生成される。辞書のキーは、どのトークンに注意するかを表すattention_maskと入力を数値に変換した多次元配列を表す input_ids である。

この際、どの深層学習フレームワークを使うかを表すreturn_tensorsを指定する必要がある。ここでは、PyTorchを使うので、引数にptを指定する。

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
pprint(inputs)

{'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]]),
 'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]])}

モデル

続いてモデルクラスのインスタンスを生成する。ここでは、AutoModelクラスのfrom_pretrainedメソッドを使用する。

ここで生成したモデルは、トランスフォーマーの基本部分だけをもち、出力は入力の特徴を抽出した多次元配列（テンソル）である。

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertModel: ['classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

pprint(model)

DistilBertModel(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0-5): 6 x TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias=True)
          (lin2): Linear(in_features=3072, out_features=768, bias=True)
          (activation): GELUActivation()
        )
        (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      )
    )
  )
)

トークナイザーで生成した辞書を展開してモデルに入力すると、PyTorchのテンソルが出力されていることが確認できる。

outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

torch.Size([2, 16, 768])

outputs

BaseModelOutput(last_hidden_state=tensor([[[-0.1798,  0.2333,  0.6321,  ..., -0.3017,  0.5008,  0.1481],
         [ 0.2758,  0.6497,  0.3200,  ..., -0.0760,  0.5136,  0.1329],
         [ 0.9046,  0.0985,  0.2950,  ...,  0.3352, -0.1407, -0.6464],
         ...,
         [ 0.1466,  0.5661,  0.3235,  ..., -0.3376,  0.5100, -0.0561],
         [ 0.7500,  0.0487,  0.1738,  ...,  0.4684,  0.0030, -0.6084],
         [ 0.0519,  0.3729,  0.5223,  ...,  0.3584,  0.6500, -0.3883]],

        [[-0.2937,  0.7283, -0.1497,  ..., -0.1187, -1.0227, -0.0422],
         [-0.2206,  0.9384, -0.0951,  ..., -0.3643, -0.6605,  0.2407],
         [-0.1536,  0.8988, -0.0728,  ..., -0.2189, -0.8528,  0.0710],
         ...,
         [-0.3017,  0.9002, -0.0200,  ..., -0.1082, -0.8412, -0.0861],
         [-0.3338,  0.9674, -0.0729,  ..., -0.1952, -0.8181, -0.0634],
         [-0.3454,  0.8824, -0.0426,  ..., -0.0993, -0.8329, -0.1065]]],
       grad_fn=<NativeLayerNormBackward0>), hidden_states=None, attentions=None)

今度は、実際に感情分析を行うための層を含んだモデルを、 AutoModelForSequenceClassificationクラスを用いて生成する。

出力のlogitsに保管されているテンソルが得られた数値である。

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model2 = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs2 = model2(**inputs)

outputs2

SequenceClassifierOutput(loss=None, logits=tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

後処理

得られたテンソルをソフトマック関数を用いて確率に変換する。これが予測値になる。

predictions = torch.nn.functional.softmax(outputs2.logits, dim=-1)
print(predictions)

tensor([[4.0195e-02, 9.5981e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)

最初の文の予測値は [0.0402, 0.9598]、2番目の文の予測値は[0.9995, 0.0005]である。これは最初の文は、1である確率が高く、2番目の文は0である確率が高いことを示している。

モデルで用いられたラベルを得るには、モデルのid2label属性をみる。

model2.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

したがって、最初の文章はPOSITIVE、2番目の文章はNEGATIVEであると判定される。

コンピュータビジョン

画像分類

以下の画像を例として用いる。

image_example1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"

vision_classifier = pipeline(task="image-classification")

preds = vision_classifier(images =image_example1)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

No model was supplied, defaulted to google/vit-base-patch16-224 and revision 5dca96d (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.

[{'score': 0.4335, 'label': 'lynx, catamount'},
 {'score': 0.0348,
  'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'},
 {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'},
 {'score': 0.0239, 'label': 'Egyptian cat'},
 {'score': 0.0229, 'label': 'tiger cat'}]

物体検出

以下を実行して追加パッケージをインストールする必要がある。

!pip install timm

from transformers import pipeline

detector = pipeline(task="object-detection")
preds = detector(image_example1)
preds = [{"score": round(pred["score"], 4), "label": pred["label"], "box": pred["box"]} for pred in preds]
preds

No model was supplied, defaulted to facebook/detr-resnet-50 and revision 2729413 (https://huggingface.co/facebook/detr-resnet-50).
Using a pipeline without specifying a model name and revision in production is not recommended.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.

[{'score': 0.9864,
  'label': 'cat',
  'box': {'xmin': 178, 'ymin': 154, 'xmax': 882, 'ymax': 598}}]

画像セグメンテーション

segmenter = pipeline(task="image-segmentation")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

No model was supplied, defaulted to facebook/detr-resnet-50-panoptic and revision fc15262 (https://huggingface.co/facebook/detr-resnet-50-panoptic).
Using a pipeline without specifying a model name and revision in production is not recommended.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
`label_ids_to_fuse` unset. No instance will be fused.

{'score': 0.9879, 'label': 'LABEL_184'}
{'score': 0.9973, 'label': 'snow'}
{'score': 0.9972, 'label': 'cat'}

深さ推定

estimator = pipeline(task="depth-estimation", model="Intel/dpt-large")
result = estimator(images=image_example1)
result

Some weights of DPTForDepthEstimation were not initialized from the model checkpoint at Intel/dpt-large and are newly initialized: ['neck.fusion_stage.layers.0.residual_layer1.convolution1.weight', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.weight', 'neck.fusion_stage.layers.0.residual_layer1.convolution1.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.

{'predicted_depth': tensor([[[ 0.7999,  0.8382,  0.8483,  ...,  2.3091,  2.3669,  2.3291],
          [ 0.8054,  0.8101,  0.8106,  ...,  2.3390,  2.3357,  2.3307],
          [ 0.8580,  0.8359,  0.8457,  ...,  2.3557,  2.3509,  2.3599],
          ...,
          [26.3410, 26.4059, 26.3881,  ..., 17.5088, 17.4768, 17.4148],
          [26.4727, 26.4515, 26.5042,  ..., 17.4223, 17.3911, 17.4052],
          [26.5116, 26.5452, 26.5301,  ..., 17.4719, 17.4700, 17.4025]]]),
 'depth': <PIL.Image.Image image mode=L size=960x686>}

音声

以下の演説の音声ファイルを用いる。

audio_example1 = "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"

音声分類

モデルにsuperb/hubert-base-superb-erを使うと音声の感情を分類し、 MIT/ast-finetuned-audioset-10-10-0.4593を使うと音声の種類を分類する。

# #classifier = pipeline(task="audio-classification", model="superb/hubert-base-superb-er")
classifier = pipeline(task="audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
preds = classifier(audio_example1)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

/usr/local/lib/python3.10/dist-packages/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py:96: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
  waveform = torch.from_numpy(waveform).unsqueeze(0)

[{'score': 0.4208, 'label': 'Speech'},
 {'score': 0.1793, 'label': 'Rain on surface'},
 {'score': 0.1301, 'label': 'Rain'},
 {'score': 0.096, 'label': 'Raindrop'},
 {'score': 0.0578, 'label': 'Music'}]

音声認識

transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-small")
transcriber(audio_example1)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}