Harnessing HuggingFace Transformers for NLP Tasks in Python
Written on
The increase in internet usage is projected to continue rising in the upcoming years. Cisco forecasts that by 2023, 66% of the global population will have internet access (Cisco, 2022).
As a result, the volume of unstructured data generated through text files, emails, and similar formats is also expected to rise. This has led to the emergence of a specialized field known as Natural Language Processing (NLP), which focuses on extracting valuable insights from such data.
A report by Mordor Intelligence (2021) anticipates that the NLP market will reach a valuation of USD 48.46 billion by 2026, experiencing a compound annual growth rate (CAGR) of 26.84% from 2021 to 2026.
In this article, we will delve into how to leverage a straightforward pre-trained transformer language model for various common NLP tasks using Python.
To begin, we need to install the Hugging Face Transformers package. To avoid potential dependency conflicts, it is advisable to set this up in a virtual environment.
pip install transformers
After installation, we can explore some basic text classification techniques.
Text Classification
When utilizing Hugging Face Transformers, the first step is to create a pipeline for the task at hand. This allows us to employ the text classification transformer effectively.
from transformers import pipeline
classifier = pipeline('text-classification')
You may see a warning indicating that the default model will be used since none was specified. This is acceptable, as the default model will be downloaded and cached on your machine for future use.
Now let's classify some text. The input can be a single string or a list of strings. For instance, we can analyze a quote from Nietzsche to observe how the pre-trained model assigns a high positive sentiment score.
import pandas as pd
text = 'Battle not with monsters, lest ye become a monster, and if you gaze into the abyss, the abyss gazes also into you.'
result = classifier(text)
df = pd.DataFrame(result)
This approach also works for a list of strings, where we can observe that the model assigns a significantly negative sentiment score to the other two quotes.
text = [
'Battle not with monsters, lest ye become a monster, and if you gaze into the abyss, the abyss gazes also into you.',
'Many of life’s failures are people who did not realize how close they were to success when they gave up.',
'A million words would not bring you back, I know because I tried, neither would a million tears, I know because I cried.'
]
result = classifier(text)
df = pd.DataFrame(result)
Named Entity Recognition
Another powerful application of NLP with Hugging Face Transformers is Named Entity Recognition (NER), which identifies familiar entities in text, such as organizations, people, and locations.
Let's apply this to a customer review for a Cast Iron griddle from a well-known online marketplace.
review = "The pan is made well and heavy as you would expect from a cast iron cook ware. I got the silicone handle holder as well hoping that it would fit snug on the handle but no it doesn't. It is one of those generic silicone handle holder that you can by for much cheaper price separately. Thanks to Amazon for shipping this to Australia under prime. Hope they add more products selection under prime to ship from the US to AU. The pricing and collection of items is not yet good in AU site, and is far from being on par with the US site, hopefully gets better in the future with more prime offerings. Cheers."
ner_tag = pipeline('ner', aggregation_strategy='simple')
results = ner_tag(review)
df = pd.DataFrame(results)
The model successfully extracts several entities, including the organization (Amazon) and locations (Australia, AU, and the US).
The output provides confidence scores for the identified entities, along with their positions in the original review.
Question Answering
We can also utilize the same review to extract answers based on specific questions, determining if the customer was satisfied with the product.
qa = pipeline('question-answering')
question = 'Was the customer satisfied?'
result_qa = qa(question=question, context=review)
df = pd.DataFrame([result_qa])
The model's response and score will vary based on the question and the text provided. For instance, changing the question to inquire about improvements for the product yields different results.
question = 'How can we improve on the product?'
result_qa = qa(question=question, context=review)
df = pd.DataFrame([result_qa])
The review indicates a pricing issue, which may provide valuable insights for the manufacturer.
Text Summarization
Finally, we can see how to summarize a lengthy text using the summarization pipeline. Below is the excerpt we will summarize:
excerpt = (
"One of the most beneficial skills you can learn in life is how to consistently put yourself in a good position. "
"The person who finds themselves in a strong position can take advantage of circumstances while others are forced into a series of poor choices. "
"Strong positions are not an accident. Weak positions aren’t bad luck."
)
text_summarizer = pipeline('summarization')
output = text_summarizer(excerpt, max_length=50, clean_up_tokenization_spaces=True)
print(output[0]['summary_text'])
The output shows that the text has been effectively condensed into a concise summary. The maximum length of the output can be adjusted as needed.
That concludes this article. I hope it has provided a clear understanding of how straightforward it is to implement HuggingFace transformer models for commonly encountered NLP tasks.
Are you a curious learner? Unlock the full potential of your knowledge on Medium and support writers like me for less than the cost of a cup of coffee.
<div class="link-block">
<div>
<h2>Join Medium with my referral link — Jason LZP</h2>
<div><h3>As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…</h3></div>
<div><p>lzpdatascience.medium.comm</p></div>
</div>
</div>