dxalxmur.com

Harnessing HuggingFace Transformers for NLP Tasks in Python

Written on

The increase in internet usage is projected to continue rising in the upcoming years. Cisco forecasts that by 2023, 66% of the global population will have internet access (Cisco, 2022).

As a result, the volume of unstructured data generated through text files, emails, and similar formats is also expected to rise. This has led to the emergence of a specialized field known as Natural Language Processing (NLP), which focuses on extracting valuable insights from such data.

A report by Mordor Intelligence (2021) anticipates that the NLP market will reach a valuation of USD 48.46 billion by 2026, experiencing a compound annual growth rate (CAGR) of 26.84% from 2021 to 2026.

In this article, we will delve into how to leverage a straightforward pre-trained transformer language model for various common NLP tasks using Python.

To begin, we need to install the Hugging Face Transformers package. To avoid potential dependency conflicts, it is advisable to set this up in a virtual environment.

pip install transformers

After installation, we can explore some basic text classification techniques.

Text Classification

When utilizing Hugging Face Transformers, the first step is to create a pipeline for the task at hand. This allows us to employ the text classification transformer effectively.

from transformers import pipeline

classifier = pipeline('text-classification')

You may see a warning indicating that the default model will be used since none was specified. This is acceptable, as the default model will be downloaded and cached on your machine for future use.

Now let's classify some text. The input can be a single string or a list of strings. For instance, we can analyze a quote from Nietzsche to observe how the pre-trained model assigns a high positive sentiment score.

import pandas as pd

text = 'Battle not with monsters, lest ye become a monster, and if you gaze into the abyss, the abyss gazes also into you.'

result = classifier(text)

df = pd.DataFrame(result)

This approach also works for a list of strings, where we can observe that the model assigns a significantly negative sentiment score to the other two quotes.

text = [

'Battle not with monsters, lest ye become a monster, and if you gaze into the abyss, the abyss gazes also into you.',

'Many of life’s failures are people who did not realize how close they were to success when they gave up.',

'A million words would not bring you back, I know because I tried, neither would a million tears, I know because I cried.'

]

result = classifier(text)

df = pd.DataFrame(result)

Named Entity Recognition

Another powerful application of NLP with Hugging Face Transformers is Named Entity Recognition (NER), which identifies familiar entities in text, such as organizations, people, and locations.

Let's apply this to a customer review for a Cast Iron griddle from a well-known online marketplace.

review = "The pan is made well and heavy as you would expect from a cast iron cook ware. I got the silicone handle holder as well hoping that it would fit snug on the handle but no it doesn't. It is one of those generic silicone handle holder that you can by for much cheaper price separately. Thanks to Amazon for shipping this to Australia under prime. Hope they add more products selection under prime to ship from the US to AU. The pricing and collection of items is not yet good in AU site, and is far from being on par with the US site, hopefully gets better in the future with more prime offerings. Cheers."

ner_tag = pipeline('ner', aggregation_strategy='simple')

results = ner_tag(review)

df = pd.DataFrame(results)

The model successfully extracts several entities, including the organization (Amazon) and locations (Australia, AU, and the US).

The output provides confidence scores for the identified entities, along with their positions in the original review.

Question Answering

We can also utilize the same review to extract answers based on specific questions, determining if the customer was satisfied with the product.

qa = pipeline('question-answering')

question = 'Was the customer satisfied?'

result_qa = qa(question=question, context=review)

df = pd.DataFrame([result_qa])

The model's response and score will vary based on the question and the text provided. For instance, changing the question to inquire about improvements for the product yields different results.

question = 'How can we improve on the product?'

result_qa = qa(question=question, context=review)

df = pd.DataFrame([result_qa])

The review indicates a pricing issue, which may provide valuable insights for the manufacturer.

Text Summarization

Finally, we can see how to summarize a lengthy text using the summarization pipeline. Below is the excerpt we will summarize:

excerpt = (

"One of the most beneficial skills you can learn in life is how to consistently put yourself in a good position. "

"The person who finds themselves in a strong position can take advantage of circumstances while others are forced into a series of poor choices. "

"Strong positions are not an accident. Weak positions aren’t bad luck."

)

text_summarizer = pipeline('summarization')

output = text_summarizer(excerpt, max_length=50, clean_up_tokenization_spaces=True)

print(output[0]['summary_text'])

The output shows that the text has been effectively condensed into a concise summary. The maximum length of the output can be adjusted as needed.

That concludes this article. I hope it has provided a clear understanding of how straightforward it is to implement HuggingFace transformer models for commonly encountered NLP tasks.

Are you a curious learner? Unlock the full potential of your knowledge on Medium and support writers like me for less than the cost of a cup of coffee.

<div class="link-block">

<div>

<h2>Join Medium with my referral link — Jason LZP</h2>

<div><h3>As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…</h3></div>

<div><p>lzpdatascience.medium.comm</p></div>

</div>

</div>

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Transforming Limiting Beliefs Into Empowering Mindsets

Explore how changing beliefs can lead to personal success and fulfillment.

Optimizing Concurrency in a Coffee Ordering System

Explore how to enhance concurrency in a coffee ordering app using various techniques and Python's capabilities.

Understanding the Dynamics of Car Collisions: 50 vs. 100 mph

Exploring the physics behind collisions, comparing equal-speed head-on crashes to high-speed impacts against a wall.

The Evolution of Mobile Food Services: A Culinary Journey

Explore the rich history of mobile food services, from Japan's Muromachi period to the impact of the COVID-19 pandemic on dining trends.

Unlocking the Secrets to Better Sleep: A Comprehensive Guide

Discover the science behind sleep and learn effective strategies for enhancing your sleep quality and overall well-being.

Welcome New Writers to ILLUMINATION #11: Your Journey Begins

Join us in welcoming new writers to ILLUMINATION Integrated Publications on Medium, where your stories can shine.

Samsung Galaxy S22 Review: A Three-Month Perspective

A detailed review of the Samsung Galaxy S22 after three months of use, covering design, software, camera, battery, and pricing.

Transforming Financial Strategies: The Impact of AI on Planning

Discover how AI is reshaping financial planning by enhancing portfolio management and client relations.