dxalxmur.com

Supercharged DataFrames: Why Polars Might Replace Pandas

Written on

Introduction to Polars

Polars has emerged as a powerful alternative to pandas, particularly when handling large datasets. This library, crafted in Rust and built on the Arrow framework, offers impressive speed and efficiency. Despite its Rust origins, users can easily access it through a Python package, making it a seamless transition for those already familiar with pandas.

Before diving deeper, let’s explore the compelling reasons to consider Polars.

Advantages of Choosing Polars

Polars harnesses the full potential of your CPU by utilizing all available cores, optimizes queries to minimize unnecessary memory usage, and can manage datasets that exceed your system's RAM. Additionally, it enforces a strict schema, requiring data types to be established prior to query execution.

To illustrate its capabilities, let’s take a look at some performance comparisons.

Performance comparison of Polars vs other libraries

Performance Metrics

Polars achieves superior performance through its lazy and semi-lazy execution. This allows for query optimization across entire queries, thus enhancing performance and reducing memory strain. However, for users who prefer traditional methods, Polars also supports eager execution similar to pandas.

Getting Started with Polars

Installation Process

To install Polars, simply run the following command:

# pip

pip install polars

# conda

conda install polars

Ensure that your Python version is 3.7 or higher.

Reading Data with Polars

Similar to pandas, Polars can read CSV files. Let’s import Polars and read a sample CSV file:

import polars as pl

df = pl.read_csv("StudentsPerformance.csv")

Upon loading, you might notice that the dataframe does not include an index, as Polars opts for a more predictable and straightforward approach. This eliminates the need for methods like .loc or .iloc that are common in pandas.

Exploring DataFrame Structure

You can easily access column names with:

>>> df.columns

['id', 'gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course', 'math score', 'reading score', 'writing score']

Now, let’s delve into how to manipulate data within Polars.

Selecting Columns

To select the "gender" column, use:

# Select 1 column

df.select(pl.col('gender'))

For multiple columns, simply include them in a list:

# Select 2+ columns

df.select(pl.col(['gender', 'math score']))

Or, to select all columns:

# Select all columns

df.select(pl.col('*'))

Creating New Columns

If you want to create a new column that sums 'math score' and 'reading score', you can do it as follows:

# polars: create "sum" column

df.with_columns(

(pl.col('math score') + pl.col('reading score')).alias("sum")

)

To calculate an average score:

# polars: create "average" column

df.with_columns(

pl.col(['math score', 'reading score', 'writing score']).mean().alias('average')

)

Filtering Data

To filter for females, use:

# polars: simple filtering

df.filter(pl.col('gender')=='female')

For more complex conditions, such as filtering females from "group B":

# Multiple filtering

df.filter(

(pl.col('gender')=='female') &

(pl.col('race/ethnicity')=='group B')

)

Grouping and Joining Data

Grouping works similarly to pandas:

# Group by

df.groupby("race/ethnicity").count()

For joining dataframes, you will need a second CSV file named "LanguageScore.csv":

df2 = pl.read_csv("LanguageScore.csv")

# Join dataframes

df.join(df2, on='id')

You can specify the type of join using the how parameter:

# Inner, left and outer join

df.join(df2, on='id', how='inner')

df.join(df2, on='id', how='left')

df.join(df2, on='id', how='outer')

Concatenating DataFrames

To concatenate dataframes, you can use .concat and specify the orientation:

# Concatenate dataframes

pl.concat([df, df2], how="horizontal")

However, if both dataframes share a column, drop one before concatenation:

# drop column "id" in df2

df2 = df2.drop("id")

# Concatenate dataframes

pl.concat([df, df2], how="horizontal")

In this case, if the dataframes differ in size, you may see null values in the resulting dataframe.

Congratulations! You’ve just learned the basics of using the Polars library. For further details, refer to the official documentation.

Stay connected by joining my newsletter, which has over 20K subscribers, and receive a free ChatGPT cheat sheet!

Video Insights

If you're interested in a visual overview of Polars, check out the following videos:

In this video, titled "Polars: The Super Fast Dataframe Library for Python... bye bye Pandas?", you'll discover the features that make Polars a compelling choice.

The second video, "Speeding Up Your DataFrames With Polars | Real Python Podcast #140," provides insights into optimizing your data operations with Polars.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Why My Company Abandoned Performance-Based Pay Raises

An exploration of why performance-based pay raises can be problematic and how alternative evaluation methods can foster a healthier workplace.

Innovative Image Manipulation: Exploring DragGAN's Capabilities

Discover how DragGAN enhances image manipulation through an interactive point-based technique in deep generative models.

# Understanding the Alarming Surge in Breast Cancer Among Young Women

A deep dive into the rising rates of breast cancer in younger women, exploring potential causes and implications for screening guidelines.

Mastering Cardano: A Comprehensive Guide to Daedalus Wallet Setup

Learn how to set up your Daedalus wallet on Cardano, a beginner-friendly guide to accessing blockchain features and managing your assets.

The Time I Tried and Failed to Break the Internet

A humorous reflection on a past video attempt and the lessons learned about sharing ourselves online.

# The Most Brilliant Cosmic Phenomenon Ever Observed by Humanity

Discover the extraordinary supernova SN 1006, the brightest cosmic event recorded by humanity, witnessed across various cultures and regions.

# Understanding the Complexities of Bipolar Disorder

Explore the multifaceted aspects of bipolar disorder, including personal experiences and insights into managing mental health.

Understanding Life's Journey: The Intricate Dance with Mortality

Explore the multifaceted relationship between life and the fear of death, and how it influences our choices and growth.