I know I’m late, but let me talk about Polars! I want to discuss Polars in Rust because it’s really important, it greatly enhances Rust’s capabilities for data science, which is our main focus here.
This blog is still pretty new, so I’m writing this a bit late. Nothing else stopped me!
Polars is a blazing-fast DataFrame library built in Rust, designed for efficient processing of large datasets. You can think of it as pandas on steroids, much faster and more memory-efficient.
What Sets Polars Apart?
If you’re already used to Pandas in Python, you might naturally wonder, “Why should I switch?” Well, today it’s not just for fun, there are many good reasons to try Polars. You don’t have to switch completely, but it’s definitely worth exploring!
Polars stands out from other DataFrame libraries mainly because Rust powers it, a language known for amazing speed and strong memory safety. This foundation lets Polars handle large datasets much faster and more efficiently than many traditional tools.
One of its coolest features is lazy evaluation, which means Polars builds an optimized query plan before running any operations. This cuts down on unnecessary work and speeds up complex data workflows.
On top of that, Polars automatically uses all the cores of your CPU by running tasks in parallel, no extra setup required. It also leverages the Apache Arrow columnar memory format, improving how your computer accesses and caches data, which matters a lot for high-performance processing.
Finally, Polars works smoothly with common data formats like CSV, Parquet, and JSON, so you can easily plug it into your existing data pipelines. All these features make Polars a uniquely fast, flexible, and reliable choice for data science with Rust.
Getting Started with Polars
For this article, I’ve included Rust examples since Polars works great here, but it’s also available for Python, Node.js, R, and SQL if you want to explore later.
[dependencies]
polars = { version = "0.38", features = ["lazy", "csv"] }
use polars::prelude::*;
fn main() -> PolarsResult<()> {
let df = CsvReader::from_path("data.csv")?
.infer_schema(None)
.has_header(true)
.finish()?;
let filtered = df.lazy()
.filter(col("age").gt(lit(30)))
.collect()?;
println!("{:?}", filtered);
Ok(())
}
Now we’re working with Polars, and here’s a simple example. Most of the time, everything begins with a CSV file, so in this case, we also loaded a CSV and apply a basic filter.
I’ll briefly explain the methods we used so you won’t have to guess why they’re there.
.infer_schema(None)
lets CsvReader guess the column types automatically. .has_header(true)
tells it the first row has column names. And .finish()?
actually reads the file and gives us the DataFrame. Simple as that!
Later, we switch to the lazy API with df.lazy()
, which lets Polars optimize things before running them. Then we use .filter(col("age").gt(lit(30)))
to keep only the rows where the age is greater than 30. Finally, .collect()?
runs the whole thing and gives us the filtered DataFrame.
Actually, you don’t even need the filter, you can simply skip it and use .collect()
to get your optimized DataFrame directly.
┌───────┬─────┐
│ name ┆ age │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═════╡
│ Alice ┆ 42 │
│ Jack ┆ 35 │
│ Mark ┆ 34 │
│ Kira ┆ 39 │
└───────┴─────┘
And you can expect output like this.
Group by and Aggregate
We can play around more, and that’s exactly why we’re here! So, let’s dive in, this time, we’ll load the same dataset, group the data by age, and calculate the average of their scores. (The datasets are made up, so feel free to create your own!)
use polars::lazy::prelude::*;
use polars::prelude::*;
fn main() -> PolarsResult<()> {
let lf = LazyCsvReader::new("data.csv").has_header(true).finish()?;
let result = lf
.group_by([col("age")])
.agg([col("score").mean().alias("average_score")])
.collect()?;
println!("{:?}", result);
Ok(())
}
┌─────┬───────────────┐
│ age ┆ average_score │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═══════════════╡
│ 28 ┆ 89.0 │
│ 34 ┆ 80.0 │
└─────┴───────────────┘
It’s really easy to work with Polars, and on top of that, you get so many advantages, like easy wasn’t already enough!
Now let’s talk about .group_by
and .agg
, two key methods that do the real magic here. With .group_by
, we tell Polars how to split the data, like grouping everything by age.
Then .agg
jumps in to say what we want to calculate for each group, like taking the average score.
Sorting and Selecting Columns
Let’s continue with other common operations: sorting and selecting columns. Actually, it’s mostly the same, just a few small changes.
use polars::lazy::prelude::*;
use polars::prelude::*;
fn main() -> PolarsResult<()> {
let lf = LazyCsvReader::new("data.csv").has_header(true).finish()?;
let result = lf
.select([col("name"), col("score")])
.sort("score", Default::default())
.collect()?;
println!("{:?}", result);
Ok(())
}
┌─────────┬───────┐
│ name ┆ score │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═══════╡
│ Charlie ┆ 75 │
│ Alice ┆ 85 │
│ Dave ┆ 88 │
│ Bob ┆ 90 │
└─────────┴───────┘
By default, sorting is descending, so here we just select the name
and score
columns and sort by the score
column.
Renaming and Dropping Columns
use polars::lazy::prelude::*;
use polars::prelude::*;
fn main() -> PolarsResult<()> {
let lf = LazyCsvReader::new("data.csv").has_header(true).finish()?;
let result = lf
.select([col("name"), col("score").alias("final_score"), col("age")])
.drop(["age"])
.collect()?;
println!("{:?}", result);
Ok(())
}
┌─────────┬─────────────┐
│ name ┆ final_score │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════════════╡
│ Alice ┆ 85 │
│ Bob ┆ 90 │
│ Charlie ┆ 75 │
│ Dave ┆ 88 │
└─────────┴─────────────┘
Here, we pick the columns we want with .select()
, renaming "score"
to "final_score"
on the fly using .alias()
. Then, we clean things up by dropping the "age"
column with .drop()
.
It’s like saying: “Keep what I need, rename what’s clearer, and toss out the rest.” Simple, smooth, and exactly what you want when handling data.
Ah, and in addition, if you want to drop a row, you need to use filtering.
use polars::lazy::prelude::*;
use polars::prelude::*;
fn main() -> PolarsResult<()> {
let lf = LazyCsvReader::new("data.csv").has_header(true).finish()?;
// Drop rows where name == "Alice"
let result = lf.filter(col("name").neq(lit("Alice"))).collect()?;
println!("{:?}", result);
Ok(())
}
Conclusion
This article will be also updated over time if I get any feedback, but I think it’s solid start for people who wanna look up Polars in Rust.
Essantial topics we cover today: dropping , selecting , renaming, filtering, math operations, sorting so you are ready to go for your next dataset process in Rust polars.
Give it a shot because I think this create really has huge potantial for data science and I believe it’s really a good news for both Rust and Python programmers.
Pull up, and I rip it up like ballet.