Categories
Data Engineering

Polars: A High-Performance DataFrame Library for Rust

I know I’m late, but let me talk about Polars! I want to discuss Polars in Rust because it’s really important, it greatly enhances Rust’s capabilities for data science, which is our main focus here. This blog is still pretty new, so I’m writing this a bit late. Nothing else stopped me! Polars is a […]

Categories
Data Engineering

Building Your First ETL Pipeline in Rust

Okay, we’re on a streak with Rust articles. This is my third Rust article, and now I’ll be giving a practical guide to complement my previous theoretical ETL article. I assume you already know what an ETL pipeline is, or at least have read my previous article on the topic, so I won’t go into […]

Categories
Data Engineering

Securing Data Pipelines: Authentication & Authorization

Data pipelines are cool, but have you ever considered their security? It’s probably a yes, because we’ve been hearing news about unauthorized access to systems and data theft for a long time. So, if you’re in this field, you’ve likely thought about the security of data pipelines at least once. So here you are. Yeah, […]

Categories
Data Engineering

Writing SQL For Data Engineering

SQL is still the most commonly used query language. Many people use it for analysis tasks, such as searching for a specific user in a database, exporting rows to Excel, and grouping categories, among others. However, in data engineering, SQL is not just a tool for data analysis; it plays a key role in building […]

Categories
Data Engineering

Don’t Build Models on Trash; Start with a Data Pipeline

Many people jump straight into building models, eager to extract insights or maximize accuracy. However, without a proper data pipeline to clean, structure, and process your data, your model will either fail or produce results that look good but are misleading. The Messy Truth About Real-World Data In an ideal world, data would be clean, […]

Categories
Data Engineering

Logic First, Data Later? Or the Other Way Around? ETL vs ELT

Data doesn’t just magically become useful. Whether you’re building dashboards, feeding machine learning models, or just trying to get a cleaner look at last quarter’s sales, you need data that’s structured, clean, and actually means something. And that’s where transformation comes in, particularly through ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes. But […]

Categories
Data Engineering

Clarifying the Terms: DataFrame vs. Dataset

If you’ve worked with data, especially in Python, Spark, or R, you’ve probably come across the terms Dataset and DataFrame. They sound similar, but they’re actually a bit different depending on the tool or framework you’re using. DataFrame A DataFrame is a two-dimensional tabular data structure that resembles an Excel sheet or a database table, […]