Oct 3, 2023

Professional Pandas: Handling Missing Data With Pandas Dropna

This is the fifth in a series of blog posts that teach how to write professional-quality pandas code. We start by discussing pandas dropna generally and going over a simple example. Then we talk about identifying missing values, when to drop data, and how to drop entire rows that are missing....


Sep 19, 2023

How To Use pandas resample on a Database

In this article, we describe pandas resample + provide some examples, and then show how you can use it at scale in your database....


Sep 16, 2023

Why Are There So Many Python Dataframes?

Introduction As I floated in the slow, crystalline current of the Rhine in Basel in the middle of the 2023 Python Dataframe Summit, I found myself asking: “Why are there so many Python dataframe APIs?” (pandas/Modin, Polars, RAPIDS cuDF, Ibis, Snowpark Dataframes, Vaex, Dask, PySpark, Daft, BigQ...


Sep 7, 2023

How the Python Dataframe Interchange Protocol Makes Life Better

In this article, we answer three questions about the Python Dataframe Interchange Protocol: What it is + what problems it solves; how it works; and how extensively it's been adopted....


Aug 30, 2023

How to Use Snowflake write_pandas

In this article we describe how to use the Snowflake write_pandas function to take a pandas DataFrame and save it as a table in Snowflake....


Aug 24, 2023

Modin Now Supports Batch Loading with a PyTorch DataLoader

In this article, we discuss how Modin -- the open-source scalable drop-in replacement for pandas -- now implements a version of the PyTorch DataLoader that supports batch loading as well, making it much, much faster to use PyTorch as a Modin user....


Aug 23, 2023

Snowpark ML + Ponder for Healthcare Data Analysis

In this post, we walk through an end-to-end machine learning workflow to show how you can use Ponder and Snowpark ML to analyze electronic health records directly in your data warehouse....


Aug 15, 2023

Professional Pandas: Indexing with Pandas Iloc

This is the fourth in a series of blog posts that teach how to write professional-quality pandas code. We start by giving a high-level description of Pandas iloc. Then we discuss each of the possible input types, with examples. And finally, we talk about the dangers of magic numbers (numbers that ap...


Aug 9, 2023

How to Use pandas read_sql

In this article, we discuss the structure of pandas read_sql, its history, how it relates to SQLAlchemy, and how to use it....

Ready to level up your Pandas game?

Sign up for a free health check for your data workflows to identify opportunities to scale and accelerate your data team.

Book a session