Top 5 Things to Know When Using the Snowflake Module in Python

Peter Boshard Olson

Jun 8, 2023 5 min read

Articles

The Snowflake Module in Python

Data practitioners use the Snowflake module in Python, also known as “snowflake-connector-python,” to interact with data in Snowflake from a Python interface — for example, to run Snowflake queries from inside a Jupyter notebook, fetch data from inside Snowflake and store it as a pandas dataframe in your local instance, etc. (Note that this is not the same thing as Snowpark, another Snowflake tool, which lets you query Snowflake using a non-SQL data API.)

In this post, we share the top five things you should know when using the Snowflake module in Python. Let’s dive in.

1. Database Connectors in Python Are Confusing

tl;dr: It’s not you.

In 1999, the Python community created guidelines for how Python should interact with databases. The Python developers made judgment calls about how to map SQL/database concepts to Python objects and methods. Because the two worlds treat different problems as first-order, doing this mapping wasn’t as simple as taking each SQL/database concept and creating a corresponding Python object.

For example, in the SQL/database world, people often want to work with the same tables concurrently, so the SQL/database world emphasizes the concept of a session, which isolates one user’s usage from another’s. But in Python, concurrent usage is uncommon, so when the Python community was deciding which database concepts to represent as Python objects, they mostly folded session functionality into what became the cursor object. Whew!

The upshot is the two most important objects in the Snowflake Module in Python are the Connection object and the Cursor object, but figuring out exactly what these do — and don’t do — can be thorny. We’ll cover that next.

Note that even the words “connector” and “connection” can get confusing here — “connector” refers to the module we import, and “connection” is one of the objects you can create using that module.

2. Connections and Cursors in the Snowflake Module in Python

Okay, so what is a connection in the snowflake-connector-python context? It’s an object you create that builds a bridge between your Python code and your database. You have to have the right credentials to create it, etc. You can also manage session details in your database.

The connection has a method that lets you create a cursor object, and the best practice is to use methods from this cursor object to execute queries in your database and fetch the results back into Python.

Why the name “cursor”? The answer is that there’s a thing called a cursor in database land that lets you iterate over rows to fetch data. The Snowflake developers assume you’ll know what this is. In fact, in the Snowflake Python Connector docs, they define their Python cursor object by saying it “represents a database cursor for execute and fetch operations.”

But the Python cursor and the database cursor are slightly different. In Python, the cursor object lets you execute queries — so run them in your warehouse, whether you pull back the results to Python or not — and also fetch the results back into Python.

3. Installing the Connector

Installing the connector is easy. All you need to do is:

pip install snowflake-connector-python

or install it via conda using:

conda install -c conda-forge snowflake-connector-python

To learn more, check out the snowflake-connector-python Github repo here, the Snowflake docs here, or the conda-forge package here.

But there are some nuances that are helpful to know. For example, if you’re a pandas user, you can specify pandas as an “extra,” so:

pip install "snowflake-connector-python[pandas]"

to get a version that comes pre-packaged with pandas and PyArrow, meaning you can do things like run:

snowflake_con.cursor().execute('SELECT COLUMN_NAME from TABLE_NAME').fetch_pandas_all()

and get back the result as a pandas dataframe. If you already have the appropriate PyArrow and pandas dependencies, fetch_pandas_all will work even if you just pip install with no extras.

4. Creating the Connection with the Snowflake Module in Python

Creating a connection is fairly straightforward. All you do is import the snowflake connector (import snowflake.connector), and then use the connect method with the necessary arguments to point to the right database and send over the right credentials.

import snowflake.connector
db_con = snowflake.connector.connect(user=****, password=****, account=****)

There are other ways to authenticate as well — Okta, OAuth, etc. — but here we’ll focus on username authentication.

And if you don’t know where to find your account name, username, and other connections information, don’t worry — we’ve got you covered! Check out our documentation which includes a detailed walkthrough on where to locate and how to format each of these pieces of information here. (This is a REALLY helpful, practical guide.)

Once you have all the info you need, one best practice for storing and accessing it is to create a credentials.py file and an empty __init__.py in the same directory as the notebook or .py file where you’re creating the connection. You can find an example of this setup here. Then you can populate your credentials.py file as follows:

params = {
  "account": "<account_identifier>"
  "user": "<username>",
  "password": "<password>",
  "role": "<role_name>",
  "warehouse": "<warehouse_name>",
  "database": "<database_name>",
  "schema": "<schema_name>"
}

Then to make the connection, all you need to do is import your credentials and run:

import credential
import snowflake.connector
snowflake_con = snowflake.connector.connect(
  user=credential.params["user"],
  password=credential.params["password"],
  account=credential.params["account"],
  role=credential.params["role"],
  database=credential.params["database"],
  schema=credential.params["schema"],
  warehouse=credential.params["warehouse"]
)

Now you can share or demo your code without having to worry that you’ll accidentally share secrets!

5. Working With Your Tables Using Pandas DataFrames

After you’ve created the connection, the most important method is probably .cursor(), which creates the cursor object:

cur = snowflake_con.cursor()

Then after you’ve created the cursor object, you’ll typically want to execute a query, as follows:

cur.execute('SELECT * from TEST_TABLE')

Executing the query will cause it to run in Snowflake, though on its own, that doesn’t pull the results back into Python — So beware, even if you’re not seeing output in Python, you still might be spending money on Snowflake compute!

If you want to pull the results right away (as a pandas dataframe, say), you can run:

df = cur.execute('SELECT * from TEST_TABLE').fetch_pandas_all()

And voila! Now you have a dataframe with data from a table in Snowflake! Note that this is pulling the data from the database into your machine’s local memory as a pandas dataframe, which is often not what you want. Ponder lets you use pandas to manipulate data directly in your database, so you benefit from the scale and security of your database. Your data never leaves the warehouse so you don’t have to worry about setting up additional infrastructure, data ingress / egress costs, and local memory constraints.

Conclusion

In this article, we discussed:

Why database connectors in Python (including the Snowflake connector) are confusing — the mapping between database concepts and Python objects isn’t 1:1!
What connections and cursors do in Python
How to install the Snowflake-Python connector
How to create a Snowflake connection
And how to work with your tables using pandas

But I’d be remiss if I didn’t note another best practice — Instead of executing Snowflake SQL queries from Python using the cursor method, you can just use Ponder to write pandas code that gets executed in Snowflake as SQL. The effect is that you get to use your favorite pandas API, but your data pipelines run on a data warehouse, one of the most battle-tested and heavily-optimized data infrastructures today!

Try Ponder Today

Start running your Python data science workflows in your database within minutes!

Get started

News

Oct 23, 2023

🐼 ❤️ ❄️

We are excited to announce Snowflake’s intent to acquire Ponder to bring Ponder’s Python data science innovations to its customers and to accelerate the growth of the Modin community.

Articles

Oct 3, 2023

Professional Pandas: Handling Missing Data With Pandas Dropna

This is the fifth in a series of blog posts that teach how to write professional-quality pandas code. We start by discussing pandas dropna generally and going over a simple example. Then we talk about identifying missing values, when to drop data, and how to drop entire rows that are missing.

Articles

Sep 19, 2023

How To Use pandas resample on a Database

In this article, we describe pandas resample + provide some examples, and then show how you can use it at scale in your database.

Ready to level up your Pandas game?

Try Ponder now

Top 5 Things to Know When Using the Snowflake Module in Python

The Snowflake Module in Python

1. Database Connectors in Python Are Confusing

2. Connections and Cursors in the Snowflake Module in Python

3. Installing the Connector

4. Creating the Connection with the Snowflake Module in Python

5. Working With Your Tables Using Pandas DataFrames

Conclusion

Try Ponder Today

Read more:

🐼 ❤️ ❄️

Professional Pandas: Handling Missing Data With Pandas Dropna

How To Use pandas resample on a Database

Ready to level up your Pandas game?

Ready to level up your Pandas game?