An API-First Approach to Data

Doris Lee

Jun 5, 2023 5 min read

Articles
An API-First Approach to Data image

Consistent interfaces simplify our interactions with complex systems. However, when it comes to data, we often find ourselves locked in by the infrastructure decisions we make. In this blog post, we explore why it is important to adopt an infrastructure-agnostic approach when picking data tools and how Ponder gives you the flexibility you need by letting you use your favorite data APIs, like pandas and NumPy, no matter what database your data is stored in. Whether you’re exploring a hybrid-cloud or multi-cloud data strategy, or looking to future-proof your work, this is for you.

Imagine standing in front of multiple doors, each representing a different data infrastructure. As a data professional, you are tasked with choosing the right door to meet your organization’s needs. However, behind each door lies a different outcome.

Pick a Door and Determine Your Fate

The problem with picking one door is that no one infrastructure is best for every use case.

Behind one door, compute costs may be cheaper; behind another, we may have a robust cloud data warehouse that is highly reliable for production uses. The challenge is your data requirements are constantly evolving, and making the wrong choice can lead to unnecessary constraints and limitations. The choices are not always clear.

Too often, the API becomes an afterthought in the decision-making process. When you opt for vendor-specific tooling, you are locked into what infrastructure you can leverage when working with your data. Using Snowpark? Well, all your code will only work on Snowflake! Using PySpark? You’re stuck with using Spark! Even SQL itself comes with hundreds of different flavors and dialects specific to each database vendor. Often, your API decisions lock you into what infrastructure you can use to work with your data and vice versa. Migrations are near impossible without heavy feats of engineering.

Why pick when you can hold the keys to different doors?

The solution to this conundrum? Invest in the API for your data.

Looking at the world from the choice of the API, the path suddenly becomes clear: pandas is the most popular API for data. It is an open standard — the go-to library for data practitioners. And it is here to stay.

By choosing Ponder, you embrace an API-first approach that allows you to remain infrastructure-agnostic. Switching infrastructure backends is as easy as swapping out your database connector. This means you can focus on leveraging the most popular API, pandas, and not worry about being tied down to a specific backend or infrastructure.

Look at how easy it is to move between different backends seamlessly!
Read more about it in the docs.

To make our point concrete, let’s look at three examples of how Ponder’s infrastructure-agnostic approach helps customers optimize for the objectives they care about.

Example 1: Optimizing for Cost Using a Hybrid On-Prem / Cloud Workflow

Jane is a data engineer who monitors the workloads running on Snowflake. She noticed that the bulk of their compute cost was coming from a long-running data pipeline for generating a weekly report across their customer database on Snowflake. After talking to the project owner, she learned that while the job required running on the entire dataset, it was not particularly time-sensitive or production-critical. With Ponder, she can easily swap out the Snowflake execution with Ponder running locally on DuckDB. This reduces their compute costs by 50% without having to rewrite any part of their workflow.

pip install ponder and try it yourself!

Get started

Example 2: Optimizing for Maintainability for Multi-Cloud Deployment

Alice is a data scientist at a consulting company with clients bringing data stored in a variety of different backends from on-prem databases to cloud data warehouses. Alice’s team has developed a general module for data cleaning and feature engineering that could be used across many of the customers they work with. However, in the past, they had to customize the code for each customer depending on their infrastructure. With Ponder, they can now build maintainable data pipelines that work across databases, freeing them up to deliver improved services for their customers.

Example 3: Optimizing for Productivity Using Adaptive Compute

John is a financial analyst developing a risk prediction model. However, since the overall model takes more than six hours to run with pandas running locally, his team can only make a few tweaks to the model in any single day. With Ponder, the team can spin up a large Snowflake warehouse to quickly iterate on their workflows interactively. Each iteration completes within a few minutes instead of waiting several hours. As a result, the project completes within a few days rather than the original month-long estimate. Once the project’s development phase is over, the team is free to pick the most cost-efficient method to run their workflows in a recurring manner.

With Ponder, you no longer have to pick. Whether you’re exploring a multi-cloud or hybrid cloud strategy or simply looking to better leverage the compute resources available to you, Ponder gives you the freedom and flexibility to move across different backends. Pick the infrastructure that best suits your use case to optimize for cost, maintainability, and productivity – depending on the objectives that matter to your team.

Data in this fast-evolving world is about managing tradeoffs. Ponder lets you leverage the benefits of hybrid execution, so that you can avoid vendor lock-in and explore new possibilities with the infrastructure available to you when and where it makes sense.

Ponder: An API-Centric Approach to Your Data

In a world where APIs have become the backbone of our technology landscape, it’s time that data deserves the same. Ponder offers a powerful and flexible API-centric approach to data management, empowering you to make choices based on your organization’s needs rather than being constrained by infrastructure. By investing in Ponder, you unlock a world of possibilities, seamlessly adapt to changing use cases. So, why pick one door when you can hold the key to many doors?

 

Try Ponder Today

Start running your Python data science workflows in your database within minutes!

Get started

Ready to level up your Pandas game?

Try Ponder now