
Ponder was founded on the belief that your data science code should not have to change when your data storage platform does. Your data could reside on your laptop, in a database on prem, or in a cloud database. You should be able to use the same code on all these environments and get the same results, while taking advantage of the scalability benefits of your data storage platform.
In keeping with this belief, we’re proud to announce support for DuckDB: Now you can run pandas and NumPy with DuckDB as a backend! Quack quack!
Why Ponder and DuckDB 🦆?
Ponder lets you run your data science workflows (pandas, NumPy) directly in your database, be it Snowflake, BigQuery, or DuckDB. With Ponder, you get the same Python experience you love, but with the power and scalability of data warehouses. Ponder is particularly focused on the pandas API, since that’s a ubiquitous and essential API for data analysis, data science, and AI, with about a quarter of all developers using it frequently. Learn more about Ponder here.
DuckDB is a new database that is proving itself to be a robust choice for single-machine, in-process workloads with lots of exciting features announced practically every week.
At Ponder we’re all about meeting the users where they are rather than making them come out of their comfort zone. So DuckDB, thanks to its seamless use as a Python library, was an obvious database for us to support the pandas API on.
What can you do with Ponder and DuckDB?
First, let’s import ponder:
import ponder
ponder.init()
If you have a DuckDB file with data – you can connect to it:
import duckdb
duckdb_con = duckdb.connect("PONDER_DEMO.duckdb"))
You’re off to the races! Now for some pandas. Let’s connect to the CUSTOMER table, which contains 15 million rows.
import modin.pandas as pd
df = pd.read_sql("CUSTOMER", duckdb_con)
time: 812 ms
With Ponder on DuckDB, you can print out a preview of the dataframe — this takes less than 150 milliseconds for our 15-million-row table.
df.head()
C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_MKTSEGMENT C_COMMENT
0 13885025 Customer013885025 dI1vjji2UTIEg7a ROATLoYjhb2GtgTM3gw0 17 27-231-844-3541 6307.11 AUTOMOBILE ogged accounts sleep slyly above t
1 13885026 Customer013885026 90DSwDWWEKzw UIUusyEFsOAkz0E 20 30-235-590-2760 5608.46 BUILDING quickly final instructions are even, special ...
2 13885027 Customer013885027 a9D8I6SUAPYILYd 19 29-263-210-9815 423.35 BUILDING wake against the blithely even platelets. bli...
3 13885028 Customer013885028 i,RUQhWznIR3ur6ww,NnJtH3gGTJh 16 26-460-477-8749 7204.22 MACHINERY lphins sleep after the permanently even packag...
4 13885029 Customer013885029 Xa2A2gh2tbYkZZQl7JxHzL3lbib0e7 4 14-429-237-5917 2402.35 FURNITURE ial pinto beans. carefully even deposits acros...
time: 125 ms
df.tail()
C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_MKTSEGMENT C_COMMENT
14999995 6735020 Customer006735020 vFwQuSJOkRomktiUnXahGp Pg77e3t SGJT20c 5 15-936-504-6793 9498.60 MACHINERY ular requests cajole slyly after the furiously...
14999996 6735021 Customer006735021 qnsn6jO2qRVJH 4 14-316-923-3675 8984.93 HOUSEHOLD unts. slyly bold theodolites for the quickly f...
14999997 6735022 Customer006735022 pwWapiNOg5bplf1Ej1WxiRbkigjqONoLN,qJyXEB 2 12-300-419-7326 -214.79 HOUSEHOLD . carefully ironic deposits along the bold
14999998 6735023 Customer006735023 3xH7YWkqWK,LGZqt bn3Z 3 13-412-507-8812 6904.76 BUILDING cajole blithely. unusual requests are quickly...
14999999 6735024 Customer006735024 RMJtYJmzCaepvQoUnk 18 28-599-151-2961 7336.20 FURNITURE even accounts sleep carefully furiously specia...
time: 141 ms
iloc is one of the bread-and-butter Pandas operations that relies on dataframe order. It works just fine, despite DuckDB not having an implicit notion of order — this is all handled by Ponder.
df.iloc[9:99]
C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_MKTSEGMENT C_COMMENT
9 13885034 Customer013885034 klO6szIwiZQZocKxFoPu3X3 17 27-270-874-8214 4966.78 AUTOMOBILE ly express requests. regular, regular accounts...
10 13885035 Customer013885035 SsSgo0uVhQITphOknU vG2okivc Q 0 10-179-934-9269 6760.22 BUILDING inal foxes grow even, ironic accounts. fi
11 13885036 Customer013885036 C7yL4WlTfA52 20 30-754-512-8897 9097.36 FURNITURE ic requests. furiously bold foxes use slyly. p...
12 13885037 Customer013885037 79pAnnVAIslyYZ8D4ZheoZJ3hlLX GB 2 12-217-386-5781 8464.12 AUTOMOBILE sly unusual warthogs are slyly re
13 13885038 Customer013885038 rgXUxZQZZJ6NgEpG1mR8rCii 13 23-705-292-5162 2577.91 AUTOMOBILE the slyly regular packages. slyly pending acc...
... ... ... ... ... ... ... ... ...
94 13885119 Customer013885119 m9kM1sItxYHb51vR8YSks4soN 4 14-371-781-3754 6150.07 BUILDING pinto beans sleep around the quickly bold
95 13885120 Customer013885120 POdHO90vuwxSqvAr1B8u,jZTyo6KOvi3Llb 18 28-494-277-9975 6168.26 FURNITURE refully furiously ironic theodolites. foxes ca...
96 13885121 Customer013885121 BC7D,qGJD0zO4V 18 28-355-267-8882 3988.74 HOUSEHOLD requests according to the instructions wake a...
97 13885122 Customer013885122 tRzsmr KaBuU4if2ec4k6lS ou61ZTSx9jBO 14 24-573-283-5373 2823.08 AUTOMOBILE d pinto beans. quickly final ideas cajole care...
98 13885123 Customer013885123 68H0SW9,3mCb3ZLXftQRu,1kg9VZmX1amySgWv 0 10-312-865-3027 2442.52 MACHINERY refully pending deposits are slowly around the...
90 rows x 8 columns
time: 188 ms
One-hot encoding is a common operation in machine learning, used to transform categorical data to boolean data. One-hot encoding involves moving data to the columns, which isn’t easy in SQL, but is a single line in pandas. All done in less than 200 milliseconds:
pd.get_dummies(df, columns="C_MKTSEGMENT")
C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_COMMENT C_MKTSEGMENT_AUTOMOBILE C_MKTSEGMENT_BUILDING C_MKTSEGMENT_FURNITURE C_MKTSEGMENT_HOUSEHOLD C_MKTSEGMENT_MACHINERY
0 13885025 Customer013885025 dI1vjji2UTIEg7a ROATLoYjhb2GtgTM3gw0 17 27-231-844-3541 6307.11 ogged accounts sleep slyly above t 1 0 0 0 0
1 13885026 Customer013885026 90DSwDWWEKzw UIUusyEFsOAkz0E 20 30-235-590-2760 5608.46 quickly final instructions are even, special ... 0 1 0 0 0
2 13885027 Customer013885027 a9D8I6SUAPYILYd 19 29-263-210-9815 423.35 wake against the blithely even platelets. bli... 0 1 0 0 0
3 13885028 Customer013885028 i,RUQhWznIR3ur6ww,NnJtH3gGTJh 16 26-460-477-8749 7204.22 lphins sleep after the permanently even packag... 0 0 0 0 1
4 13885029 Customer013885029 Xa2A2gh2tbYkZZQl7JxHzL3lbib0e7 4 14-429-237-5917 2402.35 ial pinto beans. carefully even deposits acros... 0 0 1 0 0
... ... ... ... ... ... ... ... ... ... ... ... ...
14999995 6735020 Customer006735020 vFwQuSJOkRomktiUnXahGp Pg77e3t SGJT20c 5 15-936-504-6793 9498.60 ular requests cajole slyly after the furiously... 0 0 0 0 1
14999996 6735021 Customer006735021 qnsn6jO2qRVJH 4 14-316-923-3675 8984.93 unts. slyly bold theodolites for the quickly f... 0 0 0 1 0
14999997 6735022 Customer006735022 pwWapiNOg5bplf1Ej1WxiRbkigjqONoLN,qJyXEB 2 12-300-419-7326 -214.79 . carefully ironic deposits along the bold 0 0 0 1 0
14999998 6735023 Customer006735023 3xH7YWkqWK,LGZqt bn3Z 3 13-412-507-8812 6904.76 cajole blithely. unusual requests are quickly... 0 1 0 0 0
14999999 6735024 Customer006735024 RMJtYJmzCaepvQoUnk 18 28-599-151-2961 7336.20 even accounts sleep carefully furiously specia... 0 0 1 0 0
15000000 rows x 12 columns
time: 187 ms
You have csv files you want to work with along with your data in DuckDB? No problem, just set the default connection as our DuckDB connection and we can now read the CSV as a staged table in DuckDB:
ponder.configure(default_connection=duckdb_con)
# read_csv - staging the table in duckdb
df_csv = pd.read_csv("orders_500k.csv", header=0)
time: 328 ms
Now you can work with this csv file in DuckDB.
df_csv
O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERDATE O_ORDERPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT
0 247085090 5053234 O 26421.99 1996-03-11 4-NOT SPECIFIED Clerk000022806 0 special, even accounts. slyly regular requests
1 246557251 5781565 O 150232.09 1996-03-11 2-HIGH Clerk000096944 0 gular accounts sleep furiously close theodolit...
2 247971940 1066733 O 274524.17 1996-03-11 3-MEDIUM Clerk000079674 0 to beans sleep above the furiously silent depe...
3 246548960 13399567 O 226992.85 1996-03-11 3-MEDIUM Clerk000079967 0 beans? idle foxes engage slyly even accounts. ...
4 247110821 11027485 O 211890.95 1996-03-11 2-HIGH Clerk000074934 0 sleep even excuses. furious
... ... ... ... ... ... ... ... ... ...
499995 184305477 6530809 O 186301.68 1995-08-19 4-NOT SPECIFIED Clerk000015395 0 l deposits. even, final asymptotes cajole
499996 185794656 10073543 O 138755.17 1995-08-19 4-NOT SPECIFIED Clerk000024626 0 ts. quickly regular accounts s
499997 184303879 1424971 O 233532.34 1995-08-19 2-HIGH Clerk000003942 0 pecial ideas. instructions sleep carefully acr...
499998 185796034 5808431 O 105537.88 1995-08-19 5-LOW Clerk000037482 0 ructions wake: carefully even packages serve b...
499999 184299360 2022085 O 305221.11 1995-08-19 4-NOT SPECIFIED Clerk000090622 0 ymptotes-- final, unusual accounts sleep slyly...
500000 rows x 9 columns
time: 31 ms
You can even mix-and match! Want to join the csv data with data in DuckDB?
joined = df.merge(df_csv, left_on=["C_CUSTKEY"], right_on=["O_CUSTKEY"], how="inner")
Let’s look at the results, shall we?
joined
C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_MKTSEGMENT C_COMMENT O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERDATE O_ORDERPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT
0 13885025 Customer013885025 dI1vjji2UTIEg7a ROATLoYjhb2GtgTM3gw0 17 27-231-844-3541 6307.11 AUTOMOBILE ogged accounts sleep slyly above t 411537380 13885025 O 199527.43 1996-05-18 4-NOT SPECIFIED Clerk000033809 0 ronic requests sleep slyly.
1 13885039 Customer013885039 uME8pef0iTDauXB,MWl VK56Q9T 23 33-756-622-9157 6859.99 AUTOMOBILE ckages are along the carefully 356399233 13885039 F 125630.09 1994-03-04 1-URGENT Clerk000056446 0 instructions are ironic excuses. unusual, bold...
2 13885127 Customer013885127 SyUZUT9KUP4S7eBg3aN6oE 24 34-798-434-7839 7572.26 FURNITURE le slyly grouches. regular dependencies despit... 23341472 13885127 O 188930.45 1995-12-12 1-URGENT Clerk000027310 0 ages. carefully quick asym
3 13885162 Customer013885162 fSQLEymWgpYCm 0 10-412-546-1921 3522.57 BUILDING bold, special asymptotes boost unusual packag... 539010085 13885162 O 211823.20 1995-12-02 3-MEDIUM Clerk000026175 0 requests nag blithely silent pinto beans. slyly
4 13885192 Customer013885192 Tv7Off82RTuc0EQK18NyquXppnGhoHMXgaris4E 17 27-230-872-7428 8792.09 MACHINERY lly. blithely regular packages are always. flu... 334752419 13885192 F 164490.44 1994-11-05 2-HIGH Clerk000031240 0 ites integrate after the slyly unusual account...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
499995 6734917 Customer006734917 fZU2IYF36md 15 25-846-890-1785 1446.16 AUTOMOBILE kly about the foxes. slyly furious instruction... 120997858 6734917 O 194310.36 1996-08-30 3-MEDIUM Clerk000094421 0 he pending requests. pending dol
499996 6734950 Customer006734950 S 6MMlaUQl0S3RWvowKMmkWwNsZo 6 16-486-258-1685 7244.71 MACHINERY es. furiously bold asymptotes affix furiously ... 326034115 6734950 F 241537.75 1992-05-28 5-LOW Clerk000095347 0 requests. ideas are after the blithely silent d
499997 6734953 Customer006734953 JK98pi9hSGFHrSZi7H 21 31-422-460-9577 4066.42 HOUSEHOLD final, regular requests wake blithely careful... 346830726 6734953 P 189769.51 1995-06-02 2-HIGH Clerk000097403 0 o nag carefully ironically ironic excuses? car...
499998 6734975 Customer006734975 FDxHc31TP0WQmyYCgDs9z 3 13-250-121-7346 6673.06 BUILDING nts! furiously ironic accounts within the furio 84176453 6734975 O 167591.78 1995-11-28 2-HIGH Clerk000094241 0 platelets at the ironic ideas print
499999 6735002 Customer006735002 scG9vo6cJArfr 20 30-468-868-9026 9117.82 HOUSEHOLD es. slyly regular accounts integrate c 54334753 6735002 O 177607.25 1996-04-01 3-MEDIUM Clerk000049829 0 blithely. theodolites haggle furiously
500000 rows x 17 columns
time: 1.92 s
pandas’ support for datetime operations — especially since it was originally designed for use in financial time-series analysis — is exceptional. Now you can get the same support on DuckDB.
dt_index_df = joined.set_index("O_ORDERDATE")
dt_index_df
C_CUSTKEY C_NAME C_ADDRESS C_NATIONKEY C_PHONE C_ACCTBAL C_MKTSEGMENT C_COMMENT O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT
O_ORDERDATE
1996-05-18 13885025 Customer013885025 dI1vjji2UTIEg7a ROATLoYjhb2GtgTM3gw0 17 27-231-844-3541 6307.11 AUTOMOBILE ogged accounts sleep slyly above t 411537380 13885025 O 199527.43 4-NOT SPECIFIED Clerk000033809 0 ronic requests sleep slyly.
1994-03-04 13885039 Customer013885039 uME8pef0iTDauXB,MWl VK56Q9T 23 33-756-622-9157 6859.99 AUTOMOBILE ckages are along the carefully 356399233 13885039 F 125630.09 1-URGENT Clerk000056446 0 instructions are ironic excuses. unusual, bold...
1995-12-12 13885127 Customer013885127 SyUZUT9KUP4S7eBg3aN6oE 24 34-798-434-7839 7572.26 FURNITURE le slyly grouches. regular dependencies despit... 23341472 13885127 O 188930.45 1-URGENT Clerk000027310 0 ages. carefully quick asym
1995-12-02 13885162 Customer013885162 fSQLEymWgpYCm 0 10-412-546-1921 3522.57 BUILDING bold, special asymptotes boost unusual packag... 539010085 13885162 O 211823.20 3-MEDIUM Clerk000026175 0 requests nag blithely silent pinto beans. slyly
1994-11-05 13885192 Customer013885192 Tv7Off82RTuc0EQK18NyquXppnGhoHMXgaris4E 17 27-230-872-7428 8792.09 MACHINERY lly. blithely regular packages are always. flu... 334752419 13885192 F 164490.44 2-HIGH Clerk000031240 0 ites integrate after the slyly unusual account...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1996-08-30 6734917 Customer006734917 fZU2IYF36md 15 25-846-890-1785 1446.16 AUTOMOBILE kly about the foxes. slyly furious instruction... 120997858 6734917 O 194310.36 3-MEDIUM Clerk000094421 0 he pending requests. pending dol
1992-05-28 6734950 Customer006734950 S 6MMlaUQl0S3RWvowKMmkWwNsZo 6 16-486-258-1685 7244.71 MACHINERY es. furiously bold asymptotes affix furiously ... 326034115 6734950 F 241537.75 5-LOW Clerk000095347 0 requests. ideas are after the blithely silent d
1995-06-02 6734953 Customer006734953 JK98pi9hSGFHrSZi7H 21 31-422-460-9577 4066.42 HOUSEHOLD final, regular requests wake blithely careful... 346830726 6734953 P 189769.51 2-HIGH Clerk000097403 0 o nag carefully ironically ironic excuses? car...
1995-11-28 6734975 Customer006734975 FDxHc31TP0WQmyYCgDs9z 3 13-250-121-7346 6673.06 BUILDING nts! furiously ironic accounts within the furio 84176453 6734975 O 167591.78 2-HIGH Clerk000094241 0 platelets at the ironic ideas print
1996-04-01 6735002 Customer006735002 scG9vo6cJArfr 20 30-468-868-9026 9117.82 HOUSEHOLD es. slyly regular accounts integrate c 54334753 6735002 O 177607.25 3-MEDIUM Clerk000049829 0 blithely. theodolites haggle furiously
500000 rows x 16 columns
time: 907 ms
How about some longer running operations? This next one allows for a rather involved datetime operation, resample, that standardizes the sampling frequency of the data.
dt_index_df.resample("4W")["O_TOTALPRICE"].mean()
O_ORDERDATE
1992-01-05 150849.248284
1992-04-26 151684.353106
1992-05-24 150648.910094
1994-02-27 151260.644243
1994-06-19 151104.287942
1994-07-17 150348.825665
1994-08-14 150693.089773
1994-09-11 151487.427604
1994-10-09 150433.473222
1994-11-06 151049.277947
1994-12-04 153190.038529
1995-01-01 150739.200809
1995-01-29 150783.800724
1995-02-26 151288.344218
1995-03-26 150063.333556
1995-05-21 151170.144389
1995-06-18 151102.866605
1995-07-16 150971.583738
1995-08-13 151443.335792
1995-11-05 151015.251253
1995-12-03 150644.367542
1996-01-28 151879.618669
1996-02-25 153108.216507
1996-03-24 150530.048388
1996-04-21 150544.570320
1996-05-19 152791.916291
1996-07-14 153617.900821
1996-08-11 151389.791442
1996-11-03 150459.415614
1996-12-29 152999.267568
1997-01-26 150501.075554
1997-02-23 150909.408596
Name: O_TOTALPRICE, dtype: float64
time: 3.41 s
We already saw that we can support linear algebra operations. We can also more explicitly support NumPy-style matrix operations too:
arr = pd.get_dummies(df, columns="C_MKTSEGMENT").select_dtypes("number").to_numpy()
time: 47 ms
If we check the type and shape, we can see that it’s a Modin NumPy array with 15 million rows:
type(arr)
modin.numpy.arr.array
arr.shape
(15000000, 8)
Ponder with DuckDB v. vanilla pandas
OK so great, you’ve seen how you can translate pandas and NumPy to DuckDB via Ponder. But why not just use vanilla pandas, and how does Ponder with DuckDB work anyway?
Why use Ponder with DuckDB?
Two reasons: Performance, and scale. We’ll talk about performance here, and explore scale in a later post.
Why does this improve performance? DuckDB is fast (you can read more about DuckDB’s performance v. pandas here), and through Ponder, you can use the pandas API on top of DuckDB.
For example, pandas takes 11.8 seconds to create a dataframe from a file with about 10 million rows:
import pandas
df = pandas.read_csv("orders_big.csv")
df
O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERDATE O_ORDERPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT
0 247085090 5053234 O 26421.99 1996-03-11 4-NOT SPECIFIED Clerk000022806 0 special, even accounts. slyly regular requests
1 246557251 5781565 O 150232.09 1996-03-11 2-HIGH Clerk000096944 0 gular accounts sleep furiously close theodolit...
2 247971940 1066733 O 274524.17 1996-03-11 3-MEDIUM Clerk000079674 0 to beans sleep above the furiously silent depe...
3 246548960 13399567 O 226992.85 1996-03-11 3-MEDIUM Clerk000079967 0 beans? idle foxes engage slyly even accounts. ...
4 247110821 11027485 O 211890.95 1996-03-11 2-HIGH Clerk000074934 0 sleep even excuses. furious
... ... ... ... ... ... ... ... ... ...
9999995 184305477 6530809 O 186301.68 1995-08-19 4-NOT SPECIFIED Clerk000015395 0 l deposits. even, final asymptotes cajole
9999996 185794656 10073543 O 138755.17 1995-08-19 4-NOT SPECIFIED Clerk000024626 0 ts. quickly regular accounts s
9999997 184303879 1424971 O 233532.34 1995-08-19 2-HIGH Clerk000003942 0 pecial ideas. instructions sleep carefully acr...
9999998 185796034 5808431 O 105537.88 1995-08-19 5-LOW Clerk000037482 0 ructions wake: carefully even packages serve b...
9999999 184299360 2022085 O 305221.11 1995-08-19 4-NOT SPECIFIED Clerk000090622 0 ymptotes-- final, unusual accounts sleep slyly...
10000000 rows × 9 columns
time: 11.8 s
Ponder on DuckDB gets the job done in 5.75 seconds. There’s similar speedups for other operations as well — DuckDB is able to make use of parallelism via multiple cores, sophisticated query optimization, and compression/vectorization to provide these benefits, all out of the box.
df = pd.read_csv("orders_big.csv")
df
O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERDATE O_ORDERPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT
0 247085090 5053234 O 26421.99 1996-03-11 4-NOT SPECIFIED Clerk000022806 0 special, even accounts. slyly regular requests
1 246557251 5781565 O 150232.09 1996-03-11 2-HIGH Clerk000096944 0 gular accounts sleep furiously close theodolit...
2 247971940 1066733 O 274524.17 1996-03-11 3-MEDIUM Clerk000079674 0 to beans sleep above the furiously silent depe...
3 246548960 13399567 O 226992.85 1996-03-11 3-MEDIUM Clerk000079967 0 beans? idle foxes engage slyly even accounts. ...
4 247110821 11027485 O 211890.95 1996-03-11 2-HIGH Clerk000074934 0 sleep even excuses. furious
... ... ... ... ... ... ... ... ... ...
9999995 184305477 6530809 O 186301.68 1995-08-19 4-NOT SPECIFIED Clerk000015395 0 l deposits. even, final asymptotes cajole
9999996 185794656 10073543 O 138755.17 1995-08-19 4-NOT SPECIFIED Clerk000024626 0 ts. quickly regular accounts s
9999997 184303879 1424971 O 233532.34 1995-08-19 2-HIGH Clerk000003942 0 pecial ideas. instructions sleep carefully acr...
9999998 185796034 5808431 O 105537.88 1995-08-19 5-LOW Clerk000037482 0 ructions wake: carefully even packages serve b...
9999999 184299360 2022085 O 305221.11 1995-08-19 4-NOT SPECIFIED Clerk000090622 0 ymptotes-- final, unusual accounts sleep slyly...
10000000 rows x 9 columns
time: 5.75 s
How does Ponder work with DuckDB under the hood?
To the end user, coding in pandas with Ponder on DuckDB might at first feel indistinguishable from normal pandas (with the exception that it’s faster), but under the hood, the two are completely different. Ponder on DuckDB relies on DuckDB for storage and compute. You can think of Ponder as letting users use a pandas API on top of data warehouses.

Moving between the pandas API and SQL
With Ponder on DuckDB, it’s also very easy to persist your dataframe as a DuckDB table and query it with SQL. To save the dataframe, all you need to do is use the pandas to_sql command:
df.to_sql("New Orders", index=False, con=duckdb_con)
time: 3.16 s
Now you can work with it using SQL directly, via DuckDB. When we pull all table names, you can see the “New Orders” table listed among the results:
duckdb_con.execute('SELECT * FROM duckdb_tables;').df()
database_name database_oid schema_name schema_oid table_name table_oid internal temporary has_primary_key estimated_size column_count index_count check_constraint_count sql
0 PONDER_DEMO 4 main 856 CUSTOMER 862 False False False 15000000 8 0 0 CREATE TABLE "CUSTOMER"("C_CUSTKEY" INTEGER, ...
1 PONDER_DEMO 4 main 856 ORDERS_1M 864 False False False 1000000 9 0 0 CREATE TABLE "ORDERS_1M"("O_ORDERKEY" INTEGER,...
2 PONDER_DEMO 4 main 856 CUSTOMER_TEMP 866 False False False 15000000 8 0 0 CREATE TABLE "CUSTOMER_TEMP"("C_CUSTKEY" INTEG...
3 PONDER_DEMO 4 main 856 ORDERS_1M_TEMP 868 False False False 1000000 9 0 0 CREATE TABLE "ORDERS_1M_TEMP"("O_ORDERKEY" INT...
4 PONDER_DEMO 4 main 856 New Orders 1435 False False False 10000000 9 0 0 CREATE TABLE "New Orders"("O_ORDERKEY" BIGINT,...
5 temp 1307 main 1308 PONDER_OJTYPFNERN 1318 False True False 10000000 11 0 0 CREATE TABLE "PONDER_OJTYPFNERN"("O_ORDERKEY" ...
time: 78 ms
And finally, you can work with it again using the pandas API through the read_sql command:
new_orders = pd.read_sql("New Orders", con=duckdb_con)
new_orders
O_ORDERKEY O_CUSTKEY O_ORDERSTATUS O_TOTALPRICE O_ORDERDATE O_ORDERPRIORITY O_CLERK O_SHIPPRIORITY O_COMMENT
0 247085090 5053234 O 26421.99 1996-03-11 4-NOT SPECIFIED Clerk000022806 0 special, even accounts. slyly regular requests
1 246557251 5781565 O 150232.09 1996-03-11 2-HIGH Clerk000096944 0 gular accounts sleep furiously close theodolit...
2 247971940 1066733 O 274524.17 1996-03-11 3-MEDIUM Clerk000079674 0 to beans sleep above the furiously silent depe...
3 246548960 13399567 O 226992.85 1996-03-11 3-MEDIUM Clerk000079967 0 beans? idle foxes engage slyly even accounts. ...
4 247110821 11027485 O 211890.95 1996-03-11 2-HIGH Clerk000074934 0 sleep even excuses. furious
... ... ... ... ... ... ... ... ... ...
9999995 184305477 6530809 O 186301.68 1995-08-19 4-NOT SPECIFIED Clerk000015395 0 l deposits. even, final asymptotes cajole
9999996 185794656 10073543 O 138755.17 1995-08-19 4-NOT SPECIFIED Clerk000024626 0 ts. quickly regular accounts s
9999997 184303879 1424971 O 233532.34 1995-08-19 2-HIGH Clerk000003942 0 pecial ideas. instructions sleep carefully acr...
9999998 185796034 5808431 O 105537.88 1995-08-19 5-LOW Clerk000037482 0 ructions wake: carefully even packages serve b...
9999999 184299360 2022085 O 305221.11 1995-08-19 4-NOT SPECIFIED Clerk000090622 0 ymptotes-- final, unusual accounts sleep slyly...
10000000 rows x 9 columns
time: 1.03 s
Conclusion
As you can see through the demo above, you can run pandas operations on DuckDB seamlessly, while leveraging the parallel execution of DuckDB.
The best part is, if you decide to move to some other database (on premise or on cloud), all your code remains EXACTLY the same. (The only thing that changes is the connection.) That’s the magic of Ponder: we speak both pandas and SQL so your databases (and you) don’t have to!
If you’re interested in trying Ponder, reach out here!