Modin 0.19: NumPy, Welcome to the Modin Family!

Mar 16, 2023 3 min read

News
Modin 0.19: NumPy, Welcome to the Modin Family! image

Modin, the scalable drop-in replacement for pandas, is in its final year as a teenager (0.19), and it’s decided to make this a year to remember by welcoming NumPy to the Modin fold. That’s right, Modin now supports many NumPy API functions, letting you seamlessly distribute your NumPy compute.

Modin NumPy: Numerical Computing at Scale by Changing a Single Line of Code

NumPy isn't natively multithreaded. But now, once you pip install --upgrade modin, you can just replace import numpy as np with import modin.numpy as np and run your NumPy code in parallel without any extra work on your part.

For more on why supporting both distributed pandas and NumPy is important, see here.

Modin NumPy lets you perform operations like:

  • Element-wise matrix operations such as addition, subtraction, multiplication, division, power
  • Axis-collapsing or reducing operations such as min, max, sum, product, mean
  • Multi-array operations such as maximum or minimum
  • And many others, such as where, ravel, and transpose
We won't demonstrate the new functionality in a comprehensive way here, but to show some of what modin.numpy can do, let's generate a 10 x 3 array of random integers between 1 and 100, like the following:
arr
array([[82, 15,  4],
       [95, 36, 32],
       [29, 18, 95],
       [14, 87, 95],
       [70, 12, 76],
       [55,  5,  4],
       [12, 28, 30],
       [65, 78,  4],
       [72, 26, 92],
       [84, 90, 70]])

If we check the type, we see that this is a Modin NumPy array:

type(arr)
modin.numpy.arr.array

We can do element-wise multiplication:

arr * arr
array([[6724,  225,   16],
       [9025, 1296, 1024],
       [ 841,  324, 9025],
       [ 196, 7569, 9025],
       [4900,  144, 5776],
       [3025,   25,   16],
       [ 144,  784,  900],
       [4225, 6084,   16],
       [5184,  676, 8464],
       [7056, 8100, 4900]])

Or matrix multiplication:

arr.T @ arr
array([[41320, 22343, 26117],
       [22343, 25227, 21963],
       [26117, 21963, 39162]])
(We'd get the same results with arr.T.dot(arr))
We can perform functions like np.ones_like, which generates an array of ones of the same shape as the input array:
np.ones_like(arr)
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])
We can sum the entire array, or sum across either axis:
np.sum(arr)
1475
np.sum(arr,axis=1)
array([101, 163, 142, 196, 158,  64,  70, 147, 190, 244])
np.sum(arr,axis=0)
array([578, 395, 502])
We can even take the exponent of every element:
np.exp(arr)
array([[4.09399696e+35, 3.26901737e+06, 5.45981500e+01],
       [1.81123908e+41, 4.31123155e+15, 7.89629602e+13],
       [3.93133430e+12, 6.56599691e+07, 1.81123908e+41],
       [1.20260428e+06, 6.07603023e+37, 1.81123908e+41],
       [2.51543867e+30, 1.62754791e+05, 1.01480039e+33],
       [7.69478527e+23, 1.48413159e+02, 5.45981500e+01],
       [1.62754791e+05, 1.44625706e+12, 1.06864746e+13],
       [1.69488924e+28, 7.49841700e+33, 5.45981500e+01],
       [1.85867175e+31, 1.95729609e+11, 9.01762841e+39],
       [3.02507732e+36, 1.22040329e+39, 2.51543867e+30]])

Please try out the Modin NumPy API and give us feedback on Github — What works well, what additional APIs would you love to see, etc. We’re eager to hear from you!

And if you’d like to try it out, here’s a copy-pastable version of all the code above that you can use to get started!

import modin.numpy as np

arr = np.array([[82, 15,  4],
                [95, 36, 32],
                [29, 18, 95],
                [14, 87, 95],
                [70, 12, 76],
                [55,  5,  4],
                [12, 28, 30],
                [65, 78,  4],
                [72, 26, 92],
                [84, 90, 70]])
type(arr)
arr * arr
arr.T @ arr
np.ones_like(arr)
np.sum(arr)
np.sum(arr,axis=1)
np.sum(arr,axis=0)
np.exp(arr)

Other Improvements in the 0.19 Release:

The 0.19 release included other improvements as well:

  • Over 50 stability and bug fixes
  • Over 25 performance enhancements
  • Many documentation updates, testing improvements, and other feature improvements

For more details, check out the release notes here. And thank you to the 14 Modin developers who contributed to this release!

Modin is 4.5 years in the making, has been downloaded almost 6 million times, and has ~8.5K GitHub Stars.

To stay up-to-date on all things Pandas / Python / Data Science, and to watch Modin enter its twenties (0.20), follow Modin on Twitter, join the Modin Slack community, and star Modin on Github!

Ready to level up your Pandas game?

Try Ponder now