Modin, the scalable drop-in replacement for pandas, is in its final year as a teenager (0.19), and it’s decided to make this a year to remember by welcoming NumPy to the Modin fold. That’s right, Modin now supports many NumPy API functions, letting you seamlessly distribute your NumPy compute.
Modin NumPy: Numerical Computing at Scale by Changing a Single Line of Code
NumPy isn't natively multithreaded. But now, once youpip install --upgrade modin
, you can just replaceimport numpy as np
withimport modin.numpy as np
and run your NumPy code in parallel without any extra work on your part.
For more on why supporting both distributed pandas and NumPy is important, see here.
Modin NumPy lets you perform operations like:
- Element-wise matrix operations such as addition, subtraction, multiplication, division, power
- Axis-collapsing or reducing operations such as min, max, sum, product, mean
- Multi-array operations such as maximum or minimum
- And many others, such as where, ravel, and transpose
We won't demonstrate the new functionality in a comprehensive way here, but to show some of whatmodin.numpy
can do, let's generate a10 x 3
array of random integers between 1 and 100, like the following:
arr
array([[82, 15, 4],
[95, 36, 32],
[29, 18, 95],
[14, 87, 95],
[70, 12, 76],
[55, 5, 4],
[12, 28, 30],
[65, 78, 4],
[72, 26, 92],
[84, 90, 70]])
If we check the type, we see that this is a Modin NumPy array:
type(arr)
modin.numpy.arr.array
We can do element-wise multiplication:
arr * arr
array([[6724, 225, 16],
[9025, 1296, 1024],
[ 841, 324, 9025],
[ 196, 7569, 9025],
[4900, 144, 5776],
[3025, 25, 16],
[ 144, 784, 900],
[4225, 6084, 16],
[5184, 676, 8464],
[7056, 8100, 4900]])
Or matrix multiplication:
arr.T @ arr
array([[41320, 22343, 26117],
[22343, 25227, 21963],
[26117, 21963, 39162]])
(We'd get the same results with arr.T.dot(arr)
)
We can perform functions like np.ones_like
, which generates an array of ones of the same shape as the input array:
np.ones_like(arr)
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
We can sum the entire array, or sum across either axis:
np.sum(arr)
1475
np.sum(arr,axis=1)
array([101, 163, 142, 196, 158, 64, 70, 147, 190, 244])
np.sum(arr,axis=0)
array([578, 395, 502])
We can even take the exponent of every element:
np.exp(arr)
array([[4.09399696e+35, 3.26901737e+06, 5.45981500e+01],
[1.81123908e+41, 4.31123155e+15, 7.89629602e+13],
[3.93133430e+12, 6.56599691e+07, 1.81123908e+41],
[1.20260428e+06, 6.07603023e+37, 1.81123908e+41],
[2.51543867e+30, 1.62754791e+05, 1.01480039e+33],
[7.69478527e+23, 1.48413159e+02, 5.45981500e+01],
[1.62754791e+05, 1.44625706e+12, 1.06864746e+13],
[1.69488924e+28, 7.49841700e+33, 5.45981500e+01],
[1.85867175e+31, 1.95729609e+11, 9.01762841e+39],
[3.02507732e+36, 1.22040329e+39, 2.51543867e+30]])
Please try out the Modin NumPy API and give us feedback on Github — What works well, what additional APIs would you love to see, etc. We’re eager to hear from you!
And if you’d like to try it out, here’s a copy-pastable version of all the code above that you can use to get started!
import modin.numpy as np
arr = np.array([[82, 15, 4],
[95, 36, 32],
[29, 18, 95],
[14, 87, 95],
[70, 12, 76],
[55, 5, 4],
[12, 28, 30],
[65, 78, 4],
[72, 26, 92],
[84, 90, 70]])
type(arr)
arr * arr
arr.T @ arr
np.ones_like(arr)
np.sum(arr)
np.sum(arr,axis=1)
np.sum(arr,axis=0)
np.exp(arr)
Other Improvements in the 0.19 Release:
The 0.19 release included other improvements as well:
- Over 50 stability and bug fixes
- Over 25 performance enhancements
- Many documentation updates, testing improvements, and other feature improvements
For more details, check out the release notes here. And thank you to the 14 Modin developers who contributed to this release!
Modin is 4.5 years in the making, has been downloaded almost 6 million times, and has ~8.5K GitHub Stars.
To stay up-to-date on all things Pandas / Python / Data Science, and to watch Modin enter its twenties (0.20), follow Modin on Twitter, join the Modin Slack community, and star Modin on Github!