Shuffling Large Data at Constant Memory in Dask | Dask Demo Day 2023-03#
Abstract#
Debugging is hard. Distributed debugging is hell.
Dask is a popular library for parallel and distributed computing in Python.
In this demo, we showcase the recent scalability and performance improvements in the dask.dataframe
API that were enabled by my work on the new P2P shuffling system.
Shuffling large data at constant memory in Dask
Observability for Distributed Computing with Dask | PyCon DE 2023