Skip to main content
Ctrl+K

Hendrik Makait

  • About
  • Talks
  • Blog
  • Email
  • GitHub
  • LinkedIn
  • X
  • Atom Feed
  • About
  • Talks
  • Blog
  • Email
  • GitHub
  • LinkedIn
  • X
  • Atom Feed
Hendrik Makait
Data Systems | Open Source | Distributed Computing
I'm an OSS engineer focused on scalable data and ML systems and a core maintainer of Dask.
  • Shuffling Large Data at Constant Memory in Dask | Dask Demo Day 2023-03

Shuffling Large Data at Constant Memory in Dask | Dask Demo Day 2023-03#

Abstract#

Debugging is hard. Distributed debugging is hell.

Dask is a popular library for parallel and distributed computing in Python. In this demo, we showcase the recent scalability and performance improvements in the dask.dataframe API that were enabled by my work on the new P2P shuffling system.

Shuffling large data at constant memory in Dask   Observability for Distributed Computing with Dask | PyCon DE 2023

© Copyright Hendrik Makait.