Skip to main content
Ctrl+K

Hendrik Makait

  • About
  • Talks
  • Blog
  • Email
  • GitHub
  • LinkedIn
  • X
  • Atom Feed
  • About
  • Talks
  • Blog
  • Email
  • GitHub
  • LinkedIn
  • X
  • Atom Feed
Hendrik Makait
Data Systems | Open Source | Distributed Computing
I'm an OSS engineer focused on scalable data and ML systems and a core maintainer of Dask.
  • Posts tagged p2p

Posts tagged p2p

Shuffling large data at constant memory in Dask

  • 15 March 2023
  • Hendrik Makait
  • Post
  • p2p shuffling dask distributed

With release 2023.2.1, dask.dataframe introduces a new shuffling method called P2P, making sorts, merges, and joins faster and using constant memory. Benchmarks show impressive improvements:

P2P shuffling uses constant memory while task-based shuffling scales linearly.

Read more ...


© Copyright Hendrik Makait.