Skip to main content
Ctrl+K

Hendrik Makait

Site Navigation

  • About
  • Blog
  • Email
  • GitHub
  • LinkedIn
  • Twitter
  • Atom Feed

Site Navigation

  • About
  • Blog
  • Email
  • GitHub
  • LinkedIn
  • Twitter
  • Atom Feed

Recent Posts

  • 23 June - Dask performance benchmarking put to the test: Fixing a pandas bottleneck
  • 16 May - Observability for Distributed Computing with Dask
  • 15 March - Shuffling large data at constant memory in Dask
  • 17 October - Personalization versus ‘Filter Bubble’: The Influence of Personalization on the Quality of Search Queries
  • Posts...

Posts tagged shuffling

Shuffling large data at constant memory in Dask

  • 15 March 2023
  • Hendrik Makait
  • dask distributed shuffling p2p

With release 2023.2.1, dask.dataframe introduces a new shuffling method called P2P, making sorts, merges, and joins faster and using constant memory. Benchmarks show impressive improvements:

P2P shuffling uses constant memory while task-based shuffling scales linearly.

Read more ...


© Copyright 2018, Hendrik Makait.