Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

via GitHub Sat, 12 Apr 2025 03:50:01 -0700


2010YOUY01 commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798782989


   I didn't implement the parallel merge optimization for now, my major concern 
is: this optimization requires one extra configuration, and users have to learn 
and correctly set 2 configs for each individual query, to enable the most 
efficient cascaded spill merge execution (see the below `intended optimization` 
section for what's those 2 configs), which is not ideal.
   So I'd like to defer the implementation for a bit, to think about if there 
are any simpler approaches (or maybe by collecting stats internally and 
auto-tune those related configs)
   Also, I think the current implementation is good enough to cover common 
cases (I did a rough estimation, sorting TPCH-SF1000 lineitem table with 16GB 
of memory only requires one round of re-spill)
   
   Here is the optimization that I originally thought about, I'll put them into 
a separate issue if it makes sense.
   ### Example scenario
   For one partition's `SortExec`, 100 runs are spilled, and we set 
`spill_max_spill_merge_degree` to 4
   ### Current Implementation
   
![image](https://github.com/user-attachments/assets/2d4bba36-2b9c-4fc1-8eac-780900dc1fb8)
   Each time it merges 4 existing spills into one combined spill file, until 
there are <= 4 spills total, the final result can be produced.
   For each entry, the number of re-spill will be $floor(\log_4 100)$ = 3
   
   ### Intended optimization
   If the memory pool is enough to hold more buffers at a time (while 
`spill_max_spill_merge_degree` is still limited to 4, in case the merge-degree 
is too large and hurt performance in some cases)
   One additional config will be introduced `sort_buffer_batch_capacity`, and 
set to `16` in the above example, the execution will look like:
   
![image](https://github.com/user-attachments/assets/5ed8e884-4f45-4abd-b27c-bea6fce8f46d)
   Then, inside each merge step, 16 spill files will be combined and re-spill. 
Each entry only need to be re-spilled for $floor(\log_{16} 100)$ = 1 time.
   With this approach, we can achieve an optimal re-spill count, and also 
enable parallel merge.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

Reply via email to