Nachiket-Roy opened a new pull request, #19695:
URL: https://github.com/apache/datafusion/pull/19695

   ## Which issue does this PR close?
   - Closes #19679 
   
   ## Rationale for this change
   
   External sort previously assumed that, under memory pressure, there would 
always be buffered in-memory batches available to spill before sorting. This 
assumption breaks down when a single oversized `RecordBatch` arrives and cannot 
be fully sorted in memory, while no other buffered batches exist to spill 
first. In this scenario, the sorter could fail with an out-of-memory error or 
violate expected output batch sizing. This PR adds a safe fallback that allows 
external sort to make progress without unbounded memory growth.
   
   ## What changes are included in this PR?
   This PR introduces a chunked spill fallback for oversized batches under 
memory pressure:
   - Adds a new helper `sort_and_spill_large_batch()` that:
     - Sorts a single oversized batch once
     - Splits the sorted output into `batch_size`-sized chunks
     - Incrementally appends these chunks to a single spill file
   
   - Integrates this helper at the memory-reservation boundary to handle the 
case where:
     - Memory cannot be reserved
     - No buffered batches are available to spill
     - The input batch exceeds the configured `batch_size`
   
   - Ensures:
     - Correct ordering is preserved
     - Output batches respect `batch_size`
     - Memory is released eagerly
     - No async recursion or API changes are introduced
   
   ## Are these changes tested?
   
   Yes.
   No new tests were added, as this change is fully covered by existing tests 
that already exercise external sort spilling behavior.
   
   ## Are there any user-facing changes?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to