Re: [PR] Implement disk spilling for all grouping ordering modes in GroupedHashAggregateStream [datafusion]

via GitHub Tue, 16 Dec 2025 15:03:53 -0800


rluvaton commented on code in PR #19287:
URL: https://github.com/apache/datafusion/pull/19287#discussion_r2625034881



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1060,18 +1084,26 @@ impl GroupedHashAggregateStream {
         Ok(Some(batch))
     }
 
+    /// Determines if `spill_state_if_oom` can free up memory by spilling 
state to disk
+    fn can_spill_on_oom(&self) -> bool {
+        match self.oom_mode {
+            // For spill mode, only spill if we are not already reading back 
spilled state
+            OutOfMemoryMode::Spill => {
+                !self.group_values.is_empty() && 
!self.spill_state.is_stream_merging
+            }
+            // For emit early mode, never spill
+            OutOfMemoryMode::EmitEarly => false,
+        }
+    }
+
     /// Optimistically, [`Self::group_aggregate_batch`] allows to exceed the 
memory target slightly
     /// (~ 1 [`RecordBatch`]) for simplicity. In such cases, spill the data to 
disk and clear the
     /// memory. Currently only [`GroupOrdering::None`] is supported for 
spilling.
-    fn spill_previous_if_necessary(&mut self, batch: &RecordBatch) -> 
Result<()> {
-        // TODO: support group_ordering for spilling
-        if !self.group_values.is_empty()
+    fn spill_state_if_oom(&mut self, batch: &RecordBatch) -> Result<()> {

Review Comment:
   I would expect the first thing to check if we have oom, and if yes and can 
spill than spill, and if have oom and cant spill, propegate the error and if no 
oom than nothing.
   
   but at the very least from the function name, is if we have oom that we cant 
spill, than propagate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Implement disk spilling for all grouping ordering modes in GroupedHashAggregateStream [datafusion]

Reply via email to