pepijnve commented on code in PR #19287:
URL: https://github.com/apache/datafusion/pull/19287#discussion_r2625122633
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1060,18 +1084,26 @@ impl GroupedHashAggregateStream {
Ok(Some(batch))
}
+ /// Determines if `spill_state_if_oom` can free up memory by spilling
state to disk
+ fn can_spill_on_oom(&self) -> bool {
+ match self.oom_mode {
+ // For spill mode, only spill if we are not already reading back
spilled state
+ OutOfMemoryMode::Spill => {
+ !self.group_values.is_empty() &&
!self.spill_state.is_stream_merging
+ }
+ // For emit early mode, never spill
+ OutOfMemoryMode::EmitEarly => false,
+ }
+ }
+
/// Optimistically, [`Self::group_aggregate_batch`] allows to exceed the
memory target slightly
/// (~ 1 [`RecordBatch`]) for simplicity. In such cases, spill the data to
disk and clear the
/// memory. Currently only [`GroupOrdering::None`] is supported for
spilling.
- fn spill_previous_if_necessary(&mut self, batch: &RecordBatch) ->
Result<()> {
- // TODO: support group_ordering for spilling
- if !self.group_values.is_empty()
+ fn spill_state_if_oom(&mut self, batch: &RecordBatch) -> Result<()> {
Review Comment:
I had tried to retain the original structure of the code here. I agree with
you that it's not super clear. I'll see if I can improve this a bit further
tomorrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]