EmilyMatt opened a new issue, #14851: URL: https://github.com/apache/datafusion/issues/14851
### Describe the bug In the GroupedHashAggregateStream, update_memory_reservation() has a try_resize() call, an error returned from this function, is usually handled gracefully, as we are usually in the middle of emitting and can just emit again, or are accumulating and can spill(unless not even a single batch fits in memory, in which case this can be a justified panic). However, when creating a merge stream using the spill data, there is no viable error fallback, this early exits on error if the memory pool denied the reservation. Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()? This is the problematic line: https://github.com/apache/datafusion/blob/6c5f214643fe81affcad5f3aa9031f02c4390bf0/datafusion/physical-plan/src/aggregates/row_hash.rs#L1050 I believe in line 910 as well, `match self.update_memory_reservation() { // Here we can ignore `insufficient_capacity_err` because we will spill later, // but at least one batch should fit in the memory Err(DataFusionError::ResourcesExhausted(_)) if self.group_values.len() >= self.batch_size => { Ok(()) } other => other, }` This should check preemptively if we have a possibility of spilling here, and if not, it should be a resize(), not a try_resize() as update_memory_reservation() does. But this one actually has a possibility of not aborting so seems more agreeable. It's possible that there are other design ideas taken into consideration here, but this is my belief at least. ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
