alamb commented on issue #14851: URL: https://github.com/apache/datafusion/issues/14851#issuecomment-2679009002
Thanks @EmilyMatt -- I agree with the seeming goal of this ticket that when operating in a memory constrained environment it is best to avoid erroring during the spilling process itself > It's possible that there are other design ideas taken into consideration here, but this is my belief at least. > Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()? I think another classic solution to this is to reduce the memory requirements (though then also decrease the performance) using a mutli-pass merge, as described on https://github.com/apache/datafusion/issues/14692#issue-2855923565 There is some overlap with the discussions that @2010YOUY01 has been having on this ticket (though that one is related to sorting) - https://github.com/apache/datafusion/issues/14692 So it may be that your idea of improving the reservation calls can work / improve your usecase However, to solve the problem of "being able to merge in an arbitrary amount of data / spill files given limited memory" will require multi-level merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
