Re: [I] Fixed blocks in aggregation state for better memory management [datafusion]

via GitHub Sat, 10 Jan 2026 02:13:36 -0800


Rachelint commented on issue #19649:
URL: https://github.com/apache/datafusion/issues/19649#issuecomment-3732331211


   As I see, there are actually 2 issues `support blocked aggr state` and 
`improve aggr code readability`.
   And maybe we should not mix `adding new feature` and  `refactoring` into one 
work, that may add complexity.
   
   Code changes of #15591 in `GroupedHashAggregateStream` are not large (176 
lines, and many lines are comments). I think we can just add it firstly.
   
   And secondly, what we need may be to refactor the 
`GroupedHashAggregateStream` totally. During working in aggr, I found the 
complexity of `GroupedHashAggregateStream` is due to we make `only one stream 
for the whole aggr`, rather than `one stream for partial aggr and another 
stream for final aggr`.  And it lead to that we are forced to mix codes of 
`partial aggr` and `final aggr` in `GroupedHashAggregateStream`, and finally 
many many code branches appear. 
   
   I think if we just fork a new stream rather than totally rewriting to split 
the `partial and final codes`, I am concerned it will become something like 
current `GroupedHashAggregateStream`
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Fixed blocks in aggregation state for better memory management [datafusion]

Reply via email to