Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

via GitHub Mon, 19 May 2025 04:43:02 -0700


Dandandan commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2890700369


   > Yes, in current implementation, it only help to performance with large 
amount of intermediate groups, because the slice will be called many many time 
and the cost become unacceptable.
   And for query with only small groups, it nearly make no difference.
   
   Yeah I think that was expected.
   I think we should try to minimize the impact of this on low-cardinality 
cases (e.g. make sure they fit in one array, minimize the overhead of 
blocks)... 
   
   
   > So after experiement, I think single vector + resizing is efficient enough 
actually...
   
   Yeah it is quite efficient, although problematic for large inputs
   * Offset out of bounds for utf8 / binary data.
   * Overallocation due to exponential allocation strategy
   So even with roughly the same performance I think we should still strive to 
make the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

Reply via email to