Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2890700369
> Yes, in current implementation, it only help to performance with large amount of intermediate groups, because the slice will be called many many time and the cost become unacceptable. And for query with only small groups, it nearly make no difference. Yeah I think that was expected. I think we should try to minimize the impact of this on low-cardinality cases (e.g. make sure they fit in one array, minimize the overhead of blocks)... > So after experiement, I think single vector + resizing is efficient enough actually... Yeah it is quite efficient, although problematic for large inputs * Offset out of bounds for utf8 / binary data. * Overallocation due to exponential allocation strategy So even with roughly the same performance I think we should still strive to make the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org