Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2890663260
> Me too -- I looked at the flamegraph you provided and I agree it seems like almost half the allocation time is spent with pagefaults / zeroing memory. However, I can't tell if that is because there is slowness with the underlying Vec that wasn't initialized or if there is something else going on. I think I nearly understand why about this, it is possibly be led by `lto`, `lto` seems found the initialization is unnecessary actually, so it remove it(just like calling `set_len` manually). > I suspect you already know this, but I think you can get back the original Vec from an array via Got it! > From those numbers, is it a fair assessment that the blocked approach improves performance when there is a large number of intermediate groups, but does not when there is a small number of groups? I think it maybe be possible in this situation? - When `input` for `accumualtor` and `group values` is consumed, we collect them, and transform them back to `Vec`. - Then we push them in `accumualtor` and `group values`. - Finally we reuse them in next round computation of `accumualtor` and `group values`? > From those numbers, is it a fair assessment that the blocked approach improves performance when there is a large number of intermediate groups, but does not when there is a small number of groups? Yes, in current implementation, it only help to performance with `large amount` of intermediate groups, because the `slice` will be called many many time and the cost become unacceptable. And for query with only small groups, it nearly make no difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org