Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888827007
I make a dedicated benchmark for `accumulate`. As in the total flamegraph, actually no improvement for `blocked accumulate`, although resizing cost is reduced (the query performance increased is my local, I guess it is due to `removing for expansive array slice`) . I guess the reasons may be: - We perform few large memory allocation in `flat`, but will much more small memory allocations in `blocked` - The memory no continuous anymore, I think it may be not so friendly for cpu(like cache prefetch?) ``` Flat accumulate time: [135.16 ms 135.20 ms 135.25 ms] change: [-1.9077% -1.8671% -1.8244%] (p = 0.00 < 0.05) Performance has improved. Blocked accumulate time: [139.89 ms 139.91 ms 139.92 ms] (even a bit slower) change: [+0.3130% +0.3246% +0.3359%] (p = 0.00 < 0.05) Change within noise threshold. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org