Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888827007

   I make a dedicated benchmark for `accumulate`.
   
   As in the total flamegraph, actually no improvement for `blocked 
accumulate`, although resizing cost is reduced (the query performance increased 
is my local, I guess it is due to `removing for expansive array slice`) .
   
   I guess the reasons may be:
   
   - We perform few large memory allocation in `flat`, but will much more small 
memory allocations in `blocked`
   - The memory no continuous anymore, I think it may be not so friendly for 
cpu(like cache prefetch?)
   
   ```
   Flat accumulate         time:   [135.16 ms 135.20 ms 135.25 ms]
                           change: [-1.9077% -1.8671% -1.8244%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   Blocked accumulate      time:   [139.89 ms 139.91 ms 139.92 ms] (even a bit 
slower)
                           change: [+0.3130% +0.3246% +0.3359%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to