Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2850176179
Hi @alamb , I have addressed most points mentioned in code review. And the new added sql in `extended.sql` I think can show the improvement has been merged. I checked it in my local, and it got improvment as expected (see Q7). ``` -------------------- Benchmark clickbench_extended.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ intermeidate-result-blocked-approach ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 0 │ 1963.28ms │ 1947.08ms │ no change │ │ QQuery 1 │ 1180.94ms │ 1135.65ms │ no change │ │ QQuery 2 │ 2420.21ms │ 2372.42ms │ no change │ │ QQuery 3 │ 968.18ms │ 1091.04ms │ 1.13x slower │ │ QQuery 4 │ 2955.90ms │ 2975.36ms │ no change │ │ QQuery 5 │ 32623.33ms │ 32108.79ms │ no change │ │ QQuery 6 │ 5201.07ms │ 5205.19ms │ no change │ │ QQuery 7 │ 7865.80ms │ 7237.53ms │ +1.09x faster │ └──────────────┴────────────┴──────────────────────────────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ Total Time (main) │ 55178.69ms │ │ Total Time (intermeidate-result-blocked-approach) │ 54073.06ms │ │ Average Time (main) │ 6897.34ms │ │ Average Time (intermeidate-result-blocked-approach) │ 6759.13ms │ │ Queries Faster │ 1 │ │ Queries Slower │ 1 │ │ Queries with No Change │ 6 │ └─────────────────────────────────────────────────────┴────────────┘ ``` > My big concern is that we have a plan to eventually avoid both single and multi blocked management And here is my thoughts about how we remove `single block management` code path eventually: - The hard point is that `streaming aggreagtion` need `Emit::First(n)`, and as I see we should not support `Emit::First(n)` (mainly performance reason, it is expansive and complex now, and even more expansive and complex in `blocked management`). - However, I think it may be possible to remove the usage of `Emit::First(n)`, I am considering how to make it. And if we made it, the `single block management` path can be removed after we implement `blocked management` for all exists `GroupsAccumulator` and `GroupValues`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org