Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2850176179

   Hi @alamb , I have addressed most points mentioned in code review.
   And the new added sql in `extended.sql` I think can show the improvement has 
been merged. I checked it in my local, and it got improvment as expected (see 
Q7).
   ```
   --------------------
   Benchmark clickbench_extended.json
   --------------------
   
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃       main ┃ intermeidate-result-blocked-approach ┃        
Change ┃
   
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 0     │  1963.28ms │                            1947.08ms │     no 
change │
   │ QQuery 1     │  1180.94ms │                            1135.65ms │     no 
change │
   │ QQuery 2     │  2420.21ms │                            2372.42ms │     no 
change │
   │ QQuery 3     │   968.18ms │                            1091.04ms │  1.13x 
slower │
   │ QQuery 4     │  2955.90ms │                            2975.36ms │     no 
change │
   │ QQuery 5     │ 32623.33ms │                           32108.79ms │     no 
change │
   │ QQuery 6     │  5201.07ms │                            5205.19ms │     no 
change │
   │ QQuery 7     │  7865.80ms │                            7237.53ms │ +1.09x 
faster │
   
└──────────────┴────────────┴──────────────────────────────────────┴───────────────┘
   ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ Benchmark Summary                                   ┃            ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
   │ Total Time (main)                                   │ 55178.69ms │
   │ Total Time (intermeidate-result-blocked-approach)   │ 54073.06ms │
   │ Average Time (main)                                 │  6897.34ms │
   │ Average Time (intermeidate-result-blocked-approach) │  6759.13ms │
   │ Queries Faster                                      │          1 │
   │ Queries Slower                                      │          1 │
   │ Queries with No Change                              │          6 │
   └─────────────────────────────────────────────────────┴────────────┘
   ```
   
   
   > My big concern is that we have a plan to eventually avoid both single and 
multi blocked management
   
   And here is my thoughts about how we remove `single block management` code 
path eventually:
   - The hard point is that `streaming aggreagtion` need `Emit::First(n)`, and 
as I see we should not support `Emit::First(n)` (mainly performance reason, it 
is expansive and complex now, and even more expansive and complex in `blocked 
management`).
   - However, I think it may be possible to remove the usage of 
`Emit::First(n)`, I am considering how to make it. And if we made it, the 
`single block management` path can be removed after we implement `blocked 
management` for all exists `GroupsAccumulator` and `GroupValues`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to