Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-04-20 Thread via GitHub
Rachelint commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2817566333 > However, I found the performance won't be better for Clickbench queries 4 and 7. I think it may be possible that the test queries can't reflect the improvement well.

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-04-20 Thread via GitHub
goldmedal commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2817168799 I have another implementation for this issue https://github.com/goldmedal/datafusion/pull/4 The concept is that getting the row according to indices in the selection vecto

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-04-05 Thread via GitHub
Dandandan commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2767253437 > I'm considering another approach. Maybe I shouldn't use filter_record_batch 🤔. It filters the all column iteratly. I should filter the row when the accumulator merge_batch 🤔

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-04-01 Thread via GitHub
Rachelint commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2770996279 > I'm considering another approach. Maybe I shouldn't use filter_record_batch 🤔. It filters the all column iteratly. I should filter the row when the accumulator merge_batch 🤔

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-03-31 Thread via GitHub
goldmedal commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2766751553 I'm considering another approach. Maybe I shouldn't use `filter_record_batch` 🤔. It filters the all column iteratly. I should filter the row when the accumulator `update_batch

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-03-31 Thread via GitHub
zebsme commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2766784298 > I'm considering another approach. Maybe I shouldn't use `filter_record_batch` 🤔. It filters the all column iteratly. I should filter the row when the accumulator `merge_batch`

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-03-31 Thread via GitHub
goldmedal commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2766551662 Based on https://github.com/goldmedal/datafusion/pull/3, I did the some benchmarks(`clieckbench_1`, `h2o_medium`) for it. `feat_zero-copy-hash-agg-false` is the branch that

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-03-29 Thread via GitHub
goldmedal commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2763293892 @Dandandan I have a draft https://github.com/goldmedal/datafusion/pull/3 based on #15423 for `HashAggregate`. Could you check if it's heading in the right direction?