Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-07-22 Thread via GitHub
github-actions[bot] closed pull request #15981: Optimize hash partitioning for cache friendliness URL: https://github.com/apache/datafusion/pull/15981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-3071664035 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-09 Thread via GitHub
alamb commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2867285850 > I think something like that is done already in the "convert to state" logic - it will dynamically decide to skip aggregating once it sees that the group vs input rows ratio is small.

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-08 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2863265862 I think something like that is done already in the "convert to state" logic - it will dynamically decide to skip aggregating once it sees that the group vs input rows ratio is smal

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-08 Thread via GitHub
ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2863005079 SInce partition does not appear to be a limiting factor in aggregations, I wonder if it makes sense to investigate a lower-quality pre-aggregation (i.e. let more tuples pass to the fina

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2860307673 🤖: Benchmark completed Details ``` Comparing HEAD and experiment_repartition-optimization Benchmark clickbench_extended.json --

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2860166649 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859776487 Nice, that seems like a great result! i think the main improvement seems to be after this would be using the `take_in` API you proposed in arrow-rs (mainly to avoid `concat`)

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859359952 I've ran clickbench_partitioned and tpch_mem10 - on a machine with 16 cores. The clickbench results are pretty much the same, tpch_mem10 ran significantly faster. data

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859289524 nice, could you share some perf numbers of this approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859158487 Another tried-and-true strategy for this kind of problem is to partition in multiple stages: Instead of having a "wide" fanout partitioning to, for instance 256 partitions, it is prefer