Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-05-05 Thread via GitHub
ctsk closed pull request #15479: experiment: Selectively remove CoalesceBatchesExec URL: https://github.com/apache/datafusion/pull/15479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-20 Thread via GitHub
ctsk commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817209539 [benchmarks/hashagg-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/hashagg-results.md) [benchmarks/join-results.md](https:/

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-20 Thread via GitHub
ctsk commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817057508 Sorry for the silence. I've been extensively benchmarking this PR and the results have been fairly mixed. I've also tried different thresholds for coalescing. I plan on generating table

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2815260149 I think this is a nice experiment. That said, I think we can better try changing the build side of the join to use `Vec`. I remember we (I) changed it to concatenate all buil

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-08 Thread via GitHub
berkaysynnada commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2786362233 > @berkaysynnada I had a look at `ExecutionPlanProperties::pipeline_behavior()`. I think it is not _quite_ what I want here: For the HashJoin, I want to remove the coalesce on

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-07 Thread via GitHub
ctsk commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2783511665 @berkaysynnada I had a look at `ExecutionPlanProperties::pipeline_behavior()`. I think it is not *quite* what I want here: For the HashJoin, I want to remove the coalesce on the build s

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-07 Thread via GitHub
alamb commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2782934771 > Thanks for the benchmark @Rachelint. Did you use the implementation in this PR or write your own? Overall the impact seems small (still good for how little it takes to implement!). I

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-07 Thread via GitHub
ctsk commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2782883247 *Marking as draft to signify the implementation is not finished yet* Thanks for the benchmark @Rachelint. Did you use the implementation in this PR or write your own? Overall the

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-06 Thread via GitHub
berkaysynnada commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2781639583 > I think you can generalize this logic by tracking the `ExecutionPlanProperties::pipeline_behavior()` of operators in the plan. I can give some guidance on that BTW, if

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-05 Thread via GitHub
Rachelint commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2775152403 Here is my result after removing `CoalesceBatchesExec` for `Aggregate`: ``` Benchmark clickbench_1.json ┏━━┳

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2764497802 BTW thank you very much @ctsk -- it is really cool to see the joins get some careful love and attention ❤️ -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-30 Thread via GitHub
ctsk commented on code in PR #15479: URL: https://github.com/apache/datafusion/pull/15479#discussion_r2020104151 ## datafusion/physical-optimizer/src/coalesce_batches.rs: ## @@ -92,3 +92,73 @@ impl PhysicalOptimizerRule for CoalesceBatches { true } } + +/// Remove

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-29 Thread via GitHub
berkaysynnada commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2764267974 I think you can generalize this logic by tracking the `ExecutionPlanProperties::pipeline_behavior()` of operators in the plan. -- This is an automated message from the Apach

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-28 Thread via GitHub
Dandandan commented on code in PR #15479: URL: https://github.com/apache/datafusion/pull/15479#discussion_r2019008151 ## datafusion/physical-optimizer/src/coalesce_batches.rs: ## @@ -92,3 +92,73 @@ impl PhysicalOptimizerRule for CoalesceBatches { true } } + +/// R

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-28 Thread via GitHub
Dandandan commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2761889602 That makes a lot of sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-28 Thread via GitHub
ctsk opened a new pull request, #15479: URL: https://github.com/apache/datafusion/pull/15479 Relates to Issue: #15478 ## Rationale for this change The blocking operators (HJ buid side, Aggregation) are often planned on top of a RepartitionExec with a CoalesceBatchesExec in-betw