ctsk closed pull request #15479: experiment: Selectively remove
CoalesceBatchesExec
URL: https://github.com/apache/datafusion/pull/15479
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817209539
[benchmarks/hashagg-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/hashagg-results.md)
[benchmarks/join-results.md](https:/
ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817057508
Sorry for the silence. I've been extensively benchmarking this PR and the
results have been fairly mixed. I've also tried different thresholds for
coalescing. I plan on generating table
Dandandan commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2815260149
I think this is a nice experiment.
That said, I think we can better try changing the build side of the join to
use `Vec`.
I remember we (I) changed it to concatenate all buil
berkaysynnada commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2786362233
> @berkaysynnada I had a look at
`ExecutionPlanProperties::pipeline_behavior()`. I think it is not _quite_ what
I want here: For the HashJoin, I want to remove the coalesce on
ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2783511665
@berkaysynnada I had a look at
`ExecutionPlanProperties::pipeline_behavior()`. I think it is not *quite* what
I want here: For the HashJoin, I want to remove the coalesce on the build s
alamb commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2782934771
> Thanks for the benchmark @Rachelint. Did you use the implementation in
this PR or write your own? Overall the impact seems small (still good for how
little it takes to implement!). I
ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2782883247
*Marking as draft to signify the implementation is not finished yet*
Thanks for the benchmark @Rachelint. Did you use the implementation in this
PR or write your own? Overall the
berkaysynnada commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2781639583
> I think you can generalize this logic by tracking the
`ExecutionPlanProperties::pipeline_behavior()` of operators in the plan.
I can give some guidance on that BTW, if
Rachelint commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2775152403
Here is my result after removing `CoalesceBatchesExec` for `Aggregate`:
```
Benchmark clickbench_1.json
┏━━┳
alamb commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2764497802
BTW thank you very much @ctsk -- it is really cool to see the joins get
some careful love and attention ❤️
--
This is an automated message from the Apache Git Service.
To respond t
ctsk commented on code in PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#discussion_r2020104151
##
datafusion/physical-optimizer/src/coalesce_batches.rs:
##
@@ -92,3 +92,73 @@ impl PhysicalOptimizerRule for CoalesceBatches {
true
}
}
+
+/// Remove
berkaysynnada commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2764267974
I think you can generalize this logic by tracking the
`ExecutionPlanProperties::pipeline_behavior()` of operators in the plan.
--
This is an automated message from the Apach
Dandandan commented on code in PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#discussion_r2019008151
##
datafusion/physical-optimizer/src/coalesce_batches.rs:
##
@@ -92,3 +92,73 @@ impl PhysicalOptimizerRule for CoalesceBatches {
true
}
}
+
+/// R
Dandandan commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2761889602
That makes a lot of sense!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
ctsk opened a new pull request, #15479:
URL: https://github.com/apache/datafusion/pull/15479
Relates to Issue: #15478
## Rationale for this change
The blocking operators (HJ buid side, Aggregation) are often planned on top
of a RepartitionExec with a CoalesceBatchesExec in-betw
16 matches
Mail list logo