pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2966838912
I don't think it's necessary TBH. I applied this patch (which I think is
what @berkaysynnada meant) and the test then fails in the way it's intended to.
```
Index: datafusi
alamb commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2966502775
Is there an additional test we should write perhaps, to add the coverage
@berkaysynnada suggests?
--
This is an automated message from the Apache Git Service.
To respond to the messa
alamb merged PR #16322:
URL: https://github.com/apache/datafusion/pull/16322
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965650342
> if the pending rotation somehow breaks, since SortPreservingMergeStream
never yields
I'm not sure I understand what you mean @berkaysynnada. Looking at just the
initial pha
berkaysynnada commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965238217
Sorry for the late reply. @pepijnve Your diagnosis is spot on, and the
proposed fix totally makes sense. I honestly can’t recall why I added the wake
there but not in Congested
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965788732
If think I've got you covered. Try commenting out
https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/sorts/sort_preserving_merge.rs#L1313
and running the te
berkaysynnada commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965717965
> > if the pending rotation somehow breaks, since SortPreservingMergeStream
never yields
>
> I'm not sure I understand what you mean @berkaysynnada. Looking at just
the
alamb commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965063028
Thanks again @pepijnve @Dandandan @ozankabak and @zhuqi-lucas
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
alamb commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965062758
> I just checked, and SortExec is also setting up RecordBatchReceiverStream.
The worst case scenario in terms of elapsed time in the poll_next call is that
all 10k streams are ready in
zhuqi-lucas commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2962001843
> > they are ultimately based on Tokio's channel mechanism, which relies on
Linux's epoll, so it's still very efficient
>
> I'm pretty sure that's not the case. There are n
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961836884
> they are ultimately based on Tokio's channel mechanism, which relies on
Linux's epoll, so it's still very efficient
I'm pretty sure that's not the case. There are no file de
zhuqi-lucas commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961694324
> I just checked, and SortExec is also setting up RecordBatchReceiverStream.
The worst case scenario in terms of elapsed time in the poll_next call is that
all 10k streams are re
zhuqi-lucas commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961608771
> > Especially in high-parallelism scenarios where polling all streams on
every wake-up is unnecessarily expensive.
>
> @zhuqi-lucas my assumption is that in high-paralleli
Dandandan commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961591627
Perhaps we can run the sorting benchmarks (tpch_sort) as well just to be
sure there are no regressions? AFAIU there shouldn't be but let's see
--
This is an automated message fro
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961574986
> Especially in high-parallelism scenarios where polling all streams on
every wake-up is unnecessarily expensive.
@zhuqi-lucas my assumption is that in high-parallelism situat
zhuqi-lucas commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961564872
> > Previously, as soon as one stream returned Pending, the merge would
short-circuit and return Pending, minimizing work per cycle. With the new
approach, we now poll all N stre
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2961538845
> Previously, as soon as one stream returned Pending, the merge would
short-circuit and return Pending, minimizing work per cycle. With the new
approach, we now poll all N streams o
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137449500
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137478983
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137446995
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137523815
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137438301
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137423047
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137478983
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137474689
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137438301
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137439468
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137438301
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137384940
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137392057
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2137392057
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2958304164
Waiting for benchmarks results here so I have some time to write up my
assessment of what was happening and what has changed. This is just to assist
any reviewers, not to replace re
ozankabak commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2955839724
Since this changes the congestion behavior test, which I'm not deeply
familiar with, let's hear from @berkaysynnada on this to make sure we are not
losing anything. If he is OK wit
alamb commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2955730921
🤖: Benchmark completed
Details
```
Comparing HEAD and issue_16321
Benchmark clickbench_extended.json
┏━
alamb commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2135636293
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,50 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspondin
alamb commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2955655026
🤖 `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
Dandandan commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2955401426
I profiled some queries to verify it's no longer busy on the sorting thread:
Main (TPC-H query 1):
https://github.com/user-attachments/assets/a7236a88-c024-435b-8413-d98dcd
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2135346103
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134795452
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2954193366
> @Dandandan project newbie question, my daily practice at work is to handle
code review comments using amend/force-push. Did so out of habit before
thinking to as ask. Is that ok
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2954189088
@Dandandan project newbie question, my daily practice at work is to handle
code review comments using amend/force-push. Did so out of habit before
thinking to as ask. Is that ok in
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134778315
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134777847
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134773918
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
pepijnve commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134773638
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspon
Dandandan commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2954177103
This seems really nice 🚀
On my machine I get roughly 10% improvement on queries with SPM - which I
think makes sense on a 10 core machine (with less cores it might busy-poll).
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134772014
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
Dandandan commented on code in PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#discussion_r2134771715
##
datafusion/physical-plan/src/sorts/merge.rs:
##
@@ -216,36 +212,49 @@ impl SortPreservingMergeStream {
// Once all partitions have set their correspo
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2953693505
@berkaysynnada I hope it's ok that I ping you directly; reaching out because
I believe you are the other of the test case in question. I believe this PR
surfaced a mistake in the `C
pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2953125280
A sort preserving merge specific test case started failing. I’ll dig deeper
to better understand what’s going on.
--
This is an automated message from the Apache Git Service.
To r
pepijnve opened a new pull request, #16322:
URL: https://github.com/apache/datafusion/pull/16322
## Which issue does this PR close?
- Closes #16321.
## Rationale for this change
`SortPreservingMergeStream` works in two phases. It first waits for each
input stream to be r
51 matches
Mail list logo