924060929 opened a new pull request, #64793:
URL: https://github.com/apache/doris/pull/64793

   ### What problem does this PR solve?
   
   Related PR: #63366 (FE local-shuffle planner, merged)
   
   Problem Summary:
   
   Two optimizations on top of the FE local-shuffle planner (#63366), 
addressing DORIS-24902 (bucket-shuffle serial bottleneck + parallelism upgrade).
   
   **Part 1: ExchangeNode serial decoupling + bucket-shuffle dest spreading**
   
   When the FE local-shuffle planner is enabled, 
`ExchangeNode.isSerialOperatorOnBe` no longer inherits 
`fragment.hasSerialScanNode()` — serial status comes from the node itself only. 
This decoupling allows `DistributePlanner` to spread bucket-shuffle 
destinations to all bucket-owning instances (via `assignedJoinBucketIndexes`) 
instead of funneling to the first instance per worker, eliminating the serial 
bottleneck diagnosed in DORIS-24902.
   
   Includes BE-side orphan instance fix: when dest spreading targets only 
bucket-owning instances, non-owning instances become orphans whose receivers 
never get data. Fixed by creating orphan receivers with `num_senders=0` (ready 
immediately at EOS) and using per-BE local task index for orphan detection.
   
   **Part 2: Bucket-to-hash parallelism upgrade + RF force_local_merge**
   
   When a pooled-scan fragment has significantly more instances than buckets, 
upgrade bucket-shuffle local exchanges to `LOCAL_EXECUTION_HASH_SHUFFLE` so the 
join runs at full instance parallelism instead of being capped at bucket count.
   
   - Cores-aware gate: `min(instances, executor_threads) / min(buckets, 
executor_threads) > ratio`
   - Whole-chain upgrade for stacked bucket joins
   - Tunable `bucket_shuffle_downgrade_ratio` (default 0.8)
   - RF `force_local_merge` fix: bucket upgrade flips scan from serial to 
parallel, breaking the implicit RF merge signal. Added 
`TRuntimeFilterDesc.force_local_merge` — FE walks builder→target path after 
`AddLocalExchange`; if a `LocalExchangeNode` sits on the path, the target must 
merge partial RFs.
   
   ### Release note
   
   Add session variable `local_shuffle_bucket_upgrade_ratio` (default 1.0, <= 1 
disables) to control bucket-to-hash parallelism upgrade in pooled-scan 
fragments. When enabled and per-BE instances significantly exceed bucket count, 
bucket-shuffle local exchanges are upgraded to hash exchanges for higher join 
parallelism.
   
   ### Check List (For Author)
   
   - Test
       - [x] Regression test
       - [x] Unit Test
       - [ ] Manual test
       - [ ] No need to test
   
   - Behavior changed:
       - [x] Yes. Bucket-shuffle join fragments may now use 
`LOCAL_EXECUTION_HASH_SHUFFLE` instead of `BUCKET_HASH_SHUFFLE` local exchanges 
when the upgrade ratio is met. Setting `local_shuffle_bucket_upgrade_ratio=0` 
(or <= 1) restores the previous behavior.
   
   - Does this need documentation?
       - [x] Yes. Session variables `local_shuffle_bucket_upgrade_ratio` and 
`bucket_shuffle_downgrade_ratio` should be documented.
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to