Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

via GitHub Mon, 11 Aug 2025 22:26:12 -0700


2010YOUY01 commented on PR #16660:
URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3177753494


   > > Support additional join predicates [#16660 
(comment)](https://github.com/apache/datafusion/pull/16660#discussion_r2265310584)
   > 
   > I dont think this should be done. I looked into it, and the overhead + 
complexity brought by checking selectivity just for a workload which is very 
unlikely (equijoin filters less than the single filter). DuckDB also doesnt 
support this if that is an indicator at all.
   
   Perhaps I didn't express it clearly -- the idea was not to check the 
selectivity and reorder the filters, it's always evaluate the primary join 
predicate first (equality for HJ, and single IE for PMJ), then support optional 
remaining ANDed filters:
   let's say we're doing
   ```sql
    SELECT *
             FROM generate_series(10000) AS t1(v1)
             JOIN generate_series(10000) AS t2(v1)
             ON ((t1.v1 + t2.v1) % 1000 = 0) AND (t1.v1 > t2.v1);
   ```
   Then it should be executed as a PMJ with IE predicate `t1.v1 > t2.v1`, after 
this IE predicate is evaluated, PMJ operator can support continue filtering 
with the remaining ANDed filter `(t1.v1 + t2.v1) % 1000 = 0)`
   
   The SMJ and HJ in DataFusion are all implemented this way, they're fusing 
the general join conditions into the operator, however DuckDB is breaking this 
post-filtering step into a separate filter. Probably DuckDB approach is a good 
idea to simplify the join operators.
   
   Example: DuckDB is evaluating the remaining join filter outside the join 
operator, however DF is evaluating it inside the join operator
   ```
   D explain SELECT *
             FROM generate_series(10000) AS t1(v1)
             JOIN generate_series(10000) AS t2(v1)
             ON ((t1.v1 + t2.v1) % 1000 = 0) AND (t1.v1 > t2.v1);
   
   ┌─────────────────────────────┐
   │┌───────────────────────────┐│
   ││       Physical Plan       ││
   │└───────────────────────────┘│
   └─────────────────────────────┘
   ┌───────────────────────────┐
   │           FILTER          │
   │    ────────────────────   │
   │  (((v1 + v1) % 1000) = 0) │
   │                           │
   │        ~215472 Rows       │
   └─────────────┬─────────────┘
   ┌─────────────┴─────────────┐
   │    PIECEWISE_MERGE_JOIN   │
   │    ────────────────────   │
   │      Join Type: INNER     │
   │    Conditions: v1 > v1    ├──────────────┐
   │                           │              │
   │        ~215472 Rows       │              │
   └─────────────┬─────────────┘              │
   ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
   │      GENERATE_SERIES      ││      GENERATE_SERIES      │
   │    ────────────────────   ││    ────────────────────   │
   │         Function:         ││         Function:         │
   │      GENERATE_SERIES      ││      GENERATE_SERIES      │
   │                           ││                           │
   │        ~10001 Rows        ││        ~10001 Rows        │
   └───────────────────────────┘└───────────────────────────┘
   ```
   
   ```
   > explain SELECT *
             FROM generate_series(10000) AS t1(v1)
             JOIN generate_series(10000) AS t2(v1)
             ON (t1.v1=t2.v1) AND (t1.v1 > t2.v1);
   
+---------------+------------------------------------------------------------+
   | plan_type     | plan                                                       
|
   
+---------------+------------------------------------------------------------+
   | physical_plan | ┌───────────────────────────┐                              
|
   |               | │    CoalesceBatchesExec    │                              
|
   |               | │    --------------------   │                              
|
   |               | │     target_batch_size:    │                              
|
   |               | │            8192           │                              
|
   |               | └─────────────┬─────────────┘                              
|
   |               | ┌─────────────┴─────────────┐                              
|
   |               | │        HashJoinExec       │                              
|
   |               | │    --------------------   ├──────────────┐               
|
   |               | │       on: (v1 = v1)       │              │               
|
   |               | └─────────────┬─────────────┘              │               
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │    CoalesceBatchesExec    ││    CoalesceBatchesExec    │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │     target_batch_size:    ││     target_batch_size:    │ 
|
   |               | │            8192           ││            8192           │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │      RepartitionExec      ││      RepartitionExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │ partition_count(in->out): ││ partition_count(in->out): │ 
|
   |               | │          14 -> 14         ││          14 -> 14         │ 
|
   |               | │                           ││                           │ 
|
   |               | │    partitioning_scheme:   ││    partitioning_scheme:   │ 
|
   |               | │      Hash([v1@0], 14)     ││      Hash([v1@0], 14)     │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │      RepartitionExec      ││      RepartitionExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │ partition_count(in->out): ││ partition_count(in->out): │ 
|
   |               | │          1 -> 14          ││          1 -> 14          │ 
|
   |               | │                           ││                           │ 
|
   |               | │    partitioning_scheme:   ││    partitioning_scheme:   │ 
|
   |               | │    RoundRobinBatch(14)    ││    RoundRobinBatch(14)    │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │       ProjectionExec      ││       ProjectionExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │         v1: value         ││         v1: value         │ 
|
   |               | └─────────────┬─────────────┘└─────────────┬─────────────┘ 
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │       LazyMemoryExec      ││       LazyMemoryExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │     batch_generators:     ││     batch_generators:     │ 
|
   |               | │ generate_series: start=0, ││ generate_series: start=0, │ 
|
   |               | │    end=10000, batch_size  ││    end=10000, batch_size  │ 
|
   |               | │           =8192           ││           =8192           │ 
|
   |               | └───────────────────────────┘└───────────────────────────┘ 
|
   |               |                                                            
|
   
+---------------+------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.005 seconds.
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

Reply via email to