2010YOUY01 commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3177753494
> > Support additional join predicates [#16660 (comment)](https://github.com/apache/datafusion/pull/16660#discussion_r2265310584) > > I dont think this should be done. I looked into it, and the overhead + complexity brought by checking selectivity just for a workload which is very unlikely (equijoin filters less than the single filter). DuckDB also doesnt support this if that is an indicator at all. Perhaps I didn't express it clearly -- the idea was not to check the selectivity and reorder the filters, it's always evaluate the primary join predicate first (equality for HJ, and single IE for PMJ), then support optional remaining ANDed filters: let's say we're doing ```sql SELECT * FROM generate_series(10000) AS t1(v1) JOIN generate_series(10000) AS t2(v1) ON ((t1.v1 + t2.v1) % 1000 = 0) AND (t1.v1 > t2.v1); ``` Then it should be executed as a PMJ with IE predicate `t1.v1 > t2.v1`, after this IE predicate is evaluated, PMJ operator can support continue filtering with the remaining ANDed filter `(t1.v1 + t2.v1) % 1000 = 0)` The SMJ and HJ in DataFusion are all implemented this way, they're fusing the general join conditions into the operator, however DuckDB is breaking this post-filtering step into a separate filter. Probably DuckDB approach is a good idea to simplify the join operators. Example: DuckDB is evaluating the remaining join filter outside the join operator, however DF is evaluating it inside the join operator ``` D explain SELECT * FROM generate_series(10000) AS t1(v1) JOIN generate_series(10000) AS t2(v1) ON ((t1.v1 + t2.v1) % 1000 = 0) AND (t1.v1 > t2.v1); ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ FILTER │ │ ──────────────────── │ │ (((v1 + v1) % 1000) = 0) │ │ │ │ ~215472 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PIECEWISE_MERGE_JOIN │ │ ──────────────────── │ │ Join Type: INNER │ │ Conditions: v1 > v1 ├──────────────┐ │ │ │ │ ~215472 Rows │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ GENERATE_SERIES ││ GENERATE_SERIES │ │ ──────────────────── ││ ──────────────────── │ │ Function: ││ Function: │ │ GENERATE_SERIES ││ GENERATE_SERIES │ │ ││ │ │ ~10001 Rows ││ ~10001 Rows │ └───────────────────────────┘└───────────────────────────┘ ``` ``` > explain SELECT * FROM generate_series(10000) AS t1(v1) JOIN generate_series(10000) AS t2(v1) ON (t1.v1=t2.v1) AND (t1.v1 > t2.v1); +---------------+------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ HashJoinExec │ | | | │ -------------------- ├──────────────┐ | | | │ on: (v1 = v1) │ │ | | | └─────────────┬─────────────┘ │ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec ││ CoalesceBatchesExec │ | | | │ -------------------- ││ -------------------- │ | | | │ target_batch_size: ││ target_batch_size: │ | | | │ 8192 ││ 8192 │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ RepartitionExec ││ RepartitionExec │ | | | │ -------------------- ││ -------------------- │ | | | │ partition_count(in->out): ││ partition_count(in->out): │ | | | │ 14 -> 14 ││ 14 -> 14 │ | | | │ ││ │ | | | │ partitioning_scheme: ││ partitioning_scheme: │ | | | │ Hash([v1@0], 14) ││ Hash([v1@0], 14) │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ RepartitionExec ││ RepartitionExec │ | | | │ -------------------- ││ -------------------- │ | | | │ partition_count(in->out): ││ partition_count(in->out): │ | | | │ 1 -> 14 ││ 1 -> 14 │ | | | │ ││ │ | | | │ partitioning_scheme: ││ partitioning_scheme: │ | | | │ RoundRobinBatch(14) ││ RoundRobinBatch(14) │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ ProjectionExec ││ ProjectionExec │ | | | │ -------------------- ││ -------------------- │ | | | │ v1: value ││ v1: value │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ LazyMemoryExec ││ LazyMemoryExec │ | | | │ -------------------- ││ -------------------- │ | | | │ batch_generators: ││ batch_generators: │ | | | │ generate_series: start=0, ││ generate_series: start=0, │ | | | │ end=10000, batch_size ││ end=10000, batch_size │ | | | │ =8192 ││ =8192 │ | | | └───────────────────────────┘└───────────────────────────┘ | | | | +---------------+------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.005 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org