Zhen Chen created CALCITE-6927:
----------------------------------

             Summary: Join condition remove IS NOT DISTINCT FROM
                 Key: CALCITE-6927
                 URL: https://issues.apache.org/jira/browse/CALCITE-6927
             Project: Calcite
          Issue Type: Improvement
            Reporter: Zhen Chen
            Assignee: Zhen Chen


By referring to the conversion method of spark, IS NOT DISTINCT FROM can be 
converted to `(coalesce(x, '') = coalesce(y, '') ) and (isnull(x) = isnull(y))` 
so that the join with IS NOT DISTINCT FROM condition can be used HashJoin 
instead of NestedLoopJoin when converting the logical plan to the physical 
plan.  
The sql is as follows:
{code:java}
explain 
select t1.age from user_profiles as t1 
join user_profiles t2 
on t1.user_id <=> t2.user_id;  {code}
The spark plan is as follows:
{code:java}
AdaptiveSparkPlan isFinalPlan=false
+- Project [age#6]
   +- BroadcastHashJoin [coalesce(user_id#5, ), isnull(user_id#5)], 
[coalesce(user_id#29, ), isnull(user_id#29)], Inner, BuildRight, false
      :- FileScan orc default.user_profiles[user_id#5,age#6] Batched: true, 
Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, 
Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<user_id:string,age:int>
      +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, 
string, true], ), isnull(input[0, string, true])),false), [plan_id=72]
         +- FileScan orc default.user_profiles[user_id#29] Batched: true, 
Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, 
Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<user_id:string>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to