Zhen Chen created CALCITE-6927: ---------------------------------- Summary: Join condition remove IS NOT DISTINCT FROM Key: CALCITE-6927 URL: https://issues.apache.org/jira/browse/CALCITE-6927 Project: Calcite Issue Type: Improvement Reporter: Zhen Chen Assignee: Zhen Chen
By referring to the conversion method of spark, IS NOT DISTINCT FROM can be converted to `(coalesce(x, '') = coalesce(y, '') ) and (isnull(x) = isnull(y))` so that the join with IS NOT DISTINCT FROM condition can be used HashJoin instead of NestedLoopJoin when converting the logical plan to the physical plan. The sql is as follows: {code:java} explain select t1.age from user_profiles as t1 join user_profiles t2 on t1.user_id <=> t2.user_id; {code} The spark plan is as follows: {code:java} AdaptiveSparkPlan isFinalPlan=false +- Project [age#6] +- BroadcastHashJoin [coalesce(user_id#5, ), isnull(user_id#5)], [coalesce(user_id#29, ), isnull(user_id#29)], Inner, BuildRight, false :- FileScan orc default.user_profiles[user_id#5,age#6] Batched: true, Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<user_id:string,age:int> +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, string, true], ), isnull(input[0, string, true])),false), [plan_id=72] +- FileScan orc default.user_profiles[user_id#29] Batched: true, Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<user_id:string>{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)