[ https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated HIVE-9007: ------------------------------ Fix Version/s: (was: spark-branch) 1.1.0 > Hive may generate wrong plan for map join queries due to > IdentityProjectRemover [Spark Branch] > ---------------------------------------------------------------------------------------------- > > Key: HIVE-9007 > URL: https://issues.apache.org/jira/browse/HIVE-9007 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: spark-branch > Reporter: Chao Sun > Assignee: Szehon Ho > Fix For: 1.1.0 > > Attachments: HIVE-9007-spark.patch, HIVE-9007.2-spark.patch > > > HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, > which may cause map join in spark branch to generate wrong plan. > Currently, the map join conversion in spark branch first goes through a > method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, > removes RS associated with big table, and keep RSs for all small tables. > Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of > the mapjoin op with HTS (note it doesn't check whether the RS belongs to > small table or big table.) > The issue arises, when IdentityProjectRemover comes into play, which may > result into a situation that a operator tree has two consecutive RSs. Imaging > the following example: > {noformat} > Join MapJoin > / \ / \ > RS RS ---> RS RS > / \ / \ > TS RS TS TS (big table) > \ (small table) > TS > {noformat} > In this case, all parents of the mapjoin op will be RS, even the branch for > big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, > which is obviously incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)