[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225724#comment-14225724 ]
Szehon Ho commented on HIVE-8924: --------------------------------- In SparkMapJoinResolver.containsOp, it calls MapWork.getAllRootOperators(). That will only return results if there are entries in pathToAlias, which is not there in this case. So with this change, it looks through all aliasToWork to get the operators. That way, it can correctly identify the HashTableSink, and the MapJoin, and generate a valid plan. We did try for awhile to remove the work completely in this situation but it ran into a lot of other issues, so decided this was simpler for this corner case. > Investigate test failure for join_empty.q [Spark Branch] > -------------------------------------------------------- > > Key: HIVE-8924 > URL: https://issues.apache.org/jira/browse/HIVE-8924 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: spark-branch > Reporter: Chao > Assignee: Szehon Ho > Attachments: HIVE-8924-spark.patch > > > This query has an interesting case where the big table work is empty. Here's > the MR plan: > {noformat} > STAGE DEPENDENCIES: > Stage-4 is a root stage > Stage-3 depends on stages: Stage-4 > Stage-0 depends on stages: Stage-3 > STAGE PLANS: > Stage: Stage-4 > Map Reduce Local Work > Alias -> Map Local Tables: > b > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > b > TableScan > alias: b > Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: UDFToDouble(key) is not null (type: boolean) > Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > condition expressions: > 0 {key} > 1 {value} > keys: > 0 UDFToDouble(key) (type: double) > 1 UDFToDouble(key) (type: double) > Stage: Stage-3 > Map Reduce > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {noformat} > The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)