[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225724#comment-14225724
 ] 

Szehon Ho commented on HIVE-8924:
---------------------------------

In SparkMapJoinResolver.containsOp, it calls MapWork.getAllRootOperators().  
That will only return results if there are entries in pathToAlias, which is not 
there in this case.  So with this change, it looks through all aliasToWork to 
get the operators.  That way, it can correctly identify the HashTableSink, and 
the MapJoin, and generate a valid plan.

We did try for awhile to remove the work completely in this situation but it 
ran into a lot of other issues, so decided this was simpler for this corner 
case.

> Investigate test failure for join_empty.q [Spark Branch]
> --------------------------------------------------------
>
>                 Key: HIVE-8924
>                 URL: https://issues.apache.org/jira/browse/HIVE-8924
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Szehon Ho
>         Attachments: HIVE-8924-spark.patch
>
>
> This query has an interesting case where the big table work is empty. Here's 
> the MR plan:
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
> Column stats: NONE
>             Filter Operator
>               predicate: UDFToDouble(key) is not null (type: boolean)
>               Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
> Column stats: NONE
>               HashTable Sink Operator
>                 condition expressions:
>                   0 {key}
>                   1 {value}
>                 keys:
>                   0 UDFToDouble(key) (type: double)
>                   1 UDFToDouble(key) (type: double)
>   Stage: Stage-3
>     Map Reduce
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {noformat}
> The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to