[ https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103146#comment-14103146 ]
Na Yang commented on HIVE-7767: ------------------------------- By looking into this issue, I find out the reason that caused this issue. The "hive.optimize.union.remove=true" optimizer removes the union operator from the operator tree and ends up generating two graphs in the spark transformation graph. The current GraphTran execute API is not able to handle multiples graphs properly. We need to change the execute impl in GraphTran.java to make it handle multiple transformation graphs. I will upload a patch shortly after HIVE-7717's patch is committed. > hive.optimize.union.remove does not work properly [Spark Branch] > ---------------------------------------------------------------- > > Key: HIVE-7767 > URL: https://issues.apache.org/jira/browse/HIVE-7767 > Project: Hive > Issue Type: Sub-task > Reporter: Na Yang > Assignee: Na Yang > > Turing on the hive.optimize.union.remove property generates wrong union all > result. > For Example: > {noformat} > create table inputTbl1(key string, val string) stored as textfile; > load data local inpath '../../data/files/T1.txt' into table inputTbl1; > SELECT * > FROM ( > SELECT key, count(1) as values from inputTbl1 group by key > UNION ALL > SELECT key, count(1) as values from inputTbl1 group by key > ) a; > {noformat} > when the hive.optimize.union.remove is turned on, the query result is like: > {noformat} > 1 1 > 2 1 > 3 1 > 7 1 > 8 2 > {noformat} > when the hive.optimize.union.remove is turned off, the query result is like: > {noformat} > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > {noformat} > The expected query result is: > {noformat} > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)