[ https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104629#comment-14104629 ]
Na Yang commented on HIVE-7767: ------------------------------- For the test result union_remove_22.q.out, the difference between mr result and spark result is the order. There are two queries in the test union_remove_22.q file: select * from outputTbl1; select * from outputTbl1 order by key, values; The first query returns the same dataset in both mr and spark but with different order. The second query returns the same dataset with the same order in both mr and spark. For the test result union_remove_10.q.out, the result in spark is difference from mr. After playing with the queries, I think it is a bug related to the insert overwrite query and the remove union optimization. A JIRA HIVE-7810 is filed for this issue. Can I remove the two test cases from the new patch? Thanks, Na > hive.optimize.union.remove does not work properly [Spark Branch] > ---------------------------------------------------------------- > > Key: HIVE-7767 > URL: https://issues.apache.org/jira/browse/HIVE-7767 > Project: Hive > Issue Type: Sub-task > Reporter: Na Yang > Assignee: Na Yang > Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch, > HIVE-7767.2-spark.patch > > > Turing on the hive.optimize.union.remove property generates wrong union all > result. > For Example: > {noformat} > create table inputTbl1(key string, val string) stored as textfile; > load data local inpath '../../data/files/T1.txt' into table inputTbl1; > SELECT * > FROM ( > SELECT key, count(1) as values from inputTbl1 group by key > UNION ALL > SELECT key, count(1) as values from inputTbl1 group by key > ) a; > {noformat} > when the hive.optimize.union.remove is turned on, the query result is like: > {noformat} > 1 1 > 2 1 > 3 1 > 7 1 > 8 2 > {noformat} > when the hive.optimize.union.remove is turned off, the query result is like: > {noformat} > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > {noformat} > The expected query result is: > {noformat} > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)