[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693243#comment-15693243 ]
Vikash Pareek commented on HIVE-15272: -------------------------------------- I am just calculating count of the records, result (count) does not dependent on ordering. Result should be same for each execution as in case of MR. I have around 30 million data in my_table1 (left) and 85 million data in my_table2 (right). > "LEFT OUTER JOIN" Is not populating different records with Hive On Spark > ------------------------------------------------------------------------ > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark > Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 > Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)