[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712903#comment-13712903 ]
Hive QA commented on HIVE-4730: ------------------------------- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12592716/HIVE-4730.D11283.2.patch {color:green}SUCCESS:{color} +1 all tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/81/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/81/console Messages: Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. > Join on more than 2^31 records on single reducer failed (wrong results) > ----------------------------------------------------------------------- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug > Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 > Reporter: Gabi Kazav > Assignee: Navis > Priority: Blocker > Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira