[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708691#comment-13708691 ]
Phabricator commented on HIVE-4730: ----------------------------------- brock has commented on the revision "HIVE-4730 [jira] Join on more than 2^31 records on single reducer failed (wrong results)". Hi Navis, Thanks for the patch! I noted a few style nits. Just curious, how long did the query take to complete? My guess is far too long to have a q-file test for this. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:286 Is it possible to move this up near the rest of the member variable definitions? Ideally it'd be nice to change the LHS to be List but it's possible that something in the class requires ArrayList. REVISION DETAIL https://reviews.facebook.net/D11283 To: JIRA, navis Cc: brock > Join on more than 2^31 records on single reducer failed (wrong results) > ----------------------------------------------------------------------- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug > Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 > Reporter: Gabi Kazav > Assignee: Navis > Priority: Blocker > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira