[ https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409454#comment-16409454 ]
tartarus commented on HIVE-14303: --------------------------------- [~csun] [~zxu] I suggest adding some exception information in the {code:java} ExecReducer.close(){code} . We should find the reason for the first close failure. It is exactly what I have done. > CommonJoinOperator.checkAndGenObject should return directly to avoid NPE if > ExecReducer.close is called twice. > -------------------------------------------------------------------------------------------------------------- > > Key: HIVE-14303 > URL: https://issues.apache.org/jira/browse/HIVE-14303 > Project: Hive > Issue Type: Bug > Reporter: zhihai xu > Assignee: zhihai xu > Priority: Major > Fix For: 2.3.0 > > Attachments: HIVE-14303.0.patch, HIVE-14303.1.patch, > HIVE-14303.2.patch > > > CommonJoinOperator.checkAndGenObject should return directly (after > {{CommonJoinOperator.closeOp}} was called ) to avoid NPE if ExecReducer.close > is called twice. ExecReducer.close implements Closeable interface and > ExecReducer.close can be called multiple time. We saw the following NPE which > hide the real exception due to this bug. > {code:java} > Error: java.lang.RuntimeException: Hive Runtime Error while closing > operators: null > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > ... 8 more > {code} > The code from ReduceTask.runOldReducer: > {code:java} > reducer.close(); //line 453 > reducer = null; > > out.close(reporter); > out = null; > } finally { > IOUtils.cleanup(LOG, reducer);// line 459 > closeQuietly(out, reporter); > } > {code} > Based on the above stack trace and code, reducer.close() is called twice > because the exception happened when reducer.close() is called for the first > time at line 453, the code exit before reducer was set to null. > NullPointerException is triggered when reducer.close() is called for the > second time in IOUtils.cleanup at line 459. NullPointerException hide the > real exception which happened when reducer.close() is called for the first > time at line 453. > The reason for NPE is: > The first reducer.close called CommonJoinOperator.closeOp which clear > {{storage}} > {code:java} > Arrays.fill(storage, null); > {code} > the second reduce.close generated NPE due to null {{storage[alias]}} which is > set to null by first reducer.close. > The following reducer log can give more proof: > {code:java} > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - > 53466 > 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing > operators - failing tree > 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: Hive Runtime Error > while closing operators: null > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > ... 8 more > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)