[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523765#comment-14523765 ]
Peter Slawski commented on HIVE-10538: -------------------------------------- Yes. Here is an explanation that refers to how this transient is used. The transient is used to compute the row's hash in [ReduceSinkOperator.java#L368|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L368]. {code} hashCode = computeHashCode(row, bucketNumber); {code} If the given the bucket number is valid (which would always be as the transient is initialized to a valid number) the computed hashcode would be always multiplied by 31, see [ReduceSinkOperator.java#L488|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L488]: {code} ... private int computeHashCode(Object row, int buckNum) throws HiveException { ... } else { for (int i = 0; i < partitionEval.length; i++) { Object o = partitionEval[i].evaluate(row); keyHashCode = keyHashCode * 31 + ObjectInspectorUtils.hashCode(o, partitionObjectInspectors[i]); } } int hashCode = buckNum < 0 ? keyHashCode : keyHashCode * 31 + buckNum; ... return hashCode; } {code} FileSinkOperator recomputes the hashcode at findWriterOffset(), but won't multiple by 31. This causes a different bucket number to be computed than expected. bucketMap only contains mappings for the bucket numbers that is expected for the current reducer to receive. From [FileSinkOperator.java#L811|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L811]: {code} private int findWriterOffset(Object row) throws HiveException { ... for (int i = 0; i < partitionEval.length; i++) { Object o = partitionEval[i].evaluate(row); keyHashCode = keyHashCode * 31 + ObjectInspectorUtils.hashCode(o, partitionObjectInspectors[i]); } key.setHashCode(keyHashCode); int bucketNum = prtner.getBucket(key, null, totalFiles); return bucketMap.get(bucketNum); } {code} The transient was introduced in [HIVE-8151] which refactored the bucket number from a local variable to a transient field. Initially, the local variable was initialized to -1. The refactor changed the code so that the transient field was used instead. > Fix NPE in FileSinkOperator from hashcode mismatch > -------------------------------------------------- > > Key: HIVE-10538 > URL: https://issues.apache.org/jira/browse/HIVE-10538 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 1.0.0, 1.2.0 > Reporter: Peter Slawski > Fix For: 1.3.0 > > Attachments: HIVE-10538.1.patch > > > A Null Pointer Exception occurs when in FileSinkOperator when using bucketed > tables and distribute by with multiFileSpray enabled. The following snippet > query reproduces this issue: > {code} > set hive.enforce.bucketing = true; > set hive.exec.reducers.max = 20; > create table bucket_a(key int, value_a string) clustered by (key) into 256 > buckets; > create table bucket_b(key int, value_b string) clustered by (key) into 256 > buckets; > create table bucket_ab(key int, value_a string, value_b string) clustered by > (key) into 256 buckets; > -- Insert data into bucket_a and bucket_b > insert overwrite table bucket_ab > select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key > = b.key) distribute by key; > {code} > The following stack trace is logged. > {code} > 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer > (ExecReducer.java:reduce(255)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 8 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)