[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

Peter Slawski (JIRA) Fri, 01 May 2015 12:34:22 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523765#comment-14523765
 ]


Peter Slawski commented on HIVE-10538:
--------------------------------------

Yes. Here is an explanation that refers to how this transient is used.

The transient is used to compute the row's hash in 
[ReduceSinkOperator.java#L368|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L368].
{code}
        hashCode = computeHashCode(row, bucketNumber);
{code}
If the given the bucket number is valid (which would always be as the transient 
is initialized to a valid number) the computed hashcode would be always 
multiplied by 31, see 
[ReduceSinkOperator.java#L488|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L488]:
{code}
  ...
  private int computeHashCode(Object row, int buckNum) throws HiveException {
  ...
    } else {
      for (int i = 0; i < partitionEval.length; i++) {
        Object o = partitionEval[i].evaluate(row);
        keyHashCode = keyHashCode * 31
            + ObjectInspectorUtils.hashCode(o, partitionObjectInspectors[i]);
      }
    }
    int hashCode = buckNum < 0 ? keyHashCode : keyHashCode * 31 + buckNum;
    ...
    return hashCode;
  }
{code}
FileSinkOperator recomputes the hashcode at findWriterOffset(), but won't 
multiple by 31. This causes a different bucket number to be computed than 
expected. bucketMap only contains mappings for the bucket numbers that is 
expected for the current reducer to receive. From 
[FileSinkOperator.java#L811|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L811]:
{code}
  private int findWriterOffset(Object row) throws HiveException {
  ...
      for (int i = 0; i < partitionEval.length; i++) {
        Object o = partitionEval[i].evaluate(row);
        keyHashCode = keyHashCode * 31
            + ObjectInspectorUtils.hashCode(o, partitionObjectInspectors[i]);
      }
      key.setHashCode(keyHashCode);
      int bucketNum = prtner.getBucket(key, null, totalFiles);
      return bucketMap.get(bucketNum);
  }
{code}
The transient was introduced in [HIVE-8151] which refactored the bucket number 
from a local variable to a transient field. Initially, the local variable was 
initialized to -1. The refactor changed the code so that the transient field 
was used instead.

> Fix NPE in FileSinkOperator from hashcode mismatch
> --------------------------------------------------
>
>                 Key: HIVE-10538
>                 URL: https://issues.apache.org/jira/browse/HIVE-10538
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.0.0, 1.2.0
>            Reporter: Peter Slawski
>             Fix For: 1.3.0
>
>         Attachments: HIVE-10538.1.patch
>
>
> A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
> tables and distribute by with multiFileSpray enabled. The following snippet 
> query reproduces this issue:
> {code}
> set hive.enforce.bucketing = true;
> set hive.exec.reducers.max = 20;
> create table bucket_a(key int, value_a string) clustered by (key) into 256 
> buckets;
> create table bucket_b(key int, value_b string) clustered by (key) into 256 
> buckets;
> create table bucket_ab(key int, value_a string, value_b string) clustered by 
> (key) into 256 buckets;
> -- Insert data into bucket_a and bucket_b
> insert overwrite table bucket_ab
> select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
> = b.key) distribute by key;
> {code}
> The following stack trace is logged.
> {code}
> 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
> (ExecReducer.java:reduce(255)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}}
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
>       ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

Reply via email to