Hi.

I see that this piece of code is the source of the error:

final int maxSize =
    (vectorizedTestingReducerBatchSize > 0 ?
        Math.min(vectorizedTestingReducerBatchSize, batch.getMaxSize()) :
        batch.getMaxSize());
Preconditions.checkState(maxSize > 0);
int rowIdx = 0;
int batchBytes = keyBytes.length;
try {
  for (Object value : values) {
    if (rowIdx >= maxSize ||
        (rowIdx > 0 && batchBytes >= BATCH_BYTES)) {

      // Batch is full AND we have at least 1 more row...
      batch.size = rowIdx;
      if (handleGroupKey) {
        reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false);
      }
      reducer.process(batch, tag);

      // Reset just the value columns and value buffer.
      for (int i = firstValueColumnOffset; i < batch.numCols; i++) {
        // Note that reset also resets the data buffer for bytes column vectors.
        batch.cols[i].reset();
      }
      rowIdx = 0;
      batchBytes = keyBytes.length;
    }
    if (valueLazyBinaryDeserializeToRow != null) {
      // Deserialize value into vector row columns.
      BytesWritable valueWritable = (BytesWritable) value;
      byte[] valueBytes = valueWritable.getBytes();
      int valueLength = valueWritable.getLength();
      batchBytes += valueLength;

      valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength);
      valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx);
    }
    rowIdx++;
  }


`*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` throws an
exception due to `*rowIdx*` having a value of 1024, it should have a value
of1023 at most.
But it seems to me that `*maxSize*` will always be < 1024 then why would `
*rowIdx*` on the expression
`*valueLazyBinaryDeserializeToRow.deserialize(batch,
rowIdx)*` have anything >= 1024.
Am I missing something here?

Thanks,
Bernard

On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon <
bernard.qui...@cheetahdigital.com> wrote:

> Hi.
>
> I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into
> intermittent errors when doing Hive Merge.
>
> Just to clarify, the Hive Merge query probably succeeds 60% of the time
> using the same source and destination table for the Hive Merge query.
>
> By the way, both the source and destination table has columns with complex
> data types such as ARRAY<STRING> and MAP<STRING, STRING>.
>
>
> Here's the error :
>
> TaskAttempt 0 failed, info=
> ยป Error: Error while running task ( failure ) :
> attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
> ... 20 more
>
> Would someone know a workaround for this?
>
> Thanks,
> Bernard
>

Reply via email to