Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Bernard Quizon Wed, 15 Jul 2020 01:22:20 -0700

Hi, Aaron.

Thank you, your suggestion might have solved this issue.
So far I haven't seen a failure after turning off vectorization.
Though I don't think this is the best solution since turning it off has
performance implications.


Thanks,
Bernard

On Tue, Jul 14, 2020 at 10:06 PM Aaron Grubb <aa...@kaden.ai> wrote:

> This is just a suggestion but I recently ran into an issue with vectorized
> query execution and a map column type, specifically when inserting into an
> HBase table with a map to column family setup. Try using “set
> hive.vectorized.execution.enabled=false;”
>
>
>
> Thanks,
>
> Aaron
>
>
>
>
>
> *From:* Bernard Quizon <bernard.qui...@cheetahdigital.com>
> *Sent:* Tuesday, July 14, 2020 9:57 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge
>
>
>
> Hi.
>
> I see that this piece of code is the source of the error:
>
> final int maxSize =
>     (vectorizedTestingReducerBatchSize > 0 ?
>         Math.*min*(vectorizedTestingReducerBatchSize, batch.getMaxSize()) :
>         batch.getMaxSize());
> Preconditions.*checkState*(maxSize > 0);
> int rowIdx = 0;
> int batchBytes = keyBytes.length;
> try {
>   for (Object value : values) {
>     if (rowIdx >= maxSize ||
>         (rowIdx > 0 && batchBytes >= BATCH_BYTES)) {
>
>       // Batch is full AND we have at least 1 more row...
>       batch.size = rowIdx;
>       if (handleGroupKey) {
>         reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false);
>       }
>       reducer.process(batch, tag);
>
>       // Reset just the value columns and value buffer.
>       for (int i = firstValueColumnOffset; i < batch.numCols; i++) {
>         // Note that reset also resets the data buffer for bytes column 
> vectors.
>         batch.cols[i].reset();
>       }
>       rowIdx = 0;
>       batchBytes = keyBytes.length;
>     }
>     if (valueLazyBinaryDeserializeToRow != null) {
>       // Deserialize value into vector row columns.
>       BytesWritable valueWritable = (BytesWritable) value;
>       byte[] valueBytes = valueWritable.getBytes();
>       int valueLength = valueWritable.getLength();
>       batchBytes += valueLength;
>
>       valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength);
>       valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx);
>     }
>     rowIdx++;
>   }
>
>
>
>
>
> `*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` throws an
> exception due to `*rowIdx*` having a value of 1024, it should have a
> value of1023 at most.
>
> But it seems to me that `*maxSize*` will always be < 1024 then why would `
> *rowIdx*` on the expression 
> `*valueLazyBinaryDeserializeToRow.deserialize(batch,
> rowIdx)*` have anything >= 1024.
>
> Am I missing something here?
>
> Thanks,
> Bernard
>
>
>
> On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon <
> bernard.qui...@cheetahdigital.com> wrote:
>
> Hi.
>
> I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into
> intermittent errors when doing Hive Merge.
>
> Just to clarify, the Hive Merge query probably succeeds 60% of the time
> using the same source and destination table for the Hive Merge query.
>
> By the way, both the source and destination table has columns with complex
> data types such as ARRAY<STRING> and MAP<STRING, STRING>.
>
>
>
> Here's the error :
>
> TaskAttempt 0 failed, info=
> » Error: Error while running task ( failure ) :
> attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing vector batch (tag=0) (vectorizedVertexNum 4)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470)
> ... 20 more
>
> Would someone know a workaround for this?
>
> Thanks,
> Bernard
>
>
>
>
>
>
>


-- 

Bernard Quizon

Staff Engineer


Web <https://cheetahdigital.com>  |  Blog <http://cheetahdigital.com/blog>
  |  Linkedin <http://www.linkedin.com/company/cheetahdigital/>  |  Twitter
<https://www.twitter.com/Cheetah_Digital/>  |  Facebook
<https://www.facebook.com/CheetahDigital/>


<https://drive.google.com/open?id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/open?id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt>
<https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6>
<https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6>
<https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6>
<https://cheetahdigital.com> <https://cheetahdigital.com>
<https://cheetahdigital.com> <https://cheetahdigital.com>

Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge

Reply via email to