Hi. I see that this piece of code is the source of the error:
final int maxSize = (vectorizedTestingReducerBatchSize > 0 ? Math.min(vectorizedTestingReducerBatchSize, batch.getMaxSize()) : batch.getMaxSize()); Preconditions.checkState(maxSize > 0); int rowIdx = 0; int batchBytes = keyBytes.length; try { for (Object value : values) { if (rowIdx >= maxSize || (rowIdx > 0 && batchBytes >= BATCH_BYTES)) { // Batch is full AND we have at least 1 more row... batch.size = rowIdx; if (handleGroupKey) { reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false); } reducer.process(batch, tag); // Reset just the value columns and value buffer. for (int i = firstValueColumnOffset; i < batch.numCols; i++) { // Note that reset also resets the data buffer for bytes column vectors. batch.cols[i].reset(); } rowIdx = 0; batchBytes = keyBytes.length; } if (valueLazyBinaryDeserializeToRow != null) { // Deserialize value into vector row columns. BytesWritable valueWritable = (BytesWritable) value; byte[] valueBytes = valueWritable.getBytes(); int valueLength = valueWritable.getLength(); batchBytes += valueLength; valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength); valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx); } rowIdx++; } `*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` throws an exception due to `*rowIdx*` having a value of 1024, it should have a value of1023 at most. But it seems to me that `*maxSize*` will always be < 1024 then why would ` *rowIdx*` on the expression `*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` have anything >= 1024. Am I missing something here? Thanks, Bernard On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon < bernard.qui...@cheetahdigital.com> wrote: > Hi. > > I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into > intermittent errors when doing Hive Merge. > > Just to clarify, the Hive Merge query probably succeeds 60% of the time > using the same source and destination table for the Hive Merge query. > > By the way, both the source and destination table has columns with complex > data types such as ARRAY<STRING> and MAP<STRING, STRING>. > > > Here's the error : > > TaskAttempt 0 failed, info= > ยป Error: Error while running task ( failure ) : > attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException: > java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) (vectorizedVertexNum 4) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) (vectorizedVertexNum 4) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing vector batch (tag=0) (vectorizedVertexNum 4) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470) > ... 20 more > > Would someone know a workaround for this? > > Thanks, > Bernard >