Hi, Aaron. Thank you, your suggestion might have solved this issue. So far I haven't seen a failure after turning off vectorization. Though I don't think this is the best solution since turning it off has performance implications.
Thanks, Bernard On Tue, Jul 14, 2020 at 10:06 PM Aaron Grubb <aa...@kaden.ai> wrote: > This is just a suggestion but I recently ran into an issue with vectorized > query execution and a map column type, specifically when inserting into an > HBase table with a map to column family setup. Try using “set > hive.vectorized.execution.enabled=false;” > > > > Thanks, > > Aaron > > > > > > *From:* Bernard Quizon <bernard.qui...@cheetahdigital.com> > *Sent:* Tuesday, July 14, 2020 9:57 AM > *To:* user@hive.apache.org > *Subject:* Re: Intermittent ArrayIndexOutOfBoundsException on Hive Merge > > > > Hi. > > I see that this piece of code is the source of the error: > > final int maxSize = > (vectorizedTestingReducerBatchSize > 0 ? > Math.*min*(vectorizedTestingReducerBatchSize, batch.getMaxSize()) : > batch.getMaxSize()); > Preconditions.*checkState*(maxSize > 0); > int rowIdx = 0; > int batchBytes = keyBytes.length; > try { > for (Object value : values) { > if (rowIdx >= maxSize || > (rowIdx > 0 && batchBytes >= BATCH_BYTES)) { > > // Batch is full AND we have at least 1 more row... > batch.size = rowIdx; > if (handleGroupKey) { > reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ false); > } > reducer.process(batch, tag); > > // Reset just the value columns and value buffer. > for (int i = firstValueColumnOffset; i < batch.numCols; i++) { > // Note that reset also resets the data buffer for bytes column > vectors. > batch.cols[i].reset(); > } > rowIdx = 0; > batchBytes = keyBytes.length; > } > if (valueLazyBinaryDeserializeToRow != null) { > // Deserialize value into vector row columns. > BytesWritable valueWritable = (BytesWritable) value; > byte[] valueBytes = valueWritable.getBytes(); > int valueLength = valueWritable.getLength(); > batchBytes += valueLength; > > valueLazyBinaryDeserializeToRow.setBytes(valueBytes, 0, valueLength); > valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx); > } > rowIdx++; > } > > > > > > `*valueLazyBinaryDeserializeToRow.deserialize(batch, rowIdx)*` throws an > exception due to `*rowIdx*` having a value of 1024, it should have a > value of1023 at most. > > But it seems to me that `*maxSize*` will always be < 1024 then why would ` > *rowIdx*` on the expression > `*valueLazyBinaryDeserializeToRow.deserialize(batch, > rowIdx)*` have anything >= 1024. > > Am I missing something here? > > Thanks, > Bernard > > > > On Tue, Jul 14, 2020 at 5:44 PM Bernard Quizon < > bernard.qui...@cheetahdigital.com> wrote: > > Hi. > > I'm using Hive 3.1.0 (Tez Execution Engine) and I'm running into > intermittent errors when doing Hive Merge. > > Just to clarify, the Hive Merge query probably succeeds 60% of the time > using the same source and destination table for the Hive Merge query. > > By the way, both the source and destination table has columns with complex > data types such as ARRAY<STRING> and MAP<STRING, STRING>. > > > > Here's the error : > > TaskAttempt 0 failed, info= > » Error: Error while running task ( failure ) : > attempt_1594345704665_28139_1_06_000007_0:java.lang.RuntimeException: > java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) (vectorizedVertexNum 4) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) (vectorizedVertexNum 4) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:396) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing vector batch (tag=0) (vectorizedVertexNum 4) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:493) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeMapRowColumn(VectorDeserializeRow.java:855) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:941) > at > org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:470) > ... 20 more > > Would someone know a workaround for this? > > Thanks, > Bernard > > > > > > > -- Bernard Quizon Staff Engineer Web <https://cheetahdigital.com> | Blog <http://cheetahdigital.com/blog> | Linkedin <http://www.linkedin.com/company/cheetahdigital/> | Twitter <https://www.twitter.com/Cheetah_Digital/> | Facebook <https://www.facebook.com/CheetahDigital/> <https://drive.google.com/open?id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt> <https://drive.google.com/open?id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt> <https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt> <https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt> <https://drive.google.com/uc?export=view&id=1S8f5JdCzLVfN44u3V9m8zcNbvjxZO-nt> <https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6> <https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6> <https://drive.google.com/uc?export=view&id=17Ecb7eDIeJeAx4qb9jJNqASt2tCTiuD6> <https://cheetahdigital.com> <https://cheetahdigital.com> <https://cheetahdigital.com> <https://cheetahdigital.com>