I've opened HIVE-18817<https://issues.apache.org/jira/browse/HIVE-18817> for this.
________________________________ From: Aviral Agarwal <aviral12...@gmail.com> Sent: Thursday, February 15, 2018 6:11 PM To: user@hive.apache.org Subject: Re: ORC ACID table returning Array Index Out of Bounds Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0) For now I have mitigated the problem by recreating the table. So, I don't have the relevant ORC files right now. Also, I am curious, how would "hive.acid.key.index" help in debugging this problem ? I was going through the source code and it seems the following line is the problem: /** * Find the key range for bucket files. * @param reader the reader * @param options the options for reading with * @throws IOException */ private void discoverKeyBounds(Reader reader, Reader.Options options) throws IOException { RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader); long offset = options.getOffset(); long maxOffset = options.getMaxOffset(); int firstStripe = 0; int stripeCount = 0; boolean isTail = true; List<StripeInformation> stripes = reader.getStripes(); for(StripeInformation stripe: stripes) { if (offset > stripe.getOffset()) { firstStripe += 1; } else if (maxOffset > stripe.getOffset()) { stripeCount += 1; } else { isTail = false; break; } } if (firstStripe != 0) { minKey = keyIndex[firstStripe - 1]; } if (!isTail) { maxKey = keyIndex[firstStripe + stripeCount - 1]; } } If this is still an open issue I would like to submit a patch to it. Let me know how can I further debug this issue. Thanks, Aviral Agarwal On Feb 15, 2018 23:10, "Eugene Koifman" <ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote: What version of Hive is this? Can you isolate this to a specific partition? The table/partition you are reading should have a directory called base_x/ with several bucket_0000N files. (if you see more than 1 base_x, take one with highest x) Each bucket_0000N should have a “hive.acid.key.index” property in user metadata section of ORC footer. Could you share the value of this property? You can use orcfiledump (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility) for this but it requires https://issues.apache.org/jira/browse/ORC-223. Thanks, Eugene From: Aviral Agarwal <aviral12...@gmail.com<mailto:aviral12...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Thursday, February 15, 2018 at 2:08 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: ORC ACID table returning Array Index Out of Bounds Hi guys, I am running into the following error when querying a ACID table : Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8 at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149) ... 14 more Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8 at org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:253) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) ... 25 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 8 at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:251) ... 26 more Any help would be appreciated. Regards, Aviral Agarwal