Re: ORC ACID table returning Array Index Out of Bounds

Jason Dere Tue, 27 Feb 2018 15:50:38 -0800

I've opened HIVE-18817<https://issues.apache.org/jira/browse/HIVE-18817> for 
this.



________________________________
From: Aviral Agarwal <aviral12...@gmail.com>
Sent: Thursday, February 15, 2018 6:11 PM
To: user@hive.apache.org
Subject: Re: ORC ACID table returning Array Index Out of Bounds

Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)

For now I have mitigated the problem by recreating the table. So, I don't have 
the relevant ORC files right now.

Also, I am curious, how would "hive.acid.key.index" help in debugging this 
problem ?

I was going through the source code and it seems the following line is the 
problem:


/**
 * Find the key range for bucket files.
 * @param reader the reader
 * @param options the options for reading with
 * @throws IOException
 */
private void discoverKeyBounds(Reader reader,
                               Reader.Options options) throws IOException {
  RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
  long offset = options.getOffset();
  long maxOffset = options.getMaxOffset();
  int firstStripe = 0;
  int stripeCount = 0;
  boolean isTail = true;
  List<StripeInformation> stripes = reader.getStripes();
  for(StripeInformation stripe: stripes) {
    if (offset > stripe.getOffset()) {
      firstStripe += 1;
    } else if (maxOffset > stripe.getOffset()) {
      stripeCount += 1;
    } else {
      isTail = false;
      break;
    }
  }
  if (firstStripe != 0) {
    minKey = keyIndex[firstStripe - 1];
  }
  if (!isTail) {
    maxKey = keyIndex[firstStripe + stripeCount - 1];
  }
}

If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.

Thanks,
Aviral Agarwal

On Feb 15, 2018 23:10, "Eugene Koifman" 
<ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote:
What version of Hive is this?

Can you isolate this to a specific partition?

The table/partition you are reading should have a directory called base_x/ with 
several bucket_0000N files.  (if you see more than 1 base_x, take one with 
highest x)

Each bucket_0000N should have a “hive.acid.key.index” property in user metadata 
section of ORC footer.
Could you share the value of this property?

You can use orcfiledump 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility)
 for this but it requires https://issues.apache.org/jira/browse/ORC-223.

Thanks,
Eugene


From: Aviral Agarwal <aviral12...@gmail.com<mailto:aviral12...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, February 15, 2018 at 2:08 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: ORC ACID table returning Array Index Out of Bounds

Hi guys,

I am running into the following error when querying a ACID table :


Caused by: java.lang.RuntimeException: java.io.IOException: 
java.lang.ArrayIndexOutOfBoundsException: 8

        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)

        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)

        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)

        at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)

        at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)

        at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)

        at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)

        at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)

        at 
org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)

        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)

        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)

        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)

        ... 14 more

Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8

        at 
org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)

        at 
org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)

        at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)

        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)

        ... 25 more

Caused by: java.lang.ArrayIndexOutOfBoundsException: 8

        at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)

        at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)

        at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)

        at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)

        at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)

        ... 26 more



Any help would be appreciated.


Regards,

Aviral Agarwal

Re: ORC ACID table returning Array Index Out of Bounds

Reply via email to