Saurabh Seth created HIVE-20664:
-----------------------------------

             Summary: Potential ArrayIndexOutOfBoundsException in 
VectorizedOrcAcidRowBatchReader.findMinMaxKeys
                 Key: HIVE-20664
                 URL: https://issues.apache.org/jira/browse/HIVE-20664
             Project: Hive
          Issue Type: Bug
          Components: Transactions
            Reporter: Saurabh Seth
            Assignee: Saurabh Seth


[~ekoifman], could you please confirm if my understanding is correct and if so, 
review the fix?

In the method {{VectorizedOrcAcidRowBatchReader.findMinMaxKeys}}, the code 
snippet that identifies the first and last stripe indices in the current split 
could result in an ArrayIndexOutOfBoundsException if a complete split is within 
the same stripe:
{noformat}
    for(int i = 0; i < stripes.size(); i++) {
      StripeInformation stripe = stripes.get(i);
      long stripeEnd = stripe.getOffset() + stripe.getLength();
      if(firstStripeIndex == -1 && stripe.getOffset() >= splitStart) {
        firstStripeIndex = i;
      }
      if(lastStripeIndex == -1 && splitEnd <= stripeEnd &&
          stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset() ) {
        //the last condition is for when both splitStart and splitEnd are in
        // the same stripe
        lastStripeIndex = i;
      }
    }
{noformat}
Consider the example where there are 2 stripes - 0-500 and 500-1000 and 
splitStart is 600 and splitEnd is 800.

In the first iteration of the loop, stripe.getOffset() is 0 and stripeEnd is 
500. In this iteration, neither of the if statement conditions will be met and 
firstSripeIndex as well as lastStripeIndex remain -1.

In the second iteration of the loop stripe.getOffset() is 500, stripeEnd is 
1000, The first if statement condition will not be met in this case because 
stripe's offset (500) is not greater than or equal to the splitStart (600). 
However, in the second if statement, splitEnd (800) is <= stripeEnd(1000) and 
it will try to compute the last condition 
{{stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset()}}. This will 
throw an ArrayIndexOutOfBoundsException because firstStripeIndex is still -1.

I'm not sure if this scenario is possible at all, hence logging this as a low 
priority issue. Perhaps block based split generation using BISplitStrategy 
could trigger this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to