Alex Nastetsky created HIVE-7186:
------------------------------------

             Summary: Unable to perform join on table
                 Key: HIVE-7186
                 URL: https://issues.apache.org/jira/browse/HIVE-7186
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.12.0
         Environment: Hortonworks Data Platform 2.0
            Reporter: Alex Nastetsky


Occasionally, a table will start exhibiting behavior that will prevent it from 
being used in a JOIN. 

When doing a map join, it will just stall at "Starting to launch local task to 
process map join; ".
When doing a regular join, it will make progress but then error out with a 
IndexOutOfBoundsException:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IndexOutOfBoundsException
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
        ... 9 more
Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:532)
        at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334)
        ... 15 more
        
Doing simple selects against this table work fine and do not show any apparent 
problems with the data.

Assume that the table in question is called tableA and was created by queryA.

Doing either of the following has helped resolve the issue in the past.

1) create table tableB as select * from tableA;

  Then just use tableB instead in the JOIN.

2) regenerate tableA using queryA

  Then use tableA in the JOIN again. It usually works the second time.
  

When doing a "describe formatted" on the tables, the totalSize will be 
different between the original tableA and tableB, and sometimes (but not 
always) between the original tableA and the regenerated tableA. The numRows 
will be the same across all versions of the tables.

This problem can not be reproduced consistently, but the issue always happens 
when we try to use an affected table in a JOIN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to