Alex Nastetsky created HIVE-7186:
------------------------------------
Summary: Unable to perform join on table
Key: HIVE-7186
URL: https://issues.apache.org/jira/browse/HIVE-7186
Project: Hive
Issue Type: Bug
Affects Versions: 0.12.0
Environment: Hortonworks Data Platform 2.0
Reporter: Alex Nastetsky
Occasionally, a table will start exhibiting behavior that will prevent it from
being used in a JOIN.
When doing a map join, it will just stall at "Starting to launch local task to
process map join; ".
When doing a regular join, it will make progress but then error out with a
IndexOutOfBoundsException:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IndexOutOfBoundsException
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 9 more
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153)
at
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334)
... 15 more
Doing simple selects against this table work fine and do not show any apparent
problems with the data.
Assume that the table in question is called tableA and was created by queryA.
Doing either of the following has helped resolve the issue in the past.
1) create table tableB as select * from tableA;
Then just use tableB instead in the JOIN.
2) regenerate tableA using queryA
Then use tableA in the JOIN again. It usually works the second time.
When doing a "describe formatted" on the tables, the totalSize will be
different between the original tableA and tableB, and sometimes (but not
always) between the original tableA and the regenerated tableA. The numRows
will be the same across all versions of the tables.
This problem can not be reproduced consistently, but the issue always happens
when we try to use an affected table in a JOIN.
--
This message was sent by Atlassian JIRA
(v6.2#6252)