Alex Nastetsky created HIVE-7186: ------------------------------------ Summary: Unable to perform join on table Key: HIVE-7186 URL: https://issues.apache.org/jira/browse/HIVE-7186 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Hortonworks Data Platform 2.0 Reporter: Alex Nastetsky
Occasionally, a table will start exhibiting behavior that will prevent it from being used in a JOIN. When doing a map join, it will just stall at "Starting to launch local task to process map join; ". When doing a regular join, it will make progress but then error out with a IndexOutOfBoundsException: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) ... 9 more Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:532) at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334) ... 15 more Doing simple selects against this table work fine and do not show any apparent problems with the data. Assume that the table in question is called tableA and was created by queryA. Doing either of the following has helped resolve the issue in the past. 1) create table tableB as select * from tableA; Then just use tableB instead in the JOIN. 2) regenerate tableA using queryA Then use tableA in the JOIN again. It usually works the second time. When doing a "describe formatted" on the tables, the totalSize will be different between the original tableA and tableB, and sometimes (but not always) between the original tableA and the regenerated tableA. The numRows will be the same across all versions of the tables. This problem can not be reproduced consistently, but the issue always happens when we try to use an affected table in a JOIN. -- This message was sent by Atlassian JIRA (v6.2#6252)