[ https://issues.apache.org/jira/browse/HIVE-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038830#comment-14038830 ]
Alex Nastetsky commented on HIVE-7186: -------------------------------------- I just saw a similar problem with with a different stacktrace. This time, the join got to the very end of the job and failed as it finished: {code} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599) Caused by: java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:962) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475) {code} > Unable to perform join on table > ------------------------------- > > Key: HIVE-7186 > URL: https://issues.apache.org/jira/browse/HIVE-7186 > Project: Hive > Issue Type: Bug > Affects Versions: 0.12.0 > Environment: Hortonworks Data Platform 2.0.6.0 > Reporter: Alex Nastetsky > > Occasionally, a table will start exhibiting behavior that will prevent it > from being used in a JOIN. > When doing a map join, it will just stall at "Starting to launch local task > to process map join; ". > When doing a regular join, it will make progress but then error out with a > IndexOutOfBoundsException: > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > ... 9 more > Caused by: java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkIndex(Buffer.java:532) > at > java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153) > at > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334) > ... 15 more > > Doing simple selects against this table work fine and do not show any > apparent problems with the data. > Assume that the table in question is called tableA and was created by queryA. > Doing either of the following has helped resolve the issue in the past. > 1) create table tableB as select * from tableA; > Then just use tableB instead in the JOIN. > 2) regenerate tableA using queryA > Then use tableA in the JOIN again. It usually works the second time. > > When doing a "describe formatted" on the tables, the totalSize will be > different between the original tableA and tableB, and sometimes (but not > always) between the original tableA and the regenerated tableA. The numRows > will be the same across all versions of the tables. > This problem can not be reproduced consistently, but the issue always happens > when we try to use an affected table in a JOIN. -- This message was sent by Atlassian JIRA (v6.2#6252)