Furthure found: Works: select deviceid from (select deviceid from test1)t1 left outer join (select deviceid from test2)t2 on t1.deviceid=t2. deviceid; Failed: select deviceid from (select distinct deviceid from test1)t1 left outer join (select deviceid from test2)t2 on t1.deviceid=t2. deviceid;
Difference: The second query has two mr jobs, first count distinct id for test1, and then join with test2. job1's output file's format is sequence file, and map task with job1's output has other paths of test2 file in split, but there's only one input format "SeuqenceFileInputFormat" So is there any change between hadoop 2.4 and 2.0 for combine split can cause this happen? Thanks for help. 发件人: Peng <[email protected]<mailto:[email protected]>> 日期: Wed, 15 Oct 2014 23:17:51 +0800 至: <[email protected]<mailto:[email protected]>> 主题: Textfile table but some map task try to use SequenceFile reader We upgraded from hadoop-2.0.0 to hadoop-2.4.0, without upgrading hive and still using hive 0.9(not recompiled with hadoop2.4) Normal queries work well, like count and udf, but some queries with JOIN failed. I found some map tasks failed because HIVE treat inputs type wrong. Input table stored as textfile scan but some map splits format type is SequenceFileInputFormat, ant others are TextInputFormat. I know Hive 0.9 is very old,but I can't figure out what difference between hadoop 2.4 and 2.0 caused this weired result. Thanks for help. Failed map task log as below: 2014-10-15 22:44:41,320 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 5 finished. closing... 2014-10-15 22:44:41,320 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 5 forwarded 0 rows 2014-10-15 22:44:41,320 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 4 Close done 2014-10-15 22:44:41,320 INFO [main] org.apache.hadoop.hive.ql.exec.FilterOperator: 3 Close done 2014-10-15 22:44:41,320 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 2 Close done 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 99481 rows 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 1 finished. closing... 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 1 forwarded 0 rows 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 10 Close done 2014-10-15 22:44:41,321 INFO [main] ExecMapper: ExecMapper: processed 99481 rows: used memory = 187570680 2014-10-15 22:44:41,327 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:350) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:229) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1589) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:336) ... 11 more Caused by: java.io.IOException: hdfs://cluster/new_user/createdate=2013-05-21/2013-05-21_204 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1854) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1814) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1763) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1777) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:51) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) ... 16 more
