Hi All,
Hive Mapreduce jobs failing with the following java.io.IOException: Not a data file error if there are files other than avro in the HDFS. I have created a Hive external table as shown below, CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ <schema json literal>') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Running select count(*) from testable; When /testdata contains avro files the query works fine and gives the results properly. If the /testdata have some other format files let's say /testdata/test.txt the query is failing with the following error. java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCrea tionException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreat ionException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN extRecordReader(HadoopShimsSecure.java:341) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next( HadoopShimsSecure.java:220) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java :215) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces sorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc torAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN extRecordReader(HadoopShimsSecure.java:327) ... 11 more Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGeneric RecordReader.java:72) at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(A vroContainerInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecor dReader.java:65) ... 16 more Can anyone suggest any parameter or any changes needs to be made for the query to be successful. Basically Hive should skip the other format files and load only the avro files when processing data on the HDFS. Waiting for any suggestions to resolve this issue. Regards Sathish Valluri
smime.p7s
Description: S/MIME cryptographic signature