Hi All,

 

Hive Mapreduce jobs failing with the following java.io.IOException: Not a
data file error if there are files other than avro in the HDFS.

I have created a Hive external table as shown below,

 

CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES
('avro.schema.literal'='{ <schema json literal>') STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
'/testdata/';

 

Running select count(*) from testable;

 

When /testdata contains avro files the query works fine and gives the
results properly.

If the /testdata have some other format files let's say /testdata/test.txt
the query is failing with the following error.

 

java.io.IOException: java.lang.reflect.InvocationTargetException at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCrea
tionException(HiveIOExceptionHandlerChain.java:97) at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreat
ionException(HiveIOExceptionHandlerUtil.java:57) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN
extRecordReader(HadoopShimsSecure.java:341) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(
HadoopShimsSecure.java:220) at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java
:215) at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at
org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:57) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:45) at
java.lang.reflect.Constructor.newInstance(Constructor.java:525) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN
extRecordReader(HadoopShimsSecure.java:327) ... 11 more Caused by:
java.io.IOException: Not a data file. at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at
org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGeneric
RecordReader.java:72) at
org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(A
vroContainerInputFormat.java:51) at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecor
dReader.java:65) ... 16 more

 

 

Can anyone suggest any parameter or any changes needs to be made for the
query to be successful. Basically Hive should skip the other format files
and load only the avro files when processing data on the HDFS.

 

Waiting for any suggestions to resolve this issue.

 

Regards

Sathish Valluri

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to