Resending after disabling security signing..


From: Valluri, Sathish [mailto:sathish.vall...@emc.com]
Sent: Wednesday, October 30, 2013 2:17 PM
To: user@hive.apache.org
Subject: Any sugesstions java.io.IOException: Not a data file error



Hi All,



Hive Mapreduce jobs failing with the following java.io.IOException: Not a data 
file error if there are files other than avro in the HDFS.

I have created a Hive external table as shown below,



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ <schema json literal>') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Running select count(*) from testable;



When /testdata contains avro files the query works fine and gives the results 
properly.

If the /testdata have some other format files let's say /testdata/test.txt the 
query is failing with the following error.



java.io.IOException: java.lang.reflect.InvocationTargetException at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: 
java.lang.reflect.InvocationTargetException at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327)
 ... 11 more Caused by: java.io.IOException: Not a data file. at 
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at 
org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at 
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:72)
 at 
org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
 ... 16 more





Can anyone suggest any parameter or any changes needs to be made for the query 
to be successful. Basically Hive should skip the other format files and load 
only the avro files when processing data on the HDFS.



Waiting for any suggestions to resolve this issue.



Regards

Sathish Valluri

Reply via email to