Situation: I have an external avro table in Hive.  Under certain circumstances 
zero length files can end up in the top level directory housing the external 
data.  This causes all hive queries on the table to fail.  This is with Hive 
0.14, but looking at current code base I think the same problem would occur 
with the current code.  ( A stack trace is below.)

This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
creates a new org.apache.avro.file.DataFileReader and DataFileReader throws an 
exception when trying to read an empty file (because the empty file lacks the 
magic number marking it as avro).  It seems like it be straight forward to 
modify AvroGenericRecordReader to detect an empty file and then behave 
sensibly.  For example, next() would always return false; getPos() would return 
zero, etc.

If that approach sounds sensible I will open a JIRA and take a stab at a patch. 
 Thank you in advance for any feedback!

-Aaron

Caused by: java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at 
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
at 
org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
... 25 more

Reply via email to