[ https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941603#comment-14941603 ]
Aaron Dossett commented on HIVE-11977: -------------------------------------- [~ashutoshc] Thank you for your response! My thought is that any process for generating this data could have failure scenarios that result in zero length files, this was the case when I initially ran into this issue. A file was opened on HDFS and "held" as zero length file before data was written to it, and it crashed before any data could be written. The consequences of these cases, that the entire table is unreadable (based on my experience), seems disproportionate to the actual problem. Likewise, a process deleting empty files could expose small windows where the table was unusable. Would adding a warning and/or adding an option like {{hive.exec.orc.skip.corrupt.data}} be more appropriate than silently ignoring the files? This is my first foray into Hive internals, so perhaps that orc option is not an exact comparison to this situation, but as a user it seems similar. Thank you again for the response and your feedback! > Hive should handle an external avro table with zero length files present > ------------------------------------------------------------------------ > > Key: HIVE-11977 > URL: https://issues.apache.org/jira/browse/HIVE-11977 > Project: Hive > Issue Type: Bug > Reporter: Aaron Dossett > Assignee: Aaron Dossett > Attachments: HIVE-11977-2.patch, HIVE-11977.patch > > > If a zero length file is in the top level directory housing an external avro > table, all hive queries on the table fail. > This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader > creates a new org.apache.avro.file.DataFileReader and DataFileReader throws > an exception when trying to read an empty file (because the empty file lacks > the magic number marking it as avro). > AvroGenericRecordReader should detect an empty file and then behave > reasonably. > Caused by: java.io.IOException: Not a data file. > at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102) > at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) > at > org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81) > at > org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246) > ... 25 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)