[ https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794727#comment-13794727 ]
Ankit Malhotra commented on HIVE-4175: -------------------------------------- [~ashutoshc] Assuming 11 has data and 12 does not: {code} select count(*), object_type from test_proto where dh IN ('2013-10-14 11', '2013-10-14 12') group by object_type; {code} [ProtobufDeserializer.java|https://github.com/kevinweil/elephant-bird/blob/master/hive/src/main/java/com/twitter/elephantbird/hive/serde/ProtobufDeserializer.java?source=c#L56] throws a ClassCastException as seen above. On further digging, the blob of type Text is empty and blob.getLength() returned 0. > Injection of emptyFile into input splits for empty partitions causes > Deserializer to fail > ----------------------------------------------------------------------------------------- > > Key: HIVE-4175 > URL: https://issues.apache.org/jira/browse/HIVE-4175 > Project: Hive > Issue Type: Bug > Affects Versions: 0.10.0 > Environment: CDH4.2, using MR1 > Reporter: James Kebinger > Priority: Minor > > My deserializer is expecting to receive one of 2 different subclasses of > Writable, but in certain circumstances it receives an empty instance of > org.apache.hadoop.io.Text. This only happens for task attempts where I > observe the file called "emptyFile" in the list of input splits. > I'm doing queries over an external year/month/day partitioned table that have > eagerly created partitions for, so as of today for example, I may do a query > where year = 2013 and month = 3 which includes empty partitions. > In the course of investigation I downloaded the sequence files to confirm > they were ok. Once I realized that processing of empty partitions was to > blame, I am able to work around the issue by bounding my queries to populated > partitions. > Can the need for the emptyFile be eliminated in the case where there's > already a bunch of splits being processed? Failing that, can the mapper > detect the current input is from emptyFile and not call the deserializer. -- This message was sent by Atlassian JIRA (v6.1#6144)