[ https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800689#comment-13800689 ]
Ankit Malhotra commented on HIVE-4175: -------------------------------------- For now, the only workaround that I have successfully tested is creating a default empty Sequence file for each partition. > Injection of emptyFile into input splits for empty partitions causes > Deserializer to fail > ----------------------------------------------------------------------------------------- > > Key: HIVE-4175 > URL: https://issues.apache.org/jira/browse/HIVE-4175 > Project: Hive > Issue Type: Bug > Affects Versions: 0.10.0 > Environment: CDH4.2, using MR1 > Reporter: James Kebinger > Priority: Minor > > My deserializer is expecting to receive one of 2 different subclasses of > Writable, but in certain circumstances it receives an empty instance of > org.apache.hadoop.io.Text. This only happens for task attempts where I > observe the file called "emptyFile" in the list of input splits. > I'm doing queries over an external year/month/day partitioned table that have > eagerly created partitions for, so as of today for example, I may do a query > where year = 2013 and month = 3 which includes empty partitions. > In the course of investigation I downloaded the sequence files to confirm > they were ok. Once I realized that processing of empty partitions was to > blame, I am able to work around the issue by bounding my queries to populated > partitions. > Can the need for the emptyFile be eliminated in the case where there's > already a bunch of splits being processed? Failing that, can the mapper > detect the current input is from emptyFile and not call the deserializer. -- This message was sent by Atlassian JIRA (v6.1#6144)