[ https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118836#comment-14118836 ]
Sushanth Sowmyan commented on HIVE-4329: ---------------------------------------- Hi David, making a change to PassThroughOutputFormat so that that gets used does theoretically solve my concern on why the previous patch wouldn't work. At this point, I'll retract my objections, and I'll tag [~ashutoshc], [~alangates] or [~mithun] to see if any of them want to review your patch to get it in, since it is useful and does introduce some much needed functionality. (One note, we should add a simple test case, maybe by making another DummyIF/DummyOF that composes a TextIF/TextOF, and add tests to see if that works clearly with your change.) I will personally recuse myself from this, however, because while I'm completely agreed that Hive and HCatalog should use the same code to do I/O, I do still disagree with the direction of the change. HivePassThroughOutputFormat itself was intended to be a stopgap till we could fix Hive I/O to work off a generic M/R IF/OF and get rid of HIF/HOF. [~ashutoshc], [~alangates], [~mithun] : Could any of you please have a look at this jira and take on reviewing this? > HCatalog should use getHiveRecordWriter rather than getRecordWriter > ------------------------------------------------------------------- > > Key: HIVE-4329 > URL: https://issues.apache.org/jira/browse/HIVE-4329 > Project: Hive > Issue Type: Bug > Components: HCatalog, Serializers/Deserializers > Affects Versions: 0.14.0 > Environment: discovered in Pig, but it looks like the root cause > impacts all non-Hive users > Reporter: Sean Busbey > Assignee: David Chen > Attachments: HIVE-4329.0.patch, HIVE-4329.1.patch, HIVE-4329.2.patch, > HIVE-4329.3.patch > > > Attempting to write to a HCatalog defined table backed by the AvroSerde fails > with the following stacktrace: > {code} > java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be > cast to org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84) > at > org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253) > at > org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53) > at > org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242) > at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) > {code} > The proximal cause of this failure is that the AvroContainerOutputFormat's > signature mandates a LongWritable key and HCat's FileRecordWriterContainer > forces a NullWritable. I'm not sure of a general fix, other than redefining > HiveOutputFormat to mandate a WritableComparable. > It looks like accepting WritableComparable is what's done in the other Hive > OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also > be changed, since it's ignoring the key. That way fixing things so > FileRecordWriterContainer can always use NullWritable could get spun into a > different issue? > The underlying cause for failure to write to AvroSerde tables is that > AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so > fixing the above will just push the failure into the placeholder RecordWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)