[jira] [Commented] (HIVE-8687) Support Avro through HCatalog

Sushanth Sowmyan (JIRA) Fri, 31 Oct 2014 15:24:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192621#comment-14192621
 ]


Sushanth Sowmyan commented on HIVE-8687:
----------------------------------------

Wow, thanks for the quick turnaround on the reviews, guys!

@Brock : yeah, that part was something I had to look at for a while to see if I 
could do a better job of. It will be a part-0000 file when invoked from within 
DefaultFileRecordWriterContainer. And at that time, the isDirectory() call will 
fail, and thus, will be treated as-is.

The problem that's trying to solve is that pig winds up doing some dummy 
initialization calls in the beginning  of the job, and at that time, the result 
of mapred.output.dir will be a scratch dir, not the eventual file being written 
to. That file should be a temporary only file, which gets removed on cleanup.

> Support Avro through HCatalog
> -----------------------------
>
>                 Key: HIVE-8687
>                 URL: https://issues.apache.org/jira/browse/HIVE-8687
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Serializers/Deserializers
>    Affects Versions: 0.14.0
>         Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8687.branch-0.14.patch, HIVE-8687.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>       at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>       at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>       at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>       at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>       at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>       at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8687) Support Avro through HCatalog

Reply via email to