[jira] [Commented] (HIVE-11456) HCatStorer should honor mapreduce.output.basename

Mithun Radhakrishnan (JIRA) Wed, 05 Aug 2015 14:29:38 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658981#comment-14658981
 ]


Mithun Radhakrishnan commented on HIVE-11456:
---------------------------------------------

Thanks for the heads-up, Sush. On the face of it, 
# In the first place, I'm not sure this'd be a problem, since Pig assumes 
"insert-overwrite" semantics, as opposed to "insert into". But then again, this 
is dynamic-partitioning, so it's not like Pig could check the output directory, 
_a priori_.
# Pig 0.14 uses prefixes (vertex_id + edge_id) to make sure the file is unique. 
I don't foresee that the suffix might mess with it.

Permit me to ruminate on this.




> HCatStorer should honor mapreduce.output.basename
> -------------------------------------------------
>
>                 Key: HIVE-11456
>                 URL: https://issues.apache.org/jira/browse/HIVE-11456
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Mithun Radhakrishnan
>            Priority: Critical
>             Fix For: 1.3.0, 1.2.1, 2.0.0
>
>         Attachments: HIVE-11456.1.patch
>
>
> Pig on Tez scripts with union directly followed by HCatStorer have a problem 
> due to HCatStorer not honoring mapreduce.output.basename and always using 
> part. Tez sets mapreduce.output.basename to part-v000-o000 (vertex id 
> followed by output id). With union optimizer, Pig uses vertex groups to write 
> directly from both the vertices to the final output directory. Since hcat 
> ignores the mapreduce.output.basename, both the vertices produce 
> part-r-0000<n> and when they are moved from the temp location to the final 
> directory, they just overwrite each other. There is no failure and only one 
> of the files with that name makes it into the final directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11456) HCatStorer should honor mapreduce.output.basename

Reply via email to