[ https://issues.apache.org/jira/browse/HIVE-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658981#comment-14658981 ]
Mithun Radhakrishnan commented on HIVE-11456: --------------------------------------------- Thanks for the heads-up, Sush. On the face of it, # In the first place, I'm not sure this'd be a problem, since Pig assumes "insert-overwrite" semantics, as opposed to "insert into". But then again, this is dynamic-partitioning, so it's not like Pig could check the output directory, _a priori_. # Pig 0.14 uses prefixes (vertex_id + edge_id) to make sure the file is unique. I don't foresee that the suffix might mess with it. Permit me to ruminate on this. > HCatStorer should honor mapreduce.output.basename > ------------------------------------------------- > > Key: HIVE-11456 > URL: https://issues.apache.org/jira/browse/HIVE-11456 > Project: Hive > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Rohini Palaniswamy > Assignee: Mithun Radhakrishnan > Priority: Critical > Fix For: 1.3.0, 1.2.1, 2.0.0 > > Attachments: HIVE-11456.1.patch > > > Pig on Tez scripts with union directly followed by HCatStorer have a problem > due to HCatStorer not honoring mapreduce.output.basename and always using > part. Tez sets mapreduce.output.basename to part-v000-o000 (vertex id > followed by output id). With union optimizer, Pig uses vertex groups to write > directly from both the vertices to the final output directory. Since hcat > ignores the mapreduce.output.basename, both the vertices produce > part-r-0000<n> and when they are moved from the temp location to the final > directory, they just overwrite each other. There is no failure and only one > of the files with that name makes it into the final directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)