[ https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Drome updated HIVE-13756: ------------------------------- Attachment: HIVE-13756-branch-1.patch HIVE-13756.patch Attached patches for branch-1 and master. > Map failure attempts to delete reducer _temporary directory on multi-query > pig query > ------------------------------------------------------------------------------------ > > Key: HIVE-13756 > URL: https://issues.apache.org/jira/browse/HIVE-13756 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 1.2.1, 2.0.0 > Reporter: Chris Drome > Assignee: Chris Drome > Attachments: HIVE-13756-branch-1.patch, HIVE-13756.patch > > > A pig script, executed with multi-query enabled, that reads the source data > and writes it as-is into TABLE_A as well as performing a group-by operation > on the data which is written into TABLE_B can produce erroneous results if > any map fails. This results in a single MR job that writes the map output to > a scratch directory relative to TABLE_A and the reducer output to a scratch > directory relative to TABLE_B. > If one or more maps fail it will delete the attempt data relative to TABLE_A, > but it also deletes the _temporary directory relative to TABLE_B. This has > the unintended side-effect of preventing subsequent maps from committing > their data. This means that any maps which successfully completed before the > first map failure will have its data committed as expected, other maps not, > resulting in an incomplete result set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)