Map join followed by multi-table insert will generate duplicated result -----------------------------------------------------------------------
Key: HIVE-1997 URL: https://issues.apache.org/jira/browse/HIVE-1997 Project: Hive Issue Type: Bug Reporter: Ted Xu Fix For: 0.7.0 Map join followed by multi-table insert will generate duplicated result, if the insert targets contain both direct insert (FileSinkOperator logic) and group by/distribute by (ReduceSinkOperator logic). The following query regenerate the case: {code} FROM (SELECT /*+ MAPJOIN(x) */ x.key as key1, x.value as value1, y.key as key2, y.value as value2 FROM src1 x JOIN src y ON (x.key = y.key)) subq INSERT OVERWRITE TABLE destpart PARTITION (ds='2010-12-12') SELECT key1, value1 INSERT OVERWRITE TABLE destpart PARTITION (ds='2010-12-13') SELECT key2, value2 GROUP BY key2, value2; {code} In that query above, records of table destpart(ds='2010-12-12') is duplicated. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira