Namit Jain created HIVE-3477:
--------------------------------

             Summary: Duplicate data possible with speculative execution for 
dynamic partitions
                 Key: HIVE-3477
                 URL: https://issues.apache.org/jira/browse/HIVE-3477
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Namit Jain


Consider a query like:

insert overwrite T partition (ds)
select * from
(mapreduce-subq1
  union all
mapreduce-subq2)x;

Once, mapreduce-subq1 and mapreduce-subq2 are done, the task for the union
is invoked. At the end of the union task, jobClose is invoked.

Note that there are 2 tablescan operators. The tree is something like:


TABLESCAN1  --
              \
               UNION -- SELECT -- FILESINK
              /
TABLESCAN2  --


In the current setup, jobClose will be invoked twice for FileSink.
In case of speculative execution, it is possible that data is still is
being written to tmp Dir. after jobClose is finished once. 

The correct fix would be to make sure that jobClose is only invoked once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to