Namit Jain created HIVE-3477: -------------------------------- Summary: Duplicate data possible with speculative execution for dynamic partitions Key: HIVE-3477 URL: https://issues.apache.org/jira/browse/HIVE-3477 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain
Consider a query like: insert overwrite T partition (ds) select * from (mapreduce-subq1 union all mapreduce-subq2)x; Once, mapreduce-subq1 and mapreduce-subq2 are done, the task for the union is invoked. At the end of the union task, jobClose is invoked. Note that there are 2 tablescan operators. The tree is something like: TABLESCAN1 -- \ UNION -- SELECT -- FILESINK / TABLESCAN2 -- In the current setup, jobClose will be invoked twice for FileSink. In case of speculative execution, it is possible that data is still is being written to tmp Dir. after jobClose is finished once. The correct fix would be to make sure that jobClose is only invoked once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira