[ https://issues.apache.org/jira/browse/HIVE-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458844#comment-13458844 ]
Namit Jain commented on HIVE-3477: ---------------------------------- All the tests passed. [~ashutoshc], we can can definitely discuss it in a new jira. We dont use OutputCommitter at all right now - since we write into tmp. directories and then move the temp. dirs into the final dir. > Duplicate data possible with speculative execution for dynamic partitions > ------------------------------------------------------------------------- > > Key: HIVE-3477 > URL: https://issues.apache.org/jira/browse/HIVE-3477 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Namit Jain > Assignee: Namit Jain > Attachments: hive.3477.1.patch > > > Consider a query like: > insert overwrite T partition (ds) > select * from > (mapreduce-subq1 > union all > mapreduce-subq2)x; > Once, mapreduce-subq1 and mapreduce-subq2 are done, the task for the union > is invoked. At the end of the union task, jobClose is invoked. > Note that there are 2 tablescan operators. The tree is something like: > TABLESCAN1 -- > \ > UNION -- SELECT -- FILESINK > / > TABLESCAN2 -- > In the current setup, jobClose will be invoked twice for FileSink. > In case of speculative execution, it is possible that data is still is > being written to tmp Dir. after jobClose is finished once. > The correct fix would be to make sure that jobClose is only invoked once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira