[ https://issues.apache.org/jira/browse/HIVE-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593084#comment-15593084 ]
Gopal V commented on HIVE-14535: -------------------------------- > Do you think it would be reasonable to commit the changes to the > FileSinkOperator without the rest of the MM tables support? No, a direct output committer approach without query isolation has lost data for production customers before, by forcing multiple tasks to write to the same file-name by accident - due to the way checksum-safety works, the first writer is not the winner in failure-tolerance scenarios. We want to prevent users from making such expensive mistakes again, so this patch isolates different queries from each other - without which you will stomp over files. > I know there are some concerns that this "direct output committer" approach > could cause data corruption issues, is this something was considered > explicitly in the design? If so, could you expand on why those data > corruption issues would occur? Without the isolation fix, the other parts are dangerous to use. With the isolation in place, the system moves away from the move model to a cleanup model (the cleanup code already exists, it is just applied to the scratch dir today). > add micromanaged tables to Hive (metastore keeps track of the files) > -------------------------------------------------------------------- > > Key: HIVE-14535 > URL: https://issues.apache.org/jira/browse/HIVE-14535 > Project: Hive > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > Design doc: > https://docs.google.com/document/d/1b3t1RywfyRb73-cdvkEzJUyOiekWwkMHdiQ-42zCllY > Feel free to comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)