[ https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071908#comment-17071908 ]
Marta Kuczora commented on HIVE-21164: -------------------------------------- Hi [~glapark], sorry for the late answer, I've spent quite a lot of time with analyzing this issue. The implementation in this patch is built upon the implementation of the insert-only (or also called as mm) tables. And it seems that that original implementation doesn't handle well the use-cases where multiple FileSinkOperators are present in one task and these FileSinkOperators are writing the same table. And the query you reproduced the issue with is exactly that type of query. This issue happens with multi-insert queries like the one you posted and only if dymanic partitions is involved. In this case we will end up with two FileSinkOperators within one task and each of them will write to the same table. Some basic steps what a FileSinkOperator does is the following: - Writes the data - When it's finished in the closeOp it creates a manifest file which will contain the successfully written data files. - Then in the end in the jobCloseOp it reads the manifest file and cleans up all files which are written to the table but not in the manifest file There are multiple places where problem can occur, it depends on in what order the closeOp and jobCloseOp methods of each FileSinkOperators are executed. It can cause collision in the manifest file creation as both FileSinkOperators will try to create it with the same path. It can also happen that one FileSinkOperator deletes the data written by the other FileSinkOperator (most likely this is what happens in your setup). It really depends on the order of execution of the FileSinkOperator's methods. So to summarize, this is really a design problem with the original implementation. It was a great catch for you, thank you again for it. I created HIVE-23114 for the fix and I also uploaded the first version of a patch. If you have some time, would you mind running your tests with that patch? I would appreciate it and I am really interested in the test results. > ACID: explore how we can avoid a move step during inserts/compaction > -------------------------------------------------------------------- > > Key: HIVE-21164 > URL: https://issues.apache.org/jira/browse/HIVE-21164 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 3.1.1 > Reporter: Vaibhav Gumashta > Assignee: Marta Kuczora > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch, > HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch, > HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch, > HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch, > HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch, > HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch, > HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch, > HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch > > > Currently, we write compacted data to a temporary location and then move the > files to a final location, which is an expensive operation on some cloud file > systems. Since HIVE-20823 is already in, it can control the visibility of > compacted data for the readers. Therefore, we can perhaps avoid writing data > to a temporary location and directly write compacted data to the intended > final path. -- This message was sent by Atlassian Jira (v8.3.4#803005)