[ 
https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071908#comment-17071908
 ] 

Marta Kuczora commented on HIVE-21164:
--------------------------------------

Hi [~glapark],

sorry for the late answer, I've spent quite a lot of time with analyzing this 
issue.
The implementation in this patch is built upon the implementation of the 
insert-only (or also called as mm) tables. And it seems that that original 
implementation doesn't handle well the use-cases where multiple 
FileSinkOperators are present in one task and these FileSinkOperators are 
writing the same table. And the query you reproduced the issue with is exactly 
that type of query.
This issue happens with multi-insert queries like the one you posted and only 
if dymanic partitions is involved. In this case we will end up with two 
FileSinkOperators within one task and each of them will write to the same table.
Some basic steps what a FileSinkOperator does is the following:
- Writes the data
- When it's finished in the closeOp it creates a manifest file which will 
contain the successfully written data files. 
- Then in the end in the jobCloseOp it reads the manifest file and cleans up 
all files which are written to the table but not in the manifest file
There are multiple places where problem can occur, it depends on in what order 
the closeOp and jobCloseOp methods of each FileSinkOperators are executed.
It can cause collision in the manifest file creation as both FileSinkOperators 
will try to create it with the same path. It can also happen that one 
FileSinkOperator deletes the data written by the other FileSinkOperator (most 
likely this is what happens in your setup). It really depends on the order of 
execution of the FileSinkOperator's methods. 

So to summarize, this is really a design problem with the original 
implementation. It was a great catch for you, thank you again for it.
I created HIVE-23114 for the fix and I also uploaded the first version of a 
patch. If you have some time, would you mind running your tests with that 
patch? I would appreciate it and I am really interested in the test results.


> ACID: explore how we can avoid a move step during inserts/compaction
> --------------------------------------------------------------------
>
>                 Key: HIVE-21164
>                 URL: https://issues.apache.org/jira/browse/HIVE-21164
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Vaibhav Gumashta
>            Assignee: Marta Kuczora
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch, 
> HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch, 
> HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch, 
> HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch, 
> HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch, 
> HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch, 
> HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch, 
> HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch
>
>
> Currently, we write compacted data to a temporary location and then move the 
> files to a final location, which is an expensive operation on some cloud file 
> systems. Since HIVE-20823 is already in, it can control the visibility of 
> compacted data for the readers. Therefore, we can perhaps avoid writing data 
> to a temporary location and directly write compacted data to the intended 
> final path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to