[ 
https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-22413:
------------------------------
    Target Version/s:   (was: 2.3.7)

> Avoid dirty read when reading the ACID table while compaction is running
> ------------------------------------------------------------------------
>
>                 Key: HIVE-22413
>                 URL: https://issues.apache.org/jira/browse/HIVE-22413
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Hocheol Park
>            Priority: Major
>         Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while 
> base or delta directories are being created by the compactor. Especially it 
> is highly likely to occur in the S3 storage because the “move” logic of S3 is 
> “copy and delete”, and it takes a long time to copy if the size of files are 
> large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are 
> existed in the partition directory on the process of listing the child 
> directories when reading the ACID table, compare the names of the directory 
> in the “_tmp” one and skip it in case of the same. Then it will read the 
> files before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to