[ https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962817#comment-16962817 ]
Abhishek Somani commented on HIVE-22413: ---------------------------------------- [~pvary] an issue with HIVE-20823 is that it is in 4.0.0(master) only. Backporting it to Hive 2/Hive 3 is not feasible as it is a major design change. I think we need an interim solution for S3/other blobstores in older Hive versions. We solved this in a different way ourselves. At the end of compaction, we insert a \_compaction_done file in the compacted directory, and the readers have been modified (in getAcidState()) to ignore base/delta directories till this file is visible. > Avoid dirty read when reading the ACID table while compaction is running > ------------------------------------------------------------------------ > > Key: HIVE-22413 > URL: https://issues.apache.org/jira/browse/HIVE-22413 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Hocheol Park > Priority: Major > Attachments: HIVE-22413.1.patch > > > There is a problem that dirty read occurs when reading the ACID table while > base or delta directories are being created by the compactor. Especially it > is highly likely to occur in the S3 storage because the “move” logic of S3 is > “copy and delete”, and it takes a long time to copy if the size of files are > large or bucketing count is large. > So here’s the logic to avoid this problem. If “_tmp” prefixed directories are > existed in the partition directory on the process of listing the child > directories when reading the ACID table, compare the names of the directory > in the “_tmp” one and skip it in case of the same. Then it will read the > files before merging, no difference on the results. -- This message was sent by Atlassian Jira (v8.3.4#803005)