[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497264 ]
ASF GitHub Bot logged work on HIVE-21052: ----------------------------------------- Author: ASF GitHub Bot Created on: 08/Oct/20 11:53 Start Date: 08/Oct/20 11:53 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ########## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List<CompletableFuture> cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> - clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: 1. In original patch Map<String, NonReentrantReadWriteLock> tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed: https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328 @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. 2. was related to the following issue based on Map<String, NonReentrantReadWriteLock> tableLock = new ConcurrentHashMap<>() design: "Suppose you have p-type clean on table T that is running (i.e. has the Write lock) and you have 30 different partition clean requests (in T). The 30 per partition cleans will get blocked but they will tie up every thread in the pool while they are blocked, right? If so, no other clean (on any other table) will actually make progress until the p-type on T is done." Yes, it's still the case that we'll have to wait for all tasks to complete and if there is one long-running task, we won't be able to submit new ones. However not sure if it's a critical issue. I think, we can address it in a separate jira. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 497264) Time Spent: 7h 40m (was: 7.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > ------------------------------------------------------------------------------------- > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 3.0.0, 3.1.1 > Reporter: Jaume M > Assignee: Jaume M > Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)