[ https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749161#comment-16749161 ]
Eugene Koifman commented on HIVE-21052: --------------------------------------- not sure that this is enough. Suppose you have 2 p-type records in compacton_queue for the same table. 1 Cleaner picks up the 1st one and sets a CLEANING_STATE. Suppose there is another Cleaner that can run concurrently? Will it start working on the other p-type request? But then the 2 Cleaners (or CleanWork) will both aggregate TXN_COMPONENTS entries and do overlapping work.... I think a simple model is to mutex Cleaner instances, as they are today but inside the Cleaner instance maintain a collection of all active CleanWork items by (db/table/partition) for example. Then if you don't wait for the queue (inside Cleaner) to drain, next time findReadToClean() is called, it can simply ignore any requests for tables/partition that are already being cleaned. If it ends up with non-empty list, it enqueues more CleanWork items, else the outer run() goes to sleep. It's probably fine to leave for a followup. If you do allow concurrent Cleaner instances, you would have to synch via the DB but then it gets more complicated. For example, what if cleaner sets CLEANING_STATE and dies. How does this clean ever get completed? > Make sure transactions get cleaned if they are aborted before addPartitions > is called > ------------------------------------------------------------------------------------- > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 3.0.0 > Reporter: Jaume M > Assignee: Jaume M > Priority: Critical > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch > > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian JIRA (v7.6.3#76005)