[ 
https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744315#comment-16744315
 ] 

Jaume M commented on HIVE-21052:
--------------------------------

The flow through the database would be the following:
* Rows to TXN_COMPONENTS with operation type 'p' are added in 
enqueueLockWithRetry. It is not checked if this row exist in TXN_COMPONENTS but 
it makes sure no duplicate rows are added in a single call to 
enqueueLockWithRetry. Could this happen?Different calls to enqueueLockWithRetry 
adding the same row?
* A row per table per writeId is added to TXN_COMPONENTS.
* If addDynamicPartitions is called the entry in TXN_COMPONENTS is cleaned. 
Should it be cleaned in some other circumstance? For example when commit is 
called.
* If the transaction is marked as aborted, the Initiator adds only one row per 
table to COMPACTION_QUEUE (with type 'p') independently of how many rows there 
are for that table in TXN_COMPONENTS (this rows must have a different writeId)
* When the Cleaner sees and entry in COMPACTION_QUEUE with type 'p' for a 
particular table know that there's at list one entry for this table in 
TXN_COMPONENTS, but collects all the writeIds from TXN_COMPONENTS corresponding 
to the table in the entry of COMPACTION_QUEUE..
* markCleaned deletes the entries in TXN_COMPONENTS corresponding to the 
writeIds cleaned.

Some other notes:
* Hadoop23Shims.listLocatedHdfsStatusIterator is not tested since the 
filesystem of the tests seems to fail ensureDfs therefore is 
HdfsUtils.listLocatedStatusIterator.

Can you review again [~ekoifman], I've updated reviewboard.


> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21052
>                 URL: https://issues.apache.org/jira/browse/HIVE-21052
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Jaume M
>            Assignee: Jaume M
>            Priority: Critical
>         Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch
>
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to