[ 
https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746903#comment-16746903
 ] 

Eugene Koifman commented on HIVE-21052:
---------------------------------------

We'd like to prevent 2 concurrent p-type cleans of the same table.

We'd like to prevent 2 concurrent cleans of the same partition (or the same 
unpartitioned table)

It may be ok to have a p-type clean concurrent with a normal partition clean 
(same table) if markCleaned() method for each clean operation affects disjoint 
sets of TXN_COMPONENTS entries. 

The map contains table level objects and partition level objects.  To work on a 
partition you acquire a shared lock on parent table and exclusive on the 
partition.  To work on table as a whole, you acquire a semi-shared lock on the 
table.  Semi-shared is compatible with shared but not another semi-shared.  
This gives the semantics where it's ok to do a table level clean with a 
partition level clean in parallel but not 2 concurrent table level cleans.  It 
also allows 2 different partitions in the same table to be processed in 
parallel but not the same partition in parallel.

Alternatively, you could acquire Exclusive lock on table each time you start a 
table level clean which would prevent any other table level locks thus making 
table clean block any clean on the table partitions.

 

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21052
>                 URL: https://issues.apache.org/jira/browse/HIVE-21052
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Jaume M
>            Assignee: Jaume M
>            Priority: Critical
>         Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch
>
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to