[ 
https://issues.apache.org/jira/browse/HIVE-13212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13212:
----------------------------------
    Description: 
create table acidTblPart (a int, b int) partitioned by (p string) clustered by 
(a) into " + BUCKET_COUNT + " buckets stored as orc TBLPROPERTIES 
('transactional'='true')

update acidTblPart set b = 17 where p = 1

This acquires share_write on the table while based on p = 1 we should be able 
to figure out that only 1 partition is affected and only lock the partition

Same should apply to DELETE

Above is true when table is empty.  If table has data, in particular it has p=1 
partition, then only the partition is locked.

However "update acidTblPart set b = 17 where b = 18" and the table is not 
empty, will lock every partition separately.
For a table with 100K partitions this will be a performance issue.
Need to look into getting a table level lock instead or build general lock 
promotion logic.

The logic in SemanticAnalyzer seems to be to take all known partitions of a 
table being read and create ReadEntity objects for those that match the WHERE 
clause.
A ReadEntity for the table is also created but due to logic in 
UpdateDeleteSemanticAnalyzer we ignore it.
(We set setUpdateOrDelete() on it but remove the corresponding WriteEntity and 
replace it with WriteEntity for each partition)

  was:
create table acidTblPart (a int, b int) partitioned by (p string) clustered by 
(a) into " + BUCKET_COUNT + " buckets stored as orc TBLPROPERTIES 
('transactional'='true')

update acidTblPart set b = 17 where p = 1

This acquires share_write on the table while based on p = 1 we should be able 
to figure out that only 1 partition is affected and only lock the partition

Same should apply to DELETE

Above is true when table is empty.  If table has data, in particular it has p=1 
partition, then only the partition is locked.

However "update acidTblPart set b = 17 where b = 18" and the table is not 
empty, will lock every partition separately.
For a table with 100K partitions this will be a performance issue.
Need to look into getting a table level lock instead or build general lock 
promotion logic.


> locking too coarse/broad for update/delete on a pratition
> ---------------------------------------------------------
>
>                 Key: HIVE-13212
>                 URL: https://issues.apache.org/jira/browse/HIVE-13212
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.2.1
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> create table acidTblPart (a int, b int) partitioned by (p string) clustered 
> by (a) into " + BUCKET_COUNT + " buckets stored as orc TBLPROPERTIES 
> ('transactional'='true')
> update acidTblPart set b = 17 where p = 1
> This acquires share_write on the table while based on p = 1 we should be able 
> to figure out that only 1 partition is affected and only lock the partition
> Same should apply to DELETE
> Above is true when table is empty.  If table has data, in particular it has 
> p=1 partition, then only the partition is locked.
> However "update acidTblPart set b = 17 where b = 18" and the table is not 
> empty, will lock every partition separately.
> For a table with 100K partitions this will be a performance issue.
> Need to look into getting a table level lock instead or build general lock 
> promotion logic.
> The logic in SemanticAnalyzer seems to be to take all known partitions of a 
> table being read and create ReadEntity objects for those that match the WHERE 
> clause.
> A ReadEntity for the table is also created but due to logic in 
> UpdateDeleteSemanticAnalyzer we ignore it.
> (We set setUpdateOrDelete() on it but remove the corresponding WriteEntity 
> and replace it with WriteEntity for each partition)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to