[ 
https://issues.apache.org/jira/browse/HIVE-28145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated HIVE-28145:
-------------------------------------
    Description: 
*Description:*

getPartitionsByNames API returns partition objects with empty values in many 
fields when it is executed concurrently with dropPartition API.

org.apache.hadoop.hive.metastore.MetaStoreDirectSql#getPartitionsViaPartNames 
method does multiple queries to backend db to populate the various fields in 
the partition object. First it queries for part ids using partition names, then 
joins PARTITIONS, SDS, SERDES tables for those part ids and creates partition 
objects. Then another query to PARTITION_KEY_VALS table to get the partition 
values for those part ids and populates in already created partition objects.

So if the partition is deleted just before PARTITION_KEY_VALS table query, it 
can lead to empty values in partition object. This issue can happen for other 
fields(like, partition params, storage descriptor params, serde params, sort 
cols, bucket cols, skewed cols etc) too in partition object that require 
queries to populate those fields.

*Note: Issue can be observed with both directsql and JDO based query.  Need to 
check for all APIs that involves multiple queries to backend database within a 
transaction.*

*Root Cause:*

Transaction is opened with default isolation level(read-committed). The default 
in DataNucleus is read-committed.

*Steps to reproduce:*
 # Create a partitioned table and add 500~1000 dynamic partitions(can add dummy 
partition param, sd param, serde param).
 # Create a thread pool of size 2 and submit 2 tasks. One task to submit 
getPartitionsByNames and another task to submit dropPartition in loop
 # Verify the fields in partition objects returned from getPartitionsByNames().

  was:
*Description:*

getPartitionsByNames API returns partition objects with empty values in many 
fields when it is executed concurrently with dropPartition API.

org.apache.hadoop.hive.metastore.MetaStoreDirectSql#getPartitionsViaPartNames 
method does multiple queries to backend db to populate the various fields in 
the partition object. First it queries for part ids using partition names, then 
joins PARTITIONS, SDS, SERDES tables for those part ids and creates partition 
objects. Then another query to PARTITION_KEY_VALS table to get the partition 
values for those part ids and populates in already created partition objects. 

So if the partition is deleted just before PARTITION_KEY_VALS table query, it 
can lead to empty values in partition object. This issue can happen for other 
fields(like, partition params, storage descriptor params, serde params, sort 
cols, bucket cols, skewed cols etc) too in partition object that require 
queries to populate those fields.

*Note: Issue can be observed with both directsql and JDO based query.  Need to 
check for all APIs that involves multiple queries to backend database within a 
transaction.*

*Root Cause:*

Transaction is opened with default isolation level(read-committed). The default 
(in DataNucleus) is read-committed.

*Steps to reproduce:*
 # Create a partitioned table and add 500~1000 dynamic partitions(can add dummy 
partition param, sd param, serde param).
 # Create a thread pool of size 2 and submit 2 tasks. One task to submit 
getPartitionsByNames and another task to submit dropPartition in loop
 # Verify the fields in partition objects returned from getPartitionsByNames().


> getPartitionsByNames API returns partition objects with empty values in many 
> fields when it is executed concurrently with dropPartition API 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-28145
>                 URL: https://issues.apache.org/jira/browse/HIVE-28145
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Venugopal Reddy K
>            Priority: Major
>
> *Description:*
> getPartitionsByNames API returns partition objects with empty values in many 
> fields when it is executed concurrently with dropPartition API.
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql#getPartitionsViaPartNames 
> method does multiple queries to backend db to populate the various fields in 
> the partition object. First it queries for part ids using partition names, 
> then joins PARTITIONS, SDS, SERDES tables for those part ids and creates 
> partition objects. Then another query to PARTITION_KEY_VALS table to get the 
> partition values for those part ids and populates in already created 
> partition objects.
> So if the partition is deleted just before PARTITION_KEY_VALS table query, it 
> can lead to empty values in partition object. This issue can happen for other 
> fields(like, partition params, storage descriptor params, serde params, sort 
> cols, bucket cols, skewed cols etc) too in partition object that require 
> queries to populate those fields.
> *Note: Issue can be observed with both directsql and JDO based query.  Need 
> to check for all APIs that involves multiple queries to backend database 
> within a transaction.*
> *Root Cause:*
> Transaction is opened with default isolation level(read-committed). The 
> default in DataNucleus is read-committed.
> *Steps to reproduce:*
>  # Create a partitioned table and add 500~1000 dynamic partitions(can add 
> dummy partition param, sd param, serde param).
>  # Create a thread pool of size 2 and submit 2 tasks. One task to submit 
> getPartitionsByNames and another task to submit dropPartition in loop
>  # Verify the fields in partition objects returned from 
> getPartitionsByNames().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to