[ 
https://issues.apache.org/jira/browse/HIVE-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated HIVE-26860:
-------------------------------------
    Attachment:     (was: image-2022-12-15-14-32-55-280.png)

> Appropriate rows in HMS datastore tables(SDS, SERDES, SERDE_PARAMS, 
> SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted 
> upon drop partition table with skewed columns
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26860
>                 URL: https://issues.apache.org/jira/browse/HIVE-26860
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Metastore
>            Reporter: Venugopal Reddy K
>            Priority: Major
>         Attachments: partdata3
>
>
> *[Description]*
> Appropriate rows in HMS backing datastore tables(SDS, SERDES, SERDE_PARAMS, 
> SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted 
> upon drop partition table with skewed columns.
> *[Steps to reproduce]*
> 1. Create stage table, load some data into stage table, create partition 
> table with skewed columns and load data into that table from the stage table. 
> partdata3 file is attached. It has 2 partitions.
>  
> {code:java}
> create database mydb;
> use mydb;
> {code}
>  
> {code:java}
> create table stage(num int, name string, category string) row format 
> delimited fields terminated by ',' stored as textfile;
> {code}
> {code:java}
> load data local inpath 'partdata3' into table stage;{code}
> {code:java}
> create table skewpart(num int, name string) partitioned by (category string) 
> skewed by(num) on (3,4) stored as directories row format delimited fields 
> terminated by ',' stored as textfile;{code}
> {code:java}
> insert into skewpart select * from stage;{code}
>  
> 2. Verify warehouse directory table data is correct
> {code:java}
> kvenureddy@192 category=fruit % ls   
> HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=4
> kvenureddy@192 category=fruit % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit
> kvenureddy@192 category=fruit % cd num=4 
> kvenureddy@192 num=4 % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
> kvenureddy@192 num=4 % cat 000000_0 
> 4,cherry
> kvenureddy@192 num=4 % cd ../
> kvenureddy@192 category=fruit % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
> 1,apple
> 2,banana
> 6,mango
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd../../
> zsh: no such file or directory: cd../../
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../../
> kvenureddy@192 skewpart % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart
> kvenureddy@192 skewpart % ls
> category=fruit        category=vegetable
> kvenureddy@192 skewpart % cd category=vegetable 
> kvenureddy@192 category=vegetable % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart/category=vegetable
> kvenureddy@192 category=vegetable % ls
> HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=3
> kvenureddy@192 category=vegetable % cd num=3 
> kvenureddy@192 num=3 % cat 000000_0 
> 3,carrot
> kvenureddy@192 num=3 % cd ../
> kvenureddy@192 category=vegetable % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
> 5,potato
> 7,tomato
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % {code}
>  
> 3. Verify HMS backing datastore tables after creating and loading into 
> partition+skewed table.
> Note: Tables having issue(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, 
> SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are shown below.
> SD_ID=2 row is added during create table. SD_ID=3 and 4 rows were added 
> because there are 2 partitions during data load.
> ||SDS||
> ||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
> |1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
> |{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart|file:///tmp/warehouse/external/mydb.db/skewpart]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}2{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}3{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}4{color}|
> ||SERDES||
> ||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
> |1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
> |{color:#4c9aff}2{color}| 
> |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | 
> | |{color:#4c9aff}0{color}|
> |{color:#4c9aff}3{color}| 
> |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | 
> | |{color:#4c9aff}0{color}|
> |{color:#4c9aff}4{color}| 
> |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | 
> | |{color:#4c9aff}0{color}|
> ||SERDE_PARAMS||
> ||SERDE_ID||PARAM_KEY||PARAM_VALUE||
> |1|field.delim|,|
> |1|serialization.format|,|
> |{color:#4c9aff}2{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}2{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
> ||SKEWED_COL_NAMES||
> ||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
> |{color:#4c9aff}2{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
> ||SKEWED_COL_VALUE_LOC_MAP||
> ||SD_ID||STRING_LIST_ID_KID||LOCATION||
> |{color:#4c9aff}3{color}|{color:#4c9aff}3{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}14{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}6{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}11{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> ||SKEWED_VALUES||
> ||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
> |{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}9{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}10{color}|{color:#4c9aff}1{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}12{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}13{color}|{color:#4c9aff}1{color}|
>  
> 4. Drop *skewpart* table and verify HMS backing datastore tables again.
> *Row with SD_ID=2 is deleted. But rows with SD_ID 3 and 4 are not deleted. 
> They remain in the database forever.*
> ||SDS||
> ||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
> |1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
> |{color:#de350b}3{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}3{color}|
> |{color:#de350b}4{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}4{color}|
> *Row with SERDE_ID=2 is deleted. But rows with SERDE_ID 3 and 4 are not 
> deleted since they are being reference from SDS table.*
> ||SERDES||
> ||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
> |1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
> |{color:#de350b}3{color}| 
> |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | 
> | |{color:#de350b}0{color}|
> |{color:#de350b}4{color}| 
> |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | 
> | |{color:#de350b}0{color}|
> ||SERDE_PARAMS||
> ||SERDE_ID||PARAM_KEY||PARAM_VALUE||
> |1|field.delim|,|
> |1|serialization.format|,|
> |{color:#de350b}3{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
> |{color:#de350b}3{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|
> |{color:#de350b}4{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
> |{color:#de350b}4{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|
> Rows in SKEWED tables which has reference to SD_ID=3 and 4 remains forever.
> ||SKEWED_COL_NAMES||
> ||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
> |{color:#de350b}3{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
> |{color:#de350b}4{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
> ||SKEWED_COL_VALUE_LOC_MAP||
> ||SD_ID||STRING_LIST_ID_KID||LOCATION||
> |{color:#de350b}3{color}|{color:#de350b}3{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#de350b}3{color}|{color:#de350b}14{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#de350b}4{color}|{color:#de350b}6{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> |{color:#de350b}4{color}|{color:#de350b}11{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> ||SKEWED_VALUES||
> ||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
> |{color:#de350b}4{color}|{color:#de350b}9{color}|{color:#de350b}0{color}|
> |{color:#de350b}4{color}|{color:#de350b}10{color}|{color:#de350b}1{color}|
> |{color:#de350b}3{color}|{color:#de350b}12{color}|{color:#de350b}0{color}|
> |{color:#de350b}3{color}|{color:#de350b}13{color}|{color:#de350b}1{color}|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to