[ https://issues.apache.org/jira/browse/HIVE-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-26860: ---------------------------------- Labels: pull-request-available (was: ) > Appropriate rows in HMS datastore tables(SDS, SERDES, SERDE_PARAMS, > SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted > upon drop partition table with skewed columns > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-26860 > URL: https://issues.apache.org/jira/browse/HIVE-26860 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore > Reporter: Venugopal Reddy K > Priority: Major > Labels: pull-request-available > Attachments: partdata3 > > Time Spent: 10m > Remaining Estimate: 0h > > *[Description]* > Appropriate rows in HMS backing datastore tables(SDS, SERDES, SERDE_PARAMS, > SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted > upon drop partition table with skewed columns. > *[Steps to reproduce]* > 1. Create stage table, load some data into stage table, create partition > table with skewed columns and load data into that table from the stage table. > partdata3 file is attached. It has 2 partitions. > {code:java} > create database mydb; > use mydb; > {code} > {code:java} > create table stage(num int, name string, category string) row format > delimited fields terminated by ',' stored as textfile; > {code} > {code:java} > load data local inpath 'partdata3' into table stage;{code} > {code:java} > create table skewpart(num int, name string) partitioned by (category string) > skewed by(num) on (3,4) stored as directories row format delimited fields > terminated by ',' stored as textfile;{code} > {code:java} > insert into skewpart select * from stage;{code} > > 2. Verify warehouse directory table data is correct > {code:java} > kvenureddy@192 category=fruit % ls > HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME num=4 > kvenureddy@192 category=fruit % pwd > /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit > kvenureddy@192 category=fruit % cd num=4 > kvenureddy@192 num=4 % pwd > /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4 > kvenureddy@192 num=4 % cat 000000_0 > 4,cherry > kvenureddy@192 num=4 % cd ../ > kvenureddy@192 category=fruit % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME > kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 > 1,apple > 2,banana > 6,mango > kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd../../ > zsh: no such file or directory: cd../../ > kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../../ > kvenureddy@192 skewpart % pwd > /private/tmp/warehouse/external/mydb.db/skewpart > kvenureddy@192 skewpart % ls > category=fruit category=vegetable > kvenureddy@192 skewpart % cd category=vegetable > kvenureddy@192 category=vegetable % pwd > /private/tmp/warehouse/external/mydb.db/skewpart/category=vegetable > kvenureddy@192 category=vegetable % ls > HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME num=3 > kvenureddy@192 category=vegetable % cd num=3 > kvenureddy@192 num=3 % cat 000000_0 > 3,carrot > kvenureddy@192 num=3 % cd ../ > kvenureddy@192 category=vegetable % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME > kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 > 5,potato > 7,tomato > kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % {code} > > 3. Verify HMS backing datastore tables after creating and loading into > partition+skewed table. > Note: Tables having issue(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, > SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are shown below. > SD_ID=2 row is added during create table. SD_ID=3 and 4 rows were added > because there are 2 partitions during data load. > ||SDS|| > ||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID|| > |1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1| > |{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart|file:///tmp/warehouse/external/mydb.db/skewpart]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}2{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}3{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}4{color}| > ||SERDES|| > ||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE|| > |1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0| > |{color:#4c9aff}2{color}| > |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | > | |{color:#4c9aff}0{color}| > |{color:#4c9aff}3{color}| > |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | > | |{color:#4c9aff}0{color}| > |{color:#4c9aff}4{color}| > |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | > | |{color:#4c9aff}0{color}| > ||SERDE_PARAMS|| > ||SERDE_ID||PARAM_KEY||PARAM_VALUE|| > |1|field.delim|,| > |1|serialization.format|,| > |{color:#4c9aff}2{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}| > |{color:#4c9aff}2{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}| > ||SKEWED_COL_NAMES|| > ||SD_ID||SKEWED_COL_NAME||INTEGER_IDX|| > |{color:#4c9aff}2{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}| > ||SKEWED_COL_VALUE_LOC_MAP|| > ||SD_ID||STRING_LIST_ID_KID||LOCATION|| > |{color:#4c9aff}3{color}|{color:#4c9aff}3{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}14{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}6{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}11{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}| > ||SKEWED_VALUES|| > ||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX|| > |{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|{color:#4c9aff}0{color}| > |{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}9{color}|{color:#4c9aff}0{color}| > |{color:#4c9aff}4{color}|{color:#4c9aff}10{color}|{color:#4c9aff}1{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}12{color}|{color:#4c9aff}0{color}| > |{color:#4c9aff}3{color}|{color:#4c9aff}13{color}|{color:#4c9aff}1{color}| > > 4. Drop *skewpart* table and verify HMS backing datastore tables again. > *Row with SD_ID=2 is deleted. But rows with SD_ID 3 and 4 are not deleted. > They remain in the database forever.* > ||SDS|| > ||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID|| > |1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1| > |{color:#de350b}3{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}3{color}| > |{color:#de350b}4{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}4{color}| > *Row with SERDE_ID=2 is deleted. But rows with SERDE_ID 3 and 4 are not > deleted since they are being reference from SDS table.* > ||SERDES|| > ||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE|| > |1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0| > |{color:#de350b}3{color}| > |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | > | |{color:#de350b}0{color}| > |{color:#de350b}4{color}| > |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | > | |{color:#de350b}0{color}| > ||SERDE_PARAMS|| > ||SERDE_ID||PARAM_KEY||PARAM_VALUE|| > |1|field.delim|,| > |1|serialization.format|,| > |{color:#de350b}3{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}| > |{color:#de350b}3{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}| > |{color:#de350b}4{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}| > |{color:#de350b}4{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}| > Rows in SKEWED tables which has reference to SD_ID=3 and 4 remains forever. > ||SKEWED_COL_NAMES|| > ||SD_ID||SKEWED_COL_NAME||INTEGER_IDX|| > |{color:#de350b}3{color}|{color:#de350b}num{color}|{color:#de350b}0{color}| > |{color:#de350b}4{color}|{color:#de350b}num{color}|{color:#de350b}0{color}| > ||SKEWED_COL_VALUE_LOC_MAP|| > ||SD_ID||STRING_LIST_ID_KID||LOCATION|| > |{color:#de350b}3{color}|{color:#de350b}3{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}| > |{color:#de350b}3{color}|{color:#de350b}14{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}| > |{color:#de350b}4{color}|{color:#de350b}6{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}| > |{color:#de350b}4{color}|{color:#de350b}11{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}| > ||SKEWED_VALUES|| > ||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX|| > |{color:#de350b}4{color}|{color:#de350b}9{color}|{color:#de350b}0{color}| > |{color:#de350b}4{color}|{color:#de350b}10{color}|{color:#de350b}1{color}| > |{color:#de350b}3{color}|{color:#de350b}12{color}|{color:#de350b}0{color}| > |{color:#de350b}3{color}|{color:#de350b}13{color}|{color:#de350b}1{color}| -- This message was sent by Atlassian Jira (v8.20.10#820010)