Venugopal Reddy K created HIVE-26861: ----------------------------------------
Summary: Skewed column table load do not work as expected if the user data for skewed column is not in lowercase. Key: HIVE-26861 URL: https://issues.apache.org/jira/browse/HIVE-26861 Project: Hive Issue Type: Bug Reporter: Venugopal Reddy K Attachments: data *[Description]* Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color} *[Steps to reproduce]* 1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below. {code:java} 0: jdbc:hive2://localhost:10000> create database mydb; 0: jdbc:hive2://localhost:10000> use mydb; {code} {code:java} 0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code} {code:java} 0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code} {code:java} 0: jdbc:hive2://localhost:10000> select * from stage; +------------+-------------+-----------------+ | stage.num | stage.name | stage.category | +------------+-------------+-----------------+ | 1 | apple | Fruit | | 2 | banana | Fruit | | 3 | carrot | vegetable | | 4 | cherry | Fruit | | 5 | potato | vegetable | | 6 | mango | Fruit | | 7 | tomato | vegetable | +------------+-------------+-----------------+ 7 rows selected (2.688 seconds) {code} {code:java} 0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code} {code:java} 0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code} 2. Check warehouse directory skew table data. Table was created with {*}skewed by(category) on ('Fruit','Vegetable') clause. {color:#de350b}But, t{color}{*}{color:#de350b}*{color:#de350b}h{color}ere is no directory created for category=fruit.* {color}{color:#172b4d}Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color} {color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color} {code:java} kvenureddy@192 mydb.db % cd skew kvenureddy@192 skew % ls kvenureddy@192 skew % ls HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable kvenureddy@192 skew % pwd /tmp/warehouse/external/mydb.db/skew kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls 000000_0 kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 1,apple,Fruit 2,banana,Fruit 4,cherry,Fruit 6,mango,Fruit kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../ kvenureddy@192 skew % cd category=vegetable kvenureddy@192 category=vegetable % ls 000000_0 kvenureddy@192 category=vegetable % cat 000000_0 3,carrot,vegetable 5,potato,vegetable 7,tomato,vegetable kvenureddy@192 category=vegetable % {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)