Venugopal Reddy K created HIVE-26861:
----------------------------------------

             Summary: Skewed column table load do not work as expected if the 
user data for skewed column is not in lowercase.
                 Key: HIVE-26861
                 URL: https://issues.apache.org/jira/browse/HIVE-26861
             Project: Hive
          Issue Type: Bug
            Reporter: Venugopal Reddy K
         Attachments: data

*[Description]*

Skewed table with case sensitive data on skewed column do not work as expected. 
S{color:#172b4d}kewed values are stored in lower case. And it is expecting user 
data also to be in same lower case(i.e.,does case sensitive comparison). 
Otherwise it doesn't work.{color}

*[Steps to reproduce]* 

1. Create stage table, load some data into stage table, create table with a 
skewed column and load data into that table from the stage table. data file is 
attached below.

 
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string, 
category string) row format delimited fields terminated by ',' stored as 
textfile;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table 
stage;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+-----------------+
| stage.num  | stage.name  | stage.category  |
+------------+-------------+-----------------+
| 1          | apple       | Fruit           |
| 2          | banana      | Fruit           |
| 3          | carrot      | vegetable       |
| 4          | cherry      | Fruit           |
| 5          | potato      | vegetable       |
| 6          | mango       | Fruit           |
| 7          | tomato      | vegetable       |
+------------+-------------+-----------------+
7 rows selected (2.688 seconds)
{code}
 
{code:java}
0: jdbc:hive2://localhost:10000> create table skew(num int, name string, 
category string) skewed by(category) on ('Fruit','Vegetable') stored as 
directories row format delimited fields terminated by ',' stored as 
textfile;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
 

2. Check warehouse directory skew table data. Table was created with {*}skewed 
by(category) on ('Fruit','Vegetable') clause. {color:#de350b}But, 
t{color}{*}{color:#de350b}*{color:#de350b}h{color}ere is no directory created 
for category=fruit.* {color}{color:#172b4d}Data related to category fruit are 
present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}

{color:#172b4d}Internally skewed values are stored in lower case. And it is 
expecting user data also to be in same lower case(i.e.,does case sensitive 
comparison). {color}{color:#172b4d}Thus, directory for fruit is not 
created.{color}

 
{code:java}
kvenureddy@192 mydb.db % cd skew 
kvenureddy@192 skew % ls
kvenureddy@192 skew % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
kvenureddy@192 skew % pwd
/tmp/warehouse/external/mydb.db/skew
kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
000000_0
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
1,apple,Fruit
2,banana,Fruit
4,cherry,Fruit
6,mango,Fruit
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
kvenureddy@192 skew % cd category=vegetable 
kvenureddy@192 category=vegetable % ls
000000_0
kvenureddy@192 category=vegetable % cat 000000_0 
3,carrot,vegetable
5,potato,vegetable
7,tomato,vegetable
kvenureddy@192 category=vegetable % 
{code}
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to