Sergey Shelukhin created HIVE-15148:
---------------------------------------

             Summary: disallow loading data into bucketed tables by default
                 Key: HIVE-15148
                 URL: https://issues.apache.org/jira/browse/HIVE-15148
             Project: Hive
          Issue Type: Bug
            Reporter: Sergey Shelukhin


A few q file tests still use the following, allowed, pattern:
{noformat}
CREATE TABLE bucket_small (key string, value string) partitioned by (ds string) 
CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
TABLE bucket_small partition(ds='2008-04-08');
load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
TABLE bucket_small partition(ds='2008-04-08');
{noformat}

This relies on the user to load the correct number of files with correctly 
hashed data with correct names; if the user doesn't do that the queries will 
fail or may produce incorrect results if some bucket-based optimizations kick 
in.
Additionally, even if the user does everything correctly, as far as I know some 
code derives bucket number from file name, which won't work in this case (as 
opposed to getting buckets based on the order of files, which will work here 
but won't work as per  HIVE-14970... sigh).

Hive enforces bucketing in other cases (the check cannot even be disabled these 
days), so I suggest that we either prohibit the above outright, or at least add 
a safety config setting that would prohibit it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to