[ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646162#comment-15646162
 ] 

Sergey Shelukhin edited comment on HIVE-15148 at 11/8/16 2:00 AM:
------------------------------------------------------------------

[~ashutoshc] [~jdere] do you have any input? or do you know who would be the 
bucketing expert?
I can make a patch if there's consensus.


was (Author: sershe):
[~ashutoshc] [~jdere] do you have any input? or do you know who would be the 
bucketing expert?

> disallow loading data into bucketed tables (by default?)
> --------------------------------------------------------
>
>                 Key: HIVE-15148
>                 URL: https://issues.apache.org/jira/browse/HIVE-15148
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to