[ https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646162#comment-15646162 ]
Sergey Shelukhin edited comment on HIVE-15148 at 11/8/16 2:00 AM: ------------------------------------------------------------------ [~ashutoshc] [~jdere] do you have any input? or do you know who would be the bucketing expert? I can make a patch if there's consensus. was (Author: sershe): [~ashutoshc] [~jdere] do you have any input? or do you know who would be the bucketing expert? > disallow loading data into bucketed tables (by default?) > -------------------------------------------------------- > > Key: HIVE-15148 > URL: https://issues.apache.org/jira/browse/HIVE-15148 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > > A few q file tests still use the following, allowed, pattern: > {noformat} > CREATE TABLE bucket_small (key string, value string) partitioned by (ds > string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; > load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO > TABLE bucket_small partition(ds='2008-04-08'); > load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO > TABLE bucket_small partition(ds='2008-04-08'); > {noformat} > This relies on the user to load the correct number of files with correctly > hashed data and the correct order of file names; if there's some discrepancy > in any of the above, the queries will fail or may produce incorrect results > if some bucket-based optimizations kick in. > Additionally, even if the user does everything correctly, as far as I know > some code derives bucket number from file name, which won't work in this case > (as opposed to getting buckets based on the order of files, which will work > here but won't work as per HIVE-14970... sigh). > Hive enforces bucketing in other cases (the check cannot even be disabled > these days), so I suggest that we either prohibit the above outright, or at > least add a safety config setting that would disallow it by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)