[ https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yongzhi Chen updated HIVE-10880: -------------------------------- Priority: Critical (was: Blocker) > The bucket number is not respected in insert overwrite. > ------------------------------------------------------- > > Key: HIVE-10880 > URL: https://issues.apache.org/jira/browse/HIVE-10880 > Project: Hive > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Yongzhi Chen > Assignee: Yongzhi Chen > Priority: Critical > Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, > HIVE-10880.3.patch > > > When hive.enforce.bucketing is true, the bucket number defined in the table > is no longer respected in current master and 1.2. This is a regression. > Reproduce: > {noformat} > CREATE TABLE IF NOT EXISTS buckettestinput( > data string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > CREATE TABLE IF NOT EXISTS buckettestoutput1( > data string > )CLUSTERED BY(data) > INTO 2 BUCKETS > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > CREATE TABLE IF NOT EXISTS buckettestoutput2( > data string > )CLUSTERED BY(data) > INTO 2 BUCKETS > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > Then I inserted the following data into the "buckettestinput" table > firstinsert1 > firstinsert2 > firstinsert3 > firstinsert4 > firstinsert5 > firstinsert6 > firstinsert7 > firstinsert8 > secondinsert1 > secondinsert2 > secondinsert3 > secondinsert4 > secondinsert5 > secondinsert6 > secondinsert7 > secondinsert8 > set hive.enforce.bucketing = true; > set hive.enforce.sorting=true; > insert overwrite table buckettestoutput1 > select * from buckettestinput where data like 'first%'; > set hive.auto.convert.sortmerge.join=true; > set hive.optimize.bucketmapjoin = true; > set hive.optimize.bucketmapjoin.sortedmerge = true; > select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); > Error: Error while compiling statement: FAILED: SemanticException [Error > 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use > bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number > of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 > (state=42000,code=10141) > {noformat} > The related debug information related to insert overwrite: > {noformat} > 0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1 > select * from buckettestinput where data like 'first%'insert overwrite table > buckettestoutput1 > 0: jdbc:hive2://localhost:10000> ; > select * from buckettestinput where data like ' > first%'; > INFO : Number of reduce tasks determined at compile time: 2 > INFO : In order to change the average load for a reducer (in bytes): > INFO : set hive.exec.reducers.bytes.per.reducer=<number> > INFO : In order to limit the maximum number of reducers: > INFO : set hive.exec.reducers.max=<number> > INFO : In order to set a constant number of reducers: > INFO : set mapred.reduce.tasks=<number> > INFO : Job running in-process (local Hadoop) > INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% > INFO : Ended Job = job_local107155352_0001 > INFO : Loading data to table default.buckettestoutput1 from > file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000 > INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, > totalSize=52, rawDataSize=48] > No rows affected (1.692 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)