Ryan P created HIVE-10283: ----------------------------- Summary: HIVE-4240 may be causing issue with bucketed tables Key: HIVE-10283 URL: https://issues.apache.org/jira/browse/HIVE-10283 Project: Hive Issue Type: Bug Components: Hive Reporter: Ryan P
I suspect that by removing the reducer, HIVE-4240, may be causing issues. Because of this inserts will not consolidate 'buckets' into single files which is problematic when attempting to use bucketmapjoin. CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the "buckettestinput" table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert into table buckettestoutput1 select * from buckettestinput where data like 'first%' SELECT * FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; insert into table buckettestoutput1 select * from buckettestinput where data like 'second%' check the results of the table sample query. for sort merge bucket map join set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask=true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) hive> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)