Edward Capriolo created HIVE-3083: ------------------------------------- Summary: In local mode bucking does not work Key: HIVE-3083 URL: https://issues.apache.org/jira/browse/HIVE-3083 Project: Hive Issue Type: Bug Affects Versions: 0.9.0, 0.8.1, 0.7.1 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Critical
In local mode hive bucketing does not work. I am willing to bet that since none of the bucketing unit tests assert that N files are actually created the tests are producing false positives as well. [edward@tablitha hive-0.9.0-bin]$ bin/hive hive> create table numbersflat(number int); hive> load data local inpath '/home/edward/numbers' into table numbersflat; Copying data from file:/home/edward/numbers Copying file: file:/home/edward/numbers Loading data to table default.numbersflat OK Time taken: 0.288 seconds hive> select * from numbersflat; OK 1 2 3 4 5 6 7 8 9 10 Time taken: 0.274 seconds hive> CREATE TABLE numbers_bucketed(number int,number1 int) CLUSTERED BY (number) INTO 3 BUCKETS; OK Time taken: 0.082 seconds hive> set hive.enforce.bucketing = true; hive> set hive.exec.reducers.max = 200; hive> set hive.merge.mapfiles=false; hive> > insert OVERWRITE table numbers_bucketed select number,number+1 as number1 from numbersflat; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> 12/06/04 00:50:35 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH Execution log at: /tmp/edward/edward_20120604005050_e17eb952-af76-4cf3-aee1-93bd59e74517.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2012-06-04 00:50:47,938 null map = 0%, reduce = 0% 2012-06-04 00:50:48,940 null map = 100%, reduce = 0% 2012-06-04 00:50:49,942 null map = 100%, reduce = 100% Ended Job = job_local_0001 Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin Loading data to table default.numbers_bucketed Deleted file:/user/hive/warehouse/numbers_bucketed Table default.numbers_bucketed stats: [num_partitions: 0, num_files: 1, num_rows: 10, total_size: 43, raw_data_size: 33] OK Time taken: 16.722 seconds hive> dfs -ls /user/hive/warehouse/numbers_bucketed; Found 1 items -rwxrwxrwx 1 edward edward 43 2012-06-04 00:50 /user/hive/warehouse/numbers_bucketed/000000_0 hive> dfs -ls /user/hive/warehouse/numbers_bucketed/000000_0; Found 1 items -rwxrwxrwx 1 edward edward 43 2012-06-04 00:50 /user/hive/warehouse/numbers_bucketed/000000_0 hive> cat /user/hive/warehouse/numbers_bucketed/000000_0; FAILED: Parse Error: line 1:0 cannot recognize input near 'cat' '/' 'user' hive> dfs -cat /user/hive/warehouse/numbers_bucketed/000000_0; 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 hive> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira