On bucketing : fewer files than buckets.

Ajo Fod Mon, 17 Jan 2011 11:03:55 -0800

Hello,

In the documentation I read that as many files are created in each
partition as there are buckets. In the following sample script, I
created 32 buckets, but only find 2 files in each partition directory.
 Am I missing something?


In this sample script, I'm trying to load a tab separated file from
disk into the table trades ... and then transferring data into
alltrades based on the example in :
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL/BucketedTables

BTW, ANOTHER  question : How does one put in comments in a hive.q file?

-------- sample script ------------
SET hive.enforce.bucketing=TRUE;

CREATE TABLE trades
       (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol)
SORTED BY (time ASC)
INTO 1 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE ;

LOAD DATA LOCAL INPATH 'data/2001-05-22'
     INTO TABLE trades
     PARTITION (dt='2001-05-22');

CREATE TABLE alltrades
       (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol)
SORTED BY (time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE;

FROM trades
INSERT OVERWRITE TABLE alltrades
PARTITION (dt='2001-05-22')
SELECT symbol, time, exchange, price, volume
WHERE dt='2001-05-22';

On bucketing : fewer files than buckets.

Reply via email to