I tried to create a skewed table using the group lens 100k data set and
setting the skew columns to the movie rating, but I only see one file
get created. My understanding was that separate files would be created
per value. Is there anything else that needs to be done?
hive commands:
CREATE TABLE u_data (userid int,movieid int, rating int, unixtime
string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' stored as textfile;
CREATE TABLE u_data2 (userid int,movieid int, rating int, unixtime
string) skewed by (rating) on (3,4,5);
LOAD DATA LOCAL INPATH './ml-100k.base' OVERWRITE INTO TABLE u_data;
insert into u_data2 select * from u_data;
hadoop fs output:
% hadoop fs -ls /user/hive/warehouse/u_data
Found 1 items
... 1792501 2013-12-26 15:06 /user/hive/warehouse/u_data/ua.base
% hadoop fs -ls /user/hive/warehouse/u_data2
Found 1 items
... 1792501 2013-12-26 15:22 /user/hive/warehouse/u_data2/000000_0