I want to convert a table to a bucketed table, so I made a new table with the same schema as the old table and specified a cluster column:
create table foo_bucketed ( a string, b int, c float ) clustered by (b) into 10 buckets; Then I populate it from my original table: set hive.enforce.bucketing = true; insert overwrite table foo_bucketed select * from foo; All of the data goes into the first bucket, leaving the remaining 9 buckets empty (in the file system, the remaining 9 files are 0 size). Furthermore, the cluster column is now NULL. Its values have been completely erased by the insertion (which might explain how they all ended up in a single bucket of course). ________________________________________________________________________________ Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland ________________________________________________________________________________