I don't know of any way to avoid creating new tables and moving the data. In fact, that's the official way to do it, from a temp table to the final table, so Hive can ensure the bucketing is done correctly:
https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html In other words, you might have a big move now, but going forward, you'll want to stage your data in a temp table, use this procedure to put it in the final location, then delete the temp data. dean On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: > Hello, > > We run M/R jobs to parse and process large and highly complex xml files > into AVRO files. Then we build external Hive tables on top the parsed Avro > files. The hive tables are partitioned by day; but they are still huge > partitions and joins do not perform that well. So I would like to try > out creating buckets on the join key. How do I create the buckets on the > existing HDFS files? I would prefer to avoid creating another set of tables > (bucketed) and load data from non-bucketed table to bucketed tables if at > all possible. Is it possible to do the bucketing in Java as part of the M/R > jobs while creating the Avro files? > > Any help / insight would greatly be appreciated. > > Thank you very much for your time and help. > > Sadu > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330