Thanks, Dean. Does that mean, this bucketing is exclusively Hive feature and not available to others like Java, Pig, etc?
And also, my final tables have to be managed tables; not external tables, right? . Thank again for your time and help. Sadu On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > I don't know of any way to avoid creating new tables and moving the data. > In fact, that's the official way to do it, from a temp table to the final > table, so Hive can ensure the bucketing is done correctly: > > https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html > > In other words, you might have a big move now, but going forward, you'll > want to stage your data in a temp table, use this procedure to put it in > the final location, then delete the temp data. > > dean > > On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: > >> Hello, >> >> We run M/R jobs to parse and process large and highly complex xml files >> into AVRO files. Then we build external Hive tables on top the parsed Avro >> files. The hive tables are partitioned by day; but they are still huge >> partitions and joins do not perform that well. So I would like to try >> out creating buckets on the join key. How do I create the buckets on the >> existing HDFS files? I would prefer to avoid creating another set of tables >> (bucketed) and load data from non-bucketed table to bucketed tables if at >> all possible. Is it possible to do the bucketing in Java as part of the M/R >> jobs while creating the Avro files? >> >> Any help / insight would greatly be appreciated. >> >> Thank you very much for your time and help. >> >> Sadu >> > > > > -- > *Dean Wampler, Ph.D.* > thinkbiganalytics.com > +1-312-339-1330 > >