I don't know of any way to avoid creating new tables and moving the data.
In fact, that's the official way to do it, from a temp table to the final
table, so Hive can ensure the bucketing is done correctly:

 https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html

In other words, you might have a big move now, but going forward, you'll
want to stage your data in a temp table, use this procedure to put it in
the final location, then delete the temp data.

dean

On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <saduhe...@gmail.com>wrote:

> Hello,
>
> We run M/R jobs to parse and process large and highly complex xml files
> into AVRO files. Then we build external Hive tables on top the parsed Avro
> files. The hive tables are partitioned by day; but they are still huge
> partitions and joins do not perform that well. So I would like to try
> out creating buckets on the join key. How do I create the buckets on the
> existing HDFS files? I would prefer to avoid creating another set of tables
> (bucketed) and load data from non-bucketed table to bucketed tables if at
> all possible. Is it possible to do the bucketing in Java as part of the M/R
> jobs while creating the Avro files?
>
> Any help / insight would greatly be appreciated.
>
> Thank you very much for your time and help.
>
> Sadu
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Reply via email to