Thanks, Dean.

Does that mean, this bucketing is exclusively Hive feature and not
available to others like Java, Pig, etc?

And also, my final tables have to be managed tables; not external tables,
right?
 .
Thank again for your time and help.

Sadu



On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler <
dean.wamp...@thinkbiganalytics.com> wrote:

> I don't know of any way to avoid creating new tables and moving the data.
> In fact, that's the official way to do it, from a temp table to the final
> table, so Hive can ensure the bucketing is done correctly:
>
>  https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html
>
> In other words, you might have a big move now, but going forward, you'll
> want to stage your data in a temp table, use this procedure to put it in
> the final location, then delete the temp data.
>
> dean
>
> On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <saduhe...@gmail.com>wrote:
>
>> Hello,
>>
>> We run M/R jobs to parse and process large and highly complex xml files
>> into AVRO files. Then we build external Hive tables on top the parsed Avro
>> files. The hive tables are partitioned by day; but they are still huge
>> partitions and joins do not perform that well. So I would like to try
>> out creating buckets on the join key. How do I create the buckets on the
>> existing HDFS files? I would prefer to avoid creating another set of tables
>> (bucketed) and load data from non-bucketed table to bucketed tables if at
>> all possible. Is it possible to do the bucketing in Java as part of the M/R
>> jobs while creating the Avro files?
>>
>> Any help / insight would greatly be appreciated.
>>
>> Thank you very much for your time and help.
>>
>> Sadu
>>
>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>

Reply via email to