Re: OPTIMIZING A HIVE QUERY

sudeep tokala Tue, 14 Aug 2012 14:02:02 -0700

Thanks Bejoy

On Tue, Aug 14, 2012 at 2:00 PM, Bejoy Ks <bejoy...@yahoo.com> wrote:


>  Hi Sudeep
>
> You can also look at join optimizations like map join, bucketed map
> join,sort merge join etc and choose the right one that fits your
> requirement.
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
>
>
>
>  Regards,
> Bejoy KS
>
>   ------------------------------
> *From:* sudeep tokala <sudeeptok...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Tuesday, August 14, 2012 11:00 PM
> *Subject:* Re: OPTIMIZING A HIVE QUERY
>
>  hi Bertrand,
>
> Thanks for the reply.
>
> My question was every join in a hive query would constitute to a Mapreduce
> job.
> Mapreduce job goes through serialization and deserilaization of objects
> Isnt it a overhead.
>
> Store data in the smarter way? can you please elaborate on this.
>
> Regards
> Sudeep
>
> On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux <decho...@gmail.com>wrote:
>
> You may want to be clearer. Is your question : how can I change the
> serialization strategy of Hive? (If so I let other users answer and I am
> also interested in the answer.)
>
> Else the answer is simple. If you want to join data which can not be
> stored into memory, you need to serialize them. The only solution is to
> store the data in a smarter way which would not require you to do the join.
> By the way, how do you know the serialisation is the bottleneck?
>
> Bertrand
>
>
> On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala <sudeeptok...@gmail.com>wrote:
>
>
>
> On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <sudeeptok...@gmail.com>wrote:
>
> Hi all,
>
> How to avoid serialization and deserialization overhead in hive join query
> ? will this optimize my query performance.
>
> Regards
> sudeep
>
>
>
>
>
> --
> Bertrand Dechoux
>
>
>
>
>

Re: OPTIMIZING A HIVE QUERY

Reply via email to