Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
nfluence/display/Hive/LanguageManual+Joins > > > > Regards, > Bejoy KS > > -- > *From:* sudeep tokala > *To:* user@hive.apache.org > *Sent:* Tuesday, August 14, 2012 11:00 PM > *Subject:* Re: OPTIMIZING A HIVE QUERY > > hi Bertrand, > &

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
Thanks for the reply Bertrand. On Tue, Aug 14, 2012 at 2:12 PM, Bertrand Dechoux wrote: > > > My question was every join in a hive query would constitute to a > Mapreduce job. > In the general case, yes. BUT if one side of your join is small enough (ie > you can keep all in memory), a hash join/m

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bertrand Dechoux
> My question was every join in a hive query would constitute to a Mapreduce job. In the general case, yes. BUT if one side of your join is small enough (ie you can keep all in memory), a hash join/map join can be performed which is much more performant (no reduce is required). Bejoy KS has just p

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bejoy Ks
: sudeep tokala To: user@hive.apache.org Sent: Tuesday, August 14, 2012 11:00 PM Subject: Re: OPTIMIZING A HIVE QUERY hi Bertrand,   Thanks for the reply.   My question was every join in a hive query would constitute to a Mapreduce job. Mapreduce job goes through serialization and deserilaization of

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
hi Bertrand, Thanks for the reply. My question was every join in a hive query would constitute to a Mapreduce job. Mapreduce job goes through serialization and deserilaization of objects Isnt it a overhead. Store data in the smarter way? can you please elaborate on this. Regards Sudeep On Tue,

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bertrand Dechoux
You may want to be clearer. Is your question : how can I change the serialization strategy of Hive? (If so I let other users answer and I am also interested in the answer.) Else the answer is simple. If you want to join data which can not be stored into memory, you need to serialize them. The only

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala wrote: > Hi all, > > How to avoid serialization and deserialization overhead in hive join query > ? will this optimize my query performance. > > Regards > sudeep >