Re: Performance tuning a hive query

kulkarni.swar...@gmail.com Thu, 19 Jul 2012 06:50:59 -0700

Couple to add to the list:

Indexing[1]
Columnar Storage/RCFile[2]


[1] https://cwiki.apache.org/confluence/display/Hive/IndexDev
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf

On Thu, Jul 19, 2012 at 8:39 AM, Jan Dolinár <dolik....@gmail.com> wrote:

> There are many ways, but beware that some of them may result in worse
> performance when used inappropriately.
>
> Some of the settings we use to achieve faster queries:
> hive.map.aggr=true
> hive.exec.parallel=true
> hive.exec.compress.intermediate=true
> mapred.job.reuse.jvm.num.tasks=-1
>
> Structuring the queries properly can help a lot. For example if you
> eliminate unneeded data early in the query before further processing. E.g.
> if you use subquery in FROM, you should put all WHERE clauses where
> possible into the subquery, to eliminate the amount of data passed to the
> next stage.
>
> Using multi-group-by queries helps a lot when computing multiple queries
> on same set of data.
>
> As Nitin Pawar mentioned, the JOINs can be often optimized as well.
>
> Also, fine tuning the hadoop server itself for your specific needs might
> help.
>
> I am very interested in optimization of queries as well, so if anyone
> knows some more tricks, please share...
>
> J. Dolinar
>
>
>
> On Thu, Jul 19, 2012 at 3:24 PM, Abhishek <abhishek.dod...@gmail.com>wrote:
>
>>
>> Apart from partitions and buckets how to improve of hive queries
>> *
>> *
>> *Regards
>> *
>> Abhi
>> Sent from my iPhone
>>
>
>


-- 
Swarnim

Re: Performance tuning a hive query

Reply via email to