Couple to add to the list: Indexing[1] Columnar Storage/RCFile[2]
[1] https://cwiki.apache.org/confluence/display/Hive/IndexDev [2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf On Thu, Jul 19, 2012 at 8:39 AM, Jan DolinĂ¡r <dolik....@gmail.com> wrote: > There are many ways, but beware that some of them may result in worse > performance when used inappropriately. > > Some of the settings we use to achieve faster queries: > hive.map.aggr=true > hive.exec.parallel=true > hive.exec.compress.intermediate=true > mapred.job.reuse.jvm.num.tasks=-1 > > Structuring the queries properly can help a lot. For example if you > eliminate unneeded data early in the query before further processing. E.g. > if you use subquery in FROM, you should put all WHERE clauses where > possible into the subquery, to eliminate the amount of data passed to the > next stage. > > Using multi-group-by queries helps a lot when computing multiple queries > on same set of data. > > As Nitin Pawar mentioned, the JOINs can be often optimized as well. > > Also, fine tuning the hadoop server itself for your specific needs might > help. > > I am very interested in optimization of queries as well, so if anyone > knows some more tricks, please share... > > J. Dolinar > > > > On Thu, Jul 19, 2012 at 3:24 PM, Abhishek <abhishek.dod...@gmail.com>wrote: > >> >> Apart from partitions and buckets how to improve of hive queries >> * >> * >> *Regards >> * >> Abhi >> Sent from my iPhone >> > > -- Swarnim