There are many ways, but beware that some of them may result in worse performance when used inappropriately.
Some of the settings we use to achieve faster queries: hive.map.aggr=true hive.exec.parallel=true hive.exec.compress.intermediate=true mapred.job.reuse.jvm.num.tasks=-1 Structuring the queries properly can help a lot. For example if you eliminate unneeded data early in the query before further processing. E.g. if you use subquery in FROM, you should put all WHERE clauses where possible into the subquery, to eliminate the amount of data passed to the next stage. Using multi-group-by queries helps a lot when computing multiple queries on same set of data. As Nitin Pawar mentioned, the JOINs can be often optimized as well. Also, fine tuning the hadoop server itself for your specific needs might help. I am very interested in optimization of queries as well, so if anyone knows some more tricks, please share... J. Dolinar On Thu, Jul 19, 2012 at 3:24 PM, Abhishek <[email protected]> wrote: > > Apart from partitions and buckets how to improve of hive queries > * > * > *Regards > * > Abhi > Sent from my iPhone >
