Thanks to all for all your suggestions. Really appreciate. But we have a constraint on Amazon EMR. It would be great if I get any pointer on how to tune Hadoop configurations(e.g. core-site.xml, mapred-site.xml etc) so that HIVE query gets executed faster.
Please help ASAP. Sorry for the urgency. Thanks, Shouvanik From: Bala Krishna Gangisetty [mailto:b...@altiscale.com] Sent: Friday, May 30, 2014 4:08 PM To: user@hive.apache.org Subject: Re: Need urgent help on hive query performance Another dimension, Try storing Hive table in ORC format. From my experience, it significantly improves the performance compare to other formats. Since you mentioned about join queries, on a side note, as a long term goal, you probably want to explore Hive with Tez. --Bala G. On Fri, May 30, 2014 at 3:59 PM, kulkarni.swar...@gmail.com<mailto:kulkarni.swar...@gmail.com> <kulkarni.swar...@gmail.com<mailto:kulkarni.swar...@gmail.com>> wrote: > It has innumerable no of joins. Since its client specific query, u understand > I cannot share. Sorry about that Like I said, Joins are slow and in not done correctly could have terrible performance. A couple of handy techniques depend on how exactly are you trying to perform the join. For instance, if you are trying to join a smaller table to a larger one, a map join could work well for you where the smaller table is kept in-memory when the join is performed. Also if you are able to break your table down to smaller buckets, you might as well be able to use a bucketed map join for instance. Following link should be helpful[1][2]. Hope this helps. [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization [2] http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables On Fri, May 30, 2014 at 5:38 PM, <shouvanik.hal...@accenture.com<mailto:shouvanik.hal...@accenture.com>> wrote: Pls find the answers From: kulkarni.swar...@gmail.com<mailto:kulkarni.swar...@gmail.com> [mailto:kulkarni.swar...@gmail.com<mailto:kulkarni.swar...@gmail.com>] Sent: Friday, May 30, 2014 3:34 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Need urgent help on hive query performance I feel it's pretty hard to answer this without understanding the following: 1. What exactly are you trying to query? CSV? Avro? .... HIVE table 2. Where is your data? HDFS? HBase? Local filesystem? Data is in s3 3. What version of hive are you using? Hive 0.12 4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized). It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that Thanks, -- Swarnim On Fri, May 30, 2014 at 5:32 PM, <shouvanik.hal...@accenture.com<mailto:shouvanik.hal...@accenture.com>> wrote: Can you please give a specific example or blog to refer to. I did not understand From: Ashish Garg [mailto:gargcreation1...@gmail.com<mailto:gargcreation1...@gmail.com>] Sent: Friday, May 30, 2014 3:31 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Need urgent help on hive query performance try partitioning the table and run the queries which are partition specific. Hope this helps. Thanks and Regards, Ashish Garg. On Fri, May 30, 2014 at 6:05 PM, <shouvanik.hal...@accenture.com<mailto:shouvanik.hal...@accenture.com>> wrote: Hi, Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query? We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes. Quick help is much appreciated. Thanks, Shouvanik ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. ______________________________________________________________________________________ www.accenture.com<http://www.accenture.com> -- Swarnim -- Swarnim