Re: problem in Hive performance

Bharath Ganesh Thu, 08 Nov 2012 00:07:18 -0800

Yes, when you set a property via the 'set' command on the Hive CLI, they
live for the life of that particular client session.


There is not 'golden rule' that increases your performance; it all depends
on your installation, data and query pattern. Based on these you can
consider leveraging some join optimizations, partitions, compression
techniques, storage formats.. if they really make sense to your use-case
and if numbers prove that.

You might want to take a look at some these articles, which can be starting
points for you:
http://kb.tableausoftware.com/articles/knowledgebase/cloudera-hadoop-hive-performance
http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/

Thanks,
Bharath



On Tue, Oct 30, 2012 at 7:21 PM, sagar nikam <sagarnikam...@gmail.com>wrote:

> Respected sir,
>
>      I am dealing with a database (2.5 GB) having some tables only 40 row
> to some having 9 million rows data.
> when I am doing any query for large table it takes more time.
> I want results in less time
>
> small query-->
> =========================================================================
> hive> select count(*) from cidade;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201210300724_0003, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201210300724_0003
> Kill Command = /home/trendwise/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=localhost:54311 -kill job_201210300724_0003
> 2012-10-30 07:37:41,588 Stage-1 map = 0%,  reduce = 0%
> 2012-10-30 07:37:57,493 Stage-1 map = 100%,  reduce = 0%
> 2012-10-30 07:38:17,905 Stage-1 map = 100%,  reduce = 33%
> 2012-10-30 07:38:20,965 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201210300724_0003
> OK
> 5566
> Time taken: 50.172 seconds
>
> =================================================================================================================
> hdfs-site.xml
>
> <configuration>
> <property>
>   <name>dfs.replication</name>
>   <value>3</value>
>   <description>Default block replication.
>   The actual number of replications can be specified when the file is
> created.
>   The default is used if replication is not specified in create time.
>   </description>
> </property>
>
> <property>
>   <name>dfs.block.size</name>
>   <value>131072</value>
>   <description>Default block replication.
>   The actual number of replications can be specified when the file is
> created.
>   The default is used if replication is not specified in create time.
>   </description>
> </property>
> </configuration>
>
>
> does these setting affects performance of hive?
> dfs.replication=3
> dfs.block.size=131072
>
> can i set it from hive prompt as
> hive>set dfs.replication=5
> Is this value remains for a perticular session only ?
> or Is it better to change it in .xml file ?
>
>
>
> which more setting should i do to incrase performance ?
>
>
>
> Sagar Nikam
> Trendwise Analytics
> Bangalore,INDIA
>
>

Re: problem in Hive performance

Reply via email to