Yes, when you set a property via the 'set' command on the Hive CLI, they live for the life of that particular client session.
There is not 'golden rule' that increases your performance; it all depends on your installation, data and query pattern. Based on these you can consider leveraging some join optimizations, partitions, compression techniques, storage formats.. if they really make sense to your use-case and if numbers prove that. You might want to take a look at some these articles, which can be starting points for you: http://kb.tableausoftware.com/articles/knowledgebase/cloudera-hadoop-hive-performance http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ Thanks, Bharath On Tue, Oct 30, 2012 at 7:21 PM, sagar nikam <sagarnikam...@gmail.com>wrote: > Respected sir, > > I am dealing with a database (2.5 GB) having some tables only 40 row > to some having 9 million rows data. > when I am doing any query for large table it takes more time. > I want results in less time > > small query--> > ========================================================================= > hive> select count(*) from cidade; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201210300724_0003, Tracking URL = > http://localhost:50030/jobdetails.jsp?jobid=job_201210300724_0003 > Kill Command = /home/trendwise/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job > -Dmapred.job.tracker=localhost:54311 -kill job_201210300724_0003 > 2012-10-30 07:37:41,588 Stage-1 map = 0%, reduce = 0% > 2012-10-30 07:37:57,493 Stage-1 map = 100%, reduce = 0% > 2012-10-30 07:38:17,905 Stage-1 map = 100%, reduce = 33% > 2012-10-30 07:38:20,965 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201210300724_0003 > OK > 5566 > Time taken: 50.172 seconds > > ================================================================================================================= > hdfs-site.xml > > <configuration> > <property> > <name>dfs.replication</name> > <value>3</value> > <description>Default block replication. > The actual number of replications can be specified when the file is > created. > The default is used if replication is not specified in create time. > </description> > </property> > > <property> > <name>dfs.block.size</name> > <value>131072</value> > <description>Default block replication. > The actual number of replications can be specified when the file is > created. > The default is used if replication is not specified in create time. > </description> > </property> > </configuration> > > > does these setting affects performance of hive? > dfs.replication=3 > dfs.block.size=131072 > > can i set it from hive prompt as > hive>set dfs.replication=5 > Is this value remains for a perticular session only ? > or Is it better to change it in .xml file ? > > > > which more setting should i do to incrase performance ? > > > > Sagar Nikam > Trendwise Analytics > Bangalore,INDIA > >