what happen when you set xmx=2048m or similar? Did that have any negative effects for running queries?
2011/12/12 王锋 <wfeng1...@163.com> > I have modify hive jvm args. > the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m . > > but the memory used by hiveserver is still large. > > > > > > At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote: > > Not from the running jobs, what I am saying is the heap size of the Hadoop > really depends on the number of files, directories on the HDFS. Remove old > files periodically or merge small files would bring in some performance > boost. > > On the Hive end, the memory consumed also depends on the queries that are > executed. Monitor the reducers of the Hadoop job, and my experiences are > that reduce part could be the bottleneck here. > > It's totally okay to host multiple Hive servers on one machine. > > 2011/12/12 王锋 <wfeng1...@163.com> > >> is the files you said the files from runned jobs of our system? and >> them can't be so much large. >> >> why is the cause of namenode. what are hiveserver doing when it use so >> large memory? >> >> how do you use hive? our method using hiveserver is correct? >> >> Thanks. >> >> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.su...@gmail.com> 写道: >> >> Not sure if this is because of the number of files, since the namenode >> would track each of the file and directory, and blocks. >> See this one. >> http://www.cloudera.com/blog/2009/02/the-small-files-problem/ >> >> Please correct me if I am wrong, because this seems to be more like a >> hdfs problem which is actually irrelevant to Hive. >> >> Thanks >> Aaron >> >> 2011/12/11 王锋 <wfeng1...@163.com> >> >>> >>> I want to know why the hiveserver use so large memory,and where the >>> memory has been used ? >>> >>> 在 2011-12-12 10:02:44,"王锋" <wfeng1...@163.com> 写道: >>> >>> >>> The namenode summary: >>> >>> >>> >>> the mr summary >>> >>> >>> and hiveserver: >>> >>> >>> hiveserver jvm args: >>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m >>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC >>> -XX:ParallelGCThreads=20 -XX:+UseParall >>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails >>> -XX:+PrintGCTimeStamps" >>> >>> now we using 3 hiveservers in the same machine. >>> >>> >>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>> >>> how's the data look like? and what's the size of the cluster? >>> >>> 2011/12/11 王锋 <wfeng1...@163.com> >>> >>>> Hi, >>>> >>>> I'm one of engieer of sina.com. We have used hive ,hiveserver >>>> several months. We have our own tasks schedule system .The system can >>>> schedule tasks running with hiveserver by jdbc. >>>> >>>> But The hiveserver use mem very large, usally large than 10g. we >>>> have 5min tasks which will be running every 5 minutes.,and have hourly >>>> tasks .total num of tasks is 40. And we start 3 hiveserver in one linux >>>> server,and be cycle connected . >>>> >>>> so why Memory of hiveserver using so large and how we do or some >>>> suggestion from you ? >>>> >>>> Thanks and Best Regards! >>>> >>>> Royce Wang >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >> >> >> > > > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
<<hiveserver1.png>>
<<mr.png>>
<<namenode.png>>
<<hiveserver.png>>