I got the question of hive large memory.
before the jvm args: export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms2000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:PermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" the para -XX:NewRatio=1 did not work ,and the young generation size is default 1g. eden space sieze is 800m. so everytime tasks come, part of the new objects will be store in to the old generation. thougn the ygc work,but fullgc didn't work .so hivesever heap size is very large. I don't know why did ' -XX:NewRatio ' not work. if you know ,pls tell me. And I modify the config: export HADOOP_OPTS="$HADOOP_OPTS -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m -Xss128k -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC -XX:+UseConcMarkSw eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -Xmn4000m can be sure the new generation is large enough ,so each ygc can be clean the data. 在 2011-12-12 19:20:35,"王锋" <wfeng1...@163.com> 写道: yes,we using jdk 1. 0.26 [hdfs@d048049 conf]$ java -version java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) I will see the document of the url,thanks very much! 在 2011-12-12 19:08:37,"alo alt" <wget.n...@googlemail.com> 写道: >Argh, increase! sry, to fast typing > >2011/12/12 alo alt <wget.n...@googlemail.com>: >> Did you update your JDK in last time? A java-dev told me that could be >> a issue in JDK _26 >> (https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some >> devs report a memory decrease when they use GC - flags. I'm quite not >> sure, sounds for me to far away. >> >> The stacks have a lot waitings, but I see nothing special. >> >> - Alex >> >> 2011/12/12 王锋 <wfeng1...@163.com>: >>> >>> The hive log: >>> >>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt >>> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)] >>> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00, >>> real=0.08 secs] >>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt >>> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)] >>> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01, >>> real=0.07 secs] >>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt >>> >>> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem >>> still be so large .I'm mad, God >>> >>> have other suggestions ? >>> >>> 在 2011-12-12 17:59:52,"alo alt" <wget.n...@googlemail.com >>>> 写道: >>>>When you start a high-load hive query can you watch the stack-traces? >>>>Its possible over the webinterface: >>>>http://jobtracker:50030/stacks >>>> >>>>- Alex >>>> >>>> >>>>2011/12/12 王锋 <wfeng1...@163.com> >>>>> >>>>> hiveserver will throw oom after several hours . >>>>> >>>>> >>>>> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote: >>>>> >>>>> what happen when you set xmx=2048m or similar? Did that have any negative >>>>> effects for running queries? >>>>> >>>>> 2011/12/12 王锋 <wfeng1...@163.com> >>>>>> >>>>>> I have modify hive jvm args. >>>>>> the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m . >>>>>> >>>>>> but the memory used by hiveserver is still large. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote: >>>>>> >>>>>> Not from the running jobs, what I am saying is the heap size of the >>>>>> Hadoop really depends on the number of files, directories on the HDFS. >>>>>> Remove old files periodically or merge small files would bring in some >>>>>> performance boost. >>>>>> >>>>>> On the Hive end, the memory consumed also depends on the queries that >>>>>> are executed. Monitor the reducers of the Hadoop job, and my experiences >>>>>> are that reduce part could be the bottleneck here. >>>>>> >>>>>> It's totally okay to host multiple Hive servers on one machine. >>>>>> >>>>>> 2011/12/12 王锋 <wfeng1...@163.com> >>>>>>> >>>>>>> is the files you said the files from runned jobs of our system? and >>>>>>> them can't be so much large. >>>>>>> >>>>>>> why is the cause of namenode. what are hiveserver doing when it use >>>>>>> so large memory? >>>>>>> >>>>>>> how do you use hive? our method using hiveserver is correct? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>>>>>> >>>>>>> Not sure if this is because of the number of files, since the namenode >>>>>>> would track each of the file and directory, and blocks. >>>>>>> See this one. >>>>>>> http://www.cloudera.com/blog/2009/02/the-small-files-problem/ >>>>>>> >>>>>>> Please correct me if I am wrong, because this seems to be more like a >>>>>>> hdfs problem which is actually irrelevant to Hive. >>>>>>> >>>>>>> Thanks >>>>>>> Aaron >>>>>>> >>>>>>> 2011/12/11 王锋 <wfeng1...@163.com> >>>>>>>> >>>>>>>> >>>>>>>> I want to know why the hiveserver use so large memory,and where the >>>>>>>> memory has been used ? >>>>>>>> >>>>>>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1...@163.com> 写道: >>>>>>>> >>>>>>>> >>>>>>>> The namenode summary: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> the mr summary >>>>>>>> >>>>>>>> >>>>>>>> and hiveserver: >>>>>>>> >>>>>>>> >>>>>>>> hiveserver jvm args: >>>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m >>>>>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC >>>>>>>> -XX:ParallelGCThreads=20 -XX:+UseParall >>>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails >>>>>>>> -XX:+PrintGCTimeStamps" >>>>>>>> >>>>>>>> now we using 3 hiveservers in the same machine. >>>>>>>> >>>>>>>> >>>>>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>>>>>>> >>>>>>>> how's the data look like? and what's the size of the cluster? >>>>>>>> >>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm one of engieer of sina.com. We have used hive ,hiveserver >>>>>>>>> several months. We have our own tasks schedule system .The system can >>>>>>>>> schedule tasks running with hiveserver by jdbc. >>>>>>>>> >>>>>>>>> But The hiveserver use mem very large, usally large than 10g. >>>>>>>>> we have 5min tasks which will be running every 5 minutes.,and have >>>>>>>>> hourly tasks .total num of tasks is 40. And we start 3 hiveserver in >>>>>>>>> one linux server,and be cycle connected . >>>>>>>>> >>>>>>>>> so why Memory of hiveserver using so large and how we do or >>>>>>>>> some suggestion from you ? >>>>>>>>> >>>>>>>>> Thanks and Best Regards! >>>>>>>>> >>>>>>>>> Royce Wang >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Alexander Lorenz >>>>> http://mapredit.blogspot.com >>>>> >>>>> P Think of the environment: please don't print this email unless you >>>>> really need to. >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>>-- >>>>Alexander Lorenz >>>>http://mapredit.blogspot.com >>>> >>>>P Think of the environment: please don't print this email unless you >>>>really need to. >>> >>> >>> >> >> >> >> -- >> Alexander Lorenz >> http://mapredit.blogspot.com >> >> P Think of the environment: please don't print this email unless you >> really need to. > > > >-- >Alexander Lorenz >http://mapredit.blogspot.com > >P Think of the environment: please don't print this email unless you >really need to.