Thanks for the line, looks like a jre-issue. 2011/12/13 王锋 <wfeng1...@163.com>: > I got the question of hive large memory. > > before the jvm args: > export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms2000m > -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC > -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit > -XX:MaxTenuringThreshold=8 -XX:PermSize=800M -XX:MaxPermSize=800M > -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" > > the para -XX:NewRatio=1 did not work ,and the young generation size is > default 1g. eden space sieze is 800m. > so everytime tasks come, part of the new objects will be store in to the > old generation. thougn the ygc work,but fullgc didn't work .so hivesever > heap size is very large. > I don't know why did ' -XX:NewRatio ' not work. if you know ,pls tell me. > > And I modify the config: > export HADOOP_OPTS="$HADOOP_OPTS -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m > -Xss128k -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC > -XX:+UseConcMarkSw > eepGC -XX:CMSInitiatingOccupancyFraction=70 > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 > -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P > ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" > > -Xmn4000m can be sure the new generation is large enough ,so each ygc can > be clean the data. > > > 在 2011-12-12 19:20:35,"王锋" <wfeng1...@163.com> 写道: > > > yes,we using jdk 1. 0.26 > > [hdfs@d048049 conf]$ java -version > java version "1.6.0_26" > Java(TM) SE Runtime Environment (build 1.6.0_26-b03) > Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) > > > I will see the document of the url,thanks very much! > > > > 在 2011-12-12 19:08:37,"alo alt" <wget.n...@googlemail.com> 写道: >>Argh, increase! sry, to fast typing >> >>2011/12/12 alo alt <wget.n...@googlemail.com>: >>> Did you update your JDK in last time? A java-dev told me that could be >>> a issue in JDK _26 >>> (https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some >>> devs report a memory decrease when they use GC - flags. I'm quite not >>> sure, sounds for me to far away. >>> >>> The stacks have a lot waitings, but I see nothing special. >>> >>> - Alex >>> >>> 2011/12/12 王锋 <wfeng1...@163.com>: >>>> >>>> The hive log: >>>> >>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt >>>> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)] >>>> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00, >>>> real=0.08 secs] >>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt >>>> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)] >>>> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01, >>>> real=0.07 secs] >>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt >>>> >>>> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem >>>> still be so large .I'm mad, God >>>> >>>> have other suggestions ? >>>> >>>> 在 2011-12-12 17:59:52,"alo alt" <wget.n...@googlemail.com >>>>> 写道: >>>>>When you start a high-load hive query can you watch the stack-traces? >>>>>Its possible over the webinterface: >>>>>http://jobtracker:50030/stacks >>>>> >>>>>- Alex >>>>> >>>>> >>>>>2011/12/12 王锋 <wfeng1...@163.com> >>>>>> >>>>>> hiveserver will throw oom after several hours . >>>>>> >>>>>> >>>>>> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote: >>>>>> >>>>>> what happen when you set xmx=2048m or similar? Did that have any >>>>>> negative effects for running queries? >>>>>> >>>>>> 2011/12/12 王锋 <wfeng1...@163.com> >>>>>>> >>>>>>> I have modify hive jvm args. >>>>>>> the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m . >>>>>>> >>>>>>> but the memory used by hiveserver is still large. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote: >>>>>>> >>>>>>> Not from the running jobs, what I am saying is the heap size of the >>>>>>> Hadoop really depends on the number of files, directories on the HDFS. >>>>>>> Remove old files periodically or merge small files would bring in some >>>>>>> performance boost. >>>>>>> >>>>>>> On the Hive end, the memory consumed also depends on the queries that >>>>>>> are executed. Monitor the reducers of the Hadoop job, and my >>>>>>> experiences are that reduce part could be the bottleneck here. >>>>>>> >>>>>>> It's totally okay to host multiple Hive servers on one machine. >>>>>>> >>>>>>> 2011/12/12 王锋 <wfeng1...@163.com> >>>>>>>> >>>>>>>> is the files you said the files from runned jobs of our system? and >>>>>>>> them can't be so much large. >>>>>>>> >>>>>>>> why is the cause of namenode. what are hiveserver doing when it use >>>>>>>> so large memory? >>>>>>>> >>>>>>>> how do you use hive? our method using hiveserver is correct? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>>>>>>> >>>>>>>> Not sure if this is because of the number of files, since the namenode >>>>>>>> would track each of the file and directory, and blocks. >>>>>>>> See this one. >>>>>>>> http://www.cloudera.com/blog/2009/02/the-small-files-problem/ >>>>>>>> >>>>>>>> Please correct me if I am wrong, because this seems to be more like a >>>>>>>> hdfs problem which is actually irrelevant to Hive. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Aaron >>>>>>>> >>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com> >>>>>>>>> >>>>>>>>> >>>>>>>>> I want to know why the hiveserver use so large memory,and where the >>>>>>>>> memory has been used ? >>>>>>>>> >>>>>>>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1...@163.com> 写道: >>>>>>>>> >>>>>>>>> >>>>>>>>> The namenode summary: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> the mr summary >>>>>>>>> >>>>>>>>> >>>>>>>>> and hiveserver: >>>>>>>>> >>>>>>>>> >>>>>>>>> hiveserver jvm args: >>>>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m >>>>>>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC >>>>>>>>> -XX:ParallelGCThreads=20 -XX:+UseParall >>>>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails >>>>>>>>> -XX:+PrintGCTimeStamps" >>>>>>>>> >>>>>>>>> now we using 3 hiveservers in the same machine. >>>>>>>>> >>>>>>>>> >>>>>>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>>>>>>>> >>>>>>>>> how's the data look like? and what's the size of the cluster? >>>>>>>>> >>>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm one of engieer of sina.com. We have used hive ,hiveserver >>>>>>>>>> several months. We have our own tasks schedule system .The system >>>>>>>>>> can schedule tasks running with hiveserver by jdbc. >>>>>>>>>> >>>>>>>>>> But The hiveserver use mem very large, usally large than 10g. >>>>>>>>>> we have 5min tasks which will be running every 5 minutes.,and have >>>>>>>>>> hourly tasks .total num of tasks is 40. And we start 3 hiveserver >>>>>>>>>> in one linux server,and be cycle connected . >>>>>>>>>> >>>>>>>>>> so why Memory of hiveserver using so large and how we do or >>>>>>>>>> some suggestion from you ? >>>>>>>>>> >>>>>>>>>> Thanks and Best Regards! >>>>>>>>>> >>>>>>>>>> Royce Wang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Alexander Lorenz >>>>>> http://mapredit.blogspot.com >>>>>> >>>>>> P Think of the environment: please don't print this email unless you >>>>>> really need to. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>Alexander Lorenz >>>>>http://mapredit.blogspot.com >>>>> >>>>>P Think of the environment: please don't print this email unless you >>>>>really need to. >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Alexander Lorenz >>> http://mapredit.blogspot.com >>> >>> P Think of the environment: please don't print this email unless you >>> really need to. >> >> >> >>-- >>Alexander Lorenz >>http://mapredit.blogspot.com >> >>P Think of the environment: please don't print this email unless you >>really need to. > > > > >
-- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.