When you start a high-load hive query can you watch the stack-traces? Its possible over the webinterface: http://jobtracker:50030/stacks
- Alex 2011/12/12 王锋 <wfeng1...@163.com> > > hiveserver will throw oom after several hours . > > > At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote: > > what happen when you set xmx=2048m or similar? Did that have any negative > effects for running queries? > > 2011/12/12 王锋 <wfeng1...@163.com> >> >> I have modify hive jvm args. >> the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m . >> >> but the memory used by hiveserver is still large. >> >> >> >> >> >> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote: >> >> Not from the running jobs, what I am saying is the heap size of the Hadoop >> really depends on the number of files, directories on the HDFS. Remove old >> files periodically or merge small files would bring in some performance >> boost. >> >> On the Hive end, the memory consumed also depends on the queries that are >> executed. Monitor the reducers of the Hadoop job, and my experiences are >> that reduce part could be the bottleneck here. >> >> It's totally okay to host multiple Hive servers on one machine. >> >> 2011/12/12 王锋 <wfeng1...@163.com> >>> >>> is the files you said the files from runned jobs of our system? and them >>> can't be so much large. >>> >>> why is the cause of namenode. what are hiveserver doing when it use so >>> large memory? >>> >>> how do you use hive? our method using hiveserver is correct? >>> >>> Thanks. >>> >>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>> >>> Not sure if this is because of the number of files, since the namenode >>> would track each of the file and directory, and blocks. >>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/ >>> >>> Please correct me if I am wrong, because this seems to be more like a hdfs >>> problem which is actually irrelevant to Hive. >>> >>> Thanks >>> Aaron >>> >>> 2011/12/11 王锋 <wfeng1...@163.com> >>>> >>>> >>>> I want to know why the hiveserver use so large memory,and where the memory >>>> has been used ? >>>> >>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1...@163.com> 写道: >>>> >>>> >>>> The namenode summary: >>>> >>>> >>>> >>>> the mr summary >>>> >>>> >>>> and hiveserver: >>>> >>>> >>>> hiveserver jvm args: >>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m >>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC >>>> -XX:ParallelGCThreads=20 -XX:+UseParall >>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails >>>> -XX:+PrintGCTimeStamps" >>>> >>>> now we using 3 hiveservers in the same machine. >>>> >>>> >>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.su...@gmail.com> 写道: >>>> >>>> how's the data look like? and what's the size of the cluster? >>>> >>>> 2011/12/11 王锋 <wfeng1...@163.com> >>>>> >>>>> Hi, >>>>> >>>>> I'm one of engieer of sina.com. We have used hive ,hiveserver >>>>> several months. We have our own tasks schedule system .The system can >>>>> schedule tasks running with hiveserver by jdbc. >>>>> >>>>> But The hiveserver use mem very large, usally large than 10g. we >>>>> have 5min tasks which will be running every 5 minutes.,and have hourly >>>>> tasks .total num of tasks is 40. And we start 3 hiveserver in one linux >>>>> server,and be cycle connected . >>>>> >>>>> so why Memory of hiveserver using so large and how we do or some >>>>> suggestion from you ? >>>>> >>>>> Thanks and Best Regards! >>>>> >>>>> Royce Wang >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > P Think of the environment: please don't print this email unless you really > need to. > > > > -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.