Re: Re:Re: Re: Re: Re: Re: Re:Re: hiveserver usage

alo alt Mon, 12 Dec 2011 23:44:50 -0800

Thanks for the line, looks like a jre-issue.

2011/12/13 王锋 <wfeng1...@163.com>:
> I got the question of hive large memory.
>
> before the jvm args:
> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms2000m
> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC
> -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit
> -XX:MaxTenuringThreshold=8 -XX:PermSize=800M -XX:MaxPermSize=800M
> -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>
> the para -XX:NewRatio=1 did not work ,and the young generation size is
> default 1g. eden space sieze is 800m.
> so everytime tasks come, part of  the new objects will be store in to the
> old generation. thougn the ygc work,but fullgc didn't work .so hivesever
> heap size is very large.
> I don't know why did ' -XX:NewRatio '  not work. if you know ,pls tell me.
>
> And I modify the config:
> export HADOOP_OPTS="$HADOOP_OPTS  -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m
> -Xss128k  -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC
> -XX:+UseConcMarkSw
> eepGC -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0
> -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P
> ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>
>  -Xmn4000m can be sure the new generation is large enough ,so each ygc can
> be clean the data.
>
>
> 在 2011-12-12 19:20:35，"王锋" <wfeng1...@163.com> 写道：
>
>
> yes,we using jdk 1. 0.26
>
> [hdfs@d048049 conf]$ java -version
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>
>
> I will see the document of  the url,thanks very much!
>
>
>
> 在 2011-12-12 19:08:37，"alo alt" <wget.n...@googlemail.com> 写道：
>>Argh, increase! sry, to fast typing
>>
>>2011/12/12 alo alt <wget.n...@googlemail.com>:
>>> Did you update your JDK in last time? A java-dev told me that could be
>>> a  issue in JDK _26
>>> (https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
>>> devs report a memory decrease when they use GC - flags. I'm quite not
>>> sure, sounds for me to far away.
>>>
>>> The stacks have a lot waitings, but I see nothing special.
>>>
>>> - Alex
>>>
>>> 2011/12/12 王锋 <wfeng1...@163.com>:
>>>>
>>>> The hive log:
>>>>
>>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
>>>> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
>>>> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
>>>> real=0.08 secs]
>>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
>>>> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
>>>> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
>>>> real=0.07 secs]
>>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>>>>
>>>> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
>>>> still be so large .I'm  mad, God
>>>>
>>>> have other suggestions ?
>>>>
>>>> 在 2011-12-12 17:59:52，"alo alt" <wget.n...@googlemail.com
>>>>> 写道：
>>>>>When you start a high-load hive query can you watch the stack-traces?
>>>>>Its possible over the webinterface:
>>>>>http://jobtracker:50030/stacks
>>>>>
>>>>>- Alex
>>>>>
>>>>>
>>>>>2011/12/12 王锋 <wfeng1...@163.com>
>>>>>>
>>>>>> hiveserver will throw oom after several hours .
>>>>>>
>>>>>>
>>>>>> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote:
>>>>>>
>>>>>> what happen when you set xmx=2048m or similar? Did that have any 
>>>>>> negative effects for running queries?
>>>>>>
>>>>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>>>>
>>>>>>> I have modify hive jvm args.
>>>>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>>>>
>>>>>>> but the memory  used by hiveserver  is still large.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote:
>>>>>>>
>>>>>>> Not from the running jobs, what I am saying is the heap size of the 
>>>>>>> Hadoop really depends on the number of files, directories on the HDFS. 
>>>>>>> Remove old files periodically or merge small files would bring in some 
>>>>>>> performance boost.
>>>>>>>
>>>>>>> On the Hive end, the memory consumed also depends on the queries that 
>>>>>>> are executed. Monitor the reducers of the Hadoop job, and my 
>>>>>>> experiences are that reduce part could be the bottleneck here.
>>>>>>>
>>>>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>>>>
>>>>>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>>>>>
>>>>>>>> is the files you said  the files from runned jobs  of our system? and 
>>>>>>>> them  can't be so much large.
>>>>>>>>
>>>>>>>> why is the cause of namenode.  what are hiveserver doing   when it use 
>>>>>>>> so large memory?
>>>>>>>>
>>>>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> 在 2011-12-12 14:27:09，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>>>>
>>>>>>>> Not sure if this is because of the number of files, since the namenode 
>>>>>>>> would track each of the file and directory, and blocks.
>>>>>>>> See this one. 
>>>>>>>> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>>>>
>>>>>>>> Please correct me if I am wrong, because this seems to be more like a 
>>>>>>>> hdfs problem which is actually irrelevant to Hive.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Aaron
>>>>>>>>
>>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I want to know why the hiveserver use so large memory,and where the 
>>>>>>>>> memory has been used ?
>>>>>>>>>
>>>>>>>>> 在 2011-12-12 10:02:44，"王锋" <wfeng1...@163.com> 写道：
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The namenode summary:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> the mr summary
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and hiveserver:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> hiveserver jvm args:
>>>>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m 
>>>>>>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
>>>>>>>>> -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails 
>>>>>>>>> -XX:+PrintGCTimeStamps"
>>>>>>>>>
>>>>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 在 2011-12-12 09:54:29，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>>>>>
>>>>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>>>>
>>>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver 
>>>>>>>>>> several months. We have our own tasks schedule system .The system 
>>>>>>>>>> can schedule tasks running with hiveserver by jdbc.
>>>>>>>>>>
>>>>>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   
>>>>>>>>>> we have 5min tasks which will be  running every 5 minutes.,and have 
>>>>>>>>>> hourly tasks .total num of tasks  is 40. And we start 3 hiveserver 
>>>>>>>>>> in one linux server,and be cycle connected .
>>>>>>>>>>
>>>>>>>>>>     so why Memory of  hiveserver  using so large and how we do or 
>>>>>>>>>> some suggestion from you ?
>>>>>>>>>>
>>>>>>>>>> Thanks and Best Regards!
>>>>>>>>>>
>>>>>>>>>> Royce Wang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Alexander Lorenz
>>>>>> http://mapredit.blogspot.com
>>>>>>
>>>>>> P Think of the environment: please don't print this email unless you 
>>>>>> really need to.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Alexander Lorenz
>>>>>http://mapredit.blogspot.com
>>>>>
>>>>>P Think of the environment: please don't print this email unless you
>>>>>really need to.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Alexander Lorenz
>>> http://mapredit.blogspot.com
>>>
>>> P Think of the environment: please don't print this email unless you
>>> really need to.
>>
>>
>>
>>--
>>Alexander Lorenz
>>http://mapredit.blogspot.com
>>
>>P Think of the environment: please don't print this email unless you
>>really need to.
>
>
>
>
>




-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Re: Re:Re: Re: Re: Re: Re: Re:Re: hiveserver usage

Reply via email to