Re:Re:Re: Re: Re: Re: Re: Re:Re: hiveserver usage

王锋 Mon, 12 Dec 2011 22:26:36 -0800

I got the question of hive large memory.


before the jvm args: 
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms2000m 
-XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
-XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit 
-XX:MaxTenuringThreshold=8 -XX:PermSize=800M -XX:MaxPermSize=800M 
-XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


the para -XX:NewRatio=1 did not work ,and the young generation size is default 
1g. eden space sieze is 800m.
so everytime tasks come, part of  the new objects will be store in to the old 
generation. thougn the ygc work,but fullgc didn't work .so hivesever heap size 
is very large.
I don't know why did ' -XX:NewRatio '  not work. if you know ,pls tell me.


And I modify the config:
export HADOOP_OPTS="$HADOOP_OPTS  -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m 
-Xss128k  -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC 
-XX:+UseConcMarkSw
eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection 
-XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit 
-XX:MaxTenuringThreshold=8 -XX:P
ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" 


 -Xmn4000m can be sure the new generation is large enough ,so each ygc can be 
clean the data.



在 2011-12-12 19:20:35，"王锋" <wfeng1...@163.com> 写道：


yes,we using jdk 1. 0.26


[hdfs@d048049 conf]$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)


I will see the document of  the url,thanks very much!



在 2011-12-12 19:08:37，"alo alt" <wget.n...@googlemail.com> 写道：
>Argh, increase! sry, to fast typing
>
>2011/12/12 alo alt <wget.n...@googlemail.com>:
>> Did you update your JDK in last time? A java-dev told me that could be
>> a  issue in JDK _26
>> (https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
>> devs report a memory decrease when they use GC - flags. I'm quite not
>> sure, sounds for me to far away.
>>
>> The stacks have a lot waitings, but I see nothing special.
>>
>> - Alex
>>
>> 2011/12/12 王锋 <wfeng1...@163.com>:
>>>
>>> The hive log:
>>>
>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
>>> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
>>> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
>>> real=0.08 secs]
>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
>>> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
>>> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
>>> real=0.07 secs]
>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>>>
>>> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
>>> still be so large .I'm  mad, God
>>>
>>> have other suggestions ?
>>>
>>> 在 2011-12-12 17:59:52，"alo alt" <wget.n...@googlemail.com
>>>> 写道：
>>>>When you start a high-load hive query can you watch the stack-traces?
>>>>Its possible over the webinterface:
>>>>http://jobtracker:50030/stacks
>>>>
>>>>- Alex
>>>>
>>>>
>>>>2011/12/12 王锋 <wfeng1...@163.com>
>>>>>
>>>>> hiveserver will throw oom after several hours .
>>>>>
>>>>>
>>>>> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote:
>>>>>
>>>>> what happen when you set xmx=2048m or similar? Did that have any negative 
>>>>> effects for running queries?
>>>>>
>>>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>>>
>>>>>> I have modify hive jvm args.
>>>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>>>
>>>>>> but the memory  used by hiveserver  is still large.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote:
>>>>>>
>>>>>> Not from the running jobs, what I am saying is the heap size of the 
>>>>>> Hadoop really depends on the number of files, directories on the HDFS. 
>>>>>> Remove old files periodically or merge small files would bring in some 
>>>>>> performance boost.
>>>>>>
>>>>>> On the Hive end, the memory consumed also depends on the queries that 
>>>>>> are executed. Monitor the reducers of the Hadoop job, and my experiences 
>>>>>> are that reduce part could be the bottleneck here.
>>>>>>
>>>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>>>
>>>>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>>>>
>>>>>>> is the files you said  the files from runned jobs  of our system? and 
>>>>>>> them  can't be so much large.
>>>>>>>
>>>>>>> why is the cause of namenode.  what are hiveserver doing   when it use 
>>>>>>> so large memory?
>>>>>>>
>>>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> 在 2011-12-12 14:27:09，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>>>
>>>>>>> Not sure if this is because of the number of files, since the namenode 
>>>>>>> would track each of the file and directory, and blocks.
>>>>>>> See this one. 
>>>>>>> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>>>
>>>>>>> Please correct me if I am wrong, because this seems to be more like a 
>>>>>>> hdfs problem which is actually irrelevant to Hive.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Aaron
>>>>>>>
>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> I want to know why the hiveserver use so large memory,and where the 
>>>>>>>> memory has been used ?
>>>>>>>>
>>>>>>>> 在 2011-12-12 10:02:44，"王锋" <wfeng1...@163.com> 写道：
>>>>>>>>
>>>>>>>>
>>>>>>>> The namenode summary:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> the mr summary
>>>>>>>>
>>>>>>>>
>>>>>>>> and hiveserver:
>>>>>>>>
>>>>>>>>
>>>>>>>> hiveserver jvm args:
>>>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m 
>>>>>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
>>>>>>>> -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails 
>>>>>>>> -XX:+PrintGCTimeStamps"
>>>>>>>>
>>>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2011-12-12 09:54:29，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>>>>
>>>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>>>
>>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver 
>>>>>>>>> several months. We have our own tasks schedule system .The system can 
>>>>>>>>> schedule tasks running with hiveserver by jdbc.
>>>>>>>>>
>>>>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   
>>>>>>>>> we have 5min tasks which will be  running every 5 minutes.,and have 
>>>>>>>>> hourly tasks .total num of tasks  is 40. And we start 3 hiveserver in 
>>>>>>>>> one linux server,and be cycle connected .
>>>>>>>>>
>>>>>>>>>     so why Memory of  hiveserver  using so large and how we do or 
>>>>>>>>> some suggestion from you ?
>>>>>>>>>
>>>>>>>>> Thanks and Best Regards!
>>>>>>>>>
>>>>>>>>> Royce Wang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Alexander Lorenz
>>>>> http://mapredit.blogspot.com
>>>>>
>>>>> P Think of the environment: please don't print this email unless you 
>>>>> really need to.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Alexander Lorenz
>>>>http://mapredit.blogspot.com
>>>>
>>>>P Think of the environment: please don't print this email unless you
>>>>really need to.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> P Think of the environment: please don't print this email unless you
>> really need to.
>
>
>
>-- 
>Alexander Lorenz
>http://mapredit.blogspot.com
>
>P Think of the environment: please don't print this email unless you
>really need to.

Re:Re:Re: Re: Re: Re: Re: Re:Re: hiveserver usage

Reply via email to