Re:Re: Re: Re: Re: Re:Re: hiveserver usage

王锋 Mon, 12 Dec 2011 02:49:25 -0800

The hive log:


Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)] 
9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00, 
real=0.08 secs] 
Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)] 
8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01, 
real=0.07 secs] 
Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt


Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem 
still be so large .I'm  mad, God


have other suggestions ?
在 2011-12-12 17:59:52，"alo alt" <wget.n...@googlemail.com> 写道：
>When you start a high-load hive query can you watch the stack-traces?
>Its possible over the webinterface:
>http://jobtracker:50030/stacks
>
>- Alex
>
>
>2011/12/12 王锋 <wfeng1...@163.com>
>>
>> hiveserver will throw oom after several hours .
>>
>>
>> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote:
>>
>> what happen when you set xmx=2048m or similar? Did that have any negative 
>> effects for running queries?
>>
>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>
>>> I have modify hive jvm args.
>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>
>>> but the memory  used by hiveserver  is still large.
>>>
>>>
>>>
>>>
>>>
>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote:
>>>
>>> Not from the running jobs, what I am saying is the heap size of the Hadoop 
>>> really depends on the number of files, directories on the HDFS. Remove old 
>>> files periodically or merge small files would bring in some performance 
>>> boost.
>>>
>>> On the Hive end, the memory consumed also depends on the queries that are 
>>> executed. Monitor the reducers of the Hadoop job, and my experiences are 
>>> that reduce part could be the bottleneck here.
>>>
>>> It's totally okay to host multiple Hive servers on one machine.
>>>
>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>
>>>> is the files you said  the files from runned jobs  of our system? and them 
>>>>  can't be so much large.
>>>>
>>>> why is the cause of namenode.  what are hiveserver doing   when it use so 
>>>> large memory?
>>>>
>>>> how  do you use hive? our method using hiveserver is correct?
>>>>
>>>> Thanks.
>>>>
>>>> 在 2011-12-12 14:27:09，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>
>>>> Not sure if this is because of the number of files, since the namenode 
>>>> would track each of the file and directory, and blocks.
>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>
>>>> Please correct me if I am wrong, because this seems to be more like a hdfs 
>>>> problem which is actually irrelevant to Hive.
>>>>
>>>> Thanks
>>>> Aaron
>>>>
>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>
>>>>>
>>>>> I want to know why the hiveserver use so large memory,and where the 
>>>>> memory has been used ?
>>>>>
>>>>> 在 2011-12-12 10:02:44，"王锋" <wfeng1...@163.com> 写道：
>>>>>
>>>>>
>>>>> The namenode summary:
>>>>>
>>>>>
>>>>>
>>>>> the mr summary
>>>>>
>>>>>
>>>>> and hiveserver:
>>>>>
>>>>>
>>>>> hiveserver jvm args:
>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m 
>>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
>>>>> -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails 
>>>>> -XX:+PrintGCTimeStamps"
>>>>>
>>>>> now we  using 3 hiveservers in the same machine.
>>>>>
>>>>>
>>>>> 在 2011-12-12 09:54:29，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>
>>>>> how's the data look like? and what's the size of the cluster?
>>>>>
>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver 
>>>>>> several months. We have our own tasks schedule system .The system can 
>>>>>> schedule tasks running with hiveserver by jdbc.
>>>>>>
>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   we 
>>>>>> have 5min tasks which will be  running every 5 minutes.,and have hourly 
>>>>>> tasks .total num of tasks  is 40. And we start 3 hiveserver in one linux 
>>>>>> server,and be cycle connected .
>>>>>>
>>>>>>     so why Memory of  hiveserver  using so large and how we do or some 
>>>>>> suggestion from you ?
>>>>>>
>>>>>> Thanks and Best Regards!
>>>>>>
>>>>>> Royce Wang
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> P Think of the environment: please don't print this email unless you really 
>> need to.
>>
>>
>>
>>
>
>
>
>--
>Alexander Lorenz
>http://mapredit.blogspot.com
>
>P Think of the environment: please don't print this email unless you
>really need to.

Re:Re: Re: Re: Re: Re:Re: hiveserver usage

Reply via email to