When you start a high-load hive query can you watch the stack-traces?
Its possible over the webinterface:
http://jobtracker:50030/stacks

- Alex


2011/12/12 王锋 <wfeng1...@163.com>
>
> hiveserver will throw oom after several hours .
>
>
> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote:
>
> what happen when you set xmx=2048m or similar? Did that have any negative 
> effects for running queries?
>
> 2011/12/12 王锋 <wfeng1...@163.com>
>>
>> I have modify hive jvm args.
>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>
>> but the memory  used by hiveserver  is still large.
>>
>>
>>
>>
>>
>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote:
>>
>> Not from the running jobs, what I am saying is the heap size of the Hadoop 
>> really depends on the number of files, directories on the HDFS. Remove old 
>> files periodically or merge small files would bring in some performance 
>> boost.
>>
>> On the Hive end, the memory consumed also depends on the queries that are 
>> executed. Monitor the reducers of the Hadoop job, and my experiences are 
>> that reduce part could be the bottleneck here.
>>
>> It's totally okay to host multiple Hive servers on one machine.
>>
>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>
>>> is the files you said  the files from runned jobs  of our system? and them  
>>> can't be so much large.
>>>
>>> why is the cause of namenode.  what are hiveserver doing   when it use so 
>>> large memory?
>>>
>>> how  do you use hive? our method using hiveserver is correct?
>>>
>>> Thanks.
>>>
>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.su...@gmail.com> 写道:
>>>
>>> Not sure if this is because of the number of files, since the namenode 
>>> would track each of the file and directory, and blocks.
>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>
>>> Please correct me if I am wrong, because this seems to be more like a hdfs 
>>> problem which is actually irrelevant to Hive.
>>>
>>> Thanks
>>> Aaron
>>>
>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>
>>>>
>>>> I want to know why the hiveserver use so large memory,and where the memory 
>>>> has been used ?
>>>>
>>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1...@163.com> 写道:
>>>>
>>>>
>>>> The namenode summary:
>>>>
>>>>
>>>>
>>>> the mr summary
>>>>
>>>>
>>>> and hiveserver:
>>>>
>>>>
>>>> hiveserver jvm args:
>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m 
>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
>>>> -XX:ParallelGCThreads=20 -XX:+UseParall
>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails 
>>>> -XX:+PrintGCTimeStamps"
>>>>
>>>> now we  using 3 hiveservers in the same machine.
>>>>
>>>>
>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.su...@gmail.com> 写道:
>>>>
>>>> how's the data look like? and what's the size of the cluster?
>>>>
>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>
>>>>> Hi,
>>>>>
>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver 
>>>>> several months. We have our own tasks schedule system .The system can 
>>>>> schedule tasks running with hiveserver by jdbc.
>>>>>
>>>>>     But The hiveserver use mem very large, usally  large than 10g.   we 
>>>>> have 5min tasks which will be  running every 5 minutes.,and have hourly 
>>>>> tasks .total num of tasks  is 40. And we start 3 hiveserver in one linux 
>>>>> server,and be cycle connected .
>>>>>
>>>>>     so why Memory of  hiveserver  using so large and how we do or some 
>>>>> suggestion from you ?
>>>>>
>>>>> Thanks and Best Regards!
>>>>>
>>>>> Royce Wang
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> P Think of the environment: please don't print this email unless you really 
> need to.
>
>
>
>



--
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Reply via email to