Re: Re: Re: Re: Re: Re:Re: hiveserver usage

alo alt Mon, 12 Dec 2011 03:09:13 -0800

Argh, increase! sry, to fast typing

2011/12/12 alo alt <wget.n...@googlemail.com>:
> Did you update your JDK in last time? A java-dev told me that could be
> a  issue in JDK _26
> (https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
> devs report a memory decrease when they use GC - flags. I'm quite not
> sure, sounds for me to far away.
>
> The stacks have a lot waitings, but I see nothing special.
>
> - Alex
>
> 2011/12/12 王锋 <wfeng1...@163.com>:
>>
>> The hive log:
>>
>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
>> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
>> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
>> real=0.08 secs]
>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
>> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
>> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
>> real=0.07 secs]
>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>>
>> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
>> still be so large .I'm  mad, God
>>
>> have other suggestions ?
>>
>> 在 2011-12-12 17:59:52，"alo alt" <wget.n...@googlemail.com
>>> 写道：
>>>When you start a high-load hive query can you watch the stack-traces?
>>>Its possible over the webinterface:
>>>http://jobtracker:50030/stacks
>>>
>>>- Alex
>>>
>>>
>>>2011/12/12 王锋 <wfeng1...@163.com>
>>>>
>>>> hiveserver will throw oom after several hours .
>>>>
>>>>
>>>> At 2011-12-12 17:39:21,"alo alt" <wget.n...@googlemail.com> wrote:
>>>>
>>>> what happen when you set xmx=2048m or similar? Did that have any negative 
>>>> effects for running queries?
>>>>
>>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>>
>>>>> I have modify hive jvm args.
>>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>>
>>>>> but the memory  used by hiveserver  is still large.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> wrote:
>>>>>
>>>>> Not from the running jobs, what I am saying is the heap size of the 
>>>>> Hadoop really depends on the number of files, directories on the HDFS. 
>>>>> Remove old files periodically or merge small files would bring in some 
>>>>> performance boost.
>>>>>
>>>>> On the Hive end, the memory consumed also depends on the queries that are 
>>>>> executed. Monitor the reducers of the Hadoop job, and my experiences are 
>>>>> that reduce part could be the bottleneck here.
>>>>>
>>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>>
>>>>> 2011/12/12 王锋 <wfeng1...@163.com>
>>>>>>
>>>>>> is the files you said  the files from runned jobs  of our system? and 
>>>>>> them  can't be so much large.
>>>>>>
>>>>>> why is the cause of namenode.  what are hiveserver doing   when it use 
>>>>>> so large memory?
>>>>>>
>>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> 在 2011-12-12 14:27:09，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>>
>>>>>> Not sure if this is because of the number of files, since the namenode 
>>>>>> would track each of the file and directory, and blocks.
>>>>>> See this one. 
>>>>>> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>>
>>>>>> Please correct me if I am wrong, because this seems to be more like a 
>>>>>> hdfs problem which is actually irrelevant to Hive.
>>>>>>
>>>>>> Thanks
>>>>>> Aaron
>>>>>>
>>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>>
>>>>>>>
>>>>>>> I want to know why the hiveserver use so large memory,and where the 
>>>>>>> memory has been used ?
>>>>>>>
>>>>>>> 在 2011-12-12 10:02:44，"王锋" <wfeng1...@163.com> 写道：
>>>>>>>
>>>>>>>
>>>>>>> The namenode summary:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> the mr summary
>>>>>>>
>>>>>>>
>>>>>>> and hiveserver:
>>>>>>>
>>>>>>>
>>>>>>> hiveserver jvm args:
>>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m 
>>>>>>> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
>>>>>>> -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails 
>>>>>>> -XX:+PrintGCTimeStamps"
>>>>>>>
>>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>>
>>>>>>>
>>>>>>> 在 2011-12-12 09:54:29，"Aaron Sun" <aaron.su...@gmail.com> 写道：
>>>>>>>
>>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>>
>>>>>>> 2011/12/11 王锋 <wfeng1...@163.com>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver 
>>>>>>>> several months. We have our own tasks schedule system .The system can 
>>>>>>>> schedule tasks running with hiveserver by jdbc.
>>>>>>>>
>>>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   
>>>>>>>> we have 5min tasks which will be  running every 5 minutes.,and have 
>>>>>>>> hourly tasks .total num of tasks  is 40. And we start 3 hiveserver in 
>>>>>>>> one linux server,and be cycle connected .
>>>>>>>>
>>>>>>>>     so why Memory of  hiveserver  using so large and how we do or some 
>>>>>>>> suggestion from you ?
>>>>>>>>
>>>>>>>> Thanks and Best Regards!
>>>>>>>>
>>>>>>>> Royce Wang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alexander Lorenz
>>>> http://mapredit.blogspot.com
>>>>
>>>> P Think of the environment: please don't print this email unless you 
>>>> really need to.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>--
>>>Alexander Lorenz
>>>http://mapredit.blogspot.com
>>>
>>>P Think of the environment: please don't print this email unless you
>>>really need to.
>>
>>
>>
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> P Think of the environment: please don't print this email unless you
> really need to.




-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Re: Re: Re: Re: Re: Re:Re: hiveserver usage

Reply via email to