Hi Rajesh,
   Thanks a lot for the insight. When you mean CPU are you referring to the 
vcore of yarn?
The yarn min container size(yarn-scheduler.minimum.allocation.mb) is set to 
1.5GB and the minimum cores per 
container(yarn-scheduler-minimum.allocation.vcores) is set to 1.

Are u saying that if the number of container to vcore ratio is not 1:1 then 
merely increasing number of containers will not help as each container will not 
get the vcore at the same time to process the task.

Thanks for the help!!

Ranjan

-----Original Message-----
From: Rajesh Balamohan [mailto:rajesh.balamo...@gmail.com] 
Sent: Friday, November 25, 2016 5:40 PM
To: dev@hive.apache.org
Cc: dev-h...@hive.apache.org
Subject: Re: Oversized container estimation

Those are cumulative figures in the DAG level. You may want to check the gc 
logs emitted at task level to check the details on whether complete memory is 
used or not. Not sure what is the yarn-min container size specified in your 
cluster. But based on that, you may run into the risk of running too many 
containers in same node by lowering the container size (e.g 49 containers in 98 
GB machine with 2 GB as hive container size & yarn min-container size. If you 
have only 32 CPU in your system, this would end up over subscribing a lot and 
could adversely impact job performance).

~Rajesh.B

On Fri, Nov 25, 2016 at 11:03 PM, Ranjan Banerjee <raba...@microsoft.com>
wrote:

> Hi everyone,
> I have a cluster where each container is configured at 4GB and some of 
> my queries are getting over in 30 to 40 seconds. This leads me to 
> believe that I have too much memory for my containers and I am 
> thinking of reducing the container size to 
> 1.5GB(hive.tez.container.size) but I am looking for a few more concrete data 
> points to find out if really I have oversized containers?
> I looked into the tez view of my DAG and the counters give me:
> PHYSICAL_MEMORY_BYTES 907965628416
> VIRTUAL_MEMORY_BYTES 1560263561216
> I am guessing this is wrong as there is no way the query could finish 
> in
> 20 seconds on a 98GB cluster if the actual memory required by the 
> query is 907GB. Any help to find some data points regarding 
> determination of oversized containers is very much appreciated!
>
> Thanks
> Ranjan
>



--
~Rajesh.B

Reply via email to