If DominantResourceCalculator is not used, number of containers launched
would be dependent on the amount of memory allocated to NM and the
container size. So in your cluster, if you have allocated 96 GB to NM and
container size is set to 1.5 GB, it can potentially launch 64 containers in
a node. If node has 64 CPU it is fine, otherwise it would end up
oversubscribing the CPU which can lead to slowness (due to context
switching etc. Please note that every container could potentially create
number of threads depeding on map or reduce phase.).

4 GB would anyways waste 0.5 GB as per current configuration. Yarn would
allocate 4.5 GB to fit 4 GB slot. When given to tez, it would end up using
only 80% by default. So it would have effectively used only 3.2 GB.
Instead of 4 GB, you can use 3 GB to start off with. In this case, YARN can
allocate 3 GB and tez would spin up with 2.4 GB container. Single node with
96GB could fit in 32 containers.

~Rajesh.B

On Sat, Nov 26, 2016 at 7:31 AM, Ranjan Banerjee <raba...@microsoft.com>
wrote:

> Hi Rajesh,
>    Thanks a lot for the insight. When you mean CPU are you referring to
> the vcore of yarn?
> The yarn min container size(yarn-scheduler.minimum.allocation.mb) is set
> to 1.5GB and the minimum cores per 
> container(yarn-scheduler-minimum.allocation.vcores)
> is set to 1.
>
> Are u saying that if the number of container to vcore ratio is not 1:1
> then merely increasing number of containers will not help as each container
> will not get the vcore at the same time to process the task.
>
> Thanks for the help!!
>
> Ranjan
>
> -----Original Message-----
> From: Rajesh Balamohan [mailto:rajesh.balamo...@gmail.com]
> Sent: Friday, November 25, 2016 5:40 PM
> To: dev@hive.apache.org
> Cc: dev-h...@hive.apache.org
> Subject: Re: Oversized container estimation
>
> Those are cumulative figures in the DAG level. You may want to check the
> gc logs emitted at task level to check the details on whether complete
> memory is used or not. Not sure what is the yarn-min container size
> specified in your cluster. But based on that, you may run into the risk of
> running too many containers in same node by lowering the container size
> (e.g 49 containers in 98 GB machine with 2 GB as hive container size & yarn
> min-container size. If you have only 32 CPU in your system, this would end
> up over subscribing a lot and could adversely impact job performance).
>
> ~Rajesh.B
>
> On Fri, Nov 25, 2016 at 11:03 PM, Ranjan Banerjee <raba...@microsoft.com>
> wrote:
>
> > Hi everyone,
> > I have a cluster where each container is configured at 4GB and some of
> > my queries are getting over in 30 to 40 seconds. This leads me to
> > believe that I have too much memory for my containers and I am
> > thinking of reducing the container size to
> > 1.5GB(hive.tez.container.size) but I am looking for a few more concrete
> data points to find out if really I have oversized containers?
> > I looked into the tez view of my DAG and the counters give me:
> > PHYSICAL_MEMORY_BYTES 907965628416
> > VIRTUAL_MEMORY_BYTES 1560263561216
> > I am guessing this is wrong as there is no way the query could finish
> > in
> > 20 seconds on a 98GB cluster if the actual memory required by the
> > query is 907GB. Any help to find some data points regarding
> > determination of oversized containers is very much appreciated!
> >
> > Thanks
> > Ranjan
> >
>
>
>
> --
> ~Rajesh.B
>



-- 
~Rajesh.B

Reply via email to