Hey Malcolm,

**YARN**

*yarn.nodemanager.resource.memory-mb (Amount of physical memory, in MB,
that can be allocated for containers)*
The value for this depends on if there are any other side-car applications
on the machine that the node-manager runs on.
eg. on your 32GB machine - if other apps on the machine take 4G at peak to
properly function, set this value to 28G.

*yarn.nodemanager.resource.cpu-vcores (Number of vcores that can be
allocated for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of physical
cores used by YARN containers.)*
vCores are used to slice up your physical CPUs into allocatable units to
each container.
eg: with your 4 CPUs (hyperthreaded ?) if you set this value to "8" - using
`cluster-manager.container.cpu.cores` set to `2` will guarantee al-least 1
physical CPU to each container.
If you run on a heterogeneous cluster (VMs/hosts with different SKUs) I
would recommend setting the following to :
yarn.nodemanager.resource.cpu-vcores: = -1
yarn.nodemanager.resource.pcores-vcores-multiplier = 2
yarn.nodemanager.resource.detect-hardware-capabilities = true

*yarn.nodemanager.resource.percentage-physical-cpu-limit (Percentage of CPU
that can be allocated for containers. This setting allows users to limit
the amount of CPU that YARN containers use. The default is to use 100% of
CPU)*
This is similar to yarn.nodemanager.resource.memory-mb above but WRT to CPU
usage instead of memory.


**SAMZA**
I would recommend looking at
https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html
to
find more information for each of these configs, but I briefly summarized
them below.

*cluster-manager.container.memory.mb (How much memory, in megabytes, to
request from the resource manager per container of your job)*
The value for this depends on your workload.

*cluster-manager.container.cpu.cores (The number of CPU cores to request
per container of your job. Each node in the cluster has a certain number of
CPU cores available, so this number (along with
cluster-manager.container.memory.mb) determines how many containers can be
run on one machine)*
With YARN - cores are a proxy for a vCore number that your container
requires to process data. The value for this depends on your workload.

*yarn.am.container.memory.mb (How much memory, in megabytes, to request
from the AM container of your job)*
This is usually a constant as the AM doesn't have an actual data workload.
It should be more than safe to set this to 2048 (1024 should be fine in
most cases).

*task.opts (The JVM flags that you want to pass on to the processing
containers)*
eg flags: -Xmx, -Xms, -XX:+HeapDumpOnOutOfMemoryError etc

*yarn.am.opts (The JVM flags that you want to pass on to the AM container)*
Similar to task.opts

*job.container.count (Number of containers you want to use to run the job)*
Value depends on your workload

*job.container.thread.pool.size (Number of threads in the container
thread-pool that will be used to run synchronous operations of each task in
parallel)*
Value depends on your workload and if you are using StreamTask.


On another, similar theme, has anybody tried running Samza on Hadoop 2.8.5?
> I'm experimenting with it right now, and can't get it to recognize the CPU
> core configuration. I'm curious if anybody knows about an API change
> between 2.7.x and 2.8.x in how applications are requested.


I'm sorry but I don't fully understand what you mean, can you provide
the stacktrace or describe the error you see in more detail ?

Thanks,
Abhishek

On Mon, Feb 24, 2020 at 7:05 PM Malcolm McFarland <mmcfarl...@cavulus.com>
wrote:

> On another, similar theme, has anybody tried running Samza on Hadoop 2.8.5?
> I'm experimenting with it right now, and can't get it to recognize the CPU
> core configuration. I'm curious if anybody knows about an API change
> between 2.7.x and 2.8.x in how applications are requested.
>
> What would the effect be on a container that was only allowed one CPU core?
> Would it be ok to trade that off for more containers?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>
>
> On Sun, Feb 23, 2020 at 11:56 AM Malcolm McFarland <mmcfarl...@cavulus.com
> >
> wrote:
>
> > Hey folks,
> >
> > Does anybody have recommendations for resource allocation configs when
> > running Samza on YARN? Ie, for a box that has 32GB of memory and 4 CPUs
> --
> > and let's say we're running a Samza task with 1000 partitions --  any
> > suggestions on what to set for:
> >
> > *YARN*
> > yarn.nodemanager.resource.memory-mb
> > yarn.nodemanager.resource.cpu-vcores
> > yarn.nodemanager.resource.percentage-physical-cpu-limit
> >
> > *SAMZA*
> > cluster-manager.container.memory.mb
> > cluster-manager.container.cpu.cores
> > yarn.am.container.memory.mb
> > task.opts
> > yarn.am.opts
> > job.container.count
> > job.container.thread.pool.size
> >
> > Also, do you recommend scaling up in box YARN node processing capability,
> > or out in YARN node count?
> >
> > Thanks,
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
>

Reply via email to