Hey Malcolm, **YARN**
*yarn.nodemanager.resource.memory-mb (Amount of physical memory, in MB, that can be allocated for containers)* The value for this depends on if there are any other side-car applications on the machine that the node-manager runs on. eg. on your 32GB machine - if other apps on the machine take 4G at peak to properly function, set this value to 28G. *yarn.nodemanager.resource.cpu-vcores (Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of physical cores used by YARN containers.)* vCores are used to slice up your physical CPUs into allocatable units to each container. eg: with your 4 CPUs (hyperthreaded ?) if you set this value to "8" - using `cluster-manager.container.cpu.cores` set to `2` will guarantee al-least 1 physical CPU to each container. If you run on a heterogeneous cluster (VMs/hosts with different SKUs) I would recommend setting the following to : yarn.nodemanager.resource.cpu-vcores: = -1 yarn.nodemanager.resource.pcores-vcores-multiplier = 2 yarn.nodemanager.resource.detect-hardware-capabilities = true *yarn.nodemanager.resource.percentage-physical-cpu-limit (Percentage of CPU that can be allocated for containers. This setting allows users to limit the amount of CPU that YARN containers use. The default is to use 100% of CPU)* This is similar to yarn.nodemanager.resource.memory-mb above but WRT to CPU usage instead of memory. **SAMZA** I would recommend looking at https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html to find more information for each of these configs, but I briefly summarized them below. *cluster-manager.container.memory.mb (How much memory, in megabytes, to request from the resource manager per container of your job)* The value for this depends on your workload. *cluster-manager.container.cpu.cores (The number of CPU cores to request per container of your job. Each node in the cluster has a certain number of CPU cores available, so this number (along with cluster-manager.container.memory.mb) determines how many containers can be run on one machine)* With YARN - cores are a proxy for a vCore number that your container requires to process data. The value for this depends on your workload. *yarn.am.container.memory.mb (How much memory, in megabytes, to request from the AM container of your job)* This is usually a constant as the AM doesn't have an actual data workload. It should be more than safe to set this to 2048 (1024 should be fine in most cases). *task.opts (The JVM flags that you want to pass on to the processing containers)* eg flags: -Xmx, -Xms, -XX:+HeapDumpOnOutOfMemoryError etc *yarn.am.opts (The JVM flags that you want to pass on to the AM container)* Similar to task.opts *job.container.count (Number of containers you want to use to run the job)* Value depends on your workload *job.container.thread.pool.size (Number of threads in the container thread-pool that will be used to run synchronous operations of each task in parallel)* Value depends on your workload and if you are using StreamTask. On another, similar theme, has anybody tried running Samza on Hadoop 2.8.5? > I'm experimenting with it right now, and can't get it to recognize the CPU > core configuration. I'm curious if anybody knows about an API change > between 2.7.x and 2.8.x in how applications are requested. I'm sorry but I don't fully understand what you mean, can you provide the stacktrace or describe the error you see in more detail ? Thanks, Abhishek On Mon, Feb 24, 2020 at 7:05 PM Malcolm McFarland <mmcfarl...@cavulus.com> wrote: > On another, similar theme, has anybody tried running Samza on Hadoop 2.8.5? > I'm experimenting with it right now, and can't get it to recognize the CPU > core configuration. I'm curious if anybody knows about an API change > between 2.7.x and 2.8.x in how applications are requested. > > What would the effect be on a container that was only allowed one CPU core? > Would it be ok to trade that off for more containers? > > Cheers, > Malcolm McFarland > Cavulus > > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any > unauthorized or improper disclosure, copying, distribution, or use of the > contents of this message is prohibited. The information contained in this > message is intended only for the personal and confidential use of the > recipient(s) named above. If you have received this message in error, > please notify the sender immediately and delete the original message. > > > On Sun, Feb 23, 2020 at 11:56 AM Malcolm McFarland <mmcfarl...@cavulus.com > > > wrote: > > > Hey folks, > > > > Does anybody have recommendations for resource allocation configs when > > running Samza on YARN? Ie, for a box that has 32GB of memory and 4 CPUs > -- > > and let's say we're running a Samza task with 1000 partitions -- any > > suggestions on what to set for: > > > > *YARN* > > yarn.nodemanager.resource.memory-mb > > yarn.nodemanager.resource.cpu-vcores > > yarn.nodemanager.resource.percentage-physical-cpu-limit > > > > *SAMZA* > > cluster-manager.container.memory.mb > > cluster-manager.container.cpu.cores > > yarn.am.container.memory.mb > > task.opts > > yarn.am.opts > > job.container.count > > job.container.thread.pool.size > > > > Also, do you recommend scaling up in box YARN node processing capability, > > or out in YARN node count? > > > > Thanks, > > Malcolm McFarland > > Cavulus > > > > > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any > > unauthorized or improper disclosure, copying, distribution, or use of the > > contents of this message is prohibited. The information contained in this > > message is intended only for the personal and confidential use of the > > recipient(s) named above. If you have received this message in error, > > please notify the sender immediately and delete the original message. > > >