As in Hadoop 2.5.1 of MapR 4.1.0, virtual memory checker is disabled while
physical memory checker is enabled by default.
Since on Centos/RHEL 6 there are aggressive allocation of virtual memory
due to OS behavior, you should disable virtual memory checker or increase
yarn.nodemanager.vmem-pmem-ra
On 11 Mar 2016, at 23:01, Alexander Pivovarov
mailto:apivova...@gmail.com>> wrote:
Forgot to mention. To avoid unnecessary container termination add the following
setting to yarn
yarn.nodemanager.vmem-check-enabled = false
That can kill performance on a shared cluster: if your container code
you need to set
yarn.scheduler.minimum-allocation-mb=32
otherwise Spark AM container will be running on dedicated box instead of
running together with the executor container on one of the boxes
for slaves I use Amazon EC2 r3.2xlarge box (61GB / 8 cores) - cost ~$0.10 /
hour (spot instance)
On
Thanks Koert and Alexander
I think the yarn configuration parameters in yarn-site,xml are important.
For those I have
yarn.nodemanager.resource.memory-mb
Amount of max physical memory, in MB, that can be allocated
for YARN containers.
8192
yarn.nodemanager.vmem-pmem-ratio
Ratio b
Forgot to mention. To avoid unnecessary container termination add the
following setting to yarn
yarn.nodemanager.vmem-check-enabled = false
YARN cores are virtual cores which are used just to calculate available
resources. But usually memory is used to manage yarn resources (not cores)
spark executor memory should be ~90% of yarn.scheduler.maximum-allocation-mb
(which should be the same as yarn.nodemanager.resource.memory-mb)
~10% sh
you get a spark executor per yarn container. the spark executor can have
multiple cores, yes. this is configurable. so the number of partitions that
can be processed in parallel is num-executors * executor-cores. and for
processing a partition the available memory is executor-memory /
executor-core