Hi, Jordi,

Can you post your task.opts settings as well? The Xms and Xmx JVM opts will
play a role here as well. The Xmx size should be set to less than
yarn.container.memory.mb.

-Yi

On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <jbl...@nextel.es>
wrote:

> I am seeing that I can not get even a single job running. I have recovered
> the original configuration of yarn-site.xml and capacity-scheduler.xml and
> that does not work. I am thinking that maybe there is some kind of
> information related to old jobs that have not been correctly cleaned when
> killing them. Is there any place where I can look to remove temporary files
> or something similar?
>
> Thanks
>
>         jordi
>
> -----Mensaje original-----
> De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es]
> Enviado el: martes, 22 de septiembre de 2015 10:06
> Para: dev@samza.apache.org
> Asunto: container is running beyond virtual memory limits
>
> Hi,
>
> I am not really sure If this is related to any of the previous questions
> so I am asking it in a new message. I am running three different samza jobs
> that perform different actions and interchange information. As I found
> limits in the memory that were preventing the jobs to get from Accepted to
> Running I introduced some configurations in Yarn, as suggested in this list:
>
>
> yarn-site.xml
>
> <configuration>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>     <value>128</value>
>     <description>Minimum limit of memory to allocate to each container
> request at the Resource Manager.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>     <value>512</value>
>     <description>Maximum limit of memory to allocate to each container
> request at the Resource Manager.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-vcores</name>
>     <value>1</value>
>     <description>The minimum allocation for every container request at the
> RM, in terms of virtual CPU cores. Requests lower than this won't take
> effect, and the specified value will get allocated the
> minimum.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-vcores</name>
>     <value>2</value>
>     <description>The maximum allocation for every container request at the
> RM, in terms of virtual CPU cores. Requests higher than this won't take
> effect, and will get capped to this value.</description>
>   </property>
> <property>
> <name>yarn.resourcemanager.hostname</name>
> <value>kfk-samza01</value>
> </property>
> </configuration>
>
> capacity-scheduler.xml
> Alter value
>     <property>
>     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
>     <value>0.5</value>
>     <description>
>       Maximum percent of resources in the cluster which can be used to run
>       application masters i.e. controls number of concurrent running
>       applications.
>     </description>
>   </property>
>
> The jobs are configured to reduce the memory usage:
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> After introducing these changes I experienced a very appreciable reduction
> of the speed. It seemed normal as the memory assigned to the jobs  was
> lowered and there were more running.  It was running until yesterday but
> today I am seeing that
>
> What I have seen today is that they are not moving from ACCEPTED to
> RUNNING. I have found the following in the log (full log at the end):
>
> 2015-09-22 09:54:36,661 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory
> used; 1.2 GB of 537.6 MB virtual memory used
>
> I am not sure where that 1.2 Gb comes from and makes the processes dye.
>
> Thanks,
>
>    Jordi
>
>
>
>
> 2015-09-22 09:54:36,519 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> Removed ProcessTree with root 10271
> 2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0002_01_000001 transitioned from RUNNING to KILLING
> 2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler]
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) -
> Cleaning up container container_1442908447829_0002_01_000001
> 2015-09-22 09:54:36,661 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory
> used; 1.2 GB of 537.6 MB virtual memory used
> 2015-09-22 09:54:36,661 WARN  [Container Monitor]
> monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process tree for
> container: container_1442908447829_0001_01_000001 running over twice the
> configured limit. Limit=563714432, current usage = 1269743616
> 2015-09-22 09:54:36,662 WARN  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) -
> Container [pid=10346,containerID=container_1442908447829_0001_01_000001] is
> running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB
> physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1442908447829_0001_01_000001 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908
> /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server 
> -Dsamza.container.name=samza-application-master
> -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001
> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/tmp
> -Xmx768M -XX:+PrintGCDateStamps
> -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001/gc.log
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10241024 -d64 -cp
> /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> org.apache.samza.job.yarn.SamzaAppMaster
>
> 2015-09-22 09:54:36,663 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> Removed ProcessTree with root 10346
> 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0001_01_000001 transitioned from RUNNING to KILLING
> 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler]
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) -
> Cleaning up container container_1442908447829_0001_01_000001
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> jbl...@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> ________________________________
> Jordi Blasi Uribarri
>

Reply via email to