Just to give an as complete view of my situation I am compiling what I have done and what my problem is, so maybe you have the most complete information.
What I have done is the following in two virtual machines, with 4 cores and 4gb ram each. Install Debian 7.8. Plain with no graphical interface. apt-get install openjdk-7-jdk openjdk-7-jre git maven curl git clone http://git-wip-us.apache.org/repos/asf/samza.git gradlew clean build As there was a bug in the Keyrocks testing script I just commented the code in the TestTTL script. wget http://apache.rediris.es/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz tar -xvf hadoop-2.6.0.tar.gz vi conf/yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>kfk-samza01</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>3</value> </property> </configuration> cp ./etc/hadoop/capacity-scheduler.xml conf vi $HADOOP_YARN_HOME/conf/core-site.xml <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.http.impl</name> <value>org.apache.samza.util.hadoop.HttpFileSystem</value> </property> </configuration> curl http://www.scala-lang.org/files/archive/scala-2.10.4.tgz > scala-2.10.4.tgz tar -xvf scala-2.10.4.tgz cp /tmp/scala-2.10.4/lib/scala-compiler.jar $HADOOP_YARN_HOME/share/hadoop/hdfs/lib cp /tmp/scala-2.10.4/lib/scala-library.jar $HADOOP_YARN_HOME/share/hadoop/hdfs/lib curl -L http://search.maven.org/remotecontent?filepath=org/clapper/grizzled-slf4j_2.10/1.0.1/grizzled- slf4j_2.10-1.0.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/grizzled-slf4j_2.10-1.0.1.jar curl -L http://search.maven.org/remotecontent?filepath=org/apache/samza/samza-yarn_2.10/0.9.1/samza- yarn_2.10-0.9.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/samza-yarn_2.10-0.9.1.jar curl -L http://search.maven.org/remotecontent?filepath=org/apache/samza/samza-core_2.10/0.9.1/samza- core_2.10-0.9.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/samza-core_2.10-0.9.1.jar cd /opt/hadoop-2.6.0/ scp -r . 192.168.15.94:/opt/hadoop-2.6.0 echo 192.168.15.92 >> conf/slaves echo 192.168.15.94 >> conf/slaves sbin/start-yarn.sh I have copied in the /opt/jobs/bin all the scrips in the /opt/samza/samza-shell/src/main/bash/ folder. I have generated an eclipse project with the samza dependencies included, via Maven, and no jobs, package it and copy to /opt/jobs/lib. I have generated an eclipse project with the samza dependencies included, via Maven, and three jobs that implement StreamTask and initiableTask. The functions are empty, for testing purposes. It is published in a folder published through apache web server. I have created the associated job options file in the /opt/job/dtan folder like this: task.class=flow.WorkFlow job.name=flow.WorkFlow job.factory.class=org.apache.samza.job.yarn.YarnJobFactory yarn.package.path=http://192.168.15.92/jobs/DataAnalyzer-0.0.1-bin.tar.gz systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181 systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093 systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909 task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory task.checkpoint.system=kafka task.inputs=kafka.flowtpc serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory systems.kafka.samza.msg.serde=string systems.kafka.streams.tracetpc.samza.msg.serde=json yarn.container.memory.mb=256 yarn.am.container.memory.mb=256 task.opts= -Xms128M -Xmx128M task.commit.ms=100 What I see: • If I launch the three jobs, Only one of them gets to running state. The one called Router. I it is always the same one. The others stay in Accepted until they are killed by the system. I have seen these error: o Container [pid=23007,containerID=container_1443454508386_0003_01_000001] is running beyond virtual memory limits. Current usage: 13.9 MB of 256 MB physical memory used; 1.1 GB of 537.6 MB virtual memory used. Killing container • When I kill the jobs with the kill-yarn-job.sh script the java process does not get killed. • Although I have set in the options that the job should be launched with -Xms128M -Xmx128M I see that it runs with -Xmx768M. I have even changed the run-class.sh script but it does not change. Some things that I am describing do not make sense for me, so I am lost on what to do or where to look. Thanks for your help, Jordi -----Mensaje original----- De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es] Enviado el: lunes, 28 de septiembre de 2015 11:26 Para: dev@samza.apache.org Asunto: RE: container is running beyond virtual memory limits I just changed the task options file to add the following line: task.opts=-Xmx128M And I found no change on the behaivour. I see that the job is being launched with the default -Xmx768M value: root 8296 8294 1 11:16 ? 00:00:05 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/tmp -Xmx768M -XX:+PrintGCDateStamps -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1-jar-with-dependencies.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar org.apache.samza.job.yarn.SamzaAppMaster How do I set the correct value? Thanks, Jordi -----Mensaje original----- De: Yi Pan [mailto:nickpa...@gmail.com] Enviado el: lunes, 28 de septiembre de 2015 10:56 Para: dev@samza.apache.org Asunto: Re: container is running beyond virtual memory limits Hi, Jordi, Please find the config variable task.opts in this table: http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html This allows you to add additional JVM opts when launching the containers. -Yi On Mon, Sep 28, 2015 at 1:48 AM, Jordi Blasi Uribarri <jbl...@nextel.es> wrote: > The three tasks have a similar options file, like this one. > > task.class=flow.OperationJob > job.name=flow.OperationJob > job.factory.class=org.apache.samza.job.yarn.YarnJobFactory > yarn.package.path=http://IP/javaapp.tar.gz > > > systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemF > actory > systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02: > 2181 > > systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01: > 9093,kfk-kafka02:9092,kfk-kafka02:9093 > > systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka > 01:9093,kfk-kafka02:9092,kfk-kafka02:909 > > > task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpo > intManagerFactory > task.checkpoint.system=kafka > task.inputs=kafka.operationtpc > > > serializers.registry.json.class=org.apache.samza.serializers.JsonSerde > Factory > > serializers.registry.string.class=org.apache.samza.serializers.StringS > erdeFactory > > systems.kafka.samza.msg.serde=string > systems.kafka.streams.tracetpc.samza.msg.serde=json > > yarn.container.memory.mb=256 > yarn.am.container.memory.mb=256 > > task.commit.ms=1000 > task.window.ms=60000 > > Where do I have to change the XMX parameter? > > Thanks. > > Jordi > > > -----Mensaje original----- > De: Yi Pan [mailto:nickpa...@gmail.com] Enviado el: lunes, 28 de > septiembre de 2015 10:39 > Para: dev@samza.apache.org > Asunto: Re: container is running beyond virtual memory limits > > Hi, Jordi, > > Can you post your task.opts settings as well? The Xms and Xmx JVM opts > will play a role here as well. The Xmx size should be set to less than > yarn.container.memory.mb. > > -Yi > > On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri > <jbl...@nextel.es> > wrote: > > > I am seeing that I can not get even a single job running. I have > > recovered the original configuration of yarn-site.xml and > > capacity-scheduler.xml and that does not work. I am thinking that > > maybe there is some kind of information related to old jobs that > > have not been correctly cleaned when killing them. Is there any > > place where I can look to remove temporary files or something similar? > > > > Thanks > > > > jordi > > > > -----Mensaje original----- > > De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es] Enviado el: > > martes, > > 22 de septiembre de 2015 10:06 > > Para: dev@samza.apache.org > > Asunto: container is running beyond virtual memory limits > > > > Hi, > > > > I am not really sure If this is related to any of the previous > > questions so I am asking it in a new message. I am running three > > different samza jobs that perform different actions and interchange > > information. As I found limits in the memory that were preventing > > the jobs to get from Accepted to Running I introduced some > > configurations in > Yarn, as suggested in this list: > > > > > > yarn-site.xml > > > > <configuration> > > <property> > > <name>yarn.scheduler.minimum-allocation-mb</name> > > <value>128</value> > > <description>Minimum limit of memory to allocate to each > > container request at the Resource Manager.</description> > > </property> > > <property> > > <name>yarn.scheduler.maximum-allocation-mb</name> > > <value>512</value> > > <description>Maximum limit of memory to allocate to each > > container request at the Resource Manager.</description> > > </property> > > <property> > > <name>yarn.scheduler.minimum-allocation-vcores</name> > > <value>1</value> > > <description>The minimum allocation for every container request > > at the RM, in terms of virtual CPU cores. Requests lower than this > > won't take effect, and the specified value will get allocated the > > minimum.</description> > > </property> > > <property> > > <name>yarn.scheduler.maximum-allocation-vcores</name> > > <value>2</value> > > <description>The maximum allocation for every container request > > at the RM, in terms of virtual CPU cores. Requests higher than this > > won't take effect, and will get capped to this value.</description> > > </property> > > <property> > > <name>yarn.resourcemanager.hostname</name> > > <value>kfk-samza01</value> > > </property> > > </configuration> > > > > capacity-scheduler.xml > > Alter value > > <property> > > <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> > > <value>0.5</value> > > <description> > > Maximum percent of resources in the cluster which can be used > > to > run > > application masters i.e. controls number of concurrent running > > applications. > > </description> > > </property> > > > > The jobs are configured to reduce the memory usage: > > > > yarn.container.memory.mb=256 > > yarn.am.container.memory.mb=256 > > > > After introducing these changes I experienced a very appreciable > > reduction of the speed. It seemed normal as the memory assigned to > > the jobs was lowered and there were more running. It was running > > until yesterday but today I am seeing that > > > > What I have seen today is that they are not moving from ACCEPTED to > > RUNNING. I have found the following in the log (full log at the end): > > > > 2015-09-22 09:54:36,661 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) > > - Memory usage of ProcessTree 10346 for container-id > > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical > > memory used; 1.2 GB of 537.6 MB virtual memory used > > > > I am not sure where that 1.2 Gb comes from and makes the processes dye. > > > > Thanks, > > > > Jordi > > > > > > > > > > 2015-09-22 09:54:36,519 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) > > - Removed ProcessTree with root 10271 > > 2015-09-22 09:54:36,519 INFO [AsyncDispatcher event handler] > > container.Container (ContainerImpl.java:handle(999)) - Container > > container_1442908447829_0002_01_000001 transitioned from RUNNING to > > KILLING > > 2015-09-22 09:54:36,533 INFO [AsyncDispatcher event handler] > > launcher.ContainerLaunch > > (ContainerLaunch.java:cleanupContainer(370)) > > - Cleaning up container container_1442908447829_0002_01_000001 > > 2015-09-22 09:54:36,661 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) > > - Memory usage of ProcessTree 10346 for container-id > > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical > > memory used; 1.2 GB of 537.6 MB virtual memory used > > 2015-09-22 09:54:36,661 WARN [Container Monitor] > > monitor.ContainersMonitorImpl > > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process > > tree for > > container: container_1442908447829_0001_01_000001 running over twice > > the configured limit. Limit=563714432, current usage = 1269743616 > > 2015-09-22 09:54:36,662 WARN [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) > > - Container > > [pid=10346,containerID=container_1442908447829_0001_01_000001] is > > running beyond virtual memory limits. Current usage: 70.0 MB of 256 > > MB physical memory used; 1.2 GB of 537.6 MB virtual memory used. > > Killing > container. > > Dump of the process-tree for container_1442908447829_0001_01_000001 : > > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > > |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 > > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server > > -Dsamza.container.name=samza-application-master > > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_14429084 > > 47 > > 829_0001/container_1442908447829_0001_01_000001 > > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcac > > he > > /application_1442908447829_0001/container_1442908447829_0001_01_0000 > > 01 > > /__package/tmp > > -Xmx768M -XX:+PrintGCDateStamps > > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_00 > > 01 /container_1442908447829_0001_01_000001/gc.log > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 > > -XX:GCLogFileSize=10241024 -d64 -cp > > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ > > ap > > pcache/application_1442908447829_0001/container_1442908447829_0001_0 > > 1_ > > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/ > > nm > > -local-dir/usercache/root/appcache/application_1442908447829_0001/co > > nt > > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0. > > ja > > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_ > > 14 > > 42908447829_0001/container_1442908447829_0001_01_000001/__package/li > > b/ > > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/r > > oo > > t/appcache/application_1442908447829_0001/container_1442908447829_00 > > 01 > > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/had > > oo > > p-root/nm-local-dir/usercache/root/appcache/application_144290844782 > > 9_ > > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-ja > > xr > > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/ro > > ot > > /appcache/application_1442908447829_0001/container_1442908447829_000 > > 1_ > > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/t > > mp > > /hadoop-root/nm-local-dir/usercache/root/appcache/application_144290 > > 84 > > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtB > > ro > > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ > > ap > > plication_1442908447829_0001/container_1442908447829_0001_01_000001/ > > __ package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar > > org.apache.samza.job.yarn.SamzaAppMaster > > > > 2015-09-22 09:54:36,663 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) > > - Removed ProcessTree with root 10346 > > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] > > container.Container (ContainerImpl.java:handle(999)) - Container > > container_1442908447829_0001_01_000001 transitioned from RUNNING to > > KILLING > > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] > > launcher.ContainerLaunch > > (ContainerLaunch.java:cleanupContainer(370)) > > - Cleaning up container container_1442908447829_0001_01_000001 > > ________________________________ > > Jordi Blasi Uribarri > > Área I+D+i > > > > jbl...@nextel.es > > Oficina Bilbao > > > > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png] > > ________________________________ > > Jordi Blasi Uribarri > > >