Hi: As I said the docker process and job manager process are the same one. To start task manager in docker, you need to specify in the job master config "mesos.resourcemanager.tasks.container.type" to "docker", otherwise flink will just start task manager as processes.
I don't understand what do you mean that you can't access vm's filesystem. On Tue, Jul 31, 2018 at 2:25 AM NEKRASSOV, ALEXEI <an4...@att.com> wrote: > Renjie, > > > > In my observation Task Managers don’t run in Docker containers – they run > as JVM processes directly on the VM. > > The only Docker container is the one that runs Job Manager. > > > > What am I missing? > > > > Thanks, > > Alex > > > > *From:* Renjie Liu [mailto:liurenjie2...@gmail.com] > *Sent:* Friday, July 20, 2018 8:56 PM > *To:* Till Rohrmann <trohrm...@apache.org> > *Cc:* NEKRASSOV, ALEXEI <an4...@att.com>; Fabian Hueske <fhue...@gmail.com>; > user <user@flink.apache.org> > > > *Subject:* Re: Flink on Mesos: containers question > > > > Hi, Alexei: > > > > What you paste is expected behavior. Jobmanager, two task managers each > should run in a docker instance. > > > > 13276 is should be the process of job manager, and it's the same process > as 789. They have different processes id because in show them in > different namesapces(that's a concept in cgroup, which docker actually > dependens on). > > > > On Thu, Jul 19, 2018 at 10:00 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > Hi Alexei, > > > > I actually never used Mesos with container images. I always used it in a > way where the Mesos task directly starts the Java process. > > > > Cheers, > > Till > > > > On Thu, Jul 19, 2018 at 2:44 PM NEKRASSOV, ALEXEI <an4...@att.com> wrote: > > Till, > > > > Any insight into how Flink components are containerized in Mesos? > > > > Thanks! > > Alex > > > > *From:* Fabian Hueske [mailto:fhue...@gmail.com] > *Sent:* Monday, July 16, 2018 7:57 AM > *To:* NEKRASSOV, ALEXEI <an4...@att.com> > *Cc:* user@flink.apache.org; Till Rohrmann <trohrm...@apache.org> > *Subject:* Re: Flink on Mesos: containers question > > > > Hi Alexei, > > > > Till (in CC) is familiar with Flink's Mesos support in 1.4.x. > > > > Best, Fabian > > > > 2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <an4...@att.com>: > > Can someone please clarify how Flink on Mesos in containerized? > > > > On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. > Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM. > > On that VM I see one Docker container running a process that seems to be > Mesos App Master: > > > > $ docker ps -a > > CONTAINER ID IMAGE > COMMAND CREATED STATUS > PORTS NAMES > > 97b6840466c0 mesosphere/dcos-flink:1.4.2-1.0 "/bin/sh -c > /sbin/..." 41 hours ago Up 41 hours > mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197 > > $ docker exec 97b6840466c0 /bin/ps -efww > > UID PID PPID C STIME TTY TIME CMD > > root 1 0 0 Jul11 ? 00:00:00 /bin/sh -c /sbin/init.sh > > root 7 1 0 Jul11 ? 00:00:02 runsvdir -P /etc/service > > root 8 7 0 Jul11 ? 00:00:00 runsv flink > > root 629 0 0 Jul12 pts/0 00:00:00 /bin/bash > > root 789 8 1 Jul12 ? 00:09:16 > /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath > /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: > -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log > -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties > -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml > org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner > -Dblob.server.port=23170 -Djobmanager.heap.mb=256 > -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 > -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 > -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* > -Dsecurity.kerberos.login.use-ticket-cache=true > > root 1027 0 0 12:54 ? 00:00:00 /bin/ps -efww > > > > Then on the VM itself I see another process with the same command line as > the one in the container: > > > > root 13276 9689 1 Jul12 ? 00:09:18 > /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink > -1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink > -shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink > -1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: > -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log > -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties > -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache. > flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner > -Dblob.server.port=23170 -Djobmanager.heap.mb=256 > -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 > -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 > -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* > -Dsecurity.kerberos.login.use-ticket-cache=true > > > > And I see two processes on the VM that seem to be related to Task Managers: > > > > root 13688 13687 0 Jul12 ? 00:04:25 > /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath > /mnt/mesos/sandbox/flink/lib/flink > -python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink > -shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink > /lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink > /lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: > -Dlog.file=flink-taskmanager.log > -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties > -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager > -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 > -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost > -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true > -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 > -Dtaskmanager.maxRegistrationDuration=5 minutes > -Dtaskmanager.data.port=1028 -Dparallelism.default=1 > -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=* > > root 13892 13891 0 Jul12 ? 00:04:15 > /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath > /mnt/mesos/sandbox/flink/lib/flink > -python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink > -shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink > /lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink > /lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: > -Dlog.file=flink-taskmanager.log > -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties > -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager > -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 > -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost > -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true > -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 > -Dtaskmanager.maxRegistrationDuration=5 minutes > -Dtaskmanager.data.port=1026 -Dparallelism.default=1 > -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=* > > > > But I don’t see any containers for Task Managers. > > > > I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), > but my code executed in Task Managers have no access to VM’s filesystem. > > > > It is almost like there are more containers running than “docker ps” is > showing me. Can someone clarify? > > Also, what is the relationship between PID 13276 and the process that I > see in the container (the two processes with the same command line)? > > > > Thanks! > > Alex > > > > -- > > Liu, Renjie > > Software Engineer, MVAD > -- Liu, Renjie Software Engineer, MVAD