Hi Alexei, I actually never used Mesos with container images. I always used it in a way where the Mesos task directly starts the Java process.
Cheers, Till On Thu, Jul 19, 2018 at 2:44 PM NEKRASSOV, ALEXEI <an4...@att.com> wrote: > Till, > > > > Any insight into how Flink components are containerized in Mesos? > > > > Thanks! > > Alex > > > > *From:* Fabian Hueske [mailto:fhue...@gmail.com] > *Sent:* Monday, July 16, 2018 7:57 AM > *To:* NEKRASSOV, ALEXEI <an4...@att.com> > *Cc:* user@flink.apache.org; Till Rohrmann <trohrm...@apache.org> > *Subject:* Re: Flink on Mesos: containers question > > > > Hi Alexei, > > > > Till (in CC) is familiar with Flink's Mesos support in 1.4.x. > > > > Best, Fabian > > > > 2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <an4...@att.com>: > > Can someone please clarify how Flink on Mesos in containerized? > > > > On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. > Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM. > > On that VM I see one Docker container running a process that seems to be > Mesos App Master: > > > > $ docker ps -a > > CONTAINER ID IMAGE > COMMAND CREATED STATUS > PORTS NAMES > > 97b6840466c0 mesosphere/dcos-flink:1.4.2-1.0 "/bin/sh -c > /sbin/..." 41 hours ago Up 41 hours > mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197 > > $ docker exec 97b6840466c0 /bin/ps -efww > > UID PID PPID C STIME TTY TIME CMD > > root 1 0 0 Jul11 ? 00:00:00 /bin/sh -c /sbin/init.sh > > root 7 1 0 Jul11 ? 00:00:02 runsvdir -P /etc/service > > root 8 7 0 Jul11 ? 00:00:00 runsv flink > > root 629 0 0 Jul12 pts/0 00:00:00 /bin/bash > > root 789 8 1 Jul12 ? 00:09:16 > /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath > /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: > -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log > -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties > -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml > org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner > -Dblob.server.port=23170 -Djobmanager.heap.mb=256 > -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 > -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 > -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* > -Dsecurity.kerberos.login.use-ticket-cache=true > > root 1027 0 0 12:54 ? 00:00:00 /bin/ps -efww > > > > Then on the VM itself I see another process with the same command line as > the one in the container: > > > > root 13276 9689 1 Jul12 ? 00:09:18 > /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink > -1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink > -shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink > -1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: > -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log > -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties > -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache. > flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner > -Dblob.server.port=23170 -Djobmanager.heap.mb=256 > -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 > -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 > -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* > -Dsecurity.kerberos.login.use-ticket-cache=true > > > > And I see two processes on the VM that seem to be related to Task Managers: > > > > root 13688 13687 0 Jul12 ? 00:04:25 > /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath > /mnt/mesos/sandbox/flink/lib/flink > -python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink > -shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink > /lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink > /lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: > -Dlog.file=flink-taskmanager.log > -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties > -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager > -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 > -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost > -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true > -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 > -Dtaskmanager.maxRegistrationDuration=5 minutes > -Dtaskmanager.data.port=1028 -Dparallelism.default=1 > -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=* > > root 13892 13891 0 Jul12 ? 00:04:15 > /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath > /mnt/mesos/sandbox/flink/lib/flink > -python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink > -shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink > /lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink > /lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: > -Dlog.file=flink-taskmanager.log > -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties > -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager > -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 > -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost > -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true > -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true > -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 > -Dmesos.resourcemanager.tasks.cpus=2 > -Dtaskmanager.maxRegistrationDuration=5 minutes > -Dtaskmanager.data.port=1026 -Dparallelism.default=1 > -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 > -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=* > > > > But I don’t see any containers for Task Managers. > > > > I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), > but my code executed in Task Managers have no access to VM’s filesystem. > > > > It is almost like there are more containers running than “docker ps” is > showing me. Can someone clarify? > > Also, what is the relationship between PID 13276 and the process that I > see in the container (the two processes with the same command line)? > > > > Thanks! > > Alex > > >