I am trying to deploy a Flink cluster via Mesos following
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/mesos/
(I know Mesos support has been deprecated, and I am planning to migrate my
deployment tools to Kubernetes, but for now I am stuck using Mesos). To
deploy, I am using a custom Docker image that contains both Flink and my
user binaries. The command I am using to start the cluster is

/opt/flink/bin/mesos-appmaster.sh \
      -Djobmanager.rpc.address=$HOST \
      -Dmesos.resourcemanager.framework.user=flink \
      -Dmesos.resourcemanager.framework.name=timeline-flink-populator \
      -Dmesos.master=10.0.25.139:5050 \
      -Dmesos.resourcemanager.tasks.cpus=4 \
      -Dmesos.resourcemanager.tasks.container.type=docker \
      -Dmesos.resourcemanager.tasks.container.image.name=
docker.strava.com/strava/flink:jv-mesos \
      -Dtaskmanager.numberOfTaskSlots=4 ;

mesos-appmaster.sh is able to start a Mesos framework and a Flink job
manager, but fails to start task managers. Looking in the Mesos syslog I
see that the Mesos framework was sending offers that were being declined
very quickly, and the agents ended in LOST state. I am attaching all the
relevant lines in the syslog.

Any ideas what the problem could be or what else I could check to see what
is happening?

Thanks,

Javier Vegas

Attachment: syslog
Description: Binary data

Reply via email to