I am trying to deploy a Flink cluster via Mesos following https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/mesos/ (I know Mesos support has been deprecated, and I am planning to migrate my deployment tools to Kubernetes, but for now I am stuck using Mesos). To deploy, I am using a custom Docker image that contains both Flink and my user binaries. The command I am using to start the cluster is
/opt/flink/bin/mesos-appmaster.sh \ -Djobmanager.rpc.address=$HOST \ -Dmesos.resourcemanager.framework.user=flink \ -Dmesos.resourcemanager.framework.name=timeline-flink-populator \ -Dmesos.master=10.0.25.139:5050 \ -Dmesos.resourcemanager.tasks.cpus=4 \ -Dmesos.resourcemanager.tasks.container.type=docker \ -Dmesos.resourcemanager.tasks.container.image.name= docker.strava.com/strava/flink:jv-mesos \ -Dtaskmanager.numberOfTaskSlots=4 ; mesos-appmaster.sh is able to start a Mesos framework and a Flink job manager, but fails to start task managers. Looking in the Mesos syslog I see that the Mesos framework was sending offers that were being declined very quickly, and the agents ended in LOST state. I am attaching all the relevant lines in the syslog. Any ideas what the problem could be or what else I could check to see what is happening? Thanks, Javier Vegas
syslog
Description: Binary data