Thanks, Roman!

Looking at the log, seems that the TaskManager can resolve $HOSTNAME to its
own hostname (07a6b681ee0f), as seen in these lines:

2021-09-27 22:02:41.067 [main] INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint  -
-Djobmanager.rpc.address=*07a6b681ee0f*

2021-09-27 22:02:43.025 [main] INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint  - Rest endpoint
listening at *07a6b681ee0f*:8081

2021-09-27 22:02:43.025 [main] INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint  - http://
*07a6b681ee0f*:8081 was granted leadership with
leaderSessionID=00000000-0000-0000-0000-000000000000

2021-09-27 22:02:43.026 [main] INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint  - Web frontend
listening at http://*07a6b681ee0f*:8081.


I am deploying to Mesos with Marathon, so I have no way other than
$HOSTNAME to indicate the host that will execute mesos-appmaster.sh

The environment variables are set, this is what I can see if I hop into the
Docker container:

root@07a6b681ee0f:/opt/flink# echo $HADOOP_CLASSPATH

/opt/flink/hadoop-3.2.2/etc/hadoop:/opt/flink/hadoop-3.2.2/share/hadoop/common/lib/*:/opt/flink/hadoop-3.2.2/share/hadoop/common/*:/opt/flink/hadoop-3.2.2/share/hadoop/hdfs:/opt/flink/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/opt/flink/hadoop-3.2.2/share/hadoop/hdfs/*:/opt/flink/hadoop-3.2.2/share/hadoop/mapreduce/lib/*:/opt/flink/hadoop-3.2.2/share/hadoop/mapreduce/*:/opt/flink/hadoop-3.2.2/share/hadoop/yarn:/opt/flink/hadoop-3.2.2/share/hadoop/yarn/lib/*:/opt/flink/hadoop-3.2.2/share/hadoop/yarn/*:/opt/flink/lib


root@07a6b681ee0f:/opt/flink# echo $MESOS_NATIVE_JAVA_LIBRARY

/usr/lib/libmesos.so




On Tue, Sep 28, 2021 at 5:45 AM Roman Khachatryan <ro...@apache.org> wrote:

> Hi,
>
> No additional ports need to be open as far as I know.
>
> Probably, $HOSTNAME is substituted for something not resolvable on TMs?
>
> Please also make sure that the following gets executed before
> mesos-appmaster.sh:
> export HADOOP_CLASSPATH=$(hadoop classpath)
> export MESOS_NATIVE_JAVA_LIBRARY=/path/to/lib/libmesos.so
> (as per the documentation you linked)
>
> Regards,
> Roman
>
> On Mon, Sep 27, 2021 at 7:38 PM Javier Vegas <jve...@strava.com> wrote:
> >
> > I am trying to start Flink 1.13.2 on Mesos following the instrucions in
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/mesos/
> and using Marathon to deploy a Docker image with both the Flink and my
> binaries.
> >
> > My entrypoint for the Docker image is:
> >
> >
> > /opt/flink/bin/mesos-appmaster.sh \
> >
> >       -Djobmanager.rpc.address=$HOSTNAME \
> >
> >       -Dmesos.resourcemanager.framework.user=flink \
> >
> >       -Dmesos.master=10.0.18.246:5050 \
> >
> >       -Dmesos.resourcemanager.tasks.cpus=6
> >
> >
> >
> > When mesos-appmaster.sh starts, in the stderr I see this:
> >
> >
> > I0927 16:50:32.306691 801308 exec.cpp:164] Version: 1.7.3
> >
> > I0927 16:50:32.310277 801345 exec.cpp:238] Executor registered on agent
> f671d9ee-57f6-4f92-b1b2-3137676f6cdf-S6090
> >
> > I0927 16:50:32.311120 801355 executor.cpp:130] Registered docker
> executor on 10.0.20.177
> >
> > I0927 16:50:32.311394 801345 executor.cpp:186] Starting task
> tl_flink_prod.fb215c64-1fb2-11ec-9ce6-aaa2e9cb6ba0
> >
> > WARNING: Your kernel does not support swap limit capabilities or the
> cgroup is not mounted. Memory limited without swap.
> >
> > WARNING: An illegal reflective access operation has occurred
> >
> > WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar) to method
> sun.security.krb5.Config.getInstance()
> >
> > WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >
> > WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> >
> > WARNING: All illegal access operations will be denied in a future release
> >
> > I0927 16:50:43.622053   237 sched.cpp:232] Version: 1.7.3
> >
> > I0927 16:50:43.624439   328 sched.cpp:336] New master detected at
> master@10.0.18.246:5050
> >
> > I0927 16:50:43.624779   328 sched.cpp:356] No credentials provided.
> Attempting to register without authentication
> >
> >
> > where the "New master detected" line is promising.
> >
> > However, on the Flink UI I see only the jobmanager started, and there
> are no task managers.  Getting into the Docker container, I see this in the
> log:
> >
> > WARN  org.apache.flink.mesos.scheduler.ConnectionMonitor  - Unable to
> connect to Mesos; still trying...
> >
> >
> > I have verified that from the container I can access the Mesos container
> 10.0.18.246:5050
> >
> >
> > Does any other port besides the web UI port 5050 need to be open for
> mesos-appmaster to connect with the Mesos master?
> >
> >
> > In the appmaster log (attached) I see one exception that I don't know if
> they are related to the Mesos connection problem, one is
> >
> >
> > java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
> >
> >         at
> org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:448)
> >
> >         at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:419)
> >
> >         at org.apache.hadoop.util.Shell.<clinit>(Shell.java:496)
> >
> >         at
> org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
> >
> >         at
> org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1555)
> >
> >         at
> org.apache.hadoop.security.SecurityUtil.getLogSlowLookupsEnabled(SecurityUtil.java:497)
> >
> >         at
> org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:90)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:289)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:277)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:833)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
> >
> >         at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >
> >         at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
> Source)
> >
> >         at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
> >
> >         at java.base/java.lang.reflect.Method.invoke(Unknown Source)
> >
> >         at
> org.apache.flink.runtime.util.EnvironmentInformation.getHadoopUser(EnvironmentInformation.java:215)
> >
> >         at
> org.apache.flink.runtime.util.EnvironmentInformation.logEnvironmentInfo(EnvironmentInformation.java:432)
> >
> >         at
> org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:95)
> >
> >
> >
> >
> > I am not trying (yet) to run in high availability mode, so I am not sure
> if I need to have HADOOP_HOME set or not, but I don't see anything about
> HADOOP_HOME in the FLink docs.
> >
> >
> >
> > Any tips on how I can fix my Docker+Marathon+Mesos environment so Flink
> can connect to my Mesos master?
> >
> >
> > Thanks,
> >
> >
> > Javier Vegas
> >
> >
>

Reply via email to