+1 from our end as well. At Criteo, we are running some Flink jobs on Mesos in 
production to compute short term features for machine learning. We’d love to 
help out and contribute on this initiative.

Thanks,
-- Piyush


From: Till Rohrmann <trohrm...@apache.org>
Date: Friday, December 6, 2019 at 8:10 AM
To: dev <d...@flink.apache.org>
Cc: user <user@flink.apache.org>
Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Big +1 for adding a fully working e2e test for Flink's Mesos integration. 
Ideally we would have it ready for the 1.10 release. The lack of such a test 
has bitten us already multiple times.

In general I would prefer to use the official image if possible since it frees 
us from maintaining our own custom image. Since Java 9 is no longer officially 
supported as we opted for supporting Java 11 (LTS) it might not be feasible, 
though. How much longer would building the custom image vs. downloading the 
custom image from DockerHub be? Maybe it is ok to build the image locally. Then 
we would not have to maintain the image.

Cheers,
Till

On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo 
<karma...@gmail.com<mailto:karma...@gmail.com>> wrote:
Hi, all,

Currently, there is no end to end test or IT case for Mesos deployment
while the common deployment related developing would inevitably touch
the logic of this component. Thus, some work needs to be done to
guarantee experience for both Meos users and contributors. After
offline discussion with Till and Xintong, we have some basic ideas and
would like to start a discussion thread on adding end to end tests for
Flink's Mesos integration.

As a first step, we would like to keep the scope of this contribution
to be relative small. This may also help us to quickly get some basic
test cases that might be helpful for the upcoming 1.10 release.

As far as we can think of, what needs to be done is to setup a Mesos
framework during the testing and determine which tests need to be
included.


** Regarding the Mesos framework, after trying out several approaches,
I find that setting up Mesos in docker is probably what we want. The
resources needed for building and setting up Mesos from source is
probably not affordable in most of the scenarios. So, the one open
question that worth discussion is the choice of Docker image. We have
come up with two options.

- Using official Mesos image[1]
The official image was the first alternative that come to our mind,
but we run into some sort of Java version compatibility problem that
leads to failures of launching task executors. Flink supports Java 9
since version 1.9.0 [2], However, the official Docker image of Mesos
is built with a development version of JDK 9, which probably has
caused this problem. Unless we want to make Flink to also be
compatible with the JDK development version used by the official mesos
image, this option does not work out. Besides, according to the
official roadmap[5], Java 9 is not a long-term support version, which
may bring stability risk in future.

- Build a custom image
I've already tried build a custom image[3] and successfully run most
of the existing end to end tests cases with it. The image is built
with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
framework, we could either build the image from a Docker file or pull
the pre-built image from DockerHub (or other hub services) during the
testing.
If we decide to publish the an image on DockerHub, we probably need a
Flink official  repository/account to hold it.


** Regarding the test coverage, we think the following three tests
could be a good starting point that covers a very essential set of
behaviors for Mesos deployment.
- Wordcount end-to-end test. For verifying the basic process of Mesos
deployment.
- Multiple submissions of the same job. For preventing resource
management problems on Mesos, such as [4]
- State TTL RocksDb backend end-to-end test. For verifying memory
configuration behaviors, since Mesos has it’s own config options and
logics.

Unfortunately, neither of us who participated the initial offline
discussion has much experience for running flink on mesos in
production. It would be good that users and experts who actually use
flink on mesos can join the discussion and provide some feedbacks. Any
feedback, idea, suggestion, concern and question will be welcomed and
appreciated.


BTW, we would like to raise a survey on the usages of Flink on Mesos
in the community. For the Flink on Mesos users, we would like to
learn:
- Which version of Mesos do you use and what setups (such as Marathon)
do you need for Mesos
- Is it Flink job cluster or session cluster that  is majorly used
- How is the scale of the Flink / Mesos cluster


[1]https://hub.docker.com/r/mesosphere/mesos
[2]https://issues.apache.org/jira/browse/FLINK-11307
[3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
[4]https://issues.apache.org/jira/browse/FLINK-14074
[5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html


Best,
Yangze Guo

Reply via email to