+1 for officially deprecating this component for the 1.13 release. Cheers, Till
On Thu, Mar 25, 2021 at 1:49 PM Konstantin Knauf <kna...@apache.org> wrote: > Hi Matthias, > > Thank you for following up on this. +1 to officially deprecate Mesos in > the code and documentation, too. It will be confusing for users if this > diverges from the roadmap. > > Cheers, > > Konstantin > > On Thu, Mar 25, 2021 at 12:23 PM Matthias Pohl <matth...@ververica.com> > wrote: > >> Hi everyone, >> considering the upcoming release of Flink 1.13, I wanted to revive the >> discussion about the Mesos support ones more. Mesos is also already listed >> as deprecated in Flink's overall roadmap [1]. Maybe, it's time to align >> the >> documentation accordingly to make it more explicit? >> >> What do you think? >> >> Best, >> Matthias >> >> [1] https://flink.apache.org/roadmap.html#feature-radar >> >> On Wed, Oct 28, 2020 at 9:40 AM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >> > Hi Oleksandr, >> > >> > yes you are right. The biggest problem is at the moment the lack of test >> > coverage and thereby confidence to make changes. We have some e2e tests >> > which you can find here [1]. These tests are, however, quite coarse >> grained >> > and are missing a lot of cases. One idea would be to add a Mesos e2e >> test >> > based on Flink's end-to-end test framework [2]. I think what needs to be >> > done there is to add a Mesos resource and a way to submit jobs to a >> Mesos >> > cluster to write e2e tests. >> > >> > [1] https://github.com/apache/flink/tree/master/flink-jepsen >> > [2] >> > >> https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common >> > >> > Cheers, >> > Till >> > >> > On Tue, Oct 27, 2020 at 12:29 PM Oleksandr Nitavskyi < >> > o.nitavs...@criteo.com> wrote: >> > >> >> Hello Xintong, >> >> >> >> Thanks for the insights and support. >> >> >> >> Browsing the Mesos backlog and didn't identify anything critical, which >> >> is left there. >> >> >> >> I see that there are were quite a lot of contributions to the Flink >> Mesos >> >> in the recent version: >> >> https://github.com/apache/flink/commits/master/flink-mesos. >> >> We plan to validate the current Flink master (or release 1.12 branch) >> our >> >> Mesos setup. In case of any issues, we will try to propose changes. >> >> My feeling is that our test results shouldn't affect the Flink 1.12 >> >> release cycle. And if any potential commits will land into the 1.12.1 >> it >> >> should be totally fine. >> >> >> >> In the future, we would be glad to help you guys with any >> >> maintenance-related questions. One of the highest priorities around >> this >> >> component seems to be the development of the full e2e test. >> >> >> >> Kind Regards >> >> Oleksandr Nitavskyi >> >> ________________________________ >> >> From: Xintong Song <tonysong...@gmail.com> >> >> Sent: Tuesday, October 27, 2020 7:14 AM >> >> To: dev <d...@flink.apache.org>; user <user@flink.apache.org> >> >> Cc: Piyush Narang <p.nar...@criteo.com> >> >> Subject: [BULK]Re: [SURVEY] Remove Mesos support >> >> >> >> Hi Piyush, >> >> >> >> Thanks a lot for sharing the information. It would be a great relief >> that >> >> you are good with Flink on Mesos as is. >> >> >> >> As for the jira issues, I believe the most essential ones should have >> >> already been resolved. You may find some remaining open issues here >> [1], >> >> but not all of them are necessary if we decide to keep Flink on Mesos >> as is. >> >> >> >> At the moment and in the short future, I think helps are mostly needed >> on >> >> testing the upcoming release 1.12 with Mesos use cases. The community >> is >> >> currently actively preparing the new release, and hopefully we could >> come >> >> up with a release candidate early next month. It would be greatly >> >> appreciated if you fork as experienced Flink on Mesos users can help >> with >> >> verifying the release candidates. >> >> >> >> >> >> Thank you~ >> >> >> >> Xintong Song >> >> >> >> [1] >> >> >> https://issues.apache.org/jira/browse/FLINK-17402?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20%22Deployment%20%2F%20Mesos%22%20AND%20status%20%3D%20Open >> >> < >> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-17402%3Fjql%3Dproject%2520%253D%2520FLINK%2520AND%2520component%2520%253D%2520%2522Deployment%2520%252F%2520Mesos%2522%2520AND%2520status%2520%253D%2520Open&data=04%7C01%7Co.nitavskyi%40criteo.com%7C3585e1f25bdf4e091af808d87a3f92db%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C0%7C637393760750820881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hytJFQE0MCPzMLiQTQTdbg3GVckX5M3r1NPRGrRV8j4%3D&reserved=0 >> >> > >> >> >> >> On Tue, Oct 27, 2020 at 2:58 AM Piyush Narang <p.nar...@criteo.com >> >> <mailto:p.nar...@criteo.com>> wrote: >> >> >> >> Hi Xintong, >> >> >> >> >> >> >> >> Do you have any jiras that cover any of the items on 1 or 2? I can >> reach >> >> out to folks internally and see if I can get some folks to commit to >> >> helping out. >> >> >> >> >> >> >> >> To cover the other qs: >> >> >> >> * Yes, we’ve not got a plan at the moment to get off Mesos. We use >> >> Yarn for some our Flink workloads when we can. Mesos is only used when >> we >> >> need streaming capabilities in our WW dcs (as our Yarn is centralized >> in >> >> one DC) >> >> * We’re currently on Flink 1.9 (old planner). We have a plan to >> bump >> >> to 1.11 / 1.12 this quarter. >> >> * We typically upgrade once every 6 months to a year (not every >> >> release). We’d like to speed up the cadence but we’re not there yet. >> >> * We’d largely be good with keeping Flink on Mesos as-is and >> >> functional while missing out on some of the newer features. We >> understand >> >> the pain on the communities side and we can take on the work if we see >> some >> >> fancy improvement in Flink on Yarn / K8s that we want in Mesos to put >> in >> >> the request to port it over. >> >> >> >> >> >> >> >> Thanks, >> >> >> >> >> >> >> >> -- Piyush >> >> >> >> >> >> >> >> >> >> >> >> From: Xintong Song <tonysong...@gmail.com<mailto:tonysong...@gmail.com >> >> >> >> Date: Sunday, October 25, 2020 at 10:57 PM >> >> To: dev <d...@flink.apache.org<mailto:d...@flink.apache.org>>, user < >> >> user@flink.apache.org<mailto:user@flink.apache.org>> >> >> Cc: Lasse Nedergaard <lassenedergaardfl...@gmail.com<mailto: >> >> lassenedergaardfl...@gmail.com>>, <p.nar...@criteo.com<mailto: >> >> p.nar...@criteo.com>> >> >> Subject: Re: [SURVEY] Remove Mesos support >> >> >> >> >> >> >> >> Thanks for sharing the information with us, Piyush an Lasse. >> >> >> >> >> >> >> >> @Piyush >> >> >> >> >> >> >> >> Thanks for offering the help. IMO, there are currently several problems >> >> that make supporting Flink on Mesos challenging for us. >> >> >> >> 1. Lack of Mesos experts. AFAIK, there are very few people (if not >> >> none) among the active contributors in this community that are familiar >> >> with Mesos and can help with development on this component. >> >> 2. Absence of tests. Mesos does not provide a testing cluster, like >> >> `MiniYARNCluster`, making it hard to test interactions between Flink >> and >> >> Mesos. We have only a few very simple e2e tests running on Mesos >> deployed >> >> in a docker, covering the most fundamental workflows. We are not sure >> how >> >> well those tests work, especially against some potential corner cases. >> >> 3. Divergence from other deployment. Because of 1 and 2, the new >> >> efforts (features, maintenance, refactors) tend to exclude Mesos if >> >> possible. When the new efforts have to touch the Mesos related >> components >> >> (e.g., changes to the common resource manager interfaces), we have to >> be >> >> very careful and make as few changes as possible, to avoid accidentally >> >> breaking anything that we are not familiar with. As a result, the >> component >> >> diverges a lot from other deployment components (K8s/Yarn), which >> makes it >> >> harder to maintain. >> >> >> >> It would be greatly appreciated if you can help with either of the >> above >> >> issues. >> >> >> >> >> >> >> >> Additionally, I have a few questions concerning your use cases at >> Criteo. >> >> IIUC, you are going to stay on Mesos in the foreseeable future, while >> >> keeping the Flink version up-to-date? What Flink version are you >> currently >> >> using? How often do you upgrade (e.g., every release)? Would you be >> good >> >> with keeping the Flink on Mesos component as it is (means that >> deployment >> >> and resource management improvements may not be ported to Mesos), while >> >> keeping other components up-to-date (e.g., improvements from >> programming >> >> APIs, operators, state backens, etc.)? >> >> >> >> >> >> >> >> Thank you~ >> >> >> >> Xintong Song >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Oct 24, 2020 at 2:48 AM Lasse Nedergaard < >> >> lassenedergaardfl...@gmail.com<mailto:lassenedergaardfl...@gmail.com>> >> >> wrote: >> >> >> >> Hi >> >> >> >> >> >> >> >> At Trackunit We have been using Mesos for long time but have now moved >> to >> >> k8s. >> >> >> >> Med venlig hilsen / Best regards >> >> >> >> Lasse Nedergaard >> >> >> >> >> >> >> >> >> >> >> >> Den 23. okt. 2020 kl. 17.01 skrev Robert Metzger <rmetz...@apache.org >> >> <mailto:rmetz...@apache.org>>: >> >> >> >> >> >> >> >> Hey Piyush, >> >> >> >> thanks a lot for raising this concern. I believe we should keep Mesos >> in >> >> Flink then in the foreseeable future. >> >> >> >> Your offer to help is much appreciated. We'll let you know once there >> is >> >> something. >> >> >> >> >> >> >> >> On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang <p.nar...@criteo.com >> >> <mailto:p.nar...@criteo.com>> wrote: >> >> >> >> Thanks Kostas. If there's items we can help with, I'm sure we'd be able >> >> to find folks who would be excited to contribute / help in any way. >> >> >> >> -- Piyush >> >> >> >> >> >> On 10/23/20, 10:25 AM, "Kostas Kloudas" <kklou...@gmail.com<mailto: >> >> kklou...@gmail.com>> wrote: >> >> >> >> Thanks Piyush for the message. >> >> After this, I revoke my +1. I agree with the previous opinions >> that we >> >> cannot drop code that is actively used by users, especially if it >> >> something that deep in the stack as support for cluster management >> >> framework. >> >> >> >> Cheers, >> >> Kostas >> >> >> >> On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang <p.nar...@criteo.com >> >> <mailto:p.nar...@criteo.com>> wrote: >> >> > >> >> > Hi folks, >> >> > >> >> > >> >> > >> >> > We at Criteo are active users of the Flink on Mesos resource >> >> management component. We are pretty heavy users of Mesos for scheduling >> >> workloads on our edge datacenters and we do want to continue to be >> able to >> >> run some of our Flink topologies (to compute machine learning short >> term >> >> features) on those DCs. If possible our vote would be not to drop Mesos >> >> support as that will tie us to an old release / have to maintain a >> fork as >> >> we’re not planning to migrate off Mesos anytime soon. Is the burden >> >> something that can be helped with by the community? (Or are you >> referring >> >> to having to ensure PRs handle the Mesos piece as well when they touch >> the >> >> resource managers?) >> >> > >> >> > >> >> > >> >> > Thanks, >> >> > >> >> > >> >> > >> >> > -- Piyush >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > From: Till Rohrmann <trohrm...@apache.org<mailto: >> >> trohrm...@apache.org>> >> >> > Date: Friday, October 23, 2020 at 8:19 AM >> >> > To: Xintong Song <tonysong...@gmail.com<mailto: >> >> tonysong...@gmail.com>> >> >> > Cc: dev <d...@flink.apache.org<mailto:d...@flink.apache.org>>, >> user < >> >> user@flink.apache.org<mailto:user@flink.apache.org>> >> >> > Subject: Re: [SURVEY] Remove Mesos support >> >> > >> >> > >> >> > >> >> > Thanks for starting this survey Robert! I second Konstantin and >> >> Xintong in the sense that our Mesos user's opinions should matter most >> >> here. If our community is no longer using the Mesos integration, then I >> >> would be +1 for removing it in order to decrease the maintenance >> burden. >> >> > >> >> > >> >> > >> >> > Cheers, >> >> > >> >> > Till >> >> > >> >> > >> >> > >> >> > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song < >> tonysong...@gmail.com >> >> <mailto:tonysong...@gmail.com>> wrote: >> >> > >> >> > +1 for adding a warning in 1.12 about planning to remove Mesos >> >> support. >> >> > >> >> > >> >> > >> >> > With my developer hat on, removing the Mesos support would >> >> definitely reduce the maintaining overhead for the deployment and >> resource >> >> management related components. On the other hand, the Flink on Mesos >> users' >> >> voices definitely matter a lot for this community. Either way, it >> would be >> >> good to draw users attention to this discussion early. >> >> > >> >> > >> >> > >> >> > Thank you~ >> >> > >> >> > Xintong Song >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf < >> kna...@apache.org >> >> <mailto:kna...@apache.org>> wrote: >> >> > >> >> > Hi Robert, >> >> > >> >> > +1 to the plan you outlined. If we were to drop support in Flink >> >> 1.13+, we >> >> > would still support it in Flink 1.12- with bug fixes for some >> time >> >> so that >> >> > users have time to move on. >> >> > >> >> > It would certainly be very interesting to hear from current Flink >> >> on Mesos >> >> > users, on how they see the evolution of this part of the >> ecosystem. >> >> > >> >> > Best, >> >> > >> >> > Konstantin >> >> >> > >> > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >