+1 for Azure pipeline because it promises better performance. However, I have 2 concerns:
1) Travis provides personal free service for testing personal branches. Usually, contributors use this feature to test PoC or run CRON jobs for pull requests. Using local machine will cost a lot of time. Does AZP provides the same free service? 2) Currently, we deployed a webhook [1] to receive Travis CI build notifications [2] and send to bui...@flink.apache.org mailing list. We need to figure out a way how to send Azure build results to the mailing list. And this [3] might be the way to go. builds@f.a.o mailing list Best, Jark [1]: https://github.com/wuchong/flink-notification-bot [2]: https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications [3]: https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote: > +1 > > Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道: > > > +1 for moving to Azure pipelines as it promises better scalability and > > tooling. Looking forward to having faster builds and hence shorter > feedback > > cycles :-) > > > > Cheers, > > Till > > > > On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org> > > wrote: > > > > > @robert Can you expand how the azure setup interacts with CiBot? Do we > > > have to continue mirroring builds into flink-ci? How will the cronjob > > > configuration work? We should have a general idea on how to implement > > > this before proceeding. > > > Additionally, moving /all /jobs into flink-ci requires setting up the > > > environment variables we have; can we set these up via files or will we > > > have to give all committers permissions for flink-ci/flink? > > > > > > On 04/12/2019 12:55, Chesnay Schepler wrote: > > > > From what I've seen so far Azure will provide us a better experience, > > > > so I'd say +1 for the transition as a whole. > > > > > > > > I'd delay merge at least until the feature branch is cut. > > > > Given the parental leave it may even make sense to only start merging > > > > in January afterwards, to reduce the total time taken for the > > transition. > > > > > > > > Reviews could maybe be made earlier, but I'm wondering whether anyone > > > > would even have the time at the moment to do so. > > > > > > > > On 04/12/2019 12:35, Kurt Young wrote: > > > >> Thanks Robert for driving this. There is another big pain point of > > > >> current > > > >> travis, > > > >> which is its cache mechanism will fail from time to time. Almost > > > >> around 50% > > > >> of > > > >> the build fails are caused by cache problem. I opened this issue to > > > >> travis > > > >> but > > > >> got no response yet. So big +1 from my side. > > > >> > > > >> Just one comment, it's close to 1.10 feature freeze and we will > spend > > > >> some > > > >> time > > > >> to make tests stable before release. I wish this replacement can > > happen > > > >> after > > > >> 1.10 release, otherwise it will be a unstable factor during release > > > >> testing. > > > >> > > > >> Best, > > > >> Kurt > > > >> > > > >> > > > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com> wrote: > > > >> > > > >>> Thanks Robert for the updates! And thanks a lot for all the efforts > > to > > > >>> investigate, experiment and tune Azure Pipelines for Flink > building. > > > >>> Big +1 for it. > > > >>> > > > >>> It would be great that the community building can be extended with > > > >>> custom > > > >>> machines so that the tests would not be queued for long with daily > > > >>> growing > > > >>> PRs. > > > >>> > > > >>> The increased timeout would be also very helpful. > > > >>> The 50min timeout for free travis accounts is a pain currently, > > > >>> especially > > > >>> when we'd like to run e2e tests in our own travis. And I had to > > > >>> manually > > > >>> split the jobs to make it possible to pass. > > > >>> > > > >>> Thanks, > > > >>> Zhu Zhu > > > >>> > > > >>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道: > > > >>> > > > >>>> Hi all, > > > >>>> > > > >>>> as a follow up from our discussion on reducing the build time > [1], I > > > >>> would > > > >>>> like to propose migrating our build infrastructure to Azure > > Pipelines > > > >>> (away > > > >>>> from Travis). > > > >>>> > > > >>>> I believe that we have reached the limits of what Travis can > > > >>>> provide the > > > >>>> Flink community, and I don't want the build system to limit or > > > >>>> influence > > > >>>> the project's growth. > > > >>>> > > > >>>> *Benefits:* > > > >>>> 1. The free Travis account are limited to 5 parallel builds, with > a > > > >>> timeout > > > >>>> of 50 minutes. Azure offers *10 parallel builds with 300 minute > > > >>>> timeouts > > > >>>> *for > > > >>>> free for open source projects. > > > >>>> 2. Azure Pipelines allows us to *add custom build machines* to the > > > >>>> pool > > > >>> of > > > >>>> 10 free parallel builders. > > > >>>> This will allow the Flink community to scale the available build > > > >>>> capacity > > > >>>> as the project grows. We are dependent on donations from > supporting > > > >>>> companies, but I believe that it is easier for companies to donate > > > >>> machines > > > >>>> than money. > > > >>>> Alibaba is willing to provide 10 machines, with 32 cores each to > the > > > >>> Flink > > > >>>> project for this purpose. > > > >>>> In addition, Xiyuan, who's working on adding ARM support for Flink > > > >>> provided > > > >>>> me with 2 ARM machines (16 cores each). > > > >>>> I want to use the custom, more efficient build machines for > building > > > >>>> Flink's pull requests and master-pushes. > > > >>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for > > > >>>> example to > > > >>>> transfer intermediate build artifacts between pipeline stages. > This > > > >>>> will > > > >>>> allow us to make the build more reliable (we are currently abusing > > the > > > >>>> caching mechanism in Travis for this). > > > >>>> It also has some basic analytics on test results / flaky tests > etc. > > > >>>> > > > >>>> *Known problems:* > > > >>>> - Initially, we might see different build instabilities than > before > > > >>>> - There's a higher maintenance overhead for the custom build > > machines > > > >>>> (keeping them up to date etc.) > > > >>>> - We can not use the build status integration of AZP, because they > > > >>> require > > > >>>> write access to the repository's source. The foundation does not > > allow > > > >>> that > > > >>>> [2]. > > > >>>> I propose to extend flinkbot / the flink-ci repository. > > > >>>> > > > >>>> *Current Status:* > > > >>>> - I'm able [3] to execute [4] the current custom build scripts on > > > >>>> Azure > > > >>>> Pipelines: This means that we will have one compile stage, and N > > > >>>> testing > > > >>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs. > > > >>>> The time from the start of a build till all tests have completed > is > > > >>>> 1h22 > > > >>>> minutes. > > > >>>> - I'm working on getting the nightly end to end tests to run on > the > > > >>>> new > > > >>>> infrastructure. > > > >>>> - I'm working on getting the build to work on our pool of custom > > > >>>> machines > > > >>>> as well > > > >>>> - I'm working on setting up the full matrix of builds (different > > > >>>> scala, > > > >>>> hadoop etc. versions) for the nightlies > > > >>>> > > > >>>> *Next Steps:* > > > >>>> - I propose to document the entire build system in the Flink Wiki > > > >>>> - Once Azure can cover the same pull request tests as Travis, I > > > >>>> would set > > > >>>> it up to run in parallel (including Flinkbot posting links to > > > >>>> Azure). I > > > >>>> hope that this phase lasts for 1-2 weeks only, so that we do not > > > >>>> have to > > > >>>> maintain things concurrently. I will monitor the build stability > > > >>>> closely, > > > >>>> but would expect some support with debugging potential issues from > > the > > > >>>> contributors. > > > >>>> - Once there are no problems with the new setup, we remove the > > Travis > > > >>>> setup. > > > >>>> - Independently, I will work on triggering builds from master / > > > >>>> release - > > > >>>> branch pushes, as well as cron builds from the master branch ... > > > >>>> all this > > > >>>> will be described in the Wiki. > > > >>>> > > > >>>> > > > >>>> *Timeline:*- Once I have the feeling that people are supportive of > > the > > > >>>> idea, I will start documenting in the Wiki. The first pull > requests > > > >>> should > > > >>>> show up after a few more days. > > > >>>> I will do a one month parental leave starting some time later in > > > >>> December, > > > >>>> which will probably delay things a bit. I hope to have everything > > > >>> finished > > > >>>> by end of January. > > > >>>> > > > >>>> I'm happy to hear your thoughts on this work. > > > >>>> If nobody objects, I will start documenting the system and prepare > > > >>>> everything for the migration. > > > >>>> > > > >>>> Best, > > > >>>> Robert > > > >>>> > > > >>>> > > > >>>> > > > >>>> [1] > > > >>>> > > > >>>> > > > >>> > > > > > > https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E > > > >>> > > > >>>> [2] https://issues.apache.org/jira/browse/INFRA-17030 > > > >>>> [3] https://github.com/rmetzger/flink/tree/azure_playground > > > >>>> [4] > > > >>> > > https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary > > > > > > > > > > > > > > > > > > > > > > > -- > Best Regards > > Jeff Zhang >