If we still need to accept PRs for Flink-1.9/1.10, that could explain why we still need that command hint. Chesnay, thanks for your explanation. ________________________________ From: Chesnay Schepler <ches...@apache.org> Sent: Monday, May 25, 2020 18:17 To: dev@flink.apache.org <dev@flink.apache.org>; Yun Tang <myas...@live.com> Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis
The travis bot commands must be retained so long as we accept PRs for 1.9/1.10 . On 25/05/2020 10:50, Yun Tang wrote: > I noticed that there still existed travis related bot commands in the github > PR page, and I think we should remove the command hint now. > ________________________________ > From: Robert Metzger <rmetz...@apache.org> > Sent: Thursday, April 23, 2020 15:44 > To: dev <dev@flink.apache.org> > Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / > switch off Travis > > FYI: I have moved the Flink PR and master builds from my personal Azure > account to a PMC controlled account: > https://dev.azure.com/apache-flink/apache-flink/_build > > On Fri, Apr 17, 2020 at 8:28 PM Robert Metzger <rmetz...@apache.org> wrote: > >> Thanks a lot for bringing up this topic again. >> The reason why I was hesitant to decommission Travis was that we were >> still facing some issues with the Azure infrastructure that I want to >> resolve, so that we have a strong test coverage. >> >> In the last few weeks, we had the following issues: >> - unstable e2e tests (we are running the e2e tests much more frequently, >> thus we see more failures (and discover actual bugs!)) >> - network issues (mostly around downloading maven artifacts. This is >> solved at the cost of slower builds. I'm preparing a fix to have stable & >> fast maven downloads) >> - the private builds were never really stable (this is work in progress. >> the situation is definitely better than the private Travis builds) >> - I haven't followed the overall master stability closely before February, >> but I have the feeling that April so far was a pretty unstable month on >> master. Piotr is regularly reverting commits that somehow broke master. The >> problem with unstable master is that is causes a "CI fatigue", were people >> assume that failing builds are not worth investigating anymore, leading to >> more instability. This is not a problem of the CI infrastructure itself, >> but it makes me less confident switching systems :) >> >> >> Unless something unexpected happens, I'm proposing to disable pull request >> processing on Travis next week. >> >> >> >> On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <g...@apache.org> wrote: >> >>> I am in favour of decommissioning Travis. >>> >>> Moreover, I wanted to use this thread to raise another issue with Travis >>> that I >>> have discovered recently; many of the builds running in my private Travis >>> account are timing out in the compilation stage (i.e., compilation takes >>> more >>> than 50 minutes). This means that I am not able to reliably run a full >>> build on >>> a CI server without creating a pull request. If other developers also >>> experience >>> this issue, it would speak for putting more effort into making Azure >>> Pipelines >>> the project-wide default. >>> >>> Best, >>> Gary >>> >>> On Thu, Mar 26, 2020 at 12:26 PM Yu Li <car...@gmail.com> wrote: >>> >>>> Thanks for the clarification Robert. >>>> >>>> Since the first step plan is to replace the travis PR runs, I checked >>> all >>>> PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result: >>>> >>>> * Travis FAILURE: 298 >>>> * Travis SUCCESS: 649 (68.5%) >>>> * Azure FAILURE: 420 >>>> * Azure SUCCESS: 571 (57.6%) >>>> >>>> Since the patch for each run is equivalent for Travis and Azure, there >>>> seems to be slightly higher failure rate (~10%) when running in Azure. >>>> >>>> However, with the just-merged fix for uploading logs (FLINK-16480), I >>>> believe the success rate of Azure could compete with Travis now >>> (uploading >>>> files contribute to 20% of the failures according to the report [2]). >>>> >>>> So I'm +1 to disable travis runs according to the numbers. >>>> >>>> Best Regards, >>>> Yu >>>> >>>> [1] >>>> >>> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01 >>>> [2] >>>> >>>> >>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4 >>>> On Thu, 26 Mar 2020 at 03:28, Robert Metzger <rmetz...@apache.org> >>> wrote: >>>>> Thank you for your responses. >>>>> >>>>> @Yu Li: In the current master, the log upload always fails, if the e2e >>>> job >>>>> failed. I just merged a PR that fixes this issue [1]. The problem was >>> not >>>>> really the network stability, rather a problem with the interaction of >>>> the >>>>> jobs in the pipeline (the e2e job did not set the right variables for >>> the >>>>> log upload) >>>>> Secondly, you are looking at the report of the "flink-ci.flink" >>> pipeline, >>>>> where pull requests are build. Naturally, pull request builds fail all >>>> the >>>>> time, because the PRs are not yet perfect. >>>>> >>>>> "flink-ci.flink-master" is the right pipeline to look at: >>>>> >>>>> >>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build >>>>> We have a fairly high number of failures there, because we currently >>> have >>>>> some issues downloading the maven artifacts [2]. I'm working already >>> with >>>>> Chesnay on merging a fix for that. >>>>> >>>>> >>>>> [1] >>>>> >>>>> >>> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259 >>>>> [2]https://issues.apache.org/jira/browse/FLINK-16720 >>>>> >>>>> >>>>> >>>>> On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <ches...@apache.org> >>>>> wrote: >>>>> >>>>>> The easiest way to disable travis for pushes is to remove all builds >>>>>> from the .travis.yml with a push/pr condition. >>>>>> >>>>>> On 25/03/2020 15:03, Robert Metzger wrote: >>>>>>> Thank you for the feedback so far. >>>>>>> >>>>>>> Responses to the items Chesnay raised: >>>>>>> >>>>>>> - by virtue of maintaining the past 2 releases we will have to >>>> maintain >>>>>> any >>>>>>>> Travis infrastructure as long as 1.10 is supported, i.e., until >>> 1.12 >>>>>>> Okay. I wasn't sure about the exact policy there. >>>>>>> >>>>>>> >>>>>>>> - the azure setup doesn't appear to be equivalent yet since the >>> java >>>>> e2e >>>>>>>> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a >>> result >>>> of >>>>>>>> which SQLClientKafkaITCase isn't run >>>>>>>> >>>>>>> I filed a ticket to address this: >>>>>>> https://issues.apache.org/jira/browse/FLINK-16778 >>>>>>> >>>>>>> >>>>>>>> - the nightly scripts still seems to be using a maven version >>> other >>>>> than >>>>>>>> 3.2.5; from today on master: >>>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------< >>>>>>>> org.apache.flink:flink-end-to-end-tests-common-kafka >-------- >>>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building >>>>>>>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46] >>>>>>>> 2020-03-25T05:31:52.7414689Z [INFO] >>>> --------------------------------[ >>>>>> jar >>>>>>>> ]--------------------------------- >>>>>>>> 2020-03-25T05:31:52.7518360Z [INFO] >>>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] --- >>>>>> maven-checkstyle-plugin:2.17:check >>>>>>>> (validate) @ flink-end-to-end-tests-common-kafka --- >>>>>>>> >>>>>>> I'm planning to address this as part of >>>>>>> https://issues.apache.org/jira/browse/FLINK-16411, where I work >>> on >>>>>>> centralizing all mvn invocations. >>>>>>> >>>>>>> >>>>>>>> - there is no real benefit in retiring the travis support in >>> CiBot; >>>>> the >>>>>>>> important part is whether Travis is run or not for pull requests. >>>>>>>> From what I can tell though azure seems to be working fine for >>> pull >>>>>>>> requests, so +1 to at least disable the travis PR runs. >>>>>>> So we disable Travis for https://github.com/flink-ci/flink ? I >>> will >>>> do >>>>>> it >>>>>>> once there are no new concerns and above tickets are resolved. >>>>>>> >>>>>>> What about disabling travis for master pushes? (e.g. removing the >>>>>>> .travis.yml file from master)? >>>>>>> >>>>>>> >>>>>>> @Dian: >>>>>>> Thanks a lot for your feedback. >>>>>>> >>>>>>> - The report of Azure is still not viewable[1] (I noticed that >>> Hequn >>>>> has >>>>>>>> also reported this issue in another thread). This is very useful >>>>>>>> information. >>>>>>> You are referring to the emails send to builds@f.a.o right? >>>>>>> I have reported this both as a bug [1] and a feature request [2] >>> to >>>>>> Azure. >>>>>>> But I don't believe they will resolve this issue anytime soon. >>>>>>> Azure has an notifications API that we could use to build a >>> service >>>>> that >>>>>>> sends emails to that list, but I feel that this is really a waste >>> of >>>>>> time. >>>>>>> The URL in the link even contains the ID of the build. We would >>> just >>>>> need >>>>>>> to extract this ID and generate the appropriate URL. I will try to >>>>>> directly >>>>>>> reach the product management of AZP, maybe I can get some >>> attention >>>>> from >>>>>>> there. >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403 >>>>>>> [2] >>>>>>> >>> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler < >>>> ches...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> It was left out since it adds significant additional complexity >>> and >>>>> the >>>>>>>> value is dubious at best for PRs that aren't merged shortly after >>>> the >>>>>>>> build has finished. >>>>>>>> >>>>>>>> On 25/03/2020 10:28, Dian Fu wrote: >>>>>>>>> Thanks for the information. I'm sorry that I'm not aware of this >>>>> before >>>>>>>> and I have checked the build log of travis and confirmed that >>> this >>>> is >>>>>> true. >>>>>>>>> @Chesnay Are there any specific reasons for this and is it >>> possible >>>>> to >>>>>>>> add this back for Azure Pipelines? >>>>>>>>> Thanks, >>>>>>>>> Dian >>>>>>>>> >>>>>>>>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <ches...@apache.org> 写道: >>>>>>>>>> >>>>>>>>>> @Dian we haven't been rebasing PR's against master for months, >>>> ever >>>>>>>> since we switched to CiBot. >>>>>>>>>> On 25/03/2020 09:29, Dian Fu wrote: >>>>>>>>>>> Hi Robert, >>>>>>>>>>> >>>>>>>>>>> Thanks a lot for your great work! >>>>>>>>>>> >>>>>>>>>>> Overall I'm +1 to switch to Azure as the primary CI tool if >>> it's >>>>>>>> stable enough as I think there is no need to run both the travis >>> and >>>>>> Azure >>>>>>>> for one single PR. >>>>>>>>>>> However, there are still some improvements need to do and it >>>> would >>>>> be >>>>>>>> great if these issues could be addressed before fully switch to >>>> Azure: >>>>>>>>>>> - The report of Azure is still not viewable[1] (I noticed that >>>>> Hequn >>>>>>>> has also reported this issue in another thread). This is very >>> useful >>>>>>>> information. >>>>>>>>>>> - For PR test of Azure pipeline, it seems that it will not >>> rebase >>>>> the >>>>>>>> master code before running the tests. >>>>>>>>>>> Thanks, >>>>>>>>>>> Dian >>>>>>>>>>> >>>>>>>>>>> [1] >>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 >>>>>>>> < >>>>>>>> >>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 >>>>>>>> < >>>>>>>> >>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 >>>>>>>> < >>>>>>>> >>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 >>>>>>>>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <ches...@apache.org> >>> 写道: >>>>>>>>>>>> Some thoughts: >>>>>>>>>>>> - by virtue of maintaining the past 2 releases we will have >>> to >>>>>>>> maintain any Travis infrastructure as long as 1.10 is supported, >>>> i.e., >>>>>>>> until 1.12 >>>>>>>>>>>> - the azure setup doesn't appear to be equivalent yet since >>> the >>>>> java >>>>>>>> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a >>>>> result >>>>>> of >>>>>>>> which SQLClientKafkaITCase isn't run >>>>>>>>>>>> - the nightly scripts still seems to be using a maven version >>>>> other >>>>>>>> than 3.2.5; from today on master: >>>>>>>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------< >>>>>>>> org.apache.flink:flink-end-to-end-tests-common-kafka >-------- >>>>>>>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building >>>>>>>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46] >>>>>>>>>>>> 2020-03-25T05:31:52.7414689Z [INFO] >>>>>> --------------------------------[ >>>>>>>> jar ]--------------------------------- >>>>>>>>>>>> 2020-03-25T05:31:52.7518360Z [INFO] >>>>>>>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] --- >>>>>>>> maven-checkstyle-plugin:2.17:check (validate) @ >>>>>>>> flink-end-to-end-tests-common-kafka --- >>>>>>>>>>>> - there is no real benefit in retiring the travis support in >>>>> CiBot; >>>>>>>> the important part is whether Travis is run or not for pull >>>> requests. >>>>>>>>>>>> From what I can tell though azure seems to be working fine >>> for >>>>>> pull >>>>>>>> requests, so +1 to at least disable the travis PR runs. >>>>>>>>>>>> On 23/03/2020 14:48, Robert Metzger wrote: >>>>>>>>>>>>> Hey devs, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to discuss whether it makes sense to fully >>> switch >>>> to >>>>>>>> Azure >>>>>>>>>>>>> Pipelines and phase out our Travis integration. >>>>>>>>>>>>> More information on our Azure integration can be found here: >>>>>>>>>>>>> >>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines >>>>>>>>>>>>> Travis will stay for the release-1.10 and older branches, >>> as I >>>>> have >>>>>>>> set up >>>>>>>>>>>>> Azure only for the master branch. >>>>>>>>>>>>> >>>>>>>>>>>>> Proposal: >>>>>>>>>>>>> - We keep the flinkbot infrastructure supporting both Travis >>>> and >>>>>>>> Azure >>>>>>>>>>>>> around, while we are still receive pull requests and pushes >>> for >>>>> the >>>>>>>>>>>>> "master" and "release-1.10" branches. >>>>>>>>>>>>> - We remove the travis-specific files from "master", so that >>>>> builds >>>>>>>> are not >>>>>>>>>>>>> triggered anymore >>>>>>>>>>>>> - once we receive no more builds at Travis (because 1.11 has >>>> been >>>>>>>>>>>>> released), we remove the remaining travis-related >>>> infrastructure >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Robert >>>>>>>> >>>>>>