The travis bot commands must be retained so long as we accept PRs for 1.9/1.10 .

On 25/05/2020 10:50, Yun Tang wrote:
I noticed that there still existed travis related bot commands in the github PR 
page, and I think we should remove the command hint now.
________________________________
From: Robert Metzger <rmetz...@apache.org>
Sent: Thursday, April 23, 2020 15:44
To: dev <dev@flink.apache.org>
Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / 
switch off Travis

FYI: I have moved the Flink PR and master builds from my personal Azure
account to a PMC controlled account:
https://dev.azure.com/apache-flink/apache-flink/_build

On Fri, Apr 17, 2020 at 8:28 PM Robert Metzger <rmetz...@apache.org> wrote:

Thanks a lot for bringing up this topic again.
The reason why I was hesitant to decommission Travis was that we were
still facing some issues with the Azure infrastructure that I want to
resolve, so that we have a strong test coverage.

In the last few weeks, we had the following issues:
- unstable e2e tests (we are running the e2e tests much more frequently,
thus we see more failures (and discover actual bugs!))
- network issues (mostly around downloading maven artifacts. This is
solved at the cost of slower builds. I'm preparing a fix to have stable &
fast maven downloads)
- the private builds were never really stable (this is work in progress.
the situation is definitely better than the private Travis builds)
- I haven't followed the overall master stability closely before February,
but I have the feeling that April so far was a pretty unstable month on
master. Piotr is regularly reverting commits that somehow broke master. The
problem with unstable master is that is causes a "CI fatigue", were people
assume that failing builds are not worth investigating anymore, leading to
more instability. This is not a problem of the CI infrastructure itself,
but it makes me less confident switching systems :)


Unless something unexpected happens, I'm proposing to disable pull request
processing on Travis next week.



On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <g...@apache.org> wrote:

I am in favour of decommissioning Travis.

Moreover, I wanted to use this thread to raise another issue with Travis
that I
have discovered recently; many of the builds running in my private Travis
account are timing out in the compilation stage (i.e., compilation takes
more
than 50 minutes). This means that I am not able to reliably run a full
build on
a CI server without creating a pull request. If other developers also
experience
this issue, it would speak for putting more effort into making Azure
Pipelines
the project-wide default.

Best,
Gary

On Thu, Mar 26, 2020 at 12:26 PM Yu Li <car...@gmail.com> wrote:

Thanks for the clarification Robert.

Since the first step plan is to replace the travis PR runs, I checked
all
PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:

* Travis FAILURE: 298
* Travis SUCCESS: 649 (68.5%)
* Azure FAILURE: 420
* Azure SUCCESS: 571 (57.6%)

Since the patch for each run is equivalent for Travis and Azure, there
seems to be slightly higher failure rate (~10%) when running in Azure.

However, with the just-merged fix for uploading logs (FLINK-16480), I
believe the success rate of Azure could compete with Travis now
(uploading
files contribute to 20% of the failures according to the report [2]).

So I'm +1 to disable travis runs according to the numbers.

Best Regards,
Yu

[1]

https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
[2]


https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
On Thu, 26 Mar 2020 at 03:28, Robert Metzger <rmetz...@apache.org>
wrote:
Thank you for your responses.

@Yu Li: In the current master, the log upload always fails, if the e2e
job
failed. I just merged a PR that fixes this issue [1]. The problem was
not
really the network stability, rather a problem with the interaction of
the
jobs in the pipeline (the e2e job did not set the right variables for
the
log upload)
Secondly, you are looking at the report of the "flink-ci.flink"
pipeline,
where pull requests are build. Naturally, pull request builds fail all
the
time, because the PRs are not yet perfect.

"flink-ci.flink-master" is the right pipeline to look at:


https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
We have a fairly high number of failures there, because we currently
have
some issues downloading the maven artifacts [2]. I'm working already
with
Chesnay on merging a fix for that.


[1]


https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
[2]https://issues.apache.org/jira/browse/FLINK-16720



On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <ches...@apache.org>
wrote:

The easiest way to disable travis for pushes is to remove all builds
from the .travis.yml with a push/pr condition.

On 25/03/2020 15:03, Robert Metzger wrote:
Thank you for the feedback so far.

Responses to the items Chesnay raised:

- by virtue of maintaining the past 2 releases we will have to
maintain
any
Travis infrastructure as long as 1.10 is supported, i.e., until
1.12
Okay. I wasn't sure about the exact policy there.


- the azure setup doesn't appear to be equivalent yet since the
java
e2e
profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
result
of
which SQLClientKafkaITCase isn't run

I filed a ticket to address this:
https://issues.apache.org/jira/browse/FLINK-16778


- the nightly scripts still seems to be using a maven version
other
than
3.2.5; from today on master:
2020-03-25T05:31:52.7412964Z [INFO] --------<
org.apache.flink:flink-end-to-end-tests-common-kafka >--------
2020-03-25T05:31:52.7413854Z [INFO] Building
flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
2020-03-25T05:31:52.7414689Z [INFO]
--------------------------------[
jar
]---------------------------------
2020-03-25T05:31:52.7518360Z [INFO]
2020-03-25T05:31:52.7519770Z [INFO] ---
maven-checkstyle-plugin:2.17:check
(validate) @ flink-end-to-end-tests-common-kafka ---

I'm planning to address this as part of
https://issues.apache.org/jira/browse/FLINK-16411, where I work
on
centralizing all mvn invocations.


- there is no real benefit in retiring the travis support in
CiBot;
the
important part is whether Travis is run or not for pull requests.
  From what I can tell though azure seems to be working fine for
pull
requests, so +1 to at least disable the travis PR runs.
So we disable Travis for https://github.com/flink-ci/flink ? I
will
do
it
once there are no new concerns and above tickets are resolved.

What about disabling travis for master pushes? (e.g. removing the
.travis.yml file from master)?


@Dian:
Thanks a lot for your feedback.

- The report of Azure is still not viewable[1] (I noticed that
Hequn
has
also reported this issue in another thread). This is very useful
information.
You are referring to the emails send to builds@f.a.o right?
I have reported this both as a bug [1] and a feature request [2]
to
Azure.
But I don't believe they will resolve this issue anytime soon.
Azure has an notifications API that we could use to build a
service
that
sends emails to that list, but I feel that this is really a waste
of
time.
The URL in the link even contains the ID of the build. We would
just
need
to extract this ID and generate the appropriate URL. I will try to
directly
reach the product management of AZP, maybe I can get some
attention
from
there.



[1]

https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
[2]

https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html


On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
ches...@apache.org>
wrote:

It was left out since it adds significant additional complexity
and
the
value is dubious at best for PRs that aren't merged shortly after
the
build has finished.

On 25/03/2020 10:28, Dian Fu wrote:
Thanks for the information. I'm sorry that I'm not aware of this
before
and I have checked the build log of travis and confirmed that
this
is
true.
@Chesnay Are there any specific reasons for this and is it
possible
to
add this back for Azure Pipelines?
Thanks,
Dian

在 2020年3月25日,下午4:43,Chesnay Schepler <ches...@apache.org> 写道:

@Dian we haven't been rebasing PR's against master for months,
ever
since we switched to CiBot.
On 25/03/2020 09:29, Dian Fu wrote:
Hi Robert,

Thanks a lot for your great work!

Overall I'm +1 to switch to Azure as the primary CI tool if
it's
stable enough as I think there is no need to run both the travis
and
Azure
for one single PR.
However, there are still some improvements need to do and it
would
be
great if these issues could be addressed before fully switch to
Azure:
- The report of Azure is still not viewable[1] (I noticed that
Hequn
has also reported this issue in another thread). This is very
useful
information.
- For PR test of Azure pipeline, it seems that it will not
rebase
the
master code before running the tests.
Thanks,
Dian

[1]
https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
<

https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
<

https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
<

https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
在 2020年3月25日,下午3:33,Chesnay Schepler <ches...@apache.org>
写道:
Some thoughts:
- by virtue of maintaining the past 2 releases we will have
to
maintain any Travis infrastructure as long as 1.10 is supported,
i.e.,
until 1.12
- the azure setup doesn't appear to be equivalent yet since
the
java
e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
result
of
which SQLClientKafkaITCase isn't run
- the nightly scripts still seems to be using a maven version
other
than 3.2.5; from today on master:
2020-03-25T05:31:52.7412964Z [INFO] --------<
org.apache.flink:flink-end-to-end-tests-common-kafka >--------
2020-03-25T05:31:52.7413854Z [INFO] Building
flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
2020-03-25T05:31:52.7414689Z [INFO]
--------------------------------[
jar ]---------------------------------
2020-03-25T05:31:52.7518360Z [INFO]
2020-03-25T05:31:52.7519770Z [INFO] ---
maven-checkstyle-plugin:2.17:check (validate) @
flink-end-to-end-tests-common-kafka ---
- there is no real benefit in retiring the travis support in
CiBot;
the important part is whether Travis is run or not for pull
requests.
   From what I can tell though azure seems to be working fine
for
pull
requests, so +1 to at least disable the travis PR runs.
On 23/03/2020 14:48, Robert Metzger wrote:
Hey devs,

I would like to discuss whether it makes sense to fully
switch
to
Azure
Pipelines and phase out our Travis integration.
More information on our Azure integration can be found here:

https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
Travis will stay for the release-1.10 and older branches,
as I
have
set up
Azure only for the master branch.

Proposal:
- We keep the flinkbot infrastructure supporting both Travis
and
Azure
around, while we are still receive pull requests and pushes
for
the
"master" and "release-1.10" branches.
- We remove the travis-specific files from "master", so that
builds
are not
triggered anymore
- once we receive no more builds at Travis (because 1.11 has
been
released), we remove the remaining travis-related
infrastructure
What do you think?


Best,
Robert



Reply via email to