Thanks for driving this Mathhias! +1 for joining the INFRA trial.
Apache Infra did some experimenting on self-hosted runners in
collaboration
with Apache Airflow (see ashb/runner with releases/pr-security-options
branch)
where they only allow certain groups of users (e.g. committers) to run
their
workflows on self-hosted machines. Any other group would have to rely
on
GitHub’s runners.
This is mentioned in the "Limitations of GitHub Actions in the past"
section of
the FLIP. Does this also apply to the Apache INFRA setup or can we expect
contributors' runs executed there too? As you mentioned in the FLIP,
there
are
some timeout-related test discrepancies between different setups. Similar
discrepancies could manifest themselves between the Github runners and
the
Apache INFRA runners. It would be great if we should have a uniform
setup,
where if tests pass in the individual CI, they also pass in the main
runner
and
vice versa. Currently we have such memory limits-related issues in
individual
vs main Azure CI pipelines.
2. Disable Flink’s CI bot for PRs if step #1 is considered successful
3. Join trial program for ephemeral GHA runners
Due to potential new kinds of instabilities manifesting themselves in the
new setup,
can we keep both CIs running in parallel and keep relying on the existing
one until
we are confident in the tests stability on the new ephemeral GHA infra
(skip 2.)?
Best,
Alex
On Wed, 29 Nov 2023 at 13:42, Xintong Song <tonysong...@gmail.com>
wrote:
Thanks for the efforts, Matthias.
I think it would be helpful if we can at the end migrate the CI to an
ASF-managed Github Action, as long as it provides us a similar
computation
capacity and stability. Given that the proposal is only to start a
trial
and investigate whether the migration is feasible, I don't see much
concern
in this.
I have only one suggestion and one question.
- Regarding the migration plan, I wonder if we should not disable the
CI
bot until we fully decide to migrate to Github Actions? In case the
nightly
runs don't really work well, it might be debatable whether we should
maintain the CI in two places (i.e. PRs on Github Actions and cron
builds
on Azure).
- What exactly are the changes that would affect contributors during
the
trial period? Is it only an additional CI report that you can
potentially
just ignore? Or would there be some larger impacts, e.g. you cannot
merge a
PR if the Github Action CI is not passed (I don't know, I just made
this
up)?
Best,
Xintong
On Wed, Nov 29, 2023 at 8:07 PM Yuxin Tan <tanyuxinw...@gmail.com>
wrote:
Ok, Thanks for the update and the explanations.
Best,
Yuxin
Matthias Pohl <matthias.p...@aiven.io.invalid> 于2023年11月29日周三
15:43写道:
According to the Flip, the new tests will support arm env.
I believe that's good news for arm users. I have a minor
question here. Will it be a blocker before migrating the new
tests? If not, If not, when can we expect arm environment
support to be implemented? Thanks.
Thanks for your feedback, everyone.
About the ARM support. I want to underline that this FLIP is not
about
migrating to GitHub Actions but to start a trial run in the Apache
Flink
repository. That would allow us to come up with a proper decision
whether
GitHub Actions is what we want. I admit that the title is a bit
"clickbaity". I updated the FLIP's title and its Motivation to make
things
clear.
The FLIP suggests starting a trial period until 1.19 is released
to try
things out. A proper decision on whether we want to migrate would
be
made
at the end of the 1.19 release cycle.
About the ARM support: This related content of the FLIP is entirely
based
on documentation from Apache INFRAs side. INFRA seems to offer
this ARM
support for their ephemeral runners. The ephemeral runners are in
the
testing stage, i.e. these runners are still experimental. Apache
INFRA
asks
Apache projects to join this test.
Whether the ARM support is actually possible to achieve within
Flink is
something we have to figure out as part of the trial run. One
conclusion
of
the trial run could be that we still move ahead with GHA but don't
use
arm
machines due to some blocking issues.
Matthias
On Wed, Nov 29, 2023 at 4:46 AM Yuxin Tan <tanyuxinw...@gmail.com>
wrote:
Hi, Matthias,
Thanks for driving this.
+1 from my side.
According to the Flip, the new tests will support arm env.
I believe that's good news for arm users. I have a minor
question here. Will it be a blocker before migrating the new
tests? If not, If not, when can we expect arm environment
support to be implemented? Thanks.
Best,
Yuxin
Márton Balassi <balassi.mar...@gmail.com> 于2023年11月29日周三
03:09写道:
Thanks, Matthias. Big +1 from me.
On Tue, Nov 28, 2023 at 5:30 PM Matthias Pohl
<matthias.p...@aiven.io.invalid> wrote:
Thanks for the pointer. I'm planning to join that meeting.
On Tue, Nov 28, 2023 at 4:16 PM Etienne Chauchot <
echauc...@apache.org
wrote:
Hi all,
FYI there is the ASF infra roundtable soon. One of the
subjects
for
this
session is GitHub Actions. It could be worth passing by:
December 6th, 2023 at 1700 UTC on the #Roundtablechannel on
Slack.
For information about theroundtables, and about how to
join,
see:https://infra.apache.org/roundtable.html
<https://infra.apache.org/roundtable.html>
Best
Etienne
Le 24/11/2023 à 14:16, Maximilian Michels a écrit :
Thanks for reviving the efforts here Matthias! +1 for the
transition
to GitHub Actions.
As for ASF Infra Jenkins, it works fine. Jenkins is
extremely
feature-rich. Not sure about the spare capacity though. I
know
that
for Apache Beam, Google donated a bunch of servers to get
additional
build capacity.
-Max
On Thu, Nov 23, 2023 at 10:30 AM Matthias Pohl
<matthias.p...@aiven.io.invalid> wrote:
Btw. even though we've been focusing on GitHub Actions
with
this
FLIP,
I'm
curious whether somebody has experience with Apache
Infra's
Jenkins
deployment. The discussion I found about Jenkins [1] is
quite
out-dated
(2014). I haven't worked with it myself but could
imagine
that
there
are
some features provided through plugins which are
missing in
GitHub
Actions.
[1]
https://lists.apache.org/thread/vs81xdhn3q777r7x9k7wd4dyl9kvoqn4
On Tue, Nov 21, 2023 at 4:19 PM Matthias Pohl<
matthias.p...@aiven.io>
wrote:
That's a valid point. I updated the FLIP accordingly:
Currently, the secrets (e.g. for S3 access tokens) are
maintained
by
certain PMC members with access to the corresponding
configuration
in
the
Azure CI project. This responsibility will be moved to
Apache
Infra.
They
are in charge of handling secrets in the Apache
organization.
As a
consequence, updating secrets is becoming a bit more
complicated.
This can
be still considered an improvement from a legal
standpoint
because
the
responsibility is transferred from an individual
company
(i.e.
Ververica
who's the maintainer of the Azure CI project) to the
Apache
Foundation.
On Tue, Nov 21, 2023 at 3:37 PM Martijn Visser<
martijnvis...@apache.org>
wrote:
Hi Matthias,
Thanks for the write-up and for the efforts on this. I
really
hope
that we can move away from Azure towards GHA for a
better
integration
as well (directly seeing if a PR can be merged due to
CI
passing
for
example).
The one thing I'm missing in the FLIP is how we would
setup
the
secrets for the nightly runs (for the S3 tests,
potential
tests
with
external services etc). My guess is we need to
provide the
secret
to
ASF Infra and then we would be able to refer to them
in a
pipeline?
Best regards,
Martijn
On Tue, Nov 21, 2023 at 3:05 PM Matthias Pohl
<matthias.p...@aiven.io.invalid> wrote:
I realized that I mixed up FLIP IDs. FLIP-395 is
already
reserved
[1]. I
switched to FLIP-396 [2] for the sake of
consistency. 8)
[1]
https://lists.apache.org/thread/wjd3nbvg6nt93lb0sd52f0lzls6559tv
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Migration+to+GitHub+Actions
On Tue, Nov 21, 2023 at 2:58 PM Matthias Pohl<
matthias.p...@aiven.io
wrote:
Hi everyone,
The Flink community discussed migrating from Azure
CI to
GitHub
Actions
quite some time ago [1]. The efforts around that
stalled
due
to
limitations
around self-hosted runner support from Apache
Infra’s
side.
There
were some
recent developments on that topic. Apache Infra is
experimenting
with
ephemeral runners now which might enable us to move
ahead
with
GitHub
Actions.
The goal is to join the trial phase for ephemeral
runners
and
experiment
with our CI workflows in terms of stability and
performance.
At
the
end we
can decide whether we want to abandon Azure CI and
move
to
GitHub
Actions
or stick to the former one.
Nico Weidner and Chesnay laid the groundwork on this
topic
in
the
past. I
picked up the work they did and continued
experimenting
with
it
in
my
own
fork XComp/flink [2] the past few weeks. The
workflows
are
in
a
state
where
I think that we start moving the relevant code into
Flink’s
repository.
Example runs for the basic workflow [3] and the
extended
(nightly)
workflow
[4] are provided.
This will bring a few more changes to the Flink
contributors.
That
is
why
I wanted to bring this discussion to the mailing
list
first. I
did a
write
up on (hopefully) all related topics in FLIP-395
[5].
I’m looking forward to your feedback.
Matthias
[1]
https://lists.apache.org/thread/vcyx2nx0mhklqwm827vgykv8pc54gg3k
[2]https://github.com/XComp/flink/actions
[3]
https://github.com/XComp/flink/actions/runs/6926309782
[4]
https://github.com/XComp/flink/actions/runs/6927443941
[5]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-395%3A+Migration+to+GitHub+Actions
--
[image: Aiven]<https://www.aiven.io>
*Matthias Pohl*
Opensource Software Engineer, *Aiven*
matthias.p...@aiven.io <i...@aiven.io> | +49
170
9869525
aiven.io<https://www.aiven.io> |
<https://www.facebook.com/aivencloud>
<https://www.linkedin.com/company/aiven/> <
https://twitter.com/aiven_io>
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B