Hi Nico, Thanks a lot for drafting the proposal. I really like the fully-fledged phasing model. All in all, I am +1 to move away from azure and can only second all the points you have mentioned.
I only want to clarify one point. So far my understanding was that the GHA resources are managed on a GitHub organizational level in contrast to Azure pipelines where projects have certain resources. What happens if more and more projects inside the Apache Github organization migrate to GHA? Will this affect the build queue time? Best, Fabian On Thu, Dec 16, 2021 at 3:59 PM Nicolaus Weidner <nicolaus.weid...@ververica.com> wrote: > > Hi all, > > as several people know by now, we are planning to move from Azure CI to > Github Actions. This is motivated by (not an exhaustive list): > - Not needing to mirror the repo anymore for CI > - Improving the contributor experience, especially for new contributors > - GHA development being more active than Azure CI development > > In case someone wants to check out the current version of the planned GHA > workflow, you can find it here: > https://github.com/ververica/flink/blob/master/.github/workflows/hadoop-2.8.3-scala-2.12-workflow.yml > Past runs can be seen here: https://github.com/ververica/flink/actions (lots > of red, but this is almost always not due to the workflow) > > I want to put a draft for the migration roadmap up for discussion. It's > divided into several phases: > > *Phase 1: *GHA activated on master (but not required) > - A single CI machine is converted to run GHA runners (instead of Azure > runners) and runs the workflow on pushes to master > - Azure CI remains unchanged and is still the source of truth > - We can compare runtimes and behavior/failures > - Timeframe: 2 weeks > > *Phase 2: *Additional features > - Any additional functionality that we want to add to GHA is added (e.g. > not running the workflow if workflow files were modified) > - Functionality from FlinkCIBot that we want to keep is ported over > (syncing with the mirror repo can be dropped, but there are some automated > checks that we want to keep) > - We can monitor whether performance is impacted by any change > - Timeframe: 2 weeks > > *Phase 3: *Cron jobs and (some) PR triggers run on GHA > - GHA cron builds activated (for master and release branches) > - Note: Includes some backports to all affected branches, else the > workflows won’t run: > https://stackoverflow.com/questions/61989951/github-action-workflow-not-running/61992817#61992817 > - GHA builds run for PRs of select committers (the idea is to try out > builds for all the intended trigger conditions) > - Timeframe: 1 week > > *Up to this point, the existing CI pipeline is mostly unaffected - we only > took away one CI machine.* > > *Phase 4: *Full switch to GHA > - Set up GHA runners on all machines > - GHA builds are activated for all PRs > - Either Azure or GHA build is required > - GHA runners are activated, Azure runners are deactivated (but not yet > removed) apart from 1 machine (for stragglers) > - Azure cron jobs are disabled, but kept around in case we need to revert > - Timeframe: 1-2 weeks > > *Phase 5: *Removal of Azure CI leftovers > - Only after we are satisfied that GHA is stable (at least 1 month after > the switch, can be longer) > - Green GHA build is required from now on > - Stale PRs that don't have a GHA run will have to trigger a new one (but > they would most likely have to rebase anyway...) > - (old) FlinkCIBot is disabled > - Azure yamls are deleted > - Azure runners are removed from machines > > > Timing-wise, the full switch to GHA should happen during a quiet time, far > away from a release. The remaining phases shouldn't have much impact, but > right before a release is not a good moment, of course. > Please give us your thoughts and point out anything we missed or that > doesn't seem to make sense! > > Best, > Nico