Hi all, as several people know by now, we are planning to move from Azure CI to Github Actions. This is motivated by (not an exhaustive list): - Not needing to mirror the repo anymore for CI - Improving the contributor experience, especially for new contributors - GHA development being more active than Azure CI development
In case someone wants to check out the current version of the planned GHA workflow, you can find it here: https://github.com/ververica/flink/blob/master/.github/workflows/hadoop-2.8.3-scala-2.12-workflow.yml Past runs can be seen here: https://github.com/ververica/flink/actions (lots of red, but this is almost always not due to the workflow) I want to put a draft for the migration roadmap up for discussion. It's divided into several phases: *Phase 1: *GHA activated on master (but not required) - A single CI machine is converted to run GHA runners (instead of Azure runners) and runs the workflow on pushes to master - Azure CI remains unchanged and is still the source of truth - We can compare runtimes and behavior/failures - Timeframe: 2 weeks *Phase 2: *Additional features - Any additional functionality that we want to add to GHA is added (e.g. not running the workflow if workflow files were modified) - Functionality from FlinkCIBot that we want to keep is ported over (syncing with the mirror repo can be dropped, but there are some automated checks that we want to keep) - We can monitor whether performance is impacted by any change - Timeframe: 2 weeks *Phase 3: *Cron jobs and (some) PR triggers run on GHA - GHA cron builds activated (for master and release branches) - Note: Includes some backports to all affected branches, else the workflows won’t run: https://stackoverflow.com/questions/61989951/github-action-workflow-not-running/61992817#61992817 - GHA builds run for PRs of select committers (the idea is to try out builds for all the intended trigger conditions) - Timeframe: 1 week *Up to this point, the existing CI pipeline is mostly unaffected - we only took away one CI machine.* *Phase 4: *Full switch to GHA - Set up GHA runners on all machines - GHA builds are activated for all PRs - Either Azure or GHA build is required - GHA runners are activated, Azure runners are deactivated (but not yet removed) apart from 1 machine (for stragglers) - Azure cron jobs are disabled, but kept around in case we need to revert - Timeframe: 1-2 weeks *Phase 5: *Removal of Azure CI leftovers - Only after we are satisfied that GHA is stable (at least 1 month after the switch, can be longer) - Green GHA build is required from now on - Stale PRs that don't have a GHA run will have to trigger a new one (but they would most likely have to rebase anyway...) - (old) FlinkCIBot is disabled - Azure yamls are deleted - Azure runners are removed from machines Timing-wise, the full switch to GHA should happen during a quiet time, far away from a release. The remaining phases shouldn't have much impact, but right before a release is not a good moment, of course. Please give us your thoughts and point out anything we missed or that doesn't seem to make sense! Best, Nico