Currently we run 16 different permutations of end-to-end (e2e) tests on every PR pushed to yunikorn-k8shim in GitHub. The test matrix consists of two scheduler profiles (default + plugin mode) multiplied by all major Kubernetes supported releases (8x, from 1.24.x to 1.31.x currently). This leads to a large cost to Apache for the GitHub resources used as well as longer build times.
I propose that we simply our testing matrix for PRs in two ways. First, now that the plugin deployment model is deprecated for removal going forward, we can eliminate the plugin deployment form factor. This alone would reduce the number of e2e test runs per PR from 16 to 8. Second, as we support so many different Kubernetes releases, we can reduce our testing load by adopting a policy of (3 newest + 1 oldest) releases. This would give us v1.31.0, v1.30.4, v1.29.8, and v1.24.17 currently. The reasoning behind this is to keep the e2e tests runs to a minimum, while maximizing our ability to detect regressions quickly. The likelihood of a bug affecting an intermediate version (currently 1.25 through 1.28) but *not* either an old or new version of Kubernetes is vanishingly low -- low enough to not make testing against all the releases worth it. To ensure that every combination still gets tested regularly, I propose that we adopt a weekly cron job to run the full 8x2 matrix of deployments against the master branch. This ensures that we still cover all possible formats at regular intervals before release, but also that we don't waste resources on every PR. The overall impact of making these changes will reduce our e2e test load on PRs by 75%. Please, let's discuss here further. Craig Condit --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
