Currently we run 16 different permutations of end-to-end (e2e) tests on every 
PR pushed to yunikorn-k8shim in GitHub. The test matrix consists of two 
scheduler profiles (default + plugin mode) multiplied by all major Kubernetes 
supported releases (8x, from 1.24.x to 1.31.x currently). This leads to a large 
cost to Apache for the GitHub resources used as well as longer build times.

I propose that we simply our testing matrix for PRs in two ways. First, now 
that the plugin deployment model is deprecated for removal going forward, we 
can eliminate the plugin deployment form factor. This alone would reduce the 
number of e2e test runs per PR from 16 to 8. Second, as we support so many 
different Kubernetes releases, we can reduce our testing load by adopting a 
policy of (3 newest + 1 oldest) releases. This would give us v1.31.0, v1.30.4, 
v1.29.8, and v1.24.17 currently. The reasoning behind this is to keep the e2e 
tests runs to a minimum, while maximizing our ability to detect regressions 
quickly. The likelihood of a bug affecting an intermediate version (currently 
1.25 through 1.28) but *not* either an old or new version of Kubernetes is 
vanishingly low -- low enough to not make testing against all the releases 
worth it.

To ensure that every combination still gets tested regularly, I propose that we 
adopt a weekly cron job to run the full 8x2 matrix of deployments against the 
master branch. This ensures that we still cover all possible formats at regular 
intervals before release, but also that we don't waste resources on every PR.

The overall impact of making these changes will reduce our e2e test load on PRs 
by 75%.

Please, let's discuss here further.


Craig Condit


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to