Our pools of test slaves are often at or over capacity, and this has the effect of increasing job coalescing and test wait times. This, in turn, can lead to longer tree closures caused by test bustage, and can cause try runs to be very slow to complete.

One of the easiest ways to mitigate this is to run tests less often.

To assess the impact of doing this, we will be performing an experiment the week of August 25, in which we will run debug tests on mozilla-inbound on most desktop platforms every other run, instead of every run as we do now. Debug tests on linux64 will continue to run every time. Non-desktop platforms and trees other than mozilla-inbound will not be affected.

This approach is based on the premise that the number of debug-only platform-specific failures on desktop is low enough to be manageable, and that the extra burden this imposes on the sheriffs will be small enough compared to the improvement in test slave metrics to justify the cost.

While this experiment is in progress, we will be monitoring job coalescing and test wait times, as well as impacts on sheriffs and developers. If the experiment causes sheriffs to be unable to perform their job effectively, it can be terminated prematurely.

We intend to use the data we collect during the experiment to inform decisions about additional tooling we need to make this or a similar plan permanent at some point in the future, as well as validating the premise on which this experiment is based.

After the conclusion of this experiment, a follow-up post will be made which will discuss our findings. If you have any concerns, feel free to reach out to me.

Jonathan

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to