Hello all, As we near 57 the Firefox CI group felt it was important to send out a bit of a reminder regarding infrastructure usage when you push.
*tl;dr* There is a real cost (both time and $) to using the 'all' flags in pushes. They are there if you need them, but please remember to think about what platforms and test suites you need to execute before you push, and limit the scope of execution if you can. A bit of background, our build and test infrastructure is a mix of physical hardware and AWS cloud instances. AWS scales dynamically to our load, but our physical hardware is limited. Occasionally you might see wait times and queues build up, this is typically due to our hardware being overwhelmed. When it gets really bad, we sometimes have to close the trees to allow the machines to catch up. Obviously, that's not good for anyone. Specifically, over the last few weeks we have seen a few long backlogs on our OSX machines, once requiring tree closure. We never want to have to close trees, it's a last resort, especially this close to beta. Because of the physical hardware limitation, this is particularly concerning for performance tests and tests that run on OSX (OSX builds are now cross-compiled on Linux and not really affected). If you don't need to run perf or OSX tests, please consider excluding them from your pushes. ahal sent mail a few weeks ago about the new fuzzy <https://ahal.ca/blog/2017/mach-try-fuzzy/> matching tool, which can be useful here to help you figure out what to select. To give you an idea of scale, we average 1000 pushes per week on integration branches (excluding try). Our desktop tests alone (excluding numbers for android, build jobs, and a handful of others) use roughly 900 machine hours per push. 900k machine hours per week combined. Including try and those other configurations you can roughly double these numbers. Needless to say that's a lot of machine time, and so any savings we can get can really add up. We are continuously monitoring our capacity requirements for today and for the future (new platforms, updated OSes, new experiments, new tests, etc). But it's a dynamic problem, and sometimes things pile up. While we accept that today, it's a problem we want to further limit in the future. There are a lot of interesting things we're working on here, such as selective test execution, intermittent reduction strategies, smarter tooling, and smarter infrastructure allocation that will hopefully go a long way to reducing these issues. We'll continue to update everyone here as we make those improvements. In the mean time, just a reminder to be diligent with what platforms and test suites you are running. If you have any questions feel free to reach out. Thanks! _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform