hi folks,

There has periodically been a discussion about employing dedicated
compute resources to serve our testing needs beyond what can be
accomplished in free / public CI services like GitHub Actions,
Appveyor, etc. For example:

* Workloads requiring a CUDA-capable GPU
* Tests requiring a lot of memory
* ARM architecture

While physical machines can be hooked up to some CI/CD services like
Github Actions and Buildkite, I believe we should not be 100%
dependent on the availability of such hardware (the recent tornado in
Nashville is a good example of what can go wrong).

At some point it will make sense to be able to provision cloud hosts
(either temporary spot instances or persistent nodes) to meet these
needs. This brings up several questions:

* Who's going to pay for it? Perhaps Amazon, Google, or Microsoft can
donate cloud compute credits to the project
* What kind of devops tooling would be appropriate to provision and
manage the instances, scaling up and down based on need?
* What CI/CD platform would be appropriate to dispatch work to the
cloud nodes (taking into consideration the high costs of sysadmin, and
seeking to minimize nodes sitting unused)?

This will probably take time to work out and there is significant
engineering involved in achieving any solution, but it would be good
to have all the options on the table with a frank analysis of the
pros/cons and costs (both in money and volunteer time) involved.

Thanks,
Wes

Reply via email to