We also experience huge delays for Airflow (seems that we are the third "whale" according to https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E)
We are evaluating other options for funding as well (including getting some credits from Google for Google Cloud Build / GCP) but it will take time to get resources and to switch. In the meantime maybe INFRA can help to coordinate some effort between Flink/Arrow/Airflow to decrease pressure on Travis? We considered few options (and are going to implement some of them shortly I think). Some of them are not direct changes in Travis CI builds but some other workflow/infrastructure changes that will decrease pressure on Travis: * We are going to decrease the matrix of builds we run - currently we have several combinations of Airflow builds (postgres/mysql/sqlite) x (python3.5/ python 3.6) - but we will only run subset of those rather than full matrix * we are going to combine several of our jobs into one using parallel processing. This is mainly for static code analysis - currently we have one job for each analysis which makes them run in parallel. After the change - when you include machine boot times and use all processors, the overall build time might be even faster than today - AND there will be far less vms to start for the builds. * we have separate kubernetes-related job. It currently runs only one suite of tests specific to Kubernetes as it requires special setup of the environment, but we are looking into possibility of merging Kubernetes tests into main tests (and faster environment setup with docker-compose) and save 1 job (25% of our test jobs). The main jobs will run a bit longer, but the whole overhead for starting extra job will be gone. * We introduce (PR is in the final stages of review) an easy way for contributors to run static code analysis on their own environment. A lot of our builds are PR failing because of static code analysis that is run on Travis. Currently it was a bit convoluted and not easily reproducible to run full analysis locally , but we are moving to a fully dockerised setup for builds that will allow contributors to easily run such checks on their machines and we will encourage people to run it locally, rather than submit PRs just to check if the code is right. * Even more - we are introducing and encouraging easy-to-use "pre-commit" framework in our developer workflow where the analysis will be run at commit time for only the changes being committed - this might further decrease the number of builds submitted by the contributors. * Lastly - we are introducing an easy to use "simplified development environment" where developers will be able to run all or subset of test suites easily on their machine. Currently our setup is fairly convoluted as well but we have a PR in progress to address it and have a very easy way (again - fully dockerised) to reproduce the test environment. Maybe the committers from Flink and Arrow can also take a look at non-obvious ways how their projects can decrease pressure on Travis (at least for the time being). Maybe there are some quick wins we can apply in short time in coordinated way and buy more time for switching the infrastructure ?