Microsoft has now doubled our CI capacity (to 20 concurrent VMs for executing e2e tests). If the e2e test execution is normalized tomorrow, I will revert the hotfix, enabling e2e tests on PRs again.
Sorry for the back and forth. On Tue, May 19, 2020 at 3:11 PM Robert Metzger <rmetz...@apache.org> wrote: > Microsoft has not increased our capacity yet (even though it was promised > to me yesterday again). > > I have now merged a hotfix disabling the e2e test execution on pull > requests to have enough capacity on master. > Please run e2e tests using your private Azure accounts. Thanks for your > understanding! > > Best, > Robert > > > On Thu, May 14, 2020 at 11:23 AM Robert Metzger <rmetz...@apache.org> > wrote: > >> Roughly speaking, I see the following problematic areas (I have initially >> tried running the E2E tests on those machines): >> >> a) e2e tests starting Docker images (including Kubernetes). Since the >> tests on the Ali infra are running in docker themselves, we need to adjust >> the test scripts (which is not trivial, because both containers need to be >> in the same network, and the volume mount paths are different) >> >> b) tests that modify the underlying file system: common_kubernetes.sh >> installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a >> problem in the docker environment). >> >> c) Tests that don't clean up properly when failing. IIRC I saw leftover >> docker containers by test_streaming_kinesis.sh when I was trying to run the >> E2E tests on the Ali machines. >> >> And then there pull requests that propose changes to the e2e scripts that >> mess something up :) >> We certainly need to isolate the e2e test execution somehow. Maybe we >> could launch VMs on the Ali machines for running the E2Es? (Using Vagrant) >> >> If Microsoft is not going to provide us with more test capacity, I will >> evaluate other options for the E2E tests. >> >> >> On Thu, May 14, 2020 at 10:36 AM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> Thanks for the update Robert. >>> >>> One idea to make the e2e also run on the Alibaba infrastructure would be >>> to >>> ensure that e2e tests clean up after they have run. Do we know which e2e >>> tests don't do this properly? >>> >>> Cheers, >>> Till >>> >>> On Thu, May 14, 2020 at 8:38 AM Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>> > Hi all, >>> > >>> > tl;dr: I will have to cancel some E2E test executions of pull requests >>> > because we have reached the capacity limit of Flink's Azure Pipelines >>> > account. >>> > >>> > Long version: We have two types of agent pools in Azure Pipelines: >>> > Microsoft-hosted VMs and Alibaba-hosted Docker environment. >>> > In the Microsoft VMs, we are running the E2E tests, because we have an >>> > environment that will always be destroyed after each execution (and >>> the E2E >>> > tests often leave dangling docker containers, processes etc.; and they >>> > modify files in system directories) >>> > In the Alibaba-hosted Docker environment, we are compiling and testing >>> the >>> > regular Maven tests. >>> > >>> > We only have 10 Microsoft-hosted VMs available, and each E2E execution >>> > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E >>> > tests a day. >>> > On Tuesday, we had 110 builds, on Wednesday 98 builds. >>> > Because of this, I will (manually) cancel some E2E test executions for >>> pull >>> > requests. If I see that a PR is explicitly changing something on E2E >>> tests, >>> > I will keep it. If I see that a PR is a docs change, has other test >>> > failures etc., I will cancel the E2E execution. >>> > >>> > If you want to verify that the E2E tests are passing for your own >>> changes, >>> > you can set up Azure Pipelines for your GitHub account, it's free and >>> works >>> > quite well. Here's a tutorial: >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository >>> > >>> > What can we do to avoid this situation in the future? >>> > Sadly, Microsoft does not allow to buy additional processing slots for >>> open >>> > source projects [1]. However, I'm in touch with a product manager at >>> > Microsoft who promised me (yesterday) to increase the limit for us. >>> > >>> > In the Alibaba environment, we have 80 slots available, and usually no >>> > capacity constraints. This means we don't need to make compromises >>> there. >>> > >>> > Sorry for this inconvenience. >>> > >>> > Best, >>> > Robert >>> > >>> > PS: I'm considering keeping this thread as a permanent "status update" >>> > thread for Azure Pipelines >>> > >>> > [1] >>> > >>> > >>> https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html >>> > >>> >>