Microsoft has now doubled our CI capacity (to 20 concurrent VMs for
executing e2e tests).
If the e2e test execution is normalized tomorrow, I will revert the hotfix,
enabling e2e tests on PRs again.

Sorry for the back and forth.

On Tue, May 19, 2020 at 3:11 PM Robert Metzger <rmetz...@apache.org> wrote:

> Microsoft has not increased our capacity yet (even though it was promised
> to me yesterday again).
>
> I have now merged a hotfix disabling the e2e test execution on pull
> requests to have enough capacity on master.
> Please run e2e tests using your private Azure accounts. Thanks for your
> understanding!
>
> Best,
> Robert
>
>
> On Thu, May 14, 2020 at 11:23 AM Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Roughly speaking, I see the following problematic areas (I have initially
>> tried running the E2E tests on those machines):
>>
>> a) e2e tests starting Docker images (including Kubernetes). Since the
>> tests on the Ali infra are running in docker themselves, we need to adjust
>> the test scripts (which is not trivial, because both containers need to be
>> in the same network, and the volume mount paths are different)
>>
>> b) tests that modify the underlying file system: common_kubernetes.sh
>> installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a
>> problem in the docker environment).
>>
>> c) Tests that don't clean up properly when failing. IIRC I saw leftover
>> docker containers by test_streaming_kinesis.sh when I was trying to run the
>> E2E tests on the Ali machines.
>>
>> And then there pull requests that propose changes to the e2e scripts that
>> mess something up :)
>> We certainly need to isolate the e2e test execution somehow. Maybe we
>> could launch VMs on the Ali machines for running the E2Es? (Using Vagrant)
>>
>> If Microsoft is not going to provide us with more test capacity, I will
>> evaluate other options for the E2E tests.
>>
>>
>> On Thu, May 14, 2020 at 10:36 AM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> Thanks for the update Robert.
>>>
>>> One idea to make the e2e also run on the Alibaba infrastructure would be
>>> to
>>> ensure that e2e tests clean up after they have run. Do we know which e2e
>>> tests don't do this properly?
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, May 14, 2020 at 8:38 AM Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > tl;dr: I will have to cancel some E2E test executions of pull requests
>>> > because we have reached the capacity limit of Flink's Azure Pipelines
>>> > account.
>>> >
>>> > Long version: We have two types of agent pools in Azure Pipelines:
>>> > Microsoft-hosted VMs and Alibaba-hosted Docker environment.
>>> > In the Microsoft VMs, we are running the E2E tests, because we have an
>>> > environment that will always be destroyed after each execution (and
>>> the E2E
>>> > tests often leave dangling docker containers, processes etc.; and they
>>> > modify files in system directories)
>>> > In the Alibaba-hosted Docker environment, we are compiling and testing
>>> the
>>> > regular Maven tests.
>>> >
>>> > We only have 10 Microsoft-hosted VMs available, and each E2E execution
>>> > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E
>>> > tests a day.
>>> > On Tuesday, we had 110 builds, on Wednesday 98 builds.
>>> > Because of this, I will (manually) cancel some E2E test executions for
>>> pull
>>> > requests. If I see that a PR is explicitly changing something on E2E
>>> tests,
>>> > I will keep it. If I see that a PR is a docs change, has other test
>>> > failures etc., I will cancel the E2E execution.
>>> >
>>> > If you want to verify that the E2E tests are passing for your own
>>> changes,
>>> > you can set up Azure Pipelines for your GitHub account, it's free and
>>> works
>>> > quite well. Here's a tutorial:
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository
>>> >
>>> > What can we do to avoid this situation in the future?
>>> > Sadly, Microsoft does not allow to buy additional processing slots for
>>> open
>>> > source projects [1]. However, I'm in touch with a product manager at
>>> > Microsoft who promised me (yesterday) to increase the limit for us.
>>> >
>>> > In the Alibaba environment, we have 80 slots available, and usually no
>>> > capacity constraints. This means we don't need to make compromises
>>> there.
>>> >
>>> > Sorry for this inconvenience.
>>> >
>>> > Best,
>>> > Robert
>>> >
>>> > PS: I'm considering keeping this thread as a permanent "status update"
>>> > thread for Azure Pipelines
>>> >
>>> > [1]
>>> >
>>> >
>>> https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html
>>> >
>>>
>>

Reply via email to