Re: [PROPOSAL] Dealing with public runner test failues (Integration tests restructuring)

Jarek Potiuk Thu, 08 Dec 2022 00:46:40 -0800

Merged. I hope things will be more stable for quite a while :). Let me know
if you see some instabilities - there are still at least a few occasional
flaky tests, but they should be few and far between and I hope we can
efficiently get rid of them now.


J.


On Thu, Dec 8, 2022 at 2:05 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Two PRs merged. Two more to go:
>
> * https://github.com/apache/airflow/pull/28209
> * https://github.com/apache/airflow/pull/28207
>
> I run quite a few public runs and I have not seen more memory problems :)
> - so once we merge those, we should be back in green for Public runners as
> well (plus the builds should be a bit faster).
>
> J.
>
>
> On Wed, Dec 7, 2022 at 6:36 PM Oliveira, Niko <oniko...@amazon.com.invalid>
> wrote:
>
>> Awesome to hear this!
>>
>> I was really battling this issue last week, very excited for these
>> improvements, let me know if I can help.
>>
>> Cheers,
>> Niko
>> ------------------------------
>> *From:* Jarek Potiuk <ja...@potiuk.com>
>> *Sent:* Tuesday, December 6, 2022 5:54:07 AM
>> *To:* dev@airflow.apache.org
>> *Subject:* [EXTERNAL] [PROPOSAL] Dealing with public runner test failues
>> (Integration tests restructuring)
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>> Hey everyone,
>>
>> I think many contributors (non-committers) started to suffer from
>> often failing (disappearing) test runs (mostly for sqlite).
>>
>> Together with @Taragolis, we looked at those recent stability issues
>> with "public runners". They all boil down to the integration tests
>> taking too much memory.
>>
>> Example screenshot from a debug run that I run when trying to "catch
>> the problem in the act" with debugging enabled is attached. Seems that
>> just before such failure we had just 55 M (out of 7G available in the
>> public runners) - just before the runner "disappeared". Looks like the
>> writing is on the wall.
>>
>> There are two ways we will be addressing it shortly (unless someone
>> objects or have more/ other ideas to improve it):
>>
>> 1. Improving the ways how integration tests are structured and running
>>
>> * We will reorganize our integration tests to be (similar to system
>> tests) in a separate subfolder of the "tests' ' - this will allow for
>> easier discovery and a better structured approach to all integration
>> tests.
>>
>> * We will STOP running integration tests in regular test jobs of ours.
>> Instead we will introduce a separate "Integration Test" job that will
>> run only integration tests and that will run the integrations
>> ``one-by-one" - i.e. we will not be starting kerberos, mongo, redis
>> all together, but will only start minimal set of integrations needed
>> for the tests that are using them
>>
>> 2. Arranging for bigger public runners
>>
>> I am discussing - in the Apache Infrastructure meetings - (next
>> meeting is on Wednesday) using more powerful Public runners. This is
>> possible, and we just need to make sure INFRA/Apache is not overusing
>> the free runners the Apache Software Foundation gets as a generous
>> sponsorship from GitHub. This might actually vastly decrease the
>> feedback time you get as non-committers as we can get up to 4x times
>> faster builds this way.
>>
>> J.
>>
>

Re: [PROPOSAL] Dealing with public runner test failues (Integration tests restructuring)

Reply via email to