Re: External CI Service Limitations

Jarek Potiuk Tue, 23 Jul 2019 00:52:54 -0700

Just for your information Greg - we are now experiencing another issue with
Travis CI. Suddenly our jobs are resource-strapped. Our builds started to
fail with Out of Memory errors and Not enough CPU (to run minikube). I
opened a critical infrastructure ticket -
https://issues.apache.org/jira/browse/INFRA-18787 , I disabled automated PR
building (they are all failing anyway) and remove the limit on number of
concurrent jobs temporarily so that I can test potential solutions or
retest it if Travis fixes the problem (big if).


Now we are REALLY blocked.

J.

On Mon, Jul 22, 2019 at 3:40 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Harsh "no" indeed - I understand where it comes from as I am looking from
> my project perspective.
>
> But maybe instead of binary yes/no answer - we can think about some
> compromise/temporary solution (for a week or so while we run the release
> and try to migrate out to another CI).
> Maybe Increasing just our project's capacity (if possible selectively) to
> 10 or 15 for a week might help us in the event of the coming release and
> migration to another CI.
>
> I feel really sad and angry about it, that in this case Apache Inra is not
> a bit more helpful.
>
> Seems that we are suddenly - without a real warning - being punished by
> having very well tested code with good engineering practices (small
> commits, one commit per PR), strong expectations that every commit goes
> through extensive full testing on all environments and very active
> community with a lot of contributors. Our workflow really depends on those
> tests to work and such sudden limitation is not really nice signal to the
> community.
>
> We do whatever we can to limit the pressure on Travis:
>
>    - We already decreased the build matrix to very limited set of tests.
>    - I've already asked all contributors to run most of the testing - on
>    static analysis especially -  locally (and we even merged a big change last
>    week to make it super-easy).
>    - We are adding pre-commit hook framework to make it fully automated
>    and even easier for the contributors to run those checks before they hit
>    Travis
>    - We are going to merge some of those jobs in one so that they limit
>    number of jobs.
>    - We merged a change that makes it easy to migrate out of Travis
>    - We already are in a process of moving to the GKE-provided
>    infrastructure, we initiated discussions with GitLabCI and we managed to
>    secure resources from Google in GCP to run our workflows. I hope I can have
>    a working POC this week and be able to migrate soon after.
>
> The problem is that with the current limitation even our effort to move
> out is hampered because we have a number of changes pending to be able to
> finally migrate but it takes hours for those changes to even go through,
> not mentioning merging.
> Is there anything Apache Infra can help with it?
>
>
> J.
>
> On Mon, Jul 22, 2019 at 2:31 PM Greg Stein <gst...@gmail.com> wrote:
>
>> On Mon, Jul 22, 2019 at 2:20 AM Jarek Potiuk <jarek.pot...@polidea.com>
>> wrote:
>>
>> > Hello Everyone (especially the infrastructure),
>> >
>> > Can we increase a number of workers/jobs we have per project now?
>> > Decreasing it to 5 (which I believe is the case) is terrible for us now
>> > We are nearing  1.10.4 release with Airflow and if we have more than
>> one PR
>> > in the queue it waits for several hours to run!
>> >
>> > Can we increase the limits to 15 or so (3 parallell builds for Airflow
>> as
>> > we are running 5 jobs per stage).
>> >
>>
>> Sorry to say, but "no". Travis is a *shared* resource.
>>
>> As noted elsethread, before we applied limits, Airflow used about 77,000
>> minutes in a month. That is tying up two executors full-time for the
>> entire
>> month. We have a hundred projects using Travis, and Airflow consumed a
>> 20th
>> of our entire capacity.
>>
>> The limit for all projects shall remain at five (5).
>>
>> You can always run your tests locally, to prepare for your upcoming
>> release. The Foundation's paid resources need to remain shared/available
>> for all of our projects.
>>
>> Regards,
>> Greg Stein
>> Infrastructure Administrator, ASF
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: External CI Service Limitations

Reply via email to