Re: Probable CF bot degradation

Andres Freund Sun, 20 Mar 2022 17:18:12 -0700

Hi,

On 2022-03-21 12:23:02 +1300, Thomas Munro wrote:
> It was set to try to recheck every ~48 hours, though it couldn't quite
> always achieve that when the total number of eligible submissions is
> too large.  In this case it had stalled for too long after the github
> outage, which I'm going to try to improve.  The reason for the 48+
> hour cycle is the Windows tests now take ~25 minutes (since we started
> actually running all the tests on that platform)


I see 26-28 minutes regularly :(. And that doesn't even include the "boot
time" of the test of around 3-4min, which is quite a bit higher for windows
than for the other OSs.


> and we could only
> have two Windows tasts running at a time in practice, because the
> limit for Windows was 8 CPUs, and we use 4 for each task, which means
> we could only test ~115 branches per day, or actually a shade fewer
> because it's pretty dumb and only wakes up once a minute to decide
> what to do, and we currently have 242 submissions (though some don't
> apply, those are free, so the number varies over time...).  There are
> limits on the Unixes too but they are more generous, and the Unix
> tests only take 4-10 minutes, so we can ignore that for now, it's all
> down to Windows.

I wonder if it's worth using the number of concurrently running windows tasks
as the limit, rather than the number of commits being tested
concurrently. It's not rare for windows to fail more quickly than other
OSs. But probably the 4 concurrent tests are good enough for now...

I'd love to merge the patch adding mingw CI testing, which'd increase the
pressure substantially :/


> I had been meaning to stump up the USD$10/month it costs to double the
> CPU limits from the basic free Cirrus account, and I've just now done
> that and told cfbot it's allowed to test 4 branches at once and to try
> to test every branch every 24 hours.  Let's see how that goes.

Yay.


> Here's hoping we can cut down the time it takes to run the tests on
> Windows... there's some really dumb stuff happening there.  Top items
> I'm aware of:  (1) general lack of test concurrency, (2) exec'ing new
> backends is glacially slow on that OS but we do it for every SQL
> statement in the TAP tests and every regression test script (I have
> some patches for this to share after the code freeze).

3) build is quite slow and has no caching


With meson the difference of 1, 3 is quite visible. Look at
https://cirrus-ci.com/build/5265480968568832

current buildsystem: 28:07 min
meson w/ msbuild: 22:21 min
meson w/ ninja: 19:24

meson runs quite a few tests that the "current buildsystem" doesn't, so the
win is actually bigger than the time difference indicates...


Greetings,

Andres Freund

Re: Probable CF bot degradation

Reply via email to