Re: dealing with tester timeout in a CI job

Antoine Pitrou Wed, 17 Aug 2022 07:23:19 -0700


Look for ARROW_SCOPED_TRACE

Le 17/08/2022 à 16:22, Yaron Gvili a écrit :

There are no sleeps nor deadlocks; it's just due to a large configuration-space 
that I agree can be reduced by sampling. Could you explain how to use 
SCOPED_TEST, or refer to documentation about it? I understand your idea, just 
looking for an example use of SCOPED_TEST.


Yaron.
________________________________
From: Weston Pace <weston.p...@gmail.com>
Sent: Wednesday, August 17, 2022 10:05 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: dealing with tester timeout in a CI job

My first suspicion on a test timeout is usually a deadlock.  That
being said, I haven't looked at this test / change in any real detail
so I don't know if that's the case here.  How long does the test take
to run locally?

Second, I would try and remove sleeps, and make sure to use the
utilities SleepABit and SleepABitAsync (which handle very tiny sleeps
much better on Windows) but it doesn't look like that is the case
here.

If there is no deadlock, and there is no sleep, and your test is
simply burning CPU for 5 minutes, then yes I think it is probably time
to reduce the configuration space.  Can you sample the configuration
space with a random seed (make sure to use SCOPED_TRACE to track both
the seed and the case under test so that if there is a failure it can
be reproduced.)?  CI runs quite often so if there is a failure on any
particular case it should still pop up reasonably soon.

Finally, if the configuration space can't be reduced for whatever
reason, then I think we could potentially investigate some kind of
nightly (crossbow) test with a longer timeout but I don't know that
we've had to resort to that yet.

On Wed, Aug 17, 2022 at 3:41 AM Yaron Gvili <rt...@hotmail.com> wrote:


It looks like the test normally takes less than a second. The gap in 
running-time is not surprising because the tests I locally added cover a much 
larger configuration-space. Before I reduce the configuration-space being 
tested, I'd like to figure out what the acceptable alternatives are.


Yaron.
________________________________
From: Li Jin <ice.xell...@gmail.com>
Sent: Wednesday, August 17, 2022 9:04 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: dealing with tester timeout in a CI job

Yaron, how does the asof join tests normally take?

On Wed, Aug 17, 2022 at 6:13 AM Yaron Gvili <rt...@hotmail.com> wrote:

Sorry, yes, C++. The failed job is
https://github.com/apache/arrow/runs/7839062613?check_suite_focus=true
and it timed out on code I wrote (in a PR, not merged). I'd like to avoid a
timeout without reengineering or reducing the set of tests I wrote, hence
my questions.


Yaron.
________________________________
From: Sutou Kouhei <k...@clear-code.com>
Sent: Tuesday, August 16, 2022 8:13 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: dealing with tester timeout in a CI job

Hi,

What language are you talking about? C++?
For C++, we have two timeouts:
* GitHub Action's timeout
* GTest's timeout

Could you show the URL of the failed macOS related CI job?

Thanks,
--
kou

In
  <
paxp190mb1565310e470e696da667f540bd...@paxp190mb1565.eurp190.prod.outlook.com

   "dealing with tester timeout in a CI job" on Tue, 16 Aug 2022 16:34:24
+0000,
   Yaron Gvili <rt...@hotmail.com> wrote:

Hi,

What are some acceptable ways to handle a timeout failure in a CI job

for a tester I implemented? For reference, I got such a timeout for only
one MacOS related CI job, while the other CI jobs did not get such a
timeout.


Let's assume that I cannot (easily) make the tests run any faster. Is it

possible/acceptable to change the timeout, and how? to turn off some of the
tests for one or all CI jobs, and how? to split the tester into several, so
that each meets the timeout allotment?



Cheers,
Yaron.

Re: dealing with tester timeout in a CI job

Reply via email to