Re: [PATCH v3] test/service: fix spurious failures by extending timeout

Thomas Monjalon Fri, 03 Feb 2023 07:16:24 -0800

03/02/2023 16:03, Van Haaren, Harry:
> From: Van Haaren, Harry
> > > The timeout approach just does not have its place in a functional test.
> > > Either this test is rewritten, or it must go to the performance tests
> > > list so that we stop getting false positives.
> > > Can you work on this?
> > 
> > I'll investigate various approaches on Thursday and reply here with 
> > suggested
> > next steps.
> 
> I've identified 3 checks that fail in CI (from the above log outputs), all 3 
> cases
> Have different dlays: 100 ms delay, 200 ms delay and 1000ms.
> In the CI, the service-core just hasn't been scheduled (yet) and causes the 
> "failure".
> 
> Option 1)
> One option is to while(1) loop, waiting for the service-thread to be 
> scheduled. This can be
> seen as "increasing the timeout", however in this case the test-case would be 
> errored
> not in the test-code, but in the meson-test runner as a timeout (with a 10sec 
> default?)
> The benefit here is that massively increasing (~1sec or less to 10 sec) will 
> cover all/many
> of the CI timeouts.
> 
> Option 2)
> Move to perf-tests, and not run these in a noisy-CI environment where the 
> results are not
> consistent enough to have value. This would mean that the tests are not run 
> in CI for the
> 3 checks in question are below, they all *require* the service core to be 
> scheduled:
> service_attr_get() -> requires service core to run for service stats to 
> increment
> service_lcore_attr_get() -> requires service core to run for lcore stats to 
> increment
> service_lcore_start_stop() -> requires service to run to to ensure 
> service-func itself executes.
> 
> I don't see how we can "improve" option 2 to not require the service-thread 
> to be scheduled by the OS..
> And the only way to make the OS schedule it in the CI more consistently is to 
> give it more time?


We are talking about seconds.
There are setups where scheduling a thread is taking seconds?

> Thoughts and input welcomed, I'm happy to make the code changes themselves, 
> its small effort
> For both option 1 & 2.

For time-sensitive tests, yes they should be in perf tests category.
As David said earlier, no timeout approach in functional tests.

Re: [PATCH v3] test/service: fix spurious failures by extending timeout

Reply via email to