05/10/2022 22:33, Mattias Rönnblom: > On 2022-10-05 21:14, David Marchand wrote: > > Hello, > > > > The service_autotest unit test has been failing randomly. > > This is not something new. > > We have been fixing this unit test and the service code, here and there. > > For some time we were "fine": the failures were rare. > > > > But recenly (for the last two weeks at least), it started failing more > > frequently in UNH lab. > > > > The symptoms are linked to places where the unit test code is "waiting > > for some time": > > > > - service_lcore_attr_get: > > + TestCase [ 5] : service_lcore_attr_get failed > > EAL: Test assert service_lcore_attr_get line 422 failed: Service lcore > > not stopped after waiting. > > > > > > - service_may_be_active: > > + TestCase [15] : service_may_be_active failed > > ... > > EAL: Test assert service_may_be_active line 960 failed: Error: Service > > not stopped after 100ms > > > > Ideas? > > > > > > Thanks. > > Do you run the test suite in a controlled environment? I.e., one where > you can trust that the lcore threads aren't interrupted for long periods > of time. > > 100 ms is not a long time if a SCHED_OTHER lcore thread competes for the > CPU with other threads.
You mean the tests cannot be interrupted? Then it looks very fragile. Please could help making it more robust?