On Mon, Oct 11, 2021 at 4:54 PM David Marchand <david.march...@redhat.com> wrote: > > The CI reported rare (and cryptic) failures like: > > RTE>>service_autotest > + ------------------------------------------------------- + > + Test Suite : service core test suite > + ------------------------------------------------------- + > + TestCase [ 0] : unregister_all succeeded > + TestCase [ 1] : service_name succeeded > + TestCase [ 2] : service_get_by_name succeeded > Service dummy_service Summary > dummy_service: stats 1 calls 0 cycles 0 avg: 0 > Service dummy_service Summary > dummy_service: stats 0 calls 0 cycles 0 avg: 0 > + TestCase [ 3] : service_dump succeeded > + TestCase [ 4] : service_attr_get failed > + TestCase [ 5] : service_lcore_attr_get succeeded > + TestCase [ 6] : service_probe_capability succeeded > + TestCase [ 7] : service_start_stop succeeded > + TestCase [ 8] : service_lcore_add_del succeeded > + TestCase [ 9] : service_lcore_start_stop succeeded > + TestCase [10] : service_lcore_en_dis_able succeeded > + TestCase [11] : service_mt_unsafe_poll succeeded > + TestCase [12] : service_mt_safe_poll succeeded > perf test for MT Safe: 42.7 cycles per call > + TestCase [13] : service_app_lcore_mt_safe succeeded > perf test for MT Unsafe: 73.3 cycles per call > + TestCase [14] : service_app_lcore_mt_unsafe succeeded > + TestCase [15] : service_may_be_active succeeded > + TestCase [16] : service_active_two_cores succeeded > + ------------------------------------------------------- + > + Test Suite Summary : service core test suite > + ------------------------------------------------------- + > + Tests Total : 17 > + Tests Skipped : 0 > + Tests Executed : 17 > + Tests Unsupported: 0 > + Tests Passed : 16 > + Tests Failed : 1 > + ------------------------------------------------------- + > Test Failed > RTE>> > stderr: > EAL: Detected CPU lcores: 16 > EAL: Detected NUMA nodes: 2 > EAL: Detected static linkage of DPDK > EAL: Multi-process socket /var/run/dpdk/service_autotest/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No available 1048576 kB hugepages reported > EAL: VFIO support initialized > EAL: Device 0000:03:00.0 is not NUMA-aware, defaulting socket to 0 > APP: HPET is not enabled, using TSC as default timer > EAL: Test assert service_attr_get line 340 failed: attr_get() call didn't > get call count (zero) > > According to API, trying to stop a service lcore is not possible if this > lcore is the only one associated to a service. > Doing this will result in a -EBUSY return code from > rte_service_lcore_stop() which the service_attr_get subtest was not > checking. > This left the service lcore running, and a race existed with the main > lcore on checking the service attributes which triggered this CI > failure. > > To fix this, dissociate the service lcore with current service. > > Once fixed this first issue, a race still exists, because the > wait_slcore_inactive helper added in a previous fix was not > paired with a check that the service lcore _did_ stop. > > Add missing check on rte_service_lcore_may_be_active. > > Fixes: 4d55194d76a4 ("service: add attribute get function") > Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore") > Cc: sta...@dpdk.org > > Signed-off-by: David Marchand <david.march...@redhat.com> Acked-by: Aaron Conole <acon...@redhat.com> Acked-by: Harry van Haaren <harry.van.haa...@intel.com>
Applied, thanks. -- David Marchand