On Wed, Nov 27, 2019 at 3:16 PM Van Haaren, Harry <harry.van.haa...@intel.com> wrote: > > > -----Original Message----- > > From: Aaron Conole <acon...@redhat.com> > > Sent: Wednesday, November 27, 2019 2:10 PM > > To: Van Haaren, Harry <harry.van.haa...@intel.com> > > Cc: dev@dpdk.org > > Subject: Re: [PATCH] test/service: fix wait for service core > > > > Harry van Haaren <harry.van.haa...@intel.com> writes: > > > > > This commit fixes a sporadic failure of the service_autotest > > > unit test, as seen in the DPDK CI. The failure occurs as the main test > > > thread did not wait on the service-thread to return, and allowing it > > > to read a flag before the service was able to write to it. > > > > > > The fix changes the wait API call to specific the service-core ID, > > > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > > > > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > > > for service cores, so must not be used to wait on service-cores. > > > > > > Fixes: f038a81e1c56 ("service: add unit tests") > > > > > > Reported-by: Aaron Conole <acon...@redhat.com> > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > > > > > --- > > > > It might also be good to document this behavior in the API area. It's > > unclear that the lcore wait function which takes a core id will work, > > but the broad wait will not. > > Yes agreed that docs can improve here - different patch. > > > > > Given this is a fix in the unit test, and not a functional change > > > I'm not sure its worth backporting to LTS / stable releases? > > > I've not added stable on CC yet. > > > > I think it's worth it if the LTS / stable branches use the unit tests > > (otherwise, they will observe sporadic failures). > > Ok, I've added sta...@dpdk.org on CC now > > > > > app/test/test_service_cores.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > > index 9fe38f5e0..a922c7ddc 100644 > > > --- a/app/test/test_service_cores.c > > > +++ b/app/test/test_service_cores.c > > > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) > > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > > > slcore_id); > > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); > > > - rte_eal_mp_wait_lcore(); > > > + rte_eal_wait_lcore(slcore_id); > > > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > > > "Ex-service core function call had no effect."); > > > > Should we also have some change like the following (just a guess): > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > index 9fe38f5e08..695c35ac6c 100644 > > --- a/app/test/test_service_cores.c > > +++ b/app/test/test_service_cores.c > > @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) > > > > /* flag done, then wait for the spawned 2nd core to return */ > > params[0] = 1; > > - rte_eal_mp_wait_lcore(); > > + rte_eal_wait_lcore(app_core2); > > > > /* core two gets launched first - and should hold the service lock */ > > TEST_ASSERT_EQUAL(0, app_core2_ret, > > > I reviewed this usage of the function, and I believe it waits on application > cores (aka, ROLE_RTE, not ROLE_SERVICE). Hence this usage is actually correct. > Please review and double check my logic though - more eyes is good.
It seems to be the case, yes. My overall feeling is that the services stuff is a giant hack, so better documentation will prove me wrong :-). As I said I am for taking this change in 19.11 now, as it only impacts this test and it seems to solve the random failures. Acked-by: David Marchand <david.march...@redhat.com> -- David Marchand