"Van Haaren, Harry" <harry.van.haa...@intel.com> writes:

> Hi Aaron,
>
>> -----Original Message-----
>> From: Aaron Conole <acon...@redhat.com>
>> Sent: Monday, November 25, 2019 10:54 PM
>> To: Thomas Monjalon <tho...@monjalon.net>
>> Cc: Van Haaren, Harry <harry.van.haa...@intel.com>; Amber, Kumar
>> <kumar.am...@intel.com>; dev@dpdk.org; Wang, Yipeng1
>> <yipeng1.w...@intel.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; Thakur,
>> Sham Singh <sham.singh.tha...@intel.com>; David Marchand
>> <dmarc...@redhat.com>
>> Subject: Re: [dpdk-dev] [PATCH v3] hash: added a new API to hash to query
>> key id
>> 
>> Aaron Conole <acon...@redhat.com> writes:
>> 
>> > Thomas Monjalon <tho...@monjalon.net> writes:
>> >
>> >>> From: Aaron Conole <acon...@redhat.com>
>> >>> > -      if (!service_valid(id))
>> >>> > +      if (id >= RTE_SERVICE_NUM_MAX || !service_valid(id))
>> >>
>> >> Why not adding this check in service_valid()?
>> >
>> > I think the best fix is to use SERVICE_VALID_GET_OR_ERR_RET() in these
>> > places.  For this, I at least want to try and show that there aren't any
>> > further errors.  And my test loop has been running for a while now
>> > without any more errors or segfaults, so I guess it's okay to build a
>> > proper patch.
>> 
>> This popped up:
>> 
>> EAL: Test assert service_lcore_en_dis_able line 487 failed: Ex-service core
>> function call had no effect.
>> 
>> So I'll spend some time in this area, it seems.
>
>
> The below diff makes it 100% reproducible here, failing every time.
>
> It seems like the main thread is returning, before the service thread has 
> returned.
>
> The rte_eal_mp_wait_lcore() call seems to not wait on the service-core, which 
> allows
> the main thread to read the "service_remote_launch_flag" value as 0 (before 
> the service-thread writes it to 1).
>
> Adding the delay between the service launch and service write being performed 
> makes this issue much much more likely to occur - so the above description I 
> have confidence in.
>
> What I'm not clear on (yet) is why the eal_mp_wait_lcore() isn't waiting...

As I wrote in the other thread, it's because eal_mp_wait_lcore won't
look at lcores with ROLE_SERVICE.

> -H

I've been running something similar to the suggested patch for 24
minutes now with no failure.  I've also removed the eal_mp_wait_lcore()
call in other areas throughout the test and switched to individual core
waiting "just in case."  I don't think it's the right fix, though.

Reply via email to