"Burakov, Anatoly" <anatoly.bura...@intel.com> writes:

> On 13-Mar-20 10:04 AM, David Marchand wrote:
>> On Wed, Mar 11, 2020 at 3:39 PM Harry van Haaren
>> <harry.van.haa...@intel.com> wrote:
>>>
>>> This commit releases all service cores from their role,
>>> returning them to ROLE_RTE on rte_service_finalize().
>>>
>>> This may fix an issue relating to the service cores causing
>>
>> s/may fix/fixes/
>>
>>> a race-condition on eal_cleanup(), where the service core
>>> could still be executing while the main thread has already
>>> free-d the service memory, leading to a segfault.
>>>
>>> Fixes: 21698354c832 ("service: introduce service cores concept")
>>
>> Replaced with:
>> Fixes: da23f0aa87d8 ("service: fix memory leak with new function")
>>
>>> Cc: sta...@dpdk.org
>>>
>>> Reported-by: David Marchand <david.march...@redhat.com>
>>> Reported-by: Aaron Conole <acon...@redhat.com>
>>> Signed-off-by: David Marchand <david.march...@redhat.com>
>>> Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com>
>>> Acked-by: Aaron Conole <acon...@redhat.com>
>>
>> Applied, thanks.
>>
>>
>
> This patch breaks a couple of apps (or rather the apps were broken to
> begin with, but the brokenness has been exposed with this patch).
>
> A "good" way to handle a SIGINT is to catch it, set some kind of
> global exit flag, and exit the signal handler, so that all of the
> threads see the exit flag, stop spinning, and exit the main loop and
> proceed to gracefully shutdown. That's what majority of our apps do.
>
> A bad way to handle SIGINT is to call rte_exit() inside the signal
> handler, without setting any global exit flags. Since rte_exit() now
> waits for all of the threads to stop, the exit will never actually
> happen because threads can't stop without an exit signal, and no exit
> signal was provided by the signal handler.

Yes, I don't consider it 'breaking' anything - exit in signal handlers
is always a bad idea.  I guess we should correct the examples to show
this.

> Affected apps:
>
> * l3fwd-power (i'm preparing a patch)
> * ip_reassembly (see main.c:988) - +Konstantin
>
> There are also a bunch of apps that simply call exit(0) and do unclean
> shutdown without DPDK cleanup, and also apps i have no idea what
> they're doing (call kill() on themselves in the SIGINT handler?
> l3fwd-cat does that, so do a bunch of others), but this is probably a
> bigger problem that should be addressed separately.

I think one way to mitigate this is to register an at_exit() function
that will check if eal is currently initialized and do the needed
cleanup call.  I don't know if there are any side-effects that we need
to consider for it, though.

Reply via email to