"Burakov, Anatoly" <anatoly.bura...@intel.com> writes: > On 13-Mar-20 10:04 AM, David Marchand wrote: >> On Wed, Mar 11, 2020 at 3:39 PM Harry van Haaren >> <harry.van.haa...@intel.com> wrote: >>> >>> This commit releases all service cores from their role, >>> returning them to ROLE_RTE on rte_service_finalize(). >>> >>> This may fix an issue relating to the service cores causing >> >> s/may fix/fixes/ >> >>> a race-condition on eal_cleanup(), where the service core >>> could still be executing while the main thread has already >>> free-d the service memory, leading to a segfault. >>> >>> Fixes: 21698354c832 ("service: introduce service cores concept") >> >> Replaced with: >> Fixes: da23f0aa87d8 ("service: fix memory leak with new function") >> >>> Cc: sta...@dpdk.org >>> >>> Reported-by: David Marchand <david.march...@redhat.com> >>> Reported-by: Aaron Conole <acon...@redhat.com> >>> Signed-off-by: David Marchand <david.march...@redhat.com> >>> Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> >>> Acked-by: Aaron Conole <acon...@redhat.com> >> >> Applied, thanks. >> >> > > This patch breaks a couple of apps (or rather the apps were broken to > begin with, but the brokenness has been exposed with this patch). > > A "good" way to handle a SIGINT is to catch it, set some kind of > global exit flag, and exit the signal handler, so that all of the > threads see the exit flag, stop spinning, and exit the main loop and > proceed to gracefully shutdown. That's what majority of our apps do. > > A bad way to handle SIGINT is to call rte_exit() inside the signal > handler, without setting any global exit flags. Since rte_exit() now > waits for all of the threads to stop, the exit will never actually > happen because threads can't stop without an exit signal, and no exit > signal was provided by the signal handler.
Yes, I don't consider it 'breaking' anything - exit in signal handlers is always a bad idea. I guess we should correct the examples to show this. > Affected apps: > > * l3fwd-power (i'm preparing a patch) > * ip_reassembly (see main.c:988) - +Konstantin > > There are also a bunch of apps that simply call exit(0) and do unclean > shutdown without DPDK cleanup, and also apps i have no idea what > they're doing (call kill() on themselves in the SIGINT handler? > l3fwd-cat does that, so do a bunch of others), but this is probably a > bigger problem that should be addressed separately. I think one way to mitigate this is to register an at_exit() function that will check if eal is currently initialized and do the needed cleanup call. I don't know if there are any side-effects that we need to consider for it, though.