On Fri, Jan 17, 2020 at 9:17 AM David Marchand <david.march...@redhat.com> wrote: > > On Thu, Jan 16, 2020 at 8:50 PM Aaron Conole <acon...@redhat.com> wrote: > > > > I've noticed an occasional segfault from the build system in the > > service_autotest and after talking with David (CC'd), it seems like it's > > due to the rte_service_finalize deleting the lcore_states object while > > active lcores are running. > > > > The below patch is an attempt to solve it by first reassigning all the > > lcores back to ROLE_RTE before releasing the memory. There is probably > > a larger question for DPDK proper about actually closing the pending > > lcore threads, but that's a separate issue. I've been running with the > > patch for a while, and haven't seen the crash anymore on my system. > > > > Thoughts? Is it acceptable as-is? > > Added this patch to my env, still reproducing the same issue after ~10-20 > tries. > I added a breakpoint to service_lcore_uninit that is indeed caught > when exiting the test application (just wanted to make sure your > change was in my binary).
Harry, We need a fix for this issue. Interestingly, Stephen patch that joins all pthreads at rte_eal_cleanup [1] makes this issue disappear. So my understanding is that we are missing a api (well, I could not find a way) to synchronously stop service lcores. 1: https://patchwork.dpdk.org/patch/64201/ -- David Marchand