David Marchand <david.march...@redhat.com> writes: > On Fri, Jan 17, 2020 at 9:17 AM David Marchand > <david.march...@redhat.com> wrote: >> >> On Thu, Jan 16, 2020 at 8:50 PM Aaron Conole <acon...@redhat.com> wrote: >> > >> > I've noticed an occasional segfault from the build system in the >> > service_autotest and after talking with David (CC'd), it seems like it's >> > due to the rte_service_finalize deleting the lcore_states object while >> > active lcores are running. >> > >> > The below patch is an attempt to solve it by first reassigning all the >> > lcores back to ROLE_RTE before releasing the memory. There is probably >> > a larger question for DPDK proper about actually closing the pending >> > lcore threads, but that's a separate issue. I've been running with the >> > patch for a while, and haven't seen the crash anymore on my system. >> > >> > Thoughts? Is it acceptable as-is? >> >> Added this patch to my env, still reproducing the same issue after ~10-20 >> tries. >> I added a breakpoint to service_lcore_uninit that is indeed caught >> when exiting the test application (just wanted to make sure your >> change was in my binary). > > Harry, > > We need a fix for this issue.
+1 > Interestingly, Stephen patch that joins all pthreads at > rte_eal_cleanup [1] makes this issue disappear. > So my understanding is that we are missing a api (well, I could not > find a way) to synchronously stop service lcores. Maybe we can take that patch as a fix. I hate to see this segfault in the field. I need to figure out what I missed in my cleanup (probably missed a synchronization point). > > 1: https://patchwork.dpdk.org/patch/64201/