> -----Original Message----- > From: David Marchand <david.march...@redhat.com> > Sent: Tuesday, March 10, 2020 1:05 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com> > Cc: Aaron Conole <acon...@redhat.com>; dev <dev@dpdk.org> > Subject: Re: [RFC] service: stop lcore threads before 'finalize' > > On Fri, Feb 21, 2020 at 1:28 PM Van Haaren, Harry > <harry.van.haa...@intel.com> wrote: <snip> > > > > Hi David, > > > > I have been attempting to reproduce, unfortunately without success. > > > > Attempted you suggested meson test approach (thanks for suggesting!), but > > I haven't had a segfault with that approach (yet, and its done a lot of > iterations..) > > I reproduced it on the first try, just now. > Travis catches it every once in a while (look at the ovsrobot). > > For the reproduction, this is on my laptop (core i7-8650U), baremetal, > no fancy stuff. > FWIW, the cores are ruled by the "powersave" governor. > I can see the frequency oscillates between 3.5GHz and 3.7Ghz while the > max frequency is 4.2GHz. > > Travis runs virtual machines with 2 cores, and there must be quite > some overprovisioning on those servers. > We can expect some cycles being stolen or at least something happening > on the various cores. > > > > > > I've made the service-cores unit tests delay before exit, in an attempt > > to have them access previously rte_free()-ed memory, no luck to reproduce. > > Ok, let's forget about the segfault, what do you think of the > backtrace I caught? > A service lcore thread is still in the service loop. > The master thread of the application is in the libc exiting code. > > This is what I get in all crashes.
Hi, I was actually coding up the above as a patch to send to ML for testing. I've tried to reproduce - it doesn't happen here. I don't like sending patches for fixes that I haven't been able to reliably reproduce and fix locally - but in this case there's I don't see any other option. I'll post the fix patch to the mailing list ASAP, your and Aaron's help in testing would be greatly appreciated. > > Thinking perhaps we need it on exit, I've also POCed a unit test that > leaves > > service cores active on exit on purpose, to try have them poll after exit, > > still no luck. > > > > Simplifying the problem, and using hello-world sample app with a > rte_eal_cleaup() > > call at the end also doesn't easily aggravate the problem. > > > > From code inspection, I agree there is an issue. It seems like a call to > > rte_service_lcore_reset_all() from rte_service_finalize() is enough... > > But without reproducing it is hard to have good confidence in a fix. > > You promised a doc update on the services API. > Thanks. Yes, I heard there are some questions around what service cores is useful for. Having reviewed the programmer guide and doxygen of the API, I'm not sure what needs to change. Do you have specific questions you'd like to see addressed here, or what do you feel needs to change? https://doc.dpdk.org/guides/prog_guide/service_cores.html http://doc.dpdk.org/api/rte__service_8h.html Regards, -Harry