On Thu, 9 Mar 2023 17:19:59 +0000 "Dumitrescu, Cristian" <cristian.dumitre...@intel.com> wrote:
> > -----Original Message----- > > From: Stephen Hemminger <step...@networkplumber.org> > > Sent: Thursday, March 9, 2023 4:31 PM > > To: Jangra, Yogesh <yogesh.jan...@intel.com> > > Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitre...@intel.com>; R, > > Kamalakannan <kamalakanna...@intel.com>; Suresh Narayane, Harshad > > <harshad.suresh.naray...@intel.com> > > Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev > > ports > > > > On Thu, 9 Mar 2023 14:42:49 +0000 > > Yogesh Jangra <yogesh.jan...@intel.com> wrote: > > > > > + /* > > > + * SoftNIC runs on the sevice core, it uses the resources from > > > + * the testpmd application. When we run quit command, the > > testpmd > > > + * application stops ethdev ports first, SoftNIC will try to > > > + * access the port and sometimes that result in segmentation > > > + * error. So first closing the SoftNIC port. > > > + */ > > > + RTE_ETH_FOREACH_DEV(pt_id) { > > > + if (!strcmp(ports[pt_id].dev_info.driver_name, > > "net_softnic")) { > > > + stop_port(pt_id); > > > + close_port(pt_id); > > > + } > > > + } > > > + > > > > NAK > > No driver specific hacks please. > > > > Instead fix the driver design or bug please. > > Hi Stephen, > > This is not a Soft NIC driver-specific hack, this is required for working > around some of the ethdev drivers that don't implement the stop() API > correctly and free up the device queues or some other internal resources on > stop() instead of close(). > > The Soft NIC is a meta-device that sits on top of other "physical" ethdev > devices, so when the Soft NIC device continues to poll the queues of those > physical devices after their queues have been freed, the Soft NIC will get a > segfault. This fix is required to protect against this sort of incorrect > driver behavior by simply stopping the Soft NIC devices first. > > We already have several driver specific branches in the test-pmd for e.g. LAG > or virtual devices; IMO this small change falls in the same category and it > should get accepted. > > Please let us know if this makes sense to you? > > Regards, > Cristian If the application has to do this then something is wrong with the architecture. You aren't propagating state changes through in a safe manner. Other applications will have same issue. If LAG or virtual devices have similar problems then a generic solution is needed.