From: Jeff Guo > On 11/9/2018 1:24 PM, Matan Azrad wrote: > > > > From: Jeff Guo > >> On 11/8/2018 5:35 PM, Matan Azrad wrote: > >>> From: Jeff Guo > >>>> On 11/8/2018 3:28 PM, Matan Azrad wrote: > >>>>> From: Ananyev, Konstantin > >>>>>>> -----Original Message----- > >>>>>>> From: Guo, Jia > >>>>>>> Sent: Wednesday, November 7, 2018 7:30 AM > >>>>>>> To: Matan Azrad <ma...@mellanox.com>; Ananyev, Konstantin > >>>>>>> <konstantin.anan...@intel.com>; Burakov, Anatoly > >>>>>>> <anatoly.bura...@intel.com>; Thomas Monjalon > >>>>>> <tho...@monjalon.net>; > >>>>>>> Iremonger, Bernard <bernard.iremon...@intel.com>; Wu, Jingjing > >>>>>>> <jingjing...@intel.com>; Lu, Wenzhuo <wenzhuo...@intel.com> > >>>>>>> Cc: Yigit, Ferruh <ferruh.yi...@intel.com>; dev@dpdk.org; Zhang, > >>>>>>> Helin <helin.zh...@intel.com>; He, Shaopeng > >>>> <shaopeng...@intel.com> > >>>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for > >>>>>>> hot-unplug > >>>>>>> > >>>>>>> matan > >>>>>>> > >>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote: > >>>>>>>> Hi Jeff > >>>>>>>> > >>>>>>>> From: Jeff Guo <jia....@intel.com> > >>>>>>>>> Before detach device when device be hot-unplugged, the failure > >>>>>>>>> process in user space and kernel space both need to be > >>>>>>>>> finished, such as eal interrupt callback need to be inactive > >>>>>>>>> before the callback be unregistered when device is being > >>>>>>>>> cleaned. This patch add rte alarm for device detaching, with > >>>>>>>>> that it could finish interrupt callback soon and give time to > >>>>>>>>> let the failure process done > >>>>>> before detaching. > >>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure > >>>>>>>>> handler") > >>>>>>>>> Signed-off-by: Jeff Guo <jia....@intel.com> > >>>>>>>>> --- > >>>>>>>>> app/test-pmd/testpmd.c | 13 ++++++++++++- > >>>>>>>>> 1 file changed, 12 insertions(+), 1 deletion(-) > >>>>>>>>> > >>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > >>>>>>>>> index 9c0edca..9c673cf 100644 > >>>>>>>>> --- a/app/test-pmd/testpmd.c > >>>>>>>>> +++ b/app/test-pmd/testpmd.c > >>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char > >>>>>>>>> *device_name, enum rte_dev_event_type type, > >>>>>>>>> device_name); > >>>>>>>>> return; > >>>>>>>>> } > >>>>>>>>> - rmv_event_callback((void *)(intptr_t)port_id); > >>>>>>>>> + /* > >>>>>>>>> + * Before detach device, the hot-unplug failure > >>>> process in > >>>>>>>>> + * user space and kernel space both need to be > >>>> finished, > >>>>>>>>> + * such as eal interrupt callback need to be inactive > >>>> before > >>>>>>>>> + * the callback be unregistered when device is being > >>>> cleaned. > >>>>>>>>> + * So finished interrupt callback soon here and give > >>>> time to > >>>>>>>>> + * let the work done before detaching. > >>>>>>>>> + */ > >>>>>>>>> + if (rte_eal_alarm_set(100000, > >>>>>>>>> + rmv_event_callback, (void > >>>>>>>>> *)(intptr_t)port_id)) > >>>>>>>>> + RTE_LOG(ERR, EAL, > >>>>>>>>> + "Could not set up deferred device > >>>>>>>> It looks me strange to use callback and alarm to remove a device: > >>>>>>>> Why not to use callback and that is it? > >>>>>>>> > >>>>>>>> I think that it's better to let the EAL to detach the device > >>>>>>>> after all the > >>>>>> callbacks were done and not to do it by the user callback. > >>>>>>>> So the application\callback owners just need to clean its > >>>>>>>> resources with understanding that after the callback the > >>>>>>>> device(and the callback > >>>>>>> itself) will be detached by the EAL. > >>>>>>> > >>>>>>> > >>>>>>> Firstly, at the currently framework and solution, such as > >>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the > >>>>>>> deferred device > >>>>>> removal, > >>>>>>> we tend to give the control of detaching device to the > >>>>>>> application, and the whole process is located on the user's > >>>>>>> callback. Notify app to detach device by callback but make it > >>>>>>> deferred, > >> i think it is fine. > >>>>> But the device must be detached in remove event, why not to do it > >>>>> in > >> EAL? > >>>> I think it because of before detached the device, application > >>>> should be stop the forwarding at first, then stop the device, then > >>>> close > >>>> > >>>> the device, finally call eal unplug API to detach device. If eal > >>>> directly detach device at the first step, there will be mountain > >>>> user space error need to handle, so that is one reason why need to > >>>> provider the remove notification to app, and let app to process it. > >>> This is why the EAL need to detach the device only after all the > >>> user > >> callbacks were done. > >> > >> > >> If i correctly got your meaning, you suppose to let eal to mandatory > >> detach device but not app, app just need to stop/close port, right? > > Yes, the app should stop,close,clean its own resources of the removed > > device, Then, EAL to detach the device. > > > >> If so, it will need to modify rmv_event_callback by not invoke the > >> detaching func and add some detaching logic to hotplug framework in eal. > >> > > rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV > event), so you need to use another one\ add parameter to it. > > And yes, you need to detach the device from EAL, should be simple. > > > I think rmv_event_callback is original use for other hotplug event (ETHDEV > RMV event), but it still use the common hotplug mechanism(app callback and > app detach), > > so i think it will still need to face this callback issue and you could check > that > eth_event_callback also use the rte alarm to detach device.
The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device, So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case. But, in the EAL event case this is different as I explained before. Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable. > > so my suggestion is that, you maybe propose a good idea but let we keep on > current mechanism until we come to a final good solution agreement, before > that, just let > > it functional. > I still suggest to fix the issue by an EAL detaching. > > >> It is hardly say better or worse but this new propose need to discuss > >> more in public. And it might be better to use another specific patch to > handler it. > >> What do you or other guys think so? > > Since you are fixing issue here, it can be done by a fix series. > > > > Other feedbacks are welcome all the time 😊 > > > >> > >>>>>> It is also unclear to me my we need an alarm here. > >>>>>> First (probably wrong) impression we just try to hide some > >>>>>> synchronization Problem by introducing delay. > >>>>> Looks like, the issue is that the callback function memory will be > >>>>> removed > >>>> from the function itself (by the detach call), no? > >>>> > >>>> > >>>> Answer here for both Konstantin and Matan. > >>>> > >>>> Yes, i think matan is right, the interrupt callback will be destroy > >>>> in the app callback itself, the sequence is that as below > >>>> > >>>> hot-unplug interrupt -> interrupt callback -> app callback(return > >>>> to finish interrupt callback, delay detaching) -> detach > >>>> device(unregister interrupt > >>>> callback) > >>>> > >>>> > >>>>>> Konstantin > >>>>>> > >>>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it > >>>>>>> register/unregister hotplug interrupt callback for each device, > >>>>>>> so need to make the callback done before unregister the callback. > >>>>>>> > >>>>>>> So I think it should be considerate as an workaround here, > >>>>>>> before we find a better way. > >>>>>>> > >>>>>>> > >>>>>>>>> removal\n"); > >>>>>>>>> break; > >>>>>>>>> case RTE_DEV_EVENT_ADD: > >>>>>>>>> RTE_LOG(ERR, EAL, "The device: %s has been > >>>> added!\n", > >>>>>>>>> -- > >>>>>>>>> 2.7.4