On 11/9/2018 1:24 PM, Matan Azrad wrote:

  From: Jeff Guo
On 11/8/2018 5:35 PM, Matan Azrad wrote:
From: Jeff Guo
On 11/8/2018 3:28 PM, Matan Azrad wrote:
From: Ananyev, Konstantin
-----Original Message-----
From: Guo, Jia
Sent: Wednesday, November 7, 2018 7:30 AM
To: Matan Azrad <ma...@mellanox.com>; Ananyev, Konstantin
<konstantin.anan...@intel.com>; Burakov, Anatoly
<anatoly.bura...@intel.com>; Thomas Monjalon
<tho...@monjalon.net>;
Iremonger, Bernard <bernard.iremon...@intel.com>; Wu, Jingjing
<jingjing...@intel.com>; Lu, Wenzhuo <wenzhuo...@intel.com>
Cc: Yigit, Ferruh <ferruh.yi...@intel.com>; dev@dpdk.org; Zhang,
Helin <helin.zh...@intel.com>; He, Shaopeng
<shaopeng...@intel.com>
Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
hot-unplug

matan

On 11/6/2018 2:36 PM, Matan Azrad wrote:
Hi Jeff

     From: Jeff Guo <jia....@intel.com>
Before detach device when device be hot-unplugged, the failure
process in user space and kernel space both need to be finished,
such as eal interrupt callback need to be inactive before the
callback be unregistered when device is being cleaned. This
patch add rte alarm for device detaching, with that it could
finish interrupt callback soon and give time to let the failure
process done
before detaching.
Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
Signed-off-by: Jeff Guo <jia....@intel.com>
---
     app/test-pmd/testpmd.c | 13 ++++++++++++-
     1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9c0edca..9c673cf 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
*device_name, enum rte_dev_event_type type,
                                device_name);
                        return;
                }
-               rmv_event_callback((void *)(intptr_t)port_id);
+               /*
+                * Before detach device, the hot-unplug failure
process in
+                * user space and kernel space both need to be
finished,
+                * such as eal interrupt callback need to be inactive
before
+                * the callback be unregistered when device is being
cleaned.
+                * So finished interrupt callback soon here and give
time to
+                * let the work done before detaching.
+                */
+               if (rte_eal_alarm_set(100000,
+                               rmv_event_callback, (void
*)(intptr_t)port_id))
+                       RTE_LOG(ERR, EAL,
+                               "Could not set up deferred device
It looks me strange to use callback and alarm to remove a device:
Why not to use callback and that is it?

I think that it's better to let the EAL to detach the device
after all the
callbacks were done and not to do it by the user callback.
So the application\callback owners just need to clean its
resources with understanding that after the callback the
device(and the callback
itself) will be detached by the EAL.


Firstly, at the currently framework and solution, such as callback
for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
removal,
we tend to give the control of detaching device to the
application, and the whole process is located on the user's
callback. Notify app to detach device by callback but make it deferred,
i think it is fine.
But the device must be detached in remove event, why not to do it in
EAL?
I think it because of before detached the device, application should
be stop the forwarding at first, then stop the device, then close

the device, finally call eal unplug API to detach device. If eal
directly detach device at the first step, there will be mountain user
space error need to handle, so that is one reason why need to
provider the remove notification to app, and let app to process it.
This is why the EAL need to detach the device only after all the user
callbacks were done.


If i correctly got your meaning, you suppose to let eal to mandatory detach
device but not app, app just need to stop/close port, right?
Yes, the app should stop,close,clean its own resources of the removed device,
Then, EAL to detach the device.

If so, it will need to modify rmv_event_callback by not invoke the detaching
func and add some detaching logic to hotplug framework in eal.

rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV event), so 
you need to use another one\ add parameter to it.
And yes, you need to detach the device from EAL, should be simple.


I think rmv_event_callback is original use for other hotplug event (ETHDEV RMV event), but it still use the common hotplug mechanism(app callback and app detach),

so i think it will still need to face this callback issue and you could check that eth_event_callback also use the rte alarm to detach device.

so my suggestion is that, you maybe propose a good idea but let we keep on current mechanism until we come to a final good solution agreement, before that, just let

it functional.


It is hardly say better or worse but this new propose need to discuss more in
public. And it might be better to use another specific patch to handler it.
What do you or other guys think so?
Since you are fixing issue here, it can be done by a fix series.

Other feedbacks are welcome all the time 😊


It is also unclear to me my we need an alarm here.
First (probably wrong) impression we just try to hide some
synchronization Problem by introducing delay.
Looks like, the issue is that the callback function memory will be
removed
from the function itself (by the detach call), no?


Answer here for both Konstantin and Matan.

Yes, i think matan is right, the interrupt callback will be destroy
in the app callback itself, the sequence is that as below

hot-unplug interrupt -> interrupt callback -> app callback(return to
finish interrupt callback, delay detaching) -> detach
device(unregister interrupt
callback)


Konstantin

Secondly, the vfio is different with igb_uio for hot-unplug, it
register/unregister hotplug interrupt callback for each device, so
need to make  the callback done before unregister the callback.

So I think it should be considerate as an workaround here, before
we find a better way.


removal\n");
                break;
        case RTE_DEV_EVENT_ADD:
                RTE_LOG(ERR, EAL, "The device: %s has been
added!\n",
--
2.7.4

Reply via email to