Hi From: Ferruh Yigit > On 1/23/2020 2:05 PM, Matan Azrad wrote: > > Hi > > > > From: Yigit, Ferruh > >> On 11/12/2019 8:47 AM, Matan Azrad wrote: > >>> The port was not validated before detaching. > >>> > >>> Ignore port detach operation when the port is not valid. > >>> > >>> Fixes: f8e5baa2662d ("app/testpmd: check not detaching device > >>> twice") > >>> Cc: tho...@monjalon.net > >>> Cc: sta...@dpdk.org > >>> > >>> Signed-off-by: Matan Azrad <ma...@mellanox.com> > >>> --- > >>> app/test-pmd/testpmd.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index > >>> 4444346..370eefe 100644 > >>> --- a/app/test-pmd/testpmd.c > >>> +++ b/app/test-pmd/testpmd.c > >>> @@ -2545,6 +2545,9 @@ struct extmem_param { > >>> > >>> printf("Removing a device...\n"); > >>> > >>> + if (port_id_is_invalid(port_id, ENABLED_WARN)) > >>> + return; > >>> + > >>> dev = rte_eth_devices[port_id].device; > >>> if (dev == NULL) { > >>> printf("Device already removed\n"); > >>> > >> > >> The patch is already in 19.11 [1] but it is breaking the testpmd > >> hotplug support. > >> Before 'detach_port_device()' called, the port has been stopped and > >> closed [2], which will make port fail from 'port_id_is_invalid()' > >> check and the device removal path never fully called. > >> The implication is, since device not detached, vfio request interrupt > >> keeps triggered continuously and re-starts the detach path, but > >> because of the half cleaned device it fails and app gets stuck with a > continuous log [3]. > >> > >> I wonder if the actual hotplug has been tested with this patch, the > >> commit log is not clear about the motivation and implication of the > >> patch, I am not clear why this check is added but I am sending a > >> patch soon to remove it back. > > > > The motivation of this patch was to prevent double detach on same port, > so the user cannot call detach of invalid port. > > What is the definition of the 'invalid port', if you mean device already > detached case, in the second call of the function "if (dev == NULL)" check > should prevent it going forward.
No, ethdev doesn't zero the device pointer when it release a port. So even if the port is in unused state already - means invalid, the device pointer still may be valid and point to the last port that used the same id. > But according the 'port_id_is_invalid()' API, a closed port is an invalid > port, I > think that is wrong in this context. Why? You are going to look on ethdev portid structure, don't you think we should valid the port before using its structure? > > > > I agree this patch is not good and we need a fix but I think the bug is > conceptual. > > > > Testpmd tries to do detach by port_id which is derived by ethdev port id > while detach work with rte_device. > > > > For example: > > you can see in the line above after +++: dev = > > rte_eth_devices[port_id].device, Testpmd may access invalid or > reallocated ethdev structure to get the device name and may even detach > unwanted rte_device. > > I thinks whichever function calling 'detach_port_device()' should check the > port validity. > 'detach_port_device()' doesn't know if port reallocated or not, it will free > the > given port_id, and when freeing done 'rte_eth_devices[port_id].device' will > be NULL, this looks to me a valid check. Please validate me, check ethdev, I don't think so, 'rte_eth_devices[port_id].device still valid after detach. > The caller of the 'detach_port_device()' should ensure correct port_id > passed to the function. What is correct port id, if the port was released , is it correct? > > > > So, detach is broken with and without this patch. > > I can't see how it is broken without the check, how the problem you > mentioned can be reproduced? Or is it a theoretical issue? > But with this check hotplug support is %100 reproducible broken. > > > > > > > I think Testpmd should change the concept of rte_device mapping and put > attention to next: > > 1. Don't detach by ethdev port ID. > > 2. Multiple ethdev port IDs may related to the same rte_device. > > > > The Testpmd user should be sure that all the port IDs of the rte_device are > released before the detach call and Testpmd maybe need to validate it. > > And like attach, detach should be triggered by PCI address \ rte_device > name. > > > > We need to know about port_id too to be able to stop/close it. > And sure no objection to improve the hotplug support but it is broken now, > lets fix it first. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Regards, > >> ferruh > >> > >> > >> [1] > >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit > >> .dp > >> > dk.org%2Fdpdk%2Fcommit%2F%3Fid%3D43d0e304980a1527bcac92dc679057 > >> > b189e2545a&data=02%7C01%7Cmatan%40mellanox.com%7Cc3f40356d > >> > d124e20faf708d7a006e68c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7 > >> > C0%7C637153823809699996&sdata=dBy9m%2BxCA%2Bme1IpX2LqPARa > >> 62giznKi8Xbtu220GA%2Bg%3D&reserved=0 > >> > >> [2] > >> rmv_port_callback > >> stop_port(port_id); > >> close_port(port_id); > >> detach_port_device(port_id); > >> > >> [3] > >> EAL: can not get port by device 0000:00:05.0! > >> EAL: can not get port by device 0000:00:05.0! > >> EAL: can not get port by device 0000:00:05.0! > >> EAL: can not get port by device 0000:00:05.0! > >> EAL: can not get port by device 0000:00:05.0! > >> EAL: can not get port by device 0000:00:05.0! > >> ...