Hi From: Yigit, Ferruh > On 11/12/2019 8:47 AM, Matan Azrad wrote: > > The port was not validated before detaching. > > > > Ignore port detach operation when the port is not valid. > > > > Fixes: f8e5baa2662d ("app/testpmd: check not detaching device twice") > > Cc: tho...@monjalon.net > > Cc: sta...@dpdk.org > > > > Signed-off-by: Matan Azrad <ma...@mellanox.com> > > --- > > app/test-pmd/testpmd.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index > > 4444346..370eefe 100644 > > --- a/app/test-pmd/testpmd.c > > +++ b/app/test-pmd/testpmd.c > > @@ -2545,6 +2545,9 @@ struct extmem_param { > > > > printf("Removing a device...\n"); > > > > + if (port_id_is_invalid(port_id, ENABLED_WARN)) > > + return; > > + > > dev = rte_eth_devices[port_id].device; > > if (dev == NULL) { > > printf("Device already removed\n"); > > > > The patch is already in 19.11 [1] but it is breaking the testpmd hotplug > support. > Before 'detach_port_device()' called, the port has been stopped and closed > [2], which will make port fail from 'port_id_is_invalid()' check and the > device > removal path never fully called. > The implication is, since device not detached, vfio request interrupt keeps > triggered continuously and re-starts the detach path, but because of the half > cleaned device it fails and app gets stuck with a continuous log [3]. > > I wonder if the actual hotplug has been tested with this patch, the commit > log is not clear about the motivation and implication of the patch, I am not > clear why this check is added but I am sending a patch soon to remove it > back.
The motivation of this patch was to prevent double detach on same port, so the user cannot call detach of invalid port. I agree this patch is not good and we need a fix but I think the bug is conceptual. Testpmd tries to do detach by port_id which is derived by ethdev port id while detach work with rte_device. For example: you can see in the line above after +++: dev = rte_eth_devices[port_id].device, Testpmd may access invalid or reallocated ethdev structure to get the device name and may even detach unwanted rte_device. So, detach is broken with and without this patch. I think Testpmd should change the concept of rte_device mapping and put attention to next: 1. Don't detach by ethdev port ID. 2. Multiple ethdev port IDs may related to the same rte_device. The Testpmd user should be sure that all the port IDs of the rte_device are released before the detach call and Testpmd maybe need to validate it. And like attach, detach should be triggered by PCI address \ rte_device name. Matan > Regards, > ferruh > > > [1] > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dp > dk.org%2Fdpdk%2Fcommit%2F%3Fid%3D43d0e304980a1527bcac92dc679057 > b189e2545a&data=02%7C01%7Cmatan%40mellanox.com%7Cc3f40356d > d124e20faf708d7a006e68c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7 > C0%7C637153823809699996&sdata=dBy9m%2BxCA%2Bme1IpX2LqPARa > 62giznKi8Xbtu220GA%2Bg%3D&reserved=0 > > [2] > rmv_port_callback > stop_port(port_id); > close_port(port_id); > detach_port_device(port_id); > > [3] > EAL: can not get port by device 0000:00:05.0! > EAL: can not get port by device 0000:00:05.0! > EAL: can not get port by device 0000:00:05.0! > EAL: can not get port by device 0000:00:05.0! > EAL: can not get port by device 0000:00:05.0! > EAL: can not get port by device 0000:00:05.0! > ...