On 2/13/2020 1:36 PM, Thomas Monjalon wrote: > More details below about the plan for 20.02. > > 13/02/2020 13:37, Thomas Monjalon: >> Hi, >> >> This discussion becomes confusing so I do a summary below. >> I think we can do several fixes in 20.02.
Thanks for checking this. >> >> 12/02/2020 14:49, Ferruh Yigit: >>> On 2/3/2020 5:10 PM, Matan Azrad wrote: >> >> [stripping long discussion in favor of a summary below] >> >>>> Even if the PMD clear the device pointer, the testpmd still may release >>>> wrong rte_device. >>> >>> Yes it may, although that is less likely to occur, it requires a new device >>> hot >>> added between close() and detach of the other device. >>> >>> Would you be agree to say there are two problems: >>> >>> 1) When testpmd close a port, a new attached port can re-use it over writing >>> some fields, relying the data structures of the closed port is not safe. >>> >>> 2) PMD not cleaning ethdev->device pointer in the .remove() may cause >>> issues in >>> double detach of a port. >>> >>> >>> For (1) I suggest fixing it in the attach path, don't re-use an eth_dev >>> port id >>> unless it is completely freed, may need to add new state for it. Does it >>> make sense? >> >> Yes we could add a CLOSED state which is set on ethdev close. >> When the rte_device is freed, the PMD could set attached ports as UNUSED. >> But given some ethdev ports can be open and closed dynamically, >> I am not sure it is a good solution to keep them in CLOSED state and ask >> PMD to remember them. >> >> An alternative workaround could be to allocate port_id by incrementing >> a saved biggest id. So the race condition would be very unlikely. >> The drawbacks are having big port_id numbers and changing the id >> allocation algorithm (which is not documented anyway). OK to keep increase port_id instead of re-using closed ones, that simplifies a lot. >> >> The proposals above for port_id allocation or states rework cannot be >> done in 20.02. Let's discuss and work on it in a separated thread. +1 >> >>> For (2) PMDs want to get hotplug support needs to fix it. >> >> Yes PMDs should clear rte_eth_devices[port_id].device in .remove(). > > I am sending a patch adding > memset(eth_dev, 0, sizeof(struct rte_eth_dev)); > in rte_eth_dev_release_port(). > But this patch cannot be merged after 20.02-rc1. It will wait for 20.05. Not sure about this, close() calls the 'rte_eth_dev_release_port()', memset the struct in close() will wipe the device pointers and prevents freeing them in hot remove, silently. > >> We must also protect from user calling detach on a closed port >> by adding a check in cmd_operate_detach_port_parsed(), >> before calling detach_port_device(). > > I am sending a patch adding RTE_ETH_VALID_PORTID_OR_RET() > in cmd_operate_detach_port_parsed(). > It should fix the issue observed by Matan with double detach. > It will be a double protection if keeping the check > port_id_is_invalid() in detach_port_device(). OK > >> The hotplug rmv_port_callback() must be able to call detach after close. >> There are three possible fixes: >> - revert the port_id_is_invalid() check in detach_port_device() >> - call rte_dev_remove(rte_device) directly >> - call a new function with rte_device (detach_port_device() can use it) > > I am sending a patch implementing the third alternative > as it is both keeping the detach behaviour and fixing the race condition > (i.e. protect from new port re-using the port_id between close and detach). Should work, only concern if any possible side affect occurs, can be discussed on patch. > >> About the function detach_port_device() itself, yes this function is >> strange to say the least. It was a convenience for detaching a rte_device >> from a port_id. >> The cleanup of siblings with RTE_ETH_FOREACH_DEV_OF(sibling, dev), >> should probably be removed. I've added it as a temporary solution >> before all PMDs are properly fixed: >> rte_eth_devices[sibling].device = NULL; > > I propose sending such patch in 20.05 in order to merge the memset above > first, and have time to get agreement from all PMD maintainers. OK > >> For info, there is a function detach_device() used by the command >> "device detach <identifier>" > > >