On 8/19/2021 4:45 AM, Huisong Li wrote: > 在 2021/8/18 19:24, Ferruh Yigit 写道: >> On 8/13/2021 9:16 AM, Huisong Li wrote: >>> 在 2021/8/13 14:12, Thomas Monjalon 写道: >>>> 13/08/2021 04:11, Huisong Li: >>>>> Hi, all >>>>> >>>>> This patch can enhance the security of device uninstallation to >>>>> eliminate dependency on user usage methods. >>>>> >>>>> Can you check this patch? >>>>> >>>>> >>>>> 在 2021/8/3 10:30, Huisong Li 写道: >>>>>> Ethernet devices in DPDK can be released by rte_eth_dev_close() and >>>>>> rte_dev_remove(). These APIs both call xxx_dev_close() in PMD layer >>>>>> to uninstall hardware. However, the two APIs do not have explicit >>>>>> invocation restrictions. In other words, at the ethdev layer, it is >>>>>> possible to call rte_eth_dev_close() before calling rte_dev_remove() >>>>>> or rte_eal_hotplug_remove(). In such a bad scenario, >>>> It is not a bad scenario. >>>> If there is no more port for the device after calling close, >>>> the device should be removed automatically. >>>> Keep in mind "close" is for one port, "remove" is for the entire device >>>> which can have more than one port. >>> I know. >>> >>> dev_close() is for removing an eth device. And rte_dev_remove() can be used >>> >>> for removing the rte device and all its eth devices belonging to the rte >>> device. >>> >>> In rte_dev_remove(), "remove" is executed in primary or one of secondary, >>> >>> all eth devices having same pci address will be closed and removed. >>> >>>>>> the primary >>>>>> process may be fine, but it may cause that xxx_dev_close() in the PMD >>>>>> layer will be called twice in the secondary process. So this patch >>>>>> fixes it. >>>> If a port is closed in primary, it should be the same in secondary. >>>> >>>> >>>>>> + /* >>>>>> + * The eth_dev->data->name doesn't be cleared by the secondary >>>>>> process, >>>>>> + * so above "eth_dev" isn't NULL after rte_eth_dev_close() called. >>>> This assumption is not clear. All should be closed together. >>> However, dev_close() does not have the feature similar to rte_dev_remove(). >>> >>> Namely, it is not guaranteed that all eth devices are closed together in >>> ethdev >>> layer. It depends on app or user. >>> >>> If the app does not close together, the operation of repeatedly >>> uninstalling an >>> eth device in the secondary process >>> >>> will be triggered when dev_close() is first called by one secondary >>> process, and >>> then rte_dev_remove() is called. >>> >>> So I think it should be avoided. >> First of all, I am not sure about calling 'rte_eth_dev_close()' or >> 'rte_dev_remove()' from the secondary process. >> There are explicit checks in various locations to prevent clearing resources >> completely from secondary process. > > There's no denying that. > > Generally, hardware resources of eth device and shared data of the primary and > secondary process > > are cleared by primary, which are controled by ethdev layer or PMD layer. > > But there may be some private data or resources of each process (primary or > secondary ), such as mp action > > registered by rte_mp_action_register() or others. For these resources, the > secondary process still needs to clear. > > Namely, both primary and secondary processes need to prevent repeated > offloading > of resources. > >> >> Calling 'rte_eth_dev_close()' or 'rte_dev_remove()' by secondary is >> technically >> can be done but application needs to be extra cautious and should take extra >> measures and synchronization to make it work. >> Regular use-case is secondary processes do the packet processing and all >> control >> commands run by primary. > > You are right. We have a consensus that 'rte_eth_dev_close()' or > 'rte_dev_remove()' > > can be called by primary and secondary processes. > > But DPDK framework cannot assume user behavior.😁 > > We need to make it more secure and reliable for both primary and secondary > processes. > >> >> In primary, if you call 'rte_eth_dev_close()' it will clear all ethdev >> resources >> and further 'rte_dev_remove()' call will detect missing ethdev resources and >> won't try to clear them again. >> >> In secondary, if you call 'rte_eth_dev_close()', it WON'T clear all resources >> and further 'rte_dev_remove()' call (either from primary or secondary) will >> try >> to clean ethdev resources again. You are trying to prevent this retry in >> remove >> happening for secondary process. > > Right. However, if secondary process in PMD layer has its own private > resources > to be > > cleared, it still need to do it by calling 'rte_eth_dev_close()' or > 'rte_dev_remove()'. > >> In secondary it won't free ethdev resources anyway if you let it continue, >> but I >> guess here you are trying to prevent the PMD dev_close() called again. Why? >> Is >> it just for optimization or does it cause unexpected behavior in the PMD? >> >> >> Overall, to free resources you need to do the 'rte_eth_dev_close()' or >> 'rte_dev_remove()' in the primary anyway. So instead of this workaround, I >> would >> suggest making PMD dev_close() safe to be called multiple times (if this is >> the >> problem.) > > In conclusion, primary and secondary processes in PMD layer may have their > own > > private data and resources, which need to be processed and released. > > Currently, these for PMD are either handled and cleaned up in dev_close() or > remove(). > > However, code streams in rte_dev_remove() cannot ensure that the > uninstallation > > from secondary process will not be repeated if rte_eth_dev_close() is first > called by > > secondary(primary is ok, plese review this patch). > > I think, this is the same for each PMD and is better suited to doing it in > ethdev layer. >
This patch prevents to call dev_close() twice in the secondary process, is this fixing a theoretical problem or an actual problem? If it is an actual problem can you please provide details, callstack of the problematic case? >> >> And again, please re-consider calling 'rte_eth_dev_close()' or >> 'rte_dev_remove()' from the secondary process. >> >>>>>> + * Namely, whether "eth_dev" is NULL cannot be used to determine >>>>>> whether >>>>>> + * an ethdev port has been released. >>>>>> + * For both primary process and secondary process, eth_dev->state is >>>>>> + * RTE_ETH_DEV_UNUSED, which means the ethdev port has been >>>>>> released. >>>>>> + */ >>>>>> + if (eth_dev->state == RTE_ETH_DEV_UNUSED) { >>>>>> + RTE_ETHDEV_LOG(INFO, "The ethdev port has been released."); >>>>>> + return 0; >>>>>> + } >>>> >>>> . >> .