On Thu, Aug 17, 2017 at 06:04:27AM +0000, Shahaf Shuler wrote: > Wednesday, August 16, 2017 6:26 PM, Gaëtan Rivet: > > > Even though it is reasonable for driver to call the > > rte_eth_dev_port_release, I still think the ethdev layer should protect from > > such bad behavior from the application side. > > > It is more robust than counting on the different PMD to implement such > > release. > > > > > > > The ethdev layer does so in the rte_eth_dev_detach function, which is > > currently the only way to detach a device from the ethdev level. > > > > `rte_eth_dev_detach` setting the device state seems to me to be a crutch > > against badly implemented drivers. If I was nitpicky I would actually > > remove it > > and ideally everyone should enforce the call of rte_eth_dev_release_port > > from device removal functions when reviewing PMD implementations. > > > > The hotplug API is available to applications and the only way to have > > consistent devices states after calling rte_eal_hotplug_remove is to have > > drivers using a hook in the ethdev layer allowing to clean-up resources upon > > release. While it is trivial in its current state, such entry-point in the > > ethdev > > layer will be useful and should be kept and enforced IMO. > > > > I'm thinking about the 16-bit portid scheduled for v17.11, which implies an > > arbitrary number of port available. This would imply a dynamic allocation of > > rte_eth_devices, which would *need* such release hook to be available. > > Well the API should not be designed from conjectures or speculations of > > course, but I think it should be useful and there is no reason to remove it. > > > > Going further, I find it dangerous to have two points where the port is > > semantically released (device state set to UNUSED). If the API of the port > > release changes, we now have two points where we need to enforce the > > changes. While trivial concerning an enum, it could become more complex / > > dangerous if this veered toward memory management. > > Those are valid concerns, and you convinced me the RTE_ETH_DEV_UNUSED cannot > be set after device close. > > I still think the ethdev layer missing protection against driver calls (other > than detach) following a device close. The API not allows, but the ethdev > should enforce it. > Considering some driver memset to 0 their control structs after device close, > with no protection such bad calls might lead to segfault, which is not what > we want even under bad behavior. > > One could suggest to protect it inside the driver by replacing the vtable of > the driver ops, however it will introduce a lot of duplicated code which can > easily be avoided if ethdev would not pass such > Calls after device close. > > So, how about adding another state which will indicate the device is closed, > cannot be used, yet still attached? >
It's feasible. It might be useful. Adding finer device states seems logical to me, I made those explicit in the fail-safe. However, in the fail-safe I had the need to make automatic the handling of devices, meaning that these device states had to be properly abstracted and parseable. Adding finer device states in ethdev: Pros: - Easier to follow API and to spot usage mistakes. - Increasing the rigidity of the API might make apparent mistakes from PMD implementations. Helps PMD maintainers. Cons: - Some of those "mistakes" might be due to hardware-specific limitations that cannot be bypassed. I'm thinking about the flow-isolation implementation in MLX4 that had specific requirements about when it could be enabled, that would actually be forbidden by this proposal as it had to be done before rte_eth_dev_configure. - Handling special cases could quickly become a mess of edge-cases with workarounds here and there. Making the API evolve would imply untangling this mess at times. I'm considering there a full-blown device state implementation. I know you are only proposing adding an intermediate device state allowing to protect the user from using a closed device. But this proposal should be examined on the direction it would lead us into. It might be simpler for example to introduce a flag "dev_closed" in rte_eth_dev_data (along dev_started) for example, and checking only on this in the relevant functions. I don't have a strong opinion about the finer-grained device states. I only wanted to lay out what I saw as possible longer-term effects of choosing this solution and a possible alternative. -- Gaëtan Rivet 6WIND