config restore > >> > >> External email: Use caution opening links or attachments > >> > >> > >> On 10/10/2024 1:08 PM, Dariusz Sosnowski wrote: > >>>> -----Original Message----- > >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>>> Sent: Thursday, October 10, 2024 01:17 > >>>> To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Konstantin Ananyev > >>>> <konstantin.anan...@huawei.com>; NBU-Contact-Thomas Monjalon > >>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko > >>>> <andrew.rybche...@oktetlabs.ru> > >>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> > >>>> Subject: Re: [RFC 0/4] ethdev: rework config restore > >>>> > >>>> External email: Use caution opening links or attachments > >>>> > >>>> > >>>> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote: > >>>>>> -----Original Message----- > >>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>>>>> Sent: Wednesday, October 9, 2024 03:08 > >>>>>> To: Konstantin Ananyev <konstantin.anan...@huawei.com>; Dariusz > >>>>>> Sosnowski <dsosnow...@nvidia.com>; NBU-Contact-Thomas Monjalon > >>>>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko > >>>>>> <andrew.rybche...@oktetlabs.ru> > >>>>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> > >>>>>> Subject: Re: [RFC 0/4] ethdev: rework config restore > >>>>>> > >>>>>> External email: Use caution opening links or attachments > >>>>>> > >>>>>> > >>>>>> On 10/8/2024 6:21 PM, Konstantin Ananyev wrote: > >>>>>>> > >>>>>>> > >>>>>>>>>>>> We have been working on optimizing the latency of calls to > >>>>>>>>>>>> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of > >>>>>>>>>>>> the work requires changes in the implementation of > >>>>>>>>>>>> .dev_start() PMD callback, but I also wanted to start a > >>>>>>>>>>>> discussion regarding configuration restore. > >>>>>>>>>>>> > >>>>>>>>>>>> rte_eth_dev_start() does a few things on top of calling > >>>>>>>>>>>> .dev_start() > >>>>>> callback: > >>>>>>>>>>>> > >>>>>>>>>>>> - Before calling it: > >>>>>>>>>>>> - eth_dev_mac_restore() - if device supports > >>>>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; > >>>>>>>>>>>> - After calling it: > >>>>>>>>>>>> - eth_dev_mac_restore() - if device does not support > >>>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; > >>>>>>>>>>>> - restore promiscuous config > >>>>>>>>>>>> - restore all multicast config > >>>>>>>>>>>> > >>>>>>>>>>>> eth_dev_mac_restore() iterates over all known MAC addresses - > >>>>>>>>>>>> stored in rte_eth_dev_data.mac_addrs array - and calls > >>>>>>>>>>>> .mac_addr_set() and .mac_addr_add() callbacks to apply these > >>>>>>>>>>>> MAC > >>>>>> addresses. > >>>>>>>>>>>> > >>>>>>>>>>>> Promiscuous config restore checks if promiscuous mode is > >>>>>>>>>>>> enabled or not, and calls .promiscuous_enable() or > >>>>>>>>>>>> .promiscuous_disable() > >>>>>> callback. > >>>>>>>>>>>> > >>>>>>>>>>>> All multicast config restore checks if all multicast mode is > >>>>>>>>>>>> enabled or not, and calls .allmulticast_enable() or > >>>>>>>>>>>> .allmulticast_disable() > >>>>>> callback. > >>>>>>>>>>>> > >>>>>>>>>>>> Callbacks are called directly in all of these cases, to > >>>>>>>>>>>> bypass the checks for applying the same configuration, which > >>>>>>>>>>>> exist in relevant > >>>>>> APIs. > >>>>>>>>>>>> Checks are bypassed to force drivers to reapply the > >>>>>>>>>>>> configuration. > >>>>>>>>>>>> > >>>>>>>>>>>> Let's consider what happens in the following sequence of API > >>>>>>>>>>>> calls. > >>>>>>>>>>>> > >>>>>>>>>>>> 1. rte_eth_dev_configure() > >>>>>>>>>>>> 2. rte_eth_tx_queue_setup() > >>>>>>>>>>>> 3. rte_eth_rx_queue_setup() > >>>>>>>>>>>> 4. rte_eth_promiscuous_enable() > >>>>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>>>>>>> - Stores promiscuous state in dev->data->promiscuous 5. > >>>>>>>>>>>> rte_eth_allmulticast_enable() > >>>>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>>>>>>>>> - Stores allmulticast state in dev->data->allmulticast 6. > >>>>>>>>>>>> rte_eth_dev_start() > >>>>>>>>>>>> - Call dev->dev_ops->dev_start() > >>>>>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC > >>>>>>>>>>>> address > >>>>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>>>>>>>>> > >>>>>>>>>>>> Even though all configuration is available in dev->data after > >>>>>>>>>>>> step 5, library forces reapplying this configuration in step 6. > >>>>>>>>>>>> > >>>>>>>>>>>> In mlx5 PMD case all relevant callbacks require communication > >>>>>>>>>>>> with the kernel driver, to configure the device (mlx5 PMD > >>>>>>>>>>>> must create/destroy new kernel flow rules and/or change netdev > >> config). > >>>>>>>>>>>> > >>>>>>>>>>>> mlx5 PMD handles applying all configuration in .dev_start(), > >>>>>>>>>>>> so the following forced callbacks force additional > >>>>>>>>>>>> communication with the kernel. The > >>>>>>>>>>> same configuration is applied multiple times. > >>>>>>>>>>>> > >>>>>>>>>>>> As an optimization, mlx5 PMD could check if a given > >>>>>>>>>>>> configuration was applied, but this would duplicate the > >>>>>>>>>>>> functionality of the library (for example > >>>>>>>>>>>> rte_eth_promiscuous_enable() does not call the driver if > >>>>>>>>>>>> dev->data->promiscuous is set). > >>>>>>>>>>>> > >>>>>>>>>>>> Question: Since all of the configuration is available before > >>>>>>>>>>>> .dev_start() callback is called, why ethdev library does not > >>>>>>>>>>>> expect .dev_start() to > >>>>>>>>>>> take this configuration into account? > >>>>>>>>>>>> In other words, why library has to reapply the configuration? > >>>>>>>>>>>> > >>>>>>>>>>>> I could not find any particular reason why configuration > >>>>>>>>>>>> restore exists as part of the process (it was in the initial DPDK > >> commit). > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> My assumption is .dev_stop() cause these values reset in some > >>>>>>>>>>> devices, so > >>>>>>>>>>> .dev_start() restores them back. > >>>>>>>>>>> @Bruce or @Konstantin may remember the history. > >>>>>>>>> > >>>>>>>>> Yep, as I remember, at least some Intel PMDs calling hw_reset() > >>>>>>>>> ad > >>>>>>>>> dec_stop() and even dev_start() to make sure that HW is in a > >>>>>>>>> clean > >>>>>>>>> (known) > >>>>>> state. > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> But I agree this is device specific behavior, and can be > >>>>>>>>>>> managed by what device requires. > >>>>>>>>> > >>>>>>>>> Probably yes. > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> The patches included in this RFC, propose a mechanism which > >>>>>>>>>>>> would help with managing which drivers rely on forceful > >>>>>>>>>>>> configuration > >>>> restore. > >>>>>>>>>>>> Drivers could advertise if forceful configuration restore is > >>>>>>>>>>>> needed through `RTE_ETH_DEV_*_FORCE_RESTORE` device flag. If > >>>>>>>>>>>> this flag is set, then the driver in question requires ethdev > >>>>>>>>>>>> to forcefully restore > >>>>>>>>>>> configuration. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> OK to use flag for it, but not sure about using > >>>>>>>>>>> 'dev_info->dev_flags' > >>>>>>>>>>> (RTE_ETH_DEV_*) for this, as this flag is shared with user and > >>>>>>>>>>> this is all dpdk internal. > >>>>>>>>>>> > >>>>>>>>>>> What about to have a dedicated flag for it? We can have a > >>>>>>>>>>> dedicated set of flag values for restore. > >>>>>>>>>> > >>>>>>>>>> Agreed. What do you think about the following? > >>>>>>>>> > >>>>>>>>> Instead of exposing that, can we probably make it transparent to > >>>>>>>>> the user and probably ethdev layer too? > >>>>>>>>> > >>>>>>>> > >>>>>>>> +1 to make it transparent to user, but not sure if we can make it > >>>>>>>> transparent to ethdev layer. > >>>>>>> > >>>>>>> Just to be clear: > >>>>>>> Let say, using example from above: > >>>>>>> > >>>>>>> rte_eth_dev_start() > >>>>>>> - Call dev->dev_ops->dev_start() > >>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address > >>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>>>> > >>>>>>> We probably can introduce ethdev internal function (still visible > >>>>>>> to > >>>>>>> PMDs) that would do last 3 steps: > >>>>>>> ethdev_replay_user_conf(...) > >>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address > >>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>>>> > >>>>>>> And let PMD itself to decide does it needs to call it at dev_start() > >>>>>>> or not. > >>>>>>> So it will become: > >>>>>>> rte_eth_dev_start() > >>>>>>> - Call dev->dev_ops->dev_start() > >>>>>>> -Call ethdev_replay_user_conf(.) > >>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC > >>>>>>> address > >>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>> -Call dev->dev_ops->allmulticast_enable() > >>>>>>> > >>>>>>> For PMDs that do need to restore user provided config And > >>>>>>> rte_eth_dev_start() > >>>>>>> - Call dev->dev_ops->dev_start() > >>>>>>> > >>>>>>> For those who do not. > >>>>>>> > >>>>>> > >>>>>> OK, got it what you mean. > >>>>>> Pushing restore functionality to PMDs works, but this may be doing > >>>>>> redundant work on each PMD. > >>>>>> > >>>>>> Instead Dariusz suggests PMD to provide a flag to ehtdev to what to > >>>>>> restore and common code in ethdev does the work. > >>>>>> My below dedicated data struct comment is to have this flag in a > >>>>>> new struct, overall like following: > >>>>>> > >>>>>> rte_eth_dev_start() > >>>>>> - Call dev->dev_ops->dev_start() > >>>>>> - Call dev->dev_ops->get_restore_flags(ethdev, RTE_ETH_START, > >>>>>> &flags) > >>>>>> - if (flags & MAC) dev->dev_ops->mac_addr_set() > >>>>>> - if (flags & PROMISC) dev->dev_ops->promiscuous_enable() > >>>>>> - ... > >>>>> > >>>>> Could you please explain what is the benefit of exposing flags > >>>>> through dev_ops > >>>> callback vs a dedicated flags field in rte_eth_dev_data? > >>>>> In both solutions: > >>>>> - config restore is transparent to the user, > >>>>> - drivers can omit config restore (either by not implementing the > >>>>> callback or not providing the flags), > >>>>> - an ABI change is introduced (not a huge concern, at least for 24.11). > >>>>> > >>>>> I understand that my initial proposal with "internal_flags" was too > >>>>> vague, but renaming and splitting this field into: > >>>>> > >>>>> - dev_start_restore_flags > >>>>> - dev_reset_restore_flags > >>>>> - and so on... > >>>>> > >>>>> seems sufficient, at least in my opinion. > >>>>> > >>>> > >>>> Hi Dariusz, > >>>> > >>>> Putting flags to rte_eth_dev_data works, and it is easier since there > >>>> is direct access from rte_eth_dev to rte_eth_dev_data, so you don't > >>>> need new dev_ops. So this is a valid option. > >>>> > >>>> But benefit of new dev_ops is to keep "struct rte_eth_dev_data" clean. > >>>> > >>>> "struct rte_eth_dev_data" is integral data structure for ethdev and > >>>> it is used in multiple locations, mostly related to the datapath and > >>>> all drivers needs to deal with fields of this struct. > >>>> Like [rx]_queues, dev_private, dev_conf all important and used a lot. > >>>> > >>>> I want to protect "struct rte_eth_dev_data" from noise as much as > >>>> possible, though what is noise is not always that clear. > >>>> > >>>> This restore flag is not critical, and I expect most of the drivers > >>>> won't care and populate this restore flag at all. That is why to me > >>>> it is better have dedicated struct for it and only drivers care about > >>>> restore > >> feature know it. > >>> > >>> I see. Thank you very much for the explanation. > >>> > >>> In this case, it looks like adding this to dev_ops is the way to go. > >>> > >>> So, summarizing it all: > >>> > >>> 1. dev_ops should be extended with a callback with the following > >>> signature and > >> enums/flags: > >>> > >>> enum rte_eth_dev_operation op { > >>> RTE_ETH_START, > >>> RTE_ETH_STOP, > >>> RTE_ETH_RESET, > >>> }; > >>> > >>> #define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0) #define > >>> RTE_ETH_RESTORE_PROMISC RTE_BIT32(1) #define > >> RTE_ETH_RESTORE_ALLMULTI > >>> RTE_BIT32(2) > >>> > >>> void (*get_restore_flags)( > >>> struct rte_eth_dev *dev, > >>> enum rte_eth_dev_operation op, > >>> uint32_t *flags); > >>> > >>> 2. rte_eth_dev_start() will work as follows: > >>> > >>> - Call dev->dev_ops->dev_start() > >>> - Call dev->dev_ops->get_restore_flags(dev, RTE_ETH_START, &flags). If > >>> callback > >> is not provided, assume flags == 0. > >>> - if (flags & RTE_ETH_RESTORE_MAC_ADDR) - restore MAC addresses > >>> - and so on... > >>> > >> > >> All above looks good. > >> > >>> Also, I would like to add the following: > >>> > >>> 3. Patchset introducing this change should add get_restore_flags() > >> implementation to all drivers, which informs that all config should be > >> restored. > >>> This would preserve the current behavior. > >>> Later, this could be refined driver by driver. > >>> > >>> What do you think? > >>> > >> > >> What you are saying is correct, but I suspect most of the drivers don't > >> really need > >> this restore, but they have it since it was in the ethdev layer. > >> > >> If we introduce back restore via get_restore_flags(), it may stay as it is > >> in drivers, at > >> least for most of them. > >> > >> What do you think to risk breaking stuff for this case. > >> > >> So don't implement this in the drivers by default, so who needs it will > >> recognize > >> the issue and will implement it. If we merge this patch for -rc1, it gives > >> enough > >> time for drivers to detect the issue and fix it. > > > > It seems rather too risky, especially considering that for example, there > > are a few Intel drivers which do not have maintainers (like > i40e). > > So, I don't know what will happen to such drivers. They may be left broken > > (if they are affected) for 24.11 and future releases. > > But I agree that if default behavior is preserved, this dependence of > > drivers on config restore might stay as is. > > I'm on the fence about it. > > > > Yeah, not sure. > > Do you think the dev_ops function can be implemented in the > 'ethdev_driver.c' and all drivers use exact same function? > So this reduces changes and duplication in drivers while preserving the > behavior.
Wonder why it can't be done visa-versa? If this new function is not implemented by PMD, then simply assume that everything needs to be restored (as it happens now)? > >> > >> Only we may implement this to the drivers that exist when this restore > >> code was > >> introduced. > >> I mean whatever driver exist in the initial DPDK commit, implement this > >> logic only > >> to those drivers. > > > > Seems reasonable to me. In this case, it would be igb (IIUC, now it's named > > e1000) and ixgbe. > > > >> > >>> Also, there's an open question about 'stop' and 'reset' operations. > >>> At the moment, ethdev layer does not do any config manipulation during > >>> these > >> operations. > >>> Maybe we should limit get_restore_flags() to 'start' only? > >>> > >> > >> Ack, I was about to suggest the same, for now only have 'RTE_ETH_START' > >> as a placeholder for later possible usages. > > > > Ack > > > >> > >>>> > >>>> > >>>> > >>>>>> > >>>>>> So PMDs only will provide what to restore with an internal API and > >>>>>> common ethdev layer will restore it. > >>>>>> If no restore required PMD may not implement .get_restore_flags() at > >>>>>> all. > >>>>>> > >>>>>> Additionally, RTE_ETH_START, RTE_ETH_RESET etc flag can be provided > >>>>>> to internal API to get what to restore in different states... > >>>>>> > >>>>>>>> Suggested 'internal_flag' in "struct rte_eth_dev_data" can be > >>>>>>>> confusing and open to interpretation what to use it for and by > >>>>>>>> time become source of defect. > >>>>>>> > >>>>>>> Yes, same thoughts. > >>>>>>> > >>>>>>>> Instead what do you think to have a separate, dedicated data struct > >>>>>>>> for it? > >>>>>>> > >>>>>>> Hmm... not sure I understood you here... > >>>>>>> > >>>>>>>> > >>>>>>>>> Might be we can move this restoration code into the new ethdev > >>>>>>>>> helper function,(ethdevd_user_config_restore() or so) that PMD > >>>>>>>>> can invoke > >>>>>> during its dev_start() if needed? > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_PROMISC_FORCE_RESTORE > >>>> RTE_BIT32(0) > >>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_ALLMULTI_FORCE_RESTORE > >>>> RTE_BIT32(1) > >>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_MAC_ADDR_FORCE_RESTORE > >>>>>> RTE_BIT32(2) > >>>>>>>>>> > >>>>>>>>>> struct rte_eth_dev_data { > >>>>>>>>>> /* snip */ > >>>>>>>>>> > >>>>>>>>>> uint32_t dev_flags; > >>>>>>>>>> > >>>>>>>>>> /** > >>>>>>>>>> * Internal device capabilities, used only by ethdev library. > >>>>>>>>>> * Certain functionalities provided by the library might > >>>> enabled/disabled, > >>>>>>>>>> * based on driver exposing certain capabilities. > >>>>>>>>>> */ > >>>>>>>>>> uint32_t internal_flags; > >>>>>>>>>> > >>>>>>>>>> /* snip */ > >>>>>>>>>> }; > >>>>>>>>>> > >>>>>>>>>>> Also perhaps we have go into details what needs to be restored > >>>>>>>>>>> after 'stop' and what needs to be restored after 'reset' and > >>>>>>>>>>> use similar > >>>>>> mechanism etc... > >>>>>>>>>> > >>>>>>>>>> I think we should look into that. > >>>>>>>>>> Any 'codification' of semantics between drivers and ethdev > >>>>>>>>>> library is good in > >>>>>> my opinion. > >>>>>>>>>> > >>>>>>>>>> At least right now, ethdev does not change any configuration in > >>>>>>>>>> 'stop' and > >>>>>> 'reset' from what I see. > >>>>>>>>>> But that's on library side only. > >>>>>>>>>> > >>>>>>>>>>>> This way, if we would conclude that it makes sense for > >>>>>>>>>>>> .dev_start() to handle all starting configuration aspects, we > >>>>>>>>>>>> could track which drivers still rely > >>>>>>>>>>> on configuration restore. > >>>>>>>>>>>> > >>>>>>>>>>>> Dariusz Sosnowski (4): > >>>>>>>>>>>> ethdev: rework config restore > >>>>>>>>>>>> ethdev: omit promiscuous config restore if not required > >>>>>>>>>>>> ethdev: omit all multicast config restore if not required > >>>>>>>>>>>> ethdev: omit MAC address restore if not required > >>>>>>>>>>>> > >>>>>>>>>>>> lib/ethdev/rte_ethdev.c | 39 > >>>>>>>>>>>> ++++++++++++++++++++++++++++++++++----- > >>>>>>>>>>>> lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++ > >>>>>>>>>>>> 2 files changed, 52 insertions(+), 5 deletions(-) > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> 2.39.5 > >>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>> > >