> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: Thursday, October 10, 2024 14:52 > To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Konstantin Ananyev > <konstantin.anan...@huawei.com>; NBU-Contact-Thomas Monjalon (EXTERNAL) > <tho...@monjalon.net>; Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> > Subject: Re: [RFC 0/4] ethdev: rework config restore > > External email: Use caution opening links or attachments > > > On 10/10/2024 1:08 PM, Dariusz Sosnowski wrote: > >> -----Original Message----- > >> From: Ferruh Yigit <ferruh.yi...@amd.com> > >> Sent: Thursday, October 10, 2024 01:17 > >> To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Konstantin Ananyev > >> <konstantin.anan...@huawei.com>; NBU-Contact-Thomas Monjalon > >> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko > >> <andrew.rybche...@oktetlabs.ru> > >> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> > >> Subject: Re: [RFC 0/4] ethdev: rework config restore > >> > >> External email: Use caution opening links or attachments > >> > >> > >> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote: > >>>> -----Original Message----- > >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>>> Sent: Wednesday, October 9, 2024 03:08 > >>>> To: Konstantin Ananyev <konstantin.anan...@huawei.com>; Dariusz > >>>> Sosnowski <dsosnow...@nvidia.com>; NBU-Contact-Thomas Monjalon > >>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko > >>>> <andrew.rybche...@oktetlabs.ru> > >>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> > >>>> Subject: Re: [RFC 0/4] ethdev: rework config restore > >>>> > >>>> External email: Use caution opening links or attachments > >>>> > >>>> > >>>> On 10/8/2024 6:21 PM, Konstantin Ananyev wrote: > >>>>> > >>>>> > >>>>>>>>>> We have been working on optimizing the latency of calls to > >>>>>>>>>> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of > >>>>>>>>>> the work requires changes in the implementation of > >>>>>>>>>> .dev_start() PMD callback, but I also wanted to start a > >>>>>>>>>> discussion regarding configuration restore. > >>>>>>>>>> > >>>>>>>>>> rte_eth_dev_start() does a few things on top of calling > >>>>>>>>>> .dev_start() > >>>> callback: > >>>>>>>>>> > >>>>>>>>>> - Before calling it: > >>>>>>>>>> - eth_dev_mac_restore() - if device supports > >>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; > >>>>>>>>>> - After calling it: > >>>>>>>>>> - eth_dev_mac_restore() - if device does not support > >>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; > >>>>>>>>>> - restore promiscuous config > >>>>>>>>>> - restore all multicast config > >>>>>>>>>> > >>>>>>>>>> eth_dev_mac_restore() iterates over all known MAC addresses - > >>>>>>>>>> stored in rte_eth_dev_data.mac_addrs array - and calls > >>>>>>>>>> .mac_addr_set() and .mac_addr_add() callbacks to apply these > >>>>>>>>>> MAC > >>>> addresses. > >>>>>>>>>> > >>>>>>>>>> Promiscuous config restore checks if promiscuous mode is > >>>>>>>>>> enabled or not, and calls .promiscuous_enable() or > >>>>>>>>>> .promiscuous_disable() > >>>> callback. > >>>>>>>>>> > >>>>>>>>>> All multicast config restore checks if all multicast mode is > >>>>>>>>>> enabled or not, and calls .allmulticast_enable() or > >>>>>>>>>> .allmulticast_disable() > >>>> callback. > >>>>>>>>>> > >>>>>>>>>> Callbacks are called directly in all of these cases, to > >>>>>>>>>> bypass the checks for applying the same configuration, which > >>>>>>>>>> exist in relevant > >>>> APIs. > >>>>>>>>>> Checks are bypassed to force drivers to reapply the configuration. > >>>>>>>>>> > >>>>>>>>>> Let's consider what happens in the following sequence of API calls. > >>>>>>>>>> > >>>>>>>>>> 1. rte_eth_dev_configure() > >>>>>>>>>> 2. rte_eth_tx_queue_setup() > >>>>>>>>>> 3. rte_eth_rx_queue_setup() > >>>>>>>>>> 4. rte_eth_promiscuous_enable() > >>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>>>>> - Stores promiscuous state in dev->data->promiscuous 5. > >>>>>>>>>> rte_eth_allmulticast_enable() > >>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>>>>>>> - Stores allmulticast state in dev->data->allmulticast 6. > >>>>>>>>>> rte_eth_dev_start() > >>>>>>>>>> - Call dev->dev_ops->dev_start() > >>>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address > >>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>>>>>>> > >>>>>>>>>> Even though all configuration is available in dev->data after > >>>>>>>>>> step 5, library forces reapplying this configuration in step 6. > >>>>>>>>>> > >>>>>>>>>> In mlx5 PMD case all relevant callbacks require communication > >>>>>>>>>> with the kernel driver, to configure the device (mlx5 PMD > >>>>>>>>>> must create/destroy new kernel flow rules and/or change netdev > config). > >>>>>>>>>> > >>>>>>>>>> mlx5 PMD handles applying all configuration in .dev_start(), > >>>>>>>>>> so the following forced callbacks force additional > >>>>>>>>>> communication with the kernel. The > >>>>>>>>> same configuration is applied multiple times. > >>>>>>>>>> > >>>>>>>>>> As an optimization, mlx5 PMD could check if a given > >>>>>>>>>> configuration was applied, but this would duplicate the > >>>>>>>>>> functionality of the library (for example > >>>>>>>>>> rte_eth_promiscuous_enable() does not call the driver if > >>>>>>>>>> dev->data->promiscuous is set). > >>>>>>>>>> > >>>>>>>>>> Question: Since all of the configuration is available before > >>>>>>>>>> .dev_start() callback is called, why ethdev library does not > >>>>>>>>>> expect .dev_start() to > >>>>>>>>> take this configuration into account? > >>>>>>>>>> In other words, why library has to reapply the configuration? > >>>>>>>>>> > >>>>>>>>>> I could not find any particular reason why configuration > >>>>>>>>>> restore exists as part of the process (it was in the initial DPDK > commit). > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> My assumption is .dev_stop() cause these values reset in some > >>>>>>>>> devices, so > >>>>>>>>> .dev_start() restores them back. > >>>>>>>>> @Bruce or @Konstantin may remember the history. > >>>>>>> > >>>>>>> Yep, as I remember, at least some Intel PMDs calling hw_reset() > >>>>>>> ad > >>>>>>> dec_stop() and even dev_start() to make sure that HW is in a > >>>>>>> clean > >>>>>>> (known) > >>>> state. > >>>>>>> > >>>>>>>>> > >>>>>>>>> But I agree this is device specific behavior, and can be > >>>>>>>>> managed by what device requires. > >>>>>>> > >>>>>>> Probably yes. > >>>>>>> > >>>>>>>>> > >>>>>>>>>> The patches included in this RFC, propose a mechanism which > >>>>>>>>>> would help with managing which drivers rely on forceful > >>>>>>>>>> configuration > >> restore. > >>>>>>>>>> Drivers could advertise if forceful configuration restore is > >>>>>>>>>> needed through `RTE_ETH_DEV_*_FORCE_RESTORE` device flag. If > >>>>>>>>>> this flag is set, then the driver in question requires ethdev > >>>>>>>>>> to forcefully restore > >>>>>>>>> configuration. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> OK to use flag for it, but not sure about using > >>>>>>>>> 'dev_info->dev_flags' > >>>>>>>>> (RTE_ETH_DEV_*) for this, as this flag is shared with user and > >>>>>>>>> this is all dpdk internal. > >>>>>>>>> > >>>>>>>>> What about to have a dedicated flag for it? We can have a > >>>>>>>>> dedicated set of flag values for restore. > >>>>>>>> > >>>>>>>> Agreed. What do you think about the following? > >>>>>>> > >>>>>>> Instead of exposing that, can we probably make it transparent to > >>>>>>> the user and probably ethdev layer too? > >>>>>>> > >>>>>> > >>>>>> +1 to make it transparent to user, but not sure if we can make it > >>>>>> transparent to ethdev layer. > >>>>> > >>>>> Just to be clear: > >>>>> Let say, using example from above: > >>>>> > >>>>> rte_eth_dev_start() > >>>>> - Call dev->dev_ops->dev_start() > >>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address > >>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>> > >>>>> We probably can introduce ethdev internal function (still visible > >>>>> to > >>>>> PMDs) that would do last 3 steps: > >>>>> ethdev_replay_user_conf(...) > >>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address > >>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>> - Call dev->dev_ops->allmulticast_enable() > >>>>> > >>>>> And let PMD itself to decide does it needs to call it at dev_start() or > >>>>> not. > >>>>> So it will become: > >>>>> rte_eth_dev_start() > >>>>> - Call dev->dev_ops->dev_start() > >>>>> -Call ethdev_replay_user_conf(.) > >>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC > >>>>> address > >>>>> - Call dev->dev_ops->promiscuous_enable() > >>>>> -Call dev->dev_ops->allmulticast_enable() > >>>>> > >>>>> For PMDs that do need to restore user provided config And > >>>>> rte_eth_dev_start() > >>>>> - Call dev->dev_ops->dev_start() > >>>>> > >>>>> For those who do not. > >>>>> > >>>> > >>>> OK, got it what you mean. > >>>> Pushing restore functionality to PMDs works, but this may be doing > >>>> redundant work on each PMD. > >>>> > >>>> Instead Dariusz suggests PMD to provide a flag to ehtdev to what to > >>>> restore and common code in ethdev does the work. > >>>> My below dedicated data struct comment is to have this flag in a > >>>> new struct, overall like following: > >>>> > >>>> rte_eth_dev_start() > >>>> - Call dev->dev_ops->dev_start() > >>>> - Call dev->dev_ops->get_restore_flags(ethdev, RTE_ETH_START, &flags) > >>>> - if (flags & MAC) dev->dev_ops->mac_addr_set() > >>>> - if (flags & PROMISC) dev->dev_ops->promiscuous_enable() > >>>> - ... > >>> > >>> Could you please explain what is the benefit of exposing flags > >>> through dev_ops > >> callback vs a dedicated flags field in rte_eth_dev_data? > >>> In both solutions: > >>> - config restore is transparent to the user, > >>> - drivers can omit config restore (either by not implementing the > >>> callback or not providing the flags), > >>> - an ABI change is introduced (not a huge concern, at least for 24.11). > >>> > >>> I understand that my initial proposal with "internal_flags" was too > >>> vague, but renaming and splitting this field into: > >>> > >>> - dev_start_restore_flags > >>> - dev_reset_restore_flags > >>> - and so on... > >>> > >>> seems sufficient, at least in my opinion. > >>> > >> > >> Hi Dariusz, > >> > >> Putting flags to rte_eth_dev_data works, and it is easier since there > >> is direct access from rte_eth_dev to rte_eth_dev_data, so you don't > >> need new dev_ops. So this is a valid option. > >> > >> But benefit of new dev_ops is to keep "struct rte_eth_dev_data" clean. > >> > >> "struct rte_eth_dev_data" is integral data structure for ethdev and > >> it is used in multiple locations, mostly related to the datapath and > >> all drivers needs to deal with fields of this struct. > >> Like [rx]_queues, dev_private, dev_conf all important and used a lot. > >> > >> I want to protect "struct rte_eth_dev_data" from noise as much as > >> possible, though what is noise is not always that clear. > >> > >> This restore flag is not critical, and I expect most of the drivers > >> won't care and populate this restore flag at all. That is why to me > >> it is better have dedicated struct for it and only drivers care about > >> restore > feature know it. > > > > I see. Thank you very much for the explanation. > > > > In this case, it looks like adding this to dev_ops is the way to go. > > > > So, summarizing it all: > > > > 1. dev_ops should be extended with a callback with the following signature > > and > enums/flags: > > > > enum rte_eth_dev_operation op { > > RTE_ETH_START, > > RTE_ETH_STOP, > > RTE_ETH_RESET, > > }; > > > > #define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0) #define > > RTE_ETH_RESTORE_PROMISC RTE_BIT32(1) #define > RTE_ETH_RESTORE_ALLMULTI > > RTE_BIT32(2) > > > > void (*get_restore_flags)( > > struct rte_eth_dev *dev, > > enum rte_eth_dev_operation op, > > uint32_t *flags); > > > > 2. rte_eth_dev_start() will work as follows: > > > > - Call dev->dev_ops->dev_start() > > - Call dev->dev_ops->get_restore_flags(dev, RTE_ETH_START, &flags). If > > callback > is not provided, assume flags == 0. > > - if (flags & RTE_ETH_RESTORE_MAC_ADDR) - restore MAC addresses > > - and so on... > > > > All above looks good. > > > Also, I would like to add the following: > > > > 3. Patchset introducing this change should add get_restore_flags() > implementation to all drivers, which informs that all config should be > restored. > > This would preserve the current behavior. > > Later, this could be refined driver by driver. > > > > What do you think? > > > > What you are saying is correct, but I suspect most of the drivers don't > really need > this restore, but they have it since it was in the ethdev layer. > > If we introduce back restore via get_restore_flags(), it may stay as it is in > drivers, at > least for most of them. > > What do you think to risk breaking stuff for this case. > > So don't implement this in the drivers by default, so who needs it will > recognize > the issue and will implement it. If we merge this patch for -rc1, it gives > enough > time for drivers to detect the issue and fix it.
It seems rather too risky, especially considering that for example, there are a few Intel drivers which do not have maintainers (like i40e). So, I don't know what will happen to such drivers. They may be left broken (if they are affected) for 24.11 and future releases. But I agree that if default behavior is preserved, this dependence of drivers on config restore might stay as is. I'm on the fence about it. > > Only we may implement this to the drivers that exist when this restore code > was > introduced. > I mean whatever driver exist in the initial DPDK commit, implement this logic > only > to those drivers. Seems reasonable to me. In this case, it would be igb (IIUC, now it's named e1000) and ixgbe. > > > Also, there's an open question about 'stop' and 'reset' operations. > > At the moment, ethdev layer does not do any config manipulation during these > operations. > > Maybe we should limit get_restore_flags() to 'start' only? > > > > Ack, I was about to suggest the same, for now only have 'RTE_ETH_START' > as a placeholder for later possible usages. Ack > > >> > >> > >> > >>>> > >>>> So PMDs only will provide what to restore with an internal API and > >>>> common ethdev layer will restore it. > >>>> If no restore required PMD may not implement .get_restore_flags() at all. > >>>> > >>>> Additionally, RTE_ETH_START, RTE_ETH_RESET etc flag can be provided > >>>> to internal API to get what to restore in different states... > >>>> > >>>>>> Suggested 'internal_flag' in "struct rte_eth_dev_data" can be > >>>>>> confusing and open to interpretation what to use it for and by > >>>>>> time become source of defect. > >>>>> > >>>>> Yes, same thoughts. > >>>>> > >>>>>> Instead what do you think to have a separate, dedicated data struct > >>>>>> for it? > >>>>> > >>>>> Hmm... not sure I understood you here... > >>>>> > >>>>>> > >>>>>>> Might be we can move this restoration code into the new ethdev > >>>>>>> helper function,(ethdevd_user_config_restore() or so) that PMD > >>>>>>> can invoke > >>>> during its dev_start() if needed? > >>>>>>> > >>>>>>>> > >>>>>>>> #define RTE_ETH_DEV_INTERNAL_PROMISC_FORCE_RESTORE > >> RTE_BIT32(0) > >>>>>>>> #define RTE_ETH_DEV_INTERNAL_ALLMULTI_FORCE_RESTORE > >> RTE_BIT32(1) > >>>>>>>> #define RTE_ETH_DEV_INTERNAL_MAC_ADDR_FORCE_RESTORE > >>>> RTE_BIT32(2) > >>>>>>>> > >>>>>>>> struct rte_eth_dev_data { > >>>>>>>> /* snip */ > >>>>>>>> > >>>>>>>> uint32_t dev_flags; > >>>>>>>> > >>>>>>>> /** > >>>>>>>> * Internal device capabilities, used only by ethdev library. > >>>>>>>> * Certain functionalities provided by the library might > >> enabled/disabled, > >>>>>>>> * based on driver exposing certain capabilities. > >>>>>>>> */ > >>>>>>>> uint32_t internal_flags; > >>>>>>>> > >>>>>>>> /* snip */ > >>>>>>>> }; > >>>>>>>> > >>>>>>>>> Also perhaps we have go into details what needs to be restored > >>>>>>>>> after 'stop' and what needs to be restored after 'reset' and > >>>>>>>>> use similar > >>>> mechanism etc... > >>>>>>>> > >>>>>>>> I think we should look into that. > >>>>>>>> Any 'codification' of semantics between drivers and ethdev > >>>>>>>> library is good in > >>>> my opinion. > >>>>>>>> > >>>>>>>> At least right now, ethdev does not change any configuration in > >>>>>>>> 'stop' and > >>>> 'reset' from what I see. > >>>>>>>> But that's on library side only. > >>>>>>>> > >>>>>>>>>> This way, if we would conclude that it makes sense for > >>>>>>>>>> .dev_start() to handle all starting configuration aspects, we > >>>>>>>>>> could track which drivers still rely > >>>>>>>>> on configuration restore. > >>>>>>>>>> > >>>>>>>>>> Dariusz Sosnowski (4): > >>>>>>>>>> ethdev: rework config restore > >>>>>>>>>> ethdev: omit promiscuous config restore if not required > >>>>>>>>>> ethdev: omit all multicast config restore if not required > >>>>>>>>>> ethdev: omit MAC address restore if not required > >>>>>>>>>> > >>>>>>>>>> lib/ethdev/rte_ethdev.c | 39 > >>>>>>>>>> ++++++++++++++++++++++++++++++++++----- > >>>>>>>>>> lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++ > >>>>>>>>>> 2 files changed, 52 insertions(+), 5 deletions(-) > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> 2.39.5 > >>>>>>>>>> > >>>>>>> > >>>>> > >>> > >