On 10/10/2024 5:23 PM, Dariusz Sosnowski wrote: >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Thursday, October 10, 2024 14:52 >> To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Konstantin Ananyev >> <konstantin.anan...@huawei.com>; NBU-Contact-Thomas Monjalon (EXTERNAL) >> <tho...@monjalon.net>; Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> >> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> >> Subject: Re: [RFC 0/4] ethdev: rework config restore >> >> External email: Use caution opening links or attachments >> >> >> On 10/10/2024 1:08 PM, Dariusz Sosnowski wrote: >>>> -----Original Message----- >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>> Sent: Thursday, October 10, 2024 01:17 >>>> To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Konstantin Ananyev >>>> <konstantin.anan...@huawei.com>; NBU-Contact-Thomas Monjalon >>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko >>>> <andrew.rybche...@oktetlabs.ru> >>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> >>>> Subject: Re: [RFC 0/4] ethdev: rework config restore >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote: >>>>>> -----Original Message----- >>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>>>> Sent: Wednesday, October 9, 2024 03:08 >>>>>> To: Konstantin Ananyev <konstantin.anan...@huawei.com>; Dariusz >>>>>> Sosnowski <dsosnow...@nvidia.com>; NBU-Contact-Thomas Monjalon >>>>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko >>>>>> <andrew.rybche...@oktetlabs.ru> >>>>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> >>>>>> Subject: Re: [RFC 0/4] ethdev: rework config restore >>>>>> >>>>>> External email: Use caution opening links or attachments >>>>>> >>>>>> >>>>>> On 10/8/2024 6:21 PM, Konstantin Ananyev wrote: >>>>>>> >>>>>>> >>>>>>>>>>>> We have been working on optimizing the latency of calls to >>>>>>>>>>>> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of >>>>>>>>>>>> the work requires changes in the implementation of >>>>>>>>>>>> .dev_start() PMD callback, but I also wanted to start a >>>>>>>>>>>> discussion regarding configuration restore. >>>>>>>>>>>> >>>>>>>>>>>> rte_eth_dev_start() does a few things on top of calling >>>>>>>>>>>> .dev_start() >>>>>> callback: >>>>>>>>>>>> >>>>>>>>>>>> - Before calling it: >>>>>>>>>>>> - eth_dev_mac_restore() - if device supports >>>>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; >>>>>>>>>>>> - After calling it: >>>>>>>>>>>> - eth_dev_mac_restore() - if device does not support >>>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; >>>>>>>>>>>> - restore promiscuous config >>>>>>>>>>>> - restore all multicast config >>>>>>>>>>>> >>>>>>>>>>>> eth_dev_mac_restore() iterates over all known MAC addresses - >>>>>>>>>>>> stored in rte_eth_dev_data.mac_addrs array - and calls >>>>>>>>>>>> .mac_addr_set() and .mac_addr_add() callbacks to apply these >>>>>>>>>>>> MAC >>>>>> addresses. >>>>>>>>>>>> >>>>>>>>>>>> Promiscuous config restore checks if promiscuous mode is >>>>>>>>>>>> enabled or not, and calls .promiscuous_enable() or >>>>>>>>>>>> .promiscuous_disable() >>>>>> callback. >>>>>>>>>>>> >>>>>>>>>>>> All multicast config restore checks if all multicast mode is >>>>>>>>>>>> enabled or not, and calls .allmulticast_enable() or >>>>>>>>>>>> .allmulticast_disable() >>>>>> callback. >>>>>>>>>>>> >>>>>>>>>>>> Callbacks are called directly in all of these cases, to >>>>>>>>>>>> bypass the checks for applying the same configuration, which >>>>>>>>>>>> exist in relevant >>>>>> APIs. >>>>>>>>>>>> Checks are bypassed to force drivers to reapply the configuration. >>>>>>>>>>>> >>>>>>>>>>>> Let's consider what happens in the following sequence of API calls. >>>>>>>>>>>> >>>>>>>>>>>> 1. rte_eth_dev_configure() >>>>>>>>>>>> 2. rte_eth_tx_queue_setup() >>>>>>>>>>>> 3. rte_eth_rx_queue_setup() >>>>>>>>>>>> 4. rte_eth_promiscuous_enable() >>>>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>>>>> - Stores promiscuous state in dev->data->promiscuous 5. >>>>>>>>>>>> rte_eth_allmulticast_enable() >>>>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>>>>>>> - Stores allmulticast state in dev->data->allmulticast 6. >>>>>>>>>>>> rte_eth_dev_start() >>>>>>>>>>>> - Call dev->dev_ops->dev_start() >>>>>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address >>>>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>>>>>>> >>>>>>>>>>>> Even though all configuration is available in dev->data after >>>>>>>>>>>> step 5, library forces reapplying this configuration in step 6. >>>>>>>>>>>> >>>>>>>>>>>> In mlx5 PMD case all relevant callbacks require communication >>>>>>>>>>>> with the kernel driver, to configure the device (mlx5 PMD >>>>>>>>>>>> must create/destroy new kernel flow rules and/or change netdev >> config). >>>>>>>>>>>> >>>>>>>>>>>> mlx5 PMD handles applying all configuration in .dev_start(), >>>>>>>>>>>> so the following forced callbacks force additional >>>>>>>>>>>> communication with the kernel. The >>>>>>>>>>> same configuration is applied multiple times. >>>>>>>>>>>> >>>>>>>>>>>> As an optimization, mlx5 PMD could check if a given >>>>>>>>>>>> configuration was applied, but this would duplicate the >>>>>>>>>>>> functionality of the library (for example >>>>>>>>>>>> rte_eth_promiscuous_enable() does not call the driver if >>>>>>>>>>>> dev->data->promiscuous is set). >>>>>>>>>>>> >>>>>>>>>>>> Question: Since all of the configuration is available before >>>>>>>>>>>> .dev_start() callback is called, why ethdev library does not >>>>>>>>>>>> expect .dev_start() to >>>>>>>>>>> take this configuration into account? >>>>>>>>>>>> In other words, why library has to reapply the configuration? >>>>>>>>>>>> >>>>>>>>>>>> I could not find any particular reason why configuration >>>>>>>>>>>> restore exists as part of the process (it was in the initial DPDK >> commit). >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> My assumption is .dev_stop() cause these values reset in some >>>>>>>>>>> devices, so >>>>>>>>>>> .dev_start() restores them back. >>>>>>>>>>> @Bruce or @Konstantin may remember the history. >>>>>>>>> >>>>>>>>> Yep, as I remember, at least some Intel PMDs calling hw_reset() >>>>>>>>> ad >>>>>>>>> dec_stop() and even dev_start() to make sure that HW is in a >>>>>>>>> clean >>>>>>>>> (known) >>>>>> state. >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> But I agree this is device specific behavior, and can be >>>>>>>>>>> managed by what device requires. >>>>>>>>> >>>>>>>>> Probably yes. >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The patches included in this RFC, propose a mechanism which >>>>>>>>>>>> would help with managing which drivers rely on forceful >>>>>>>>>>>> configuration >>>> restore. >>>>>>>>>>>> Drivers could advertise if forceful configuration restore is >>>>>>>>>>>> needed through `RTE_ETH_DEV_*_FORCE_RESTORE` device flag. If >>>>>>>>>>>> this flag is set, then the driver in question requires ethdev >>>>>>>>>>>> to forcefully restore >>>>>>>>>>> configuration. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> OK to use flag for it, but not sure about using >>>>>>>>>>> 'dev_info->dev_flags' >>>>>>>>>>> (RTE_ETH_DEV_*) for this, as this flag is shared with user and >>>>>>>>>>> this is all dpdk internal. >>>>>>>>>>> >>>>>>>>>>> What about to have a dedicated flag for it? We can have a >>>>>>>>>>> dedicated set of flag values for restore. >>>>>>>>>> >>>>>>>>>> Agreed. What do you think about the following? >>>>>>>>> >>>>>>>>> Instead of exposing that, can we probably make it transparent to >>>>>>>>> the user and probably ethdev layer too? >>>>>>>>> >>>>>>>> >>>>>>>> +1 to make it transparent to user, but not sure if we can make it >>>>>>>> transparent to ethdev layer. >>>>>>> >>>>>>> Just to be clear: >>>>>>> Let say, using example from above: >>>>>>> >>>>>>> rte_eth_dev_start() >>>>>>> - Call dev->dev_ops->dev_start() >>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address >>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>> >>>>>>> We probably can introduce ethdev internal function (still visible >>>>>>> to >>>>>>> PMDs) that would do last 3 steps: >>>>>>> ethdev_replay_user_conf(...) >>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address >>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>> >>>>>>> And let PMD itself to decide does it needs to call it at dev_start() or >>>>>>> not. >>>>>>> So it will become: >>>>>>> rte_eth_dev_start() >>>>>>> - Call dev->dev_ops->dev_start() >>>>>>> -Call ethdev_replay_user_conf(.) >>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC >>>>>>> address >>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>> -Call dev->dev_ops->allmulticast_enable() >>>>>>> >>>>>>> For PMDs that do need to restore user provided config And >>>>>>> rte_eth_dev_start() >>>>>>> - Call dev->dev_ops->dev_start() >>>>>>> >>>>>>> For those who do not. >>>>>>> >>>>>> >>>>>> OK, got it what you mean. >>>>>> Pushing restore functionality to PMDs works, but this may be doing >>>>>> redundant work on each PMD. >>>>>> >>>>>> Instead Dariusz suggests PMD to provide a flag to ehtdev to what to >>>>>> restore and common code in ethdev does the work. >>>>>> My below dedicated data struct comment is to have this flag in a >>>>>> new struct, overall like following: >>>>>> >>>>>> rte_eth_dev_start() >>>>>> - Call dev->dev_ops->dev_start() >>>>>> - Call dev->dev_ops->get_restore_flags(ethdev, RTE_ETH_START, &flags) >>>>>> - if (flags & MAC) dev->dev_ops->mac_addr_set() >>>>>> - if (flags & PROMISC) dev->dev_ops->promiscuous_enable() >>>>>> - ... >>>>> >>>>> Could you please explain what is the benefit of exposing flags >>>>> through dev_ops >>>> callback vs a dedicated flags field in rte_eth_dev_data? >>>>> In both solutions: >>>>> - config restore is transparent to the user, >>>>> - drivers can omit config restore (either by not implementing the >>>>> callback or not providing the flags), >>>>> - an ABI change is introduced (not a huge concern, at least for 24.11). >>>>> >>>>> I understand that my initial proposal with "internal_flags" was too >>>>> vague, but renaming and splitting this field into: >>>>> >>>>> - dev_start_restore_flags >>>>> - dev_reset_restore_flags >>>>> - and so on... >>>>> >>>>> seems sufficient, at least in my opinion. >>>>> >>>> >>>> Hi Dariusz, >>>> >>>> Putting flags to rte_eth_dev_data works, and it is easier since there >>>> is direct access from rte_eth_dev to rte_eth_dev_data, so you don't >>>> need new dev_ops. So this is a valid option. >>>> >>>> But benefit of new dev_ops is to keep "struct rte_eth_dev_data" clean. >>>> >>>> "struct rte_eth_dev_data" is integral data structure for ethdev and >>>> it is used in multiple locations, mostly related to the datapath and >>>> all drivers needs to deal with fields of this struct. >>>> Like [rx]_queues, dev_private, dev_conf all important and used a lot. >>>> >>>> I want to protect "struct rte_eth_dev_data" from noise as much as >>>> possible, though what is noise is not always that clear. >>>> >>>> This restore flag is not critical, and I expect most of the drivers >>>> won't care and populate this restore flag at all. That is why to me >>>> it is better have dedicated struct for it and only drivers care about >>>> restore >> feature know it. >>> >>> I see. Thank you very much for the explanation. >>> >>> In this case, it looks like adding this to dev_ops is the way to go. >>> >>> So, summarizing it all: >>> >>> 1. dev_ops should be extended with a callback with the following signature >>> and >> enums/flags: >>> >>> enum rte_eth_dev_operation op { >>> RTE_ETH_START, >>> RTE_ETH_STOP, >>> RTE_ETH_RESET, >>> }; >>> >>> #define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0) #define >>> RTE_ETH_RESTORE_PROMISC RTE_BIT32(1) #define >> RTE_ETH_RESTORE_ALLMULTI >>> RTE_BIT32(2) >>> >>> void (*get_restore_flags)( >>> struct rte_eth_dev *dev, >>> enum rte_eth_dev_operation op, >>> uint32_t *flags); >>> >>> 2. rte_eth_dev_start() will work as follows: >>> >>> - Call dev->dev_ops->dev_start() >>> - Call dev->dev_ops->get_restore_flags(dev, RTE_ETH_START, &flags). If >>> callback >> is not provided, assume flags == 0. >>> - if (flags & RTE_ETH_RESTORE_MAC_ADDR) - restore MAC addresses >>> - and so on... >>> >> >> All above looks good. >> >>> Also, I would like to add the following: >>> >>> 3. Patchset introducing this change should add get_restore_flags() >> implementation to all drivers, which informs that all config should be >> restored. >>> This would preserve the current behavior. >>> Later, this could be refined driver by driver. >>> >>> What do you think? >>> >> >> What you are saying is correct, but I suspect most of the drivers don't >> really need >> this restore, but they have it since it was in the ethdev layer. >> >> If we introduce back restore via get_restore_flags(), it may stay as it is >> in drivers, at >> least for most of them. >> >> What do you think to risk breaking stuff for this case. >> >> So don't implement this in the drivers by default, so who needs it will >> recognize >> the issue and will implement it. If we merge this patch for -rc1, it gives >> enough >> time for drivers to detect the issue and fix it. > > It seems rather too risky, especially considering that for example, there are > a few Intel drivers which do not have maintainers (like i40e). > So, I don't know what will happen to such drivers. They may be left broken > (if they are affected) for 24.11 and future releases. > But I agree that if default behavior is preserved, this dependence of drivers > on config restore might stay as is. > I'm on the fence about it. >
Yeah, not sure. Do you think the dev_ops function can be implemented in the 'ethdev_driver.c' and all drivers use exact same function? So this reduces changes and duplication in drivers while preserving the behavior. >> >> Only we may implement this to the drivers that exist when this restore code >> was >> introduced. >> I mean whatever driver exist in the initial DPDK commit, implement this >> logic only >> to those drivers. > > Seems reasonable to me. In this case, it would be igb (IIUC, now it's named > e1000) and ixgbe. > >> >>> Also, there's an open question about 'stop' and 'reset' operations. >>> At the moment, ethdev layer does not do any config manipulation during these >> operations. >>> Maybe we should limit get_restore_flags() to 'start' only? >>> >> >> Ack, I was about to suggest the same, for now only have 'RTE_ETH_START' >> as a placeholder for later possible usages. > > Ack > >> >>>> >>>> >>>> >>>>>> >>>>>> So PMDs only will provide what to restore with an internal API and >>>>>> common ethdev layer will restore it. >>>>>> If no restore required PMD may not implement .get_restore_flags() at all. >>>>>> >>>>>> Additionally, RTE_ETH_START, RTE_ETH_RESET etc flag can be provided >>>>>> to internal API to get what to restore in different states... >>>>>> >>>>>>>> Suggested 'internal_flag' in "struct rte_eth_dev_data" can be >>>>>>>> confusing and open to interpretation what to use it for and by >>>>>>>> time become source of defect. >>>>>>> >>>>>>> Yes, same thoughts. >>>>>>> >>>>>>>> Instead what do you think to have a separate, dedicated data struct >>>>>>>> for it? >>>>>>> >>>>>>> Hmm... not sure I understood you here... >>>>>>> >>>>>>>> >>>>>>>>> Might be we can move this restoration code into the new ethdev >>>>>>>>> helper function,(ethdevd_user_config_restore() or so) that PMD >>>>>>>>> can invoke >>>>>> during its dev_start() if needed? >>>>>>>>> >>>>>>>>>> >>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_PROMISC_FORCE_RESTORE >>>> RTE_BIT32(0) >>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_ALLMULTI_FORCE_RESTORE >>>> RTE_BIT32(1) >>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_MAC_ADDR_FORCE_RESTORE >>>>>> RTE_BIT32(2) >>>>>>>>>> >>>>>>>>>> struct rte_eth_dev_data { >>>>>>>>>> /* snip */ >>>>>>>>>> >>>>>>>>>> uint32_t dev_flags; >>>>>>>>>> >>>>>>>>>> /** >>>>>>>>>> * Internal device capabilities, used only by ethdev library. >>>>>>>>>> * Certain functionalities provided by the library might >>>> enabled/disabled, >>>>>>>>>> * based on driver exposing certain capabilities. >>>>>>>>>> */ >>>>>>>>>> uint32_t internal_flags; >>>>>>>>>> >>>>>>>>>> /* snip */ >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>>> Also perhaps we have go into details what needs to be restored >>>>>>>>>>> after 'stop' and what needs to be restored after 'reset' and >>>>>>>>>>> use similar >>>>>> mechanism etc... >>>>>>>>>> >>>>>>>>>> I think we should look into that. >>>>>>>>>> Any 'codification' of semantics between drivers and ethdev >>>>>>>>>> library is good in >>>>>> my opinion. >>>>>>>>>> >>>>>>>>>> At least right now, ethdev does not change any configuration in >>>>>>>>>> 'stop' and >>>>>> 'reset' from what I see. >>>>>>>>>> But that's on library side only. >>>>>>>>>> >>>>>>>>>>>> This way, if we would conclude that it makes sense for >>>>>>>>>>>> .dev_start() to handle all starting configuration aspects, we >>>>>>>>>>>> could track which drivers still rely >>>>>>>>>>> on configuration restore. >>>>>>>>>>>> >>>>>>>>>>>> Dariusz Sosnowski (4): >>>>>>>>>>>> ethdev: rework config restore >>>>>>>>>>>> ethdev: omit promiscuous config restore if not required >>>>>>>>>>>> ethdev: omit all multicast config restore if not required >>>>>>>>>>>> ethdev: omit MAC address restore if not required >>>>>>>>>>>> >>>>>>>>>>>> lib/ethdev/rte_ethdev.c | 39 >>>>>>>>>>>> ++++++++++++++++++++++++++++++++++----- >>>>>>>>>>>> lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++ >>>>>>>>>>>> 2 files changed, 52 insertions(+), 5 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> 2.39.5 >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >