On 10/10/2024 11:58 PM, Konstantin Ananyev wrote: > > > config restore >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On 10/10/2024 1:08 PM, Dariusz Sosnowski wrote: >>>>>> -----Original Message----- >>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>>>> Sent: Thursday, October 10, 2024 01:17 >>>>>> To: Dariusz Sosnowski <dsosnow...@nvidia.com>; Konstantin Ananyev >>>>>> <konstantin.anan...@huawei.com>; NBU-Contact-Thomas Monjalon >>>>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko >>>>>> <andrew.rybche...@oktetlabs.ru> >>>>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> >>>>>> Subject: Re: [RFC 0/4] ethdev: rework config restore >>>>>> >>>>>> External email: Use caution opening links or attachments >>>>>> >>>>>> >>>>>> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote: >>>>>>>> -----Original Message----- >>>>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>>>>>> Sent: Wednesday, October 9, 2024 03:08 >>>>>>>> To: Konstantin Ananyev <konstantin.anan...@huawei.com>; Dariusz >>>>>>>> Sosnowski <dsosnow...@nvidia.com>; NBU-Contact-Thomas Monjalon >>>>>>>> (EXTERNAL) <tho...@monjalon.net>; Andrew Rybchenko >>>>>>>> <andrew.rybche...@oktetlabs.ru> >>>>>>>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richard...@intel.com> >>>>>>>> Subject: Re: [RFC 0/4] ethdev: rework config restore >>>>>>>> >>>>>>>> External email: Use caution opening links or attachments >>>>>>>> >>>>>>>> >>>>>>>> On 10/8/2024 6:21 PM, Konstantin Ananyev wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> We have been working on optimizing the latency of calls to >>>>>>>>>>>>>> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of >>>>>>>>>>>>>> the work requires changes in the implementation of >>>>>>>>>>>>>> .dev_start() PMD callback, but I also wanted to start a >>>>>>>>>>>>>> discussion regarding configuration restore. >>>>>>>>>>>>>> >>>>>>>>>>>>>> rte_eth_dev_start() does a few things on top of calling >>>>>>>>>>>>>> .dev_start() >>>>>>>> callback: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Before calling it: >>>>>>>>>>>>>> - eth_dev_mac_restore() - if device supports >>>>>>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; >>>>>>>>>>>>>> - After calling it: >>>>>>>>>>>>>> - eth_dev_mac_restore() - if device does not support >>>>>>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR; >>>>>>>>>>>>>> - restore promiscuous config >>>>>>>>>>>>>> - restore all multicast config >>>>>>>>>>>>>> >>>>>>>>>>>>>> eth_dev_mac_restore() iterates over all known MAC addresses - >>>>>>>>>>>>>> stored in rte_eth_dev_data.mac_addrs array - and calls >>>>>>>>>>>>>> .mac_addr_set() and .mac_addr_add() callbacks to apply these >>>>>>>>>>>>>> MAC >>>>>>>> addresses. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Promiscuous config restore checks if promiscuous mode is >>>>>>>>>>>>>> enabled or not, and calls .promiscuous_enable() or >>>>>>>>>>>>>> .promiscuous_disable() >>>>>>>> callback. >>>>>>>>>>>>>> >>>>>>>>>>>>>> All multicast config restore checks if all multicast mode is >>>>>>>>>>>>>> enabled or not, and calls .allmulticast_enable() or >>>>>>>>>>>>>> .allmulticast_disable() >>>>>>>> callback. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Callbacks are called directly in all of these cases, to >>>>>>>>>>>>>> bypass the checks for applying the same configuration, which >>>>>>>>>>>>>> exist in relevant >>>>>>>> APIs. >>>>>>>>>>>>>> Checks are bypassed to force drivers to reapply the >>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let's consider what happens in the following sequence of API >>>>>>>>>>>>>> calls. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. rte_eth_dev_configure() >>>>>>>>>>>>>> 2. rte_eth_tx_queue_setup() >>>>>>>>>>>>>> 3. rte_eth_rx_queue_setup() >>>>>>>>>>>>>> 4. rte_eth_promiscuous_enable() >>>>>>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>>>>>>> - Stores promiscuous state in dev->data->promiscuous 5. >>>>>>>>>>>>>> rte_eth_allmulticast_enable() >>>>>>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>>>>>>>>> - Stores allmulticast state in dev->data->allmulticast 6. >>>>>>>>>>>>>> rte_eth_dev_start() >>>>>>>>>>>>>> - Call dev->dev_ops->dev_start() >>>>>>>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC >>>>>>>>>>>>>> address >>>>>>>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>>>>>>>>> >>>>>>>>>>>>>> Even though all configuration is available in dev->data after >>>>>>>>>>>>>> step 5, library forces reapplying this configuration in step 6. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In mlx5 PMD case all relevant callbacks require communication >>>>>>>>>>>>>> with the kernel driver, to configure the device (mlx5 PMD >>>>>>>>>>>>>> must create/destroy new kernel flow rules and/or change netdev >>>> config). >>>>>>>>>>>>>> >>>>>>>>>>>>>> mlx5 PMD handles applying all configuration in .dev_start(), >>>>>>>>>>>>>> so the following forced callbacks force additional >>>>>>>>>>>>>> communication with the kernel. The >>>>>>>>>>>>> same configuration is applied multiple times. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As an optimization, mlx5 PMD could check if a given >>>>>>>>>>>>>> configuration was applied, but this would duplicate the >>>>>>>>>>>>>> functionality of the library (for example >>>>>>>>>>>>>> rte_eth_promiscuous_enable() does not call the driver if >>>>>>>>>>>>>> dev->data->promiscuous is set). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Question: Since all of the configuration is available before >>>>>>>>>>>>>> .dev_start() callback is called, why ethdev library does not >>>>>>>>>>>>>> expect .dev_start() to >>>>>>>>>>>>> take this configuration into account? >>>>>>>>>>>>>> In other words, why library has to reapply the configuration? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I could not find any particular reason why configuration >>>>>>>>>>>>>> restore exists as part of the process (it was in the initial DPDK >>>> commit). >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> My assumption is .dev_stop() cause these values reset in some >>>>>>>>>>>>> devices, so >>>>>>>>>>>>> .dev_start() restores them back. >>>>>>>>>>>>> @Bruce or @Konstantin may remember the history. >>>>>>>>>>> >>>>>>>>>>> Yep, as I remember, at least some Intel PMDs calling hw_reset() >>>>>>>>>>> ad >>>>>>>>>>> dec_stop() and even dev_start() to make sure that HW is in a >>>>>>>>>>> clean >>>>>>>>>>> (known) >>>>>>>> state. >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> But I agree this is device specific behavior, and can be >>>>>>>>>>>>> managed by what device requires. >>>>>>>>>>> >>>>>>>>>>> Probably yes. >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> The patches included in this RFC, propose a mechanism which >>>>>>>>>>>>>> would help with managing which drivers rely on forceful >>>>>>>>>>>>>> configuration >>>>>> restore. >>>>>>>>>>>>>> Drivers could advertise if forceful configuration restore is >>>>>>>>>>>>>> needed through `RTE_ETH_DEV_*_FORCE_RESTORE` device flag. If >>>>>>>>>>>>>> this flag is set, then the driver in question requires ethdev >>>>>>>>>>>>>> to forcefully restore >>>>>>>>>>>>> configuration. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> OK to use flag for it, but not sure about using >>>>>>>>>>>>> 'dev_info->dev_flags' >>>>>>>>>>>>> (RTE_ETH_DEV_*) for this, as this flag is shared with user and >>>>>>>>>>>>> this is all dpdk internal. >>>>>>>>>>>>> >>>>>>>>>>>>> What about to have a dedicated flag for it? We can have a >>>>>>>>>>>>> dedicated set of flag values for restore. >>>>>>>>>>>> >>>>>>>>>>>> Agreed. What do you think about the following? >>>>>>>>>>> >>>>>>>>>>> Instead of exposing that, can we probably make it transparent to >>>>>>>>>>> the user and probably ethdev layer too? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> +1 to make it transparent to user, but not sure if we can make it >>>>>>>>>> transparent to ethdev layer. >>>>>>>>> >>>>>>>>> Just to be clear: >>>>>>>>> Let say, using example from above: >>>>>>>>> >>>>>>>>> rte_eth_dev_start() >>>>>>>>> - Call dev->dev_ops->dev_start() >>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address >>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>>>> >>>>>>>>> We probably can introduce ethdev internal function (still visible >>>>>>>>> to >>>>>>>>> PMDs) that would do last 3 steps: >>>>>>>>> ethdev_replay_user_conf(...) >>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC address >>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>> - Call dev->dev_ops->allmulticast_enable() >>>>>>>>> >>>>>>>>> And let PMD itself to decide does it needs to call it at dev_start() >>>>>>>>> or not. >>>>>>>>> So it will become: >>>>>>>>> rte_eth_dev_start() >>>>>>>>> - Call dev->dev_ops->dev_start() >>>>>>>>> -Call ethdev_replay_user_conf(.) >>>>>>>>> - Call dev->dev_ops->mac_addr_set() - apply default MAC >>>>>>>>> address >>>>>>>>> - Call dev->dev_ops->promiscuous_enable() >>>>>>>>> -Call dev->dev_ops->allmulticast_enable() >>>>>>>>> >>>>>>>>> For PMDs that do need to restore user provided config And >>>>>>>>> rte_eth_dev_start() >>>>>>>>> - Call dev->dev_ops->dev_start() >>>>>>>>> >>>>>>>>> For those who do not. >>>>>>>>> >>>>>>>> >>>>>>>> OK, got it what you mean. >>>>>>>> Pushing restore functionality to PMDs works, but this may be doing >>>>>>>> redundant work on each PMD. >>>>>>>> >>>>>>>> Instead Dariusz suggests PMD to provide a flag to ehtdev to what to >>>>>>>> restore and common code in ethdev does the work. >>>>>>>> My below dedicated data struct comment is to have this flag in a >>>>>>>> new struct, overall like following: >>>>>>>> >>>>>>>> rte_eth_dev_start() >>>>>>>> - Call dev->dev_ops->dev_start() >>>>>>>> - Call dev->dev_ops->get_restore_flags(ethdev, RTE_ETH_START, >>>>>>>> &flags) >>>>>>>> - if (flags & MAC) dev->dev_ops->mac_addr_set() >>>>>>>> - if (flags & PROMISC) dev->dev_ops->promiscuous_enable() >>>>>>>> - ... >>>>>>> >>>>>>> Could you please explain what is the benefit of exposing flags >>>>>>> through dev_ops >>>>>> callback vs a dedicated flags field in rte_eth_dev_data? >>>>>>> In both solutions: >>>>>>> - config restore is transparent to the user, >>>>>>> - drivers can omit config restore (either by not implementing the >>>>>>> callback or not providing the flags), >>>>>>> - an ABI change is introduced (not a huge concern, at least for 24.11). >>>>>>> >>>>>>> I understand that my initial proposal with "internal_flags" was too >>>>>>> vague, but renaming and splitting this field into: >>>>>>> >>>>>>> - dev_start_restore_flags >>>>>>> - dev_reset_restore_flags >>>>>>> - and so on... >>>>>>> >>>>>>> seems sufficient, at least in my opinion. >>>>>>> >>>>>> >>>>>> Hi Dariusz, >>>>>> >>>>>> Putting flags to rte_eth_dev_data works, and it is easier since there >>>>>> is direct access from rte_eth_dev to rte_eth_dev_data, so you don't >>>>>> need new dev_ops. So this is a valid option. >>>>>> >>>>>> But benefit of new dev_ops is to keep "struct rte_eth_dev_data" clean. >>>>>> >>>>>> "struct rte_eth_dev_data" is integral data structure for ethdev and >>>>>> it is used in multiple locations, mostly related to the datapath and >>>>>> all drivers needs to deal with fields of this struct. >>>>>> Like [rx]_queues, dev_private, dev_conf all important and used a lot. >>>>>> >>>>>> I want to protect "struct rte_eth_dev_data" from noise as much as >>>>>> possible, though what is noise is not always that clear. >>>>>> >>>>>> This restore flag is not critical, and I expect most of the drivers >>>>>> won't care and populate this restore flag at all. That is why to me >>>>>> it is better have dedicated struct for it and only drivers care about >>>>>> restore >>>> feature know it. >>>>> >>>>> I see. Thank you very much for the explanation. >>>>> >>>>> In this case, it looks like adding this to dev_ops is the way to go. >>>>> >>>>> So, summarizing it all: >>>>> >>>>> 1. dev_ops should be extended with a callback with the following >>>>> signature and >>>> enums/flags: >>>>> >>>>> enum rte_eth_dev_operation op { >>>>> RTE_ETH_START, >>>>> RTE_ETH_STOP, >>>>> RTE_ETH_RESET, >>>>> }; >>>>> >>>>> #define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0) #define >>>>> RTE_ETH_RESTORE_PROMISC RTE_BIT32(1) #define >>>> RTE_ETH_RESTORE_ALLMULTI >>>>> RTE_BIT32(2) >>>>> >>>>> void (*get_restore_flags)( >>>>> struct rte_eth_dev *dev, >>>>> enum rte_eth_dev_operation op, >>>>> uint32_t *flags); >>>>> >>>>> 2. rte_eth_dev_start() will work as follows: >>>>> >>>>> - Call dev->dev_ops->dev_start() >>>>> - Call dev->dev_ops->get_restore_flags(dev, RTE_ETH_START, &flags). If >>>>> callback >>>> is not provided, assume flags == 0. >>>>> - if (flags & RTE_ETH_RESTORE_MAC_ADDR) - restore MAC addresses >>>>> - and so on... >>>>> >>>> >>>> All above looks good. >>>> >>>>> Also, I would like to add the following: >>>>> >>>>> 3. Patchset introducing this change should add get_restore_flags() >>>> implementation to all drivers, which informs that all config should be >>>> restored. >>>>> This would preserve the current behavior. >>>>> Later, this could be refined driver by driver. >>>>> >>>>> What do you think? >>>>> >>>> >>>> What you are saying is correct, but I suspect most of the drivers don't >>>> really need >>>> this restore, but they have it since it was in the ethdev layer. >>>> >>>> If we introduce back restore via get_restore_flags(), it may stay as it is >>>> in drivers, at >>>> least for most of them. >>>> >>>> What do you think to risk breaking stuff for this case. >>>> >>>> So don't implement this in the drivers by default, so who needs it will >>>> recognize >>>> the issue and will implement it. If we merge this patch for -rc1, it gives >>>> enough >>>> time for drivers to detect the issue and fix it. >>> >>> It seems rather too risky, especially considering that for example, there >>> are a few Intel drivers which do not have maintainers (like >> i40e). >>> So, I don't know what will happen to such drivers. They may be left broken >>> (if they are affected) for 24.11 and future releases. >>> But I agree that if default behavior is preserved, this dependence of >>> drivers on config restore might stay as is. >>> I'm on the fence about it. >>> >> >> Yeah, not sure. >> >> Do you think the dev_ops function can be implemented in the >> 'ethdev_driver.c' and all drivers use exact same function? >> So this reduces changes and duplication in drivers while preserving the >> behavior. > > Wonder why it can't be done visa-versa? > If this new function is not implemented by PMD, then simply assume > that everything needs to be restored (as it happens now)? >
True, this preserve the behavior and prevents updating all drivers. I think can be done in two ways: 1. dev_ops() tells what NOT to restore, if dev_ops() not implemented or it returns zero, restore everything. Here what not nice is what to restore in this case is not very well defined. 2. dev_ops() tells what to restore, but when not implemented default flag value is all restore. Which is current behavior. I am for option 2. and I assume that is what Konstantin also suggest, but I want to clarify to be sure. >>>> >>>> Only we may implement this to the drivers that exist when this restore >>>> code was >>>> introduced. >>>> I mean whatever driver exist in the initial DPDK commit, implement this >>>> logic only >>>> to those drivers. >>> >>> Seems reasonable to me. In this case, it would be igb (IIUC, now it's named >>> e1000) and ixgbe. >>> >>>> >>>>> Also, there's an open question about 'stop' and 'reset' operations. >>>>> At the moment, ethdev layer does not do any config manipulation during >>>>> these >>>> operations. >>>>> Maybe we should limit get_restore_flags() to 'start' only? >>>>> >>>> >>>> Ack, I was about to suggest the same, for now only have 'RTE_ETH_START' >>>> as a placeholder for later possible usages. >>> >>> Ack >>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> So PMDs only will provide what to restore with an internal API and >>>>>>>> common ethdev layer will restore it. >>>>>>>> If no restore required PMD may not implement .get_restore_flags() at >>>>>>>> all. >>>>>>>> >>>>>>>> Additionally, RTE_ETH_START, RTE_ETH_RESET etc flag can be provided >>>>>>>> to internal API to get what to restore in different states... >>>>>>>> >>>>>>>>>> Suggested 'internal_flag' in "struct rte_eth_dev_data" can be >>>>>>>>>> confusing and open to interpretation what to use it for and by >>>>>>>>>> time become source of defect. >>>>>>>>> >>>>>>>>> Yes, same thoughts. >>>>>>>>> >>>>>>>>>> Instead what do you think to have a separate, dedicated data struct >>>>>>>>>> for it? >>>>>>>>> >>>>>>>>> Hmm... not sure I understood you here... >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Might be we can move this restoration code into the new ethdev >>>>>>>>>>> helper function,(ethdevd_user_config_restore() or so) that PMD >>>>>>>>>>> can invoke >>>>>>>> during its dev_start() if needed? >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_PROMISC_FORCE_RESTORE >>>>>> RTE_BIT32(0) >>>>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_ALLMULTI_FORCE_RESTORE >>>>>> RTE_BIT32(1) >>>>>>>>>>>> #define RTE_ETH_DEV_INTERNAL_MAC_ADDR_FORCE_RESTORE >>>>>>>> RTE_BIT32(2) >>>>>>>>>>>> >>>>>>>>>>>> struct rte_eth_dev_data { >>>>>>>>>>>> /* snip */ >>>>>>>>>>>> >>>>>>>>>>>> uint32_t dev_flags; >>>>>>>>>>>> >>>>>>>>>>>> /** >>>>>>>>>>>> * Internal device capabilities, used only by ethdev library. >>>>>>>>>>>> * Certain functionalities provided by the library might >>>>>> enabled/disabled, >>>>>>>>>>>> * based on driver exposing certain capabilities. >>>>>>>>>>>> */ >>>>>>>>>>>> uint32_t internal_flags; >>>>>>>>>>>> >>>>>>>>>>>> /* snip */ >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>>> Also perhaps we have go into details what needs to be restored >>>>>>>>>>>>> after 'stop' and what needs to be restored after 'reset' and >>>>>>>>>>>>> use similar >>>>>>>> mechanism etc... >>>>>>>>>>>> >>>>>>>>>>>> I think we should look into that. >>>>>>>>>>>> Any 'codification' of semantics between drivers and ethdev >>>>>>>>>>>> library is good in >>>>>>>> my opinion. >>>>>>>>>>>> >>>>>>>>>>>> At least right now, ethdev does not change any configuration in >>>>>>>>>>>> 'stop' and >>>>>>>> 'reset' from what I see. >>>>>>>>>>>> But that's on library side only. >>>>>>>>>>>> >>>>>>>>>>>>>> This way, if we would conclude that it makes sense for >>>>>>>>>>>>>> .dev_start() to handle all starting configuration aspects, we >>>>>>>>>>>>>> could track which drivers still rely >>>>>>>>>>>>> on configuration restore. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dariusz Sosnowski (4): >>>>>>>>>>>>>> ethdev: rework config restore >>>>>>>>>>>>>> ethdev: omit promiscuous config restore if not required >>>>>>>>>>>>>> ethdev: omit all multicast config restore if not required >>>>>>>>>>>>>> ethdev: omit MAC address restore if not required >>>>>>>>>>>>>> >>>>>>>>>>>>>> lib/ethdev/rte_ethdev.c | 39 >>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++----- >>>>>>>>>>>>>> lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++ >>>>>>>>>>>>>> 2 files changed, 52 insertions(+), 5 deletions(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> 2.39.5 >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >