Fri, Dec 15, 2023 at 12:06:52PM CET, pab...@redhat.com wrote: >On Thu, 2023-12-14 at 17:46 -0800, Jakub Kicinski wrote: >> On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote: >> > Together with Simon, I spent some time on the above. We think the >> > ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for >> > this offloads, with some small extensions (adding a 'max_rate' param, >> > too). >> >> uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF) >> to implement common basis? >> >> Is it not cleaner to have a separate driver API, with its ops >> and capabilities? > >We understand one of the end goal is consolidating the existing rate- >related in kernel interfaces. Adding a new one does not feel a good >starting to reach that goal, see [1] & [2] ;). ndo_setup_tc() feels >like the natural choice for H/W offload and TBF is the existing >interface IMHO nearest to the requirements here. > >The devlink rate API could be a possible alternative...
Again, devlink rate was introduced for the rate configuration of the entity that is not present (by netdev for example) on a host. If we have netdev, let's use it. > >> > The idea would be: >> > - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload >> > is enabled >> > - extend sch_btf to support max rate >> > - do the relevant ice implementation >> > - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the >> > latter interface is a strict super-set of former) >> > - ndo_set_vf_rate could also be replaced with the mentioned ndo call >> > (with another small extension to the offload data) >> > >> > I think mqprio deserves it's own separate offload interface, as it >> > covers multiple tasks other than shaping (grouping queues and mapping >> > priority to classes) >> > >> > In the long run we could have a generic implementation of the >> > ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a >> > generic way to fetch the devlink_port instance corresponding to the >> > given netdev and mapping the TBF features to the devlink_rate API. >> > >> > Not starting this due to what Jiri mentioned [1]. >> >> Jiri, AFAIU, is against using devlink rate *uAPI* to configure network >> rate limiting. That's separate from the internal representation. > >... with a couples of caveats: > >1) AFAICS devlink (and/or devlink_port) does not have fine grained, per >queue representation and intel want to be able to configure shaping on >per queue basis. I think/hope we don't want to bring the discussion to >extending the devlink interface with queue support, I fear that will >block us for a long time. Perhaps I’m missing or misunderstanding >something here. Otherwise in retrospect this looks like a reasonable >point to completely avoid devlink here. > >2) My understanding of Jiri statement was more restrictive. @Jiri it >would great if could share your genuine interpretation: are you ok with >using the devlink_port rate API as a basis to replace >ndo_set_tx_maxrate() (via dev->devlink_port->devlink->) and possibly Does not make any sense to me. >ndo_set_vf_rate(). Note the given the previous point, this option would ndo_set_vf_rate() (and the rest of ndo_[gs]et_vf_*() ndo) is the legacy way. Devlink rate replaced that when switchdev eswich mode is configured by: $ sudo devlink dev eswitch set pci/0000:08:00.1 mode switchdev In drivers, ndo_set_vf_rate() and devlink rate are implemented in the same way. See mlx5 for example: mlx5_esw_qos_set_vport_rate() mlx5_esw_devlink_rate_leaf_tx_share_set() >still feel problematic. > >Cheers, > >Paolo > >[1] https://xkcd.com/927/ >[2] https://www.youtube.com/watch?v=f8kO_L-pDwo > _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan