> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > Sent: Saturday, 23 July 2022 20.25 > > 23/07/2022 09:24, Morten Brørup пишет: > > +CC: i40e maintainers > > +CC: mlx5 maintainers > > > >> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > >> Sent: Saturday, 23 July 2022 00.35 > >> > >> 22/07/2022 17:14, Morten Brørup пишет: > >>> From: Huichao Cai [mailto:chcch...@163.com] > >>> Sent: Friday, 22 July 2022 17.59 > >>> > >>>> At 2022-07-22 23:52:28, "Morten Brørup" <m...@smartsharesystems.com> > >> wrote: > >>>>>> From: Stephen Hemminger [mailto:step...@networkplumber.org] > >>>>>> Sent: Friday, 22 July 2022 16.49 > >>>>>> > >>>>>> On Fri, 22 Jul 2022 21:01:50 +0800 > >>>>>> Huichao Cai <chcch...@163.com> wrote: > >>>>>> > >>>>>>> Some NIC drivers support MBUF_FAST_FREE(Device supports > >> optimization > >>>>>>> for fast release of mbufs. When set application must guarantee > >> that > >>>>>>> per-queue all mbufs comes from the same mempool and has refcnt > = > >> 1) > >>>>>>> offload. In order to adapt to this offload function, add this > >> API. > >>>>>>> Add some test data for this API. > >>>>>>> > >>>>>>> Signed-off-by: Huichao Cai <chcch...@163.com> > >>>>>> > >>>>>> The code should just be checking that refcnt == 1 directly. > >>>>>> > >>>>>> There are cases where sender passes a cloned mbuf. This is > >> independent > >>>>>> of the fast free optimization. > >>>>>> > >>>>>> Similar to what Linux kernel does with skb_cow(). > >>>>> > >>>>> Olivier just confirmed that MBUF_FAST_FREE requires that the > mbufs > >> are direct and non-segmented, although these requirements are not > yet > >> documented. > >>>>> > >>>>> This means that you should not generate segmented mbufs with this > >> patch. I don't know what to do instead; probably fail with an > >> appropriate errno. > >>>> > >>>> When the bnxt driver sends mbuf, it will take the mbuf segments > >> apart and hang it to the tx_buf_ring, so there is no mbuf segments > when > >> it is released. Does this mean that there can be mbuf segments? > >>> > >>> Only if the bnxt driver also resets the segmentation fields > (nb_segs > >> and next) in those mbufs, which I suppose it does, if it supports > >> MBUF_FAST_FREE with segmented packets. > >>> > >>> However, other Ethernet drivers don't do that, so a generic library > >> function cannot rely on it. These missing requirements for > >> MBUF_FAST_FREE is a bug, either in the MBUF_FAST_FREE documentation, > or > >> in the drivers where MBUF_FAST_FREE only works correctly with direct > >> and non-segmented mbufs. > >>> > >> > >> I believe multi-segment packets work ok with MBUF_FAST_FREE > >> (as long as other requirements are met). > > > > Looking at the i40e and mlx5 drivers, they both seem to call > rte_mempool_put_bulk() without first calling rte_pktmbuf_prefree_seg(). > So segmented packets freed with MBUF_FAST_FREE, will be stored in the > mbuf pool without m->nb_segs and m->next being reset first. > > > > I don't have deep knowledge of these drivers, so maybe I have > overlooked something. > > > > The point of MBUF_FAST_FREE is to bypass a lot of code under certain > conditions. So I believe that these two undocumented requirements > should remain, so the drivers can bypass this code. Otherwise, don't > use MBUF_FAST_FREE. > > > > Actually, after another look, I think you and Olivier are right - > multi-seg packets should not be used together with MBUF_FAST_FREE. > I forgot that mbuf_prefree() is responsible to reset both 'next' > and 'nb_segs' fields of the mbuf. > It might keep working for some simple forwarding app (like l3fwd), > as most PMDs reset these fields at RX path anyway, but that's just a > coincidence we shouldn't rely on.
I hope the PMDs don't reset these fields in their RX path, unless they are creating multi-seg packets and therefore must. It might cause an extra cache miss per packet, if the PMD unnecessarily sets m->next, which is in the second cache line of the mbuf. Or perhaps everyone has forgotten about this RX/TX split of the first/second cache line of the mbufs, because all tests are based on run-to-completion, where the second cache line will be written shortly afterwards anyway. :-( > We probably need to update l3fwd (and other examples) to dis-allow > MBUF_FAST_FREE when TX_OFFLOAD_MULTI_SEGS is selected. +1 > > Konstantin