> From: Olivier Matz [mailto:olivier.m...@6wind.com] > Sent: Friday, November 6, 2020 11:05 AM > > On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote: > > > From: Olivier Matz [mailto:olivier.m...@6wind.com] > > > Sent: Friday, November 6, 2020 9:21 AM > > > > > > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote: > > > > > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] > > > > > Sent: Friday, November 6, 2020 12:55 AM > > > > > > > > > > > > > > > > > > >> Hi Olivier, > > > > > > > > > > > > > >> > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free > whatever > > > the > > > > > value > > > > > > > of m->next, > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is != > 1. > > > For > > > > > > > instance in this > > > > > > > > > > > > > >>> case: > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> m1 = rte_pktmbuf_alloc(mp); > > > > > > > > > > > > > >>> rte_pktmbuf_append(m1, 500); > > > > > > > > > > > > > >>> m2 = rte_pktmbuf_alloc(mp); > > > > > > > > > > > > > >>> rte_pktmbuf_append(m2, 500); > > > > > > > > > > > > > >>> rte_pktmbuf_chain(m1, m2); > > > > > > > > > > > > > >>> m0 = rte_pktmbuf_alloc(mp); > > > > > > > > > > > > > >>> rte_pktmbuf_append(m0, 500); > > > > > > > > > > > > > >>> rte_pktmbuf_chain(m0, m1); > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset > nb_seg in > > > the > > > > > > > initial m1 > > > > > > > > > > > > > >>> segment (this is not required), after this > code > > > the > > > > > > > mbuf chain > > > > > > > > > > > > > >>> have 3 segments: > > > > > > > > > > > > > >>> - m0: next=m1, nb_seg=3 > > > > > > > > > > > > > >>> - m1: next=m2, nb_seg=2 > > > > > > > > > > > > > >>> - m2: next=NULL, nb_seg=1 > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore > > > nb_seg=1 > > > > > in > > > > > > > the second > > > > > > > > > > > > > >>> segment. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Hmm, not sure why is that? > > > > > > > > > > > > > >> You are talking about freeing m1, right? > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > > > > > > > > > >> { > > > > > > > > > > > > > >> ... > > > > > > > > > > > > > >> if (m->next != NULL) { > > > > > > > > > > > > > >> m->next = NULL; > > > > > > > > > > > > > >> m->nb_segs = 1; > > > > > > > > > > > > > >> } > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if() > > > block, > > > > > > > > > > > > > >> and will reset both next and nb_segs. > > > > > > > > > > > > > >> What I am missing here? > > > > > > > > > > > > > >> Thinking in more generic way, that change: > > > > > > > > > > > > > >> - if (m->next != NULL) { > > > > > > > > > > > > > >> - m->next = NULL; > > > > > > > > > > > > > >> - m->nb_segs = 1; > > > > > > > > > > > > > >> - } > > > > > > > > > > > > > >> + m->next = NULL; > > > > > > > > > > > > > >> + m->nb_segs = 1; > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and > now > > > it > > > > > does > > > > > > > not > > > > > > > > > > > > > > show the issue... > > > > > > > > > > > > > > > > > > > > > > > > > > > > The full example also adds a split() to break > the > > > > > mbuf > > > > > > > chain > > > > > > > > > > > > > > between m1 and m2. The kind of thing that > would > > > be > > > > > done > > > > > > > for > > > > > > > > > > > > > > software TCP segmentation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If so, may be the right solution is to care > about > > > > > nb_segs > > > > > > > > > > > > > when next is set to NULL on split? Any place > when > > > next > > > > > is > > > > > > > set > > > > > > > > > > > > > to NULL. Just to keep the optimization in a > more > > > > > generic > > > > > > > place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The problem with that approach is that there are > > > already > > > > > > > several > > > > > > > > > > > > existing split() or trim() implementations in > > > different > > > > > DPDK- > > > > > > > based > > > > > > > > > > > > applications. For instance, we have some in > > > 6WINDGate. If > > > > > we > > > > > > > force > > > > > > > > > > > > applications to set nb_seg to 1 when resetting > next, > > > it > > > > > has > > > > > > > to be > > > > > > > > > > > > documented because it is not straightforward. > > > > > > > > > > > > > > > > > > > > > > I think it is better to go that way. > > > > > > > > > > > From my perspective it seems natural to reset > nb_seg at > > > > > same > > > > > > > time > > > > > > > > > > > we reset next, otherwise inconsistency will occur. > > > > > > > > > > > > > > > > > > > > While it is not explicitly stated for nb_segs, to me > it > > > was > > > > > clear > > > > > > > that > > > > > > > > > > nb_segs is only valid in the first segment, like for > many > > > > > fields > > > > > > > (port, > > > > > > > > > > ol_flags, vlan, rss, ...). > > > > > > > > > > > > > > > > > > > > If we say that nb_segs has to be valid in any > segments, > > > it > > > > > means > > > > > > > that > > > > > > > > > > chain() or split() will have to update it in all > > > segments, > > > > > which > > > > > > > is not > > > > > > > > > > efficient. > > > > > > > > > > > > > > > > > > Why in all? > > > > > > > > > We can state that nb_segs on non-first segment should > > > always > > > > > equal > > > > > > > 1. > > > > > > > > > As I understand in that case, both split() and chain() > have > > > to > > > > > > > update nb_segs > > > > > > > > > only for head mbufs, rest ones will remain untouched. > > > > > > > > > > > > > > > > Well, anyway, I think it's strange to have a constraint > on m- > > > > > >nb_segs > > > > > > > for > > > > > > > > non-first segment. We don't have that kind of constraints > for > > > > > other > > > > > > > fields. > > > > > > > > > > > > > > True, we don't. But this is one of the fields we consider > > > critical > > > > > > > for proper work of mbuf alloc/free mechanism. > > > > > > > > > > > > > > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first > > > segments > > > > > will provide any benefits. > > > > > > > > > > It would make this patch unneeded. > > > > > So, for direct, non-segmented mbufs pktmbuf_free() will remain > > > write- > > > > > free. > > > > > > > > I see. Then I agree with Konstantin that alternative solutions > should > > > be considered. > > > > > > > > The benefit regarding free()'ing non-segmented mbufs - which is a > > > very common operation - certainly outweighs the cost of requiring > > > split()/chain() operations to set the new head mbuf's nb_segs = 1. > > > > > > > > Nonetheless, the bug needs to be fixed somehow. > > > > > > > > If we can't come up with a better solution that doesn't break the > > > ABI, we are forced to accept the patch. > > > > > > > > Unless the techboard accepts to break the ABI in order to avoid > the > > > performance cost of this patch. > > > > > > Did someone notice a performance drop with this patch? > > > On my side, I don't see any regression on a L3 use case. > > > > I am afraid that the DPDK performance regression tests are based on > TX immediately following RX, so cache misses in TX may go by unnoticed > because RX warmed up the cache for TX already. And similarly for RX > reusing mbufs that have been warmed up by the preceding free() at TX. > > > > Please consider testing the performance difference with the mbuf > being completely cold at TX, and going completely cold again before > being reused for RX. > > > > > > > > Let's sumarize: splitting a mbuf chain and freeing it causes > subsequent > > > mbuf > > > allocation to return a mbuf which is not correctly initialized. > There > > > are 2 > > > options to fix it: > > > > > > 1/ change the mbuf free function (this patch) > > > > > > - m->nb_segs would behave like many other field: valid in the > first > > > segment, ignored in other segments > > > - may impact performance (suspected) > > > > > > 2/ change all places where a mbuf chain is split, or trimmed > > > > > > - m->nb_segs would have a specific behavior: count the number of > > > segments in the first mbuf, should be 1 in the last segment, > > > ignored in other ones. > > > - no code change in mbuf library, so no performance impact > > > - need to patch all places where we do a mbuf split or trim. > From > > > afar, > > > I see at least mbuf_cut_seg_ofs() in DPDK. Some external > > > applications > > > may have to be patched (for instance, I already found 3 places > in > > > 6WIND code base without a deep search). > > > > > > In my opinion, 1/ is better, except we notice a significant > > > performance, > > > because the (implicit) behavior is unchanged. > > > > > > Whatever the solution, some documentation has to be added. > > > > > > Olivier > > > > > > > Unfortunately, I don't think that anything but the first option will > go into 20.11 and stable releases of older versions, so I stand by my > acknowledgment of the patch. > > If we are affraid about 20.11 performance (it is legitimate, few days > before the release), we can target 21.02. After all, everybody lives > with this bug since 2017, so there is no urgency. If accepted and well > tested, it can be backported in stable branches.
+1 Good thinking, Olivier!