On Thu, Nov 05, 2020 at 01:14:07PM +0000, Ananyev, Konstantin wrote: > > > > > > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote: > > > > > On 11/5/20 10:46 AM, Olivier Matz wrote: > > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote: > > > > > >> > > > > > >> Hi Olivier, > > > > > >> > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value of > > > > > >>> m->next, > > > > > >>> because it can happen that m->nb_seg is != 1. For instance in this > > > > > >>> case: > > > > > >>> > > > > > >>> m1 = rte_pktmbuf_alloc(mp); > > > > > >>> rte_pktmbuf_append(m1, 500); > > > > > >>> m2 = rte_pktmbuf_alloc(mp); > > > > > >>> rte_pktmbuf_append(m2, 500); > > > > > >>> rte_pktmbuf_chain(m1, m2); > > > > > >>> m0 = rte_pktmbuf_alloc(mp); > > > > > >>> rte_pktmbuf_append(m0, 500); > > > > > >>> rte_pktmbuf_chain(m0, m1); > > > > > >>> > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 > > > > > >>> segment (this is not required), after this code the mbuf chain > > > > > >>> have 3 segments: > > > > > >>> - m0: next=m1, nb_seg=3 > > > > > >>> - m1: next=m2, nb_seg=2 > > > > > >>> - m2: next=NULL, nb_seg=1 > > > > > >>> > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second > > > > > >>> segment. > > > > > >> > > > > > >> Hmm, not sure why is that? > > > > > >> You are talking about freeing m1, right? > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > >> { > > > > > >> ... > > > > > >> if (m->next != NULL) { > > > > > >> m->next = NULL; > > > > > >> m->nb_segs = 1; > > > > > >> } > > > > > >> > > > > > >> m1->next != NULL, so it will enter the if() block, > > > > > >> and will reset both next and nb_segs. > > > > > >> What I am missing here? > > > > > >> Thinking in more generic way, that change: > > > > > >> - if (m->next != NULL) { > > > > > >> - m->next = NULL; > > > > > >> - m->nb_segs = 1; > > > > > >> - } > > > > > >> + m->next = NULL; > > > > > >> + m->nb_segs = 1; > > > > > > > > > > > > Ah, sorry. I oversimplified the example and now it does not > > > > > > show the issue... > > > > > > > > > > > > The full example also adds a split() to break the mbuf chain > > > > > > between m1 and m2. The kind of thing that would be done for > > > > > > software TCP segmentation. > > > > > > > > > > > > > > > > If so, may be the right solution is to care about nb_segs > > > > > when next is set to NULL on split? Any place when next is set > > > > > to NULL. Just to keep the optimization in a more generic place. > > > > > > > > > > The problem with that approach is that there are already several > > > > existing split() or trim() implementations in different DPDK-based > > > > applications. For instance, we have some in 6WINDGate. If we force > > > > applications to set nb_seg to 1 when resetting next, it has to be > > > > documented because it is not straightforward. > > > > > > I think it is better to go that way. > > > From my perspective it seems natural to reset nb_seg at same time > > > we reset next, otherwise inconsistency will occur. > > > > While it is not explicitly stated for nb_segs, to me it was clear that > > nb_segs is only valid in the first segment, like for many fields (port, > > ol_flags, vlan, rss, ...). > > > > If we say that nb_segs has to be valid in any segments, it means that > > chain() or split() will have to update it in all segments, which is not > > efficient. > > Why in all? > We can state that nb_segs on non-first segment should always equal 1. > As I understand in that case, both split() and chain() have to update nb_segs > only for head mbufs, rest ones will remain untouched.
Well, anyway, I think it's strange to have a constraint on m->nb_segs for non-first segment. We don't have that kind of constraints for other fields. > > > > > Saying that nb_segs has to be valid for the first and last segment seems > > really odd to me. What is the logic behind that except keeping this test > > as is? > > > > In any case, it has to be better documented. > > > > > > Olivier > > > > > > > > I think the approach from > > > > this patch is safer. > > > > > > It might be easier from perspective that changes in less places are > > > required, > > > Though I think that this patch will introduce some performance drop. > > > As now each mbuf_prefree_seg() will cause update of 2 cache lines > > > unconditionally. > > > > > > > By the way, for 21.11, if we are able to do some optimizations and have > > > > both pool (index?) and next in the first cache line, we may reconsider > > > > the fact that next and nb_segs are already set for new allocated mbufs, > > > > because it is not straightforward either. > > > > > > My suggestion - let's put future optimization discussion aside for now, > > > and concentrate on that particular patch. > > > > > > > > > > > > > After this operation, we have 2 mbuf chain: > > > > > > - m0 with 2 segments, the last one has next=NULL but nb_seg=2 > > > > > > - new_m with 1 segment > > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second segment. > > > > > > > > > > > >> Assumes that it is ok to have an mbuf with > > > > > >> nb_seg > 1 and next == NULL. > > > > > >> Which seems wrong to me. > > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored when not in the > > > > > > first > > > > > > segment, and there is nothing saying it should be set to 1. > > > > > > Typically, > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same > > > > > > for > > > > > > many similar functions in applications. > > > > > > > > > > > > Olivier > > > > > > > > > > > >> > > > > > >> > > > > > >>> This is expected that mbufs stored in pool have their > > > > > >>> nb_seg field set to 1. > > > > > >>> > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool") > > > > > >>> Cc: sta...@dpdk.org > > > > > >>> > > > > > >>> Signed-off-by: Olivier Matz <olivier.m...@6wind.com> > > > > > >>> --- > > > > > >>> lib/librte_mbuf/rte_mbuf.c | 6 ++---- > > > > > >>> lib/librte_mbuf/rte_mbuf.h | 12 ++++-------- > > > > > >>> 2 files changed, 6 insertions(+), 12 deletions(-) > > > > > >>> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c > > > > > >>> b/lib/librte_mbuf/rte_mbuf.c > > > > > >>> index 8a456e5e64..e632071c23 100644 > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c > > > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, > > > > > >>> void *opaque) > > > > > >>> > > > > > >>> rte_mbuf_ext_refcnt_set(m->shinfo, 1); > > > > > >>> m->ol_flags = EXT_ATTACHED_MBUF; > > > > > >>> - if (m->next != NULL) { > > > > > >>> - m->next = NULL; > > > > > >>> - m->nb_segs = 1; > > > > > >>> - } > > > > > >>> + m->next = NULL; > > > > > >>> + m->nb_segs = 1; > > > > > >>> rte_mbuf_raw_free(m); > > > > > >>> } > > > > > >>> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h > > > > > >>> b/lib/librte_mbuf/rte_mbuf.h > > > > > >>> index a1414ed7cd..ef5800c8ef 100644 > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > >>> return NULL; > > > > > >>> } > > > > > >>> > > > > > >>> - if (m->next != NULL) { > > > > > >>> - m->next = NULL; > > > > > >>> - m->nb_segs = 1; > > > > > >>> - } > > > > > >>> + m->next = NULL; > > > > > >>> + m->nb_segs = 1; > > > > > >>> > > > > > >>> return m; > > > > > >>> > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > >>> return NULL; > > > > > >>> } > > > > > >>> > > > > > >>> - if (m->next != NULL) { > > > > > >>> - m->next = NULL; > > > > > >>> - m->nb_segs = 1; > > > > > >>> - } > > > > > >>> + m->next = NULL; > > > > > >>> + m->nb_segs = 1; > > > > > >>> rte_mbuf_refcnt_set(m, 1); > > > > > >>> > > > > > >>> return m; > > > > > >>> -- > > > > > >>> 2.25.1 > > > > > >> > > > > >