Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free

Andrew Rybchenko Sun, 08 Nov 2020 06:17:18 -0800

On 11/6/20 3:23 PM, Morten Brørup wrote:

From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
Sent: Friday, November 6, 2020 12:54 PM

Hi Olivier,

m->nb_seg must be reset on mbuf free

whatever

the

value

of m->next,

because it can happen that m->nb_seg is

!=

1.

For

instance in this

case:

   m1 = rte_pktmbuf_alloc(mp);
   rte_pktmbuf_append(m1, 500);
   m2 = rte_pktmbuf_alloc(mp);
   rte_pktmbuf_append(m2, 500);
   rte_pktmbuf_chain(m1, m2);
   m0 = rte_pktmbuf_alloc(mp);
   rte_pktmbuf_append(m0, 500);
   rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset

nb_seg in

the

initial m1

segment (this is not required), after

this

code

the

mbuf chain

have 3 segments:
   - m0: next=m1, nb_seg=3
   - m1: next=m2, nb_seg=2
   - m2: next=NULL, nb_seg=1

Freeing this mbuf chain will not

restore

nb_seg=1

in

the second

segment.


Hmm, not sure why is that?
You are talking about freeing m1, right?
rte_pktmbuf_prefree_seg(struct rte_mbuf

*m)

{
        ...
        if (m->next != NULL) {
                         m->next = NULL;
                         m->nb_segs = 1;
                 }

m1->next != NULL, so it will enter the

if()

block,

and will reset both next and nb_segs.
What I am missing here?
Thinking in more generic way, that

change:

  -             if (m->next != NULL) {
  -                     m->next = NULL;
  -                     m->nb_segs = 1;
  -             }
  +             m->next = NULL;
  +             m->nb_segs = 1;


Ah, sorry. I oversimplified the example

and

now

it

does

not

show the issue...

The full example also adds a split() to

break

the

mbuf

chain

between m1 and m2. The kind of thing that

would

be

done

for

software TCP segmentation.


If so, may be the right solution is to care

about

nb_segs

when next is set to NULL on split? Any

place

when

next

is

set

to NULL. Just to keep the optimization in a

more

generic

place.

The problem with that approach is that there

are

already

several

existing split() or trim() implementations in

different

DPDK-

based

applications. For instance, we have some in

6WINDGate. If

we

force

applications to set nb_seg to 1 when

resetting

next,

it

has

to be

documented because it is not straightforward.


I think it is better to go that way.
 From my perspective it seems natural to reset

nb_seg at

same

time

we reset next, otherwise inconsistency will

occur.


While it is not explicitly stated for nb_segs, to

me

it

was

clear

that

nb_segs is only valid in the first segment, like

for

many

fields

(port,

ol_flags, vlan, rss, ...).

If we say that nb_segs has to be valid in any

segments,

it

means

that

chain() or split() will have to update it in all

segments,

which

is not

efficient.


Why in all?
We can state that nb_segs on non-first segment

should

always

equal

1.

As I understand in that case, both split() and

chain()

have

to

update nb_segs

only for head mbufs, rest ones will remain

untouched.


Well, anyway, I think it's strange to have a

constraint

on m-

nb_segs

for

non-first segment. We don't have that kind of

constraints

for

other

fields.

True, we don't. But this is one of the fields we

consider

critical

for proper work of mbuf alloc/free mechanism.


I am not sure that requiring m->nb_segs == 1 on non-first

segments

will provide any benefits.

It would make this patch unneeded.
So, for direct, non-segmented mbufs  pktmbuf_free() will

remain

write-

free.


I see. Then I agree with Konstantin that alternative

solutions

should

be considered.


The benefit regarding free()'ing non-segmented mbufs - which

is a

very common operation - certainly outweighs the cost of

requiring

split()/chain() operations to set the new head mbuf's nb_segs =

1.


Nonetheless, the bug needs to be fixed somehow.

If we can't come up with a better solution that doesn't break

the

ABI, we are forced to accept the patch.


Unless the techboard accepts to break the ABI in order to

avoid

the

performance cost of this patch.

Did someone notice a performance drop with this patch?
On my side, I don't see any regression on a L3 use case.


I am afraid that the DPDK performance regression tests are based

on

TX immediately following RX, so cache misses in TX may go by

unnoticed

because RX warmed up the cache for TX already. And similarly for RX
reusing mbufs that have been warmed up by the preceding free() at

TX.


Please consider testing the performance difference with the mbuf

being completely cold at TX, and going completely cold again before
being reused for RX.


Let's sumarize: splitting a mbuf chain and freeing it causes

subsequent

mbuf
allocation to return a mbuf which is not correctly initialized.

There

are 2
options to fix it:

1/ change the mbuf free function (this patch)

    - m->nb_segs would behave like many other field: valid in

the

first

      segment, ignored in other segments
    - may impact performance (suspected)

2/ change all places where a mbuf chain is split, or trimmed

    - m->nb_segs would have a specific behavior: count the

number of

      segments in the first mbuf, should be 1 in the last

segment,

      ignored in other ones.
    - no code change in mbuf library, so no performance impact
    - need to patch all places where we do a mbuf split or trim.

From

afar,
      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
applications
      may have to be patched (for instance, I already found 3

places

in

      6WIND code base without a deep search).

In my opinion, 1/ is better, except we notice a significant
performance,
because the (implicit) behavior is unchanged.

Whatever the solution, some documentation has to be added.

Olivier


Unfortunately, I don't think that anything but the first option

will

go into 20.11 and stable releases of older versions, so I stand by

my

acknowledgment of the patch.

If we are affraid about 20.11 performance (it is legitimate, few

days

before the release), we can target 21.02. After all, everybody

lives

with this bug since 2017, so there is no urgency. If accepted and

well

tested, it can be backported in stable branches.


+1

Good thinking, Olivier!


Looking at the changes once again, it probably can be reworked a bit:

-       if (m->next != NULL) {
-               m->next = NULL;
-               m->nb_segs = 1;
-       }

+       if (m->next != NULL)
+               m->next = NULL;
+       if (m->nb_segs != 1)
+               m->nb_segs = 1;

That way we add one more condition checking, but I suppose it
shouldn't be that perf critical.
That way for direct,non-segmented mbuf it still should be write-free.
Except cases as you described above: chain(), then split().

Of-course we still need to do perf testing for that approach too.
So if your preference it to postpone it till 21.02 - that's ok for me.
Konstantin


With this suggestion, I cannot imagine any performance drop for direct, non-segmented 
mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the 
function already reads m->refcnt in the first cache line; so no cache misses are 
introduced.

+1

Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free

Reply via email to