+CC techboard

> From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> Sent: Monday, November 9, 2020 6:18 AM
> 
> On Sun, Nov 8, 2020 at 2:03 AM Thomas Monjalon <tho...@monjalon.net>
> wrote:
> >
> > 07/11/2020 20:05, Jerin Jacob:
> > > On Sun, Nov 8, 2020 at 12:09 AM Thomas Monjalon
> <tho...@monjalon.net> wrote:
> > > > 07/11/2020 18:12, Jerin Jacob:
> > > > > On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon
> <tho...@monjalon.net> wrote:
> > > > > >
> > > > > > The mempool pointer in the mbuf struct is moved
> > > > > > from the second to the first half.
> > > > > > It should increase performance on most systems having 64-byte
> cache line,
> > > > >
> > > > > > i.e. mbuf is split in two cache lines.
> > > > >
> > > > > But In any event, Tx needs to touch the pool to freeing back to
> the
> > > > > pool upon  Tx completion. Right?
> > > > > Not able to understand the motivation for moving it to the
> first 64B cache line?
> > > > > The gain varies from driver to driver. For example, a Typical
> > > > > ARM-based NPU does not need to
> > > > > touch the pool in Rx and its been filled by HW. Whereas it
> needs to
> > > > > touch in Tx if the reference count is implemented.
> > >
> > > See below.
> > >
> > > > >
> > > > > > Due to this change, tx_offload is moved, so some vector data
> paths
> > > > > > may need to be adjusted. Note: OCTEON TX2 check is removed
> temporarily!
> > > > >
> > > > > It will be breaking the Tx path, Please just don't remove the
> static
> > > > > assert without adjusting the code.
> > > >
> > > > Of course not.
> > > > I looked at the vector Tx path of OCTEON TX2,
> > > > it's close to be impossible to understand :)
> > > > Please help!
> > >
> > > Off course. Could you check the above section any share the
> rationale
> > > for this change
> > > and where it helps and how much it helps?
> >
> > It has been concluded in the techboard meeting you were part of.
> > I don't understand why we restart this discussion again.
> > I won't have the energy to restart this process myself.
> > If you don't want to apply the techboard decision, then please
> > do the necessary to request another quick decision.
> 
> Yes. Initially, I thought it is OK as we have 128B CL, After looking
> into Thomas's change, I realized
> it is not good for ARM64 64B catchlines based NPU as
> - A Typical  ARM-based NPU does not need to touch the pool in Rx and
> its been filled by HW. Whereas it needs to
> touch in Tx if the reference count is implemented.

Jerin, I don't understand what the problem is here...

Since RX doesn't touch m->pool, it shouldn't matter for RX which cache line 
m->pool resides in. I get that.

You are saying that TX needs to touch m->pool if the reference count is 
implemented. I get that. But I don't understand why it is worse having m->pool 
in the first cache line than in the second cache line; can you please clarify?

> - Also it will be effecting exiting vector routines

That is unavoidable if we move something from the second to the first cache 
line.

It may require some rework on the vector routines, but it shouldn't be too 
difficult for whoever wrote these vector routines.

> 
> I request to reconsider the tech board decision.

I was on the techboard meeting as an observer (or whatever the correct term 
would be for non-members), and this is my impression of the decision on the 
meeting:

The techboard clearly decided not to move any dynamic fields in the first cache 
line, on the grounds that if we move them away again in a later version, DPDK 
users utilizing a dynamic field in the first cache line might experience a 
performance drop at that later time. And this will be a very bad user 
experience, causing grief and complaints. To me, this seemed like a firm 
decision, based on solid arguments.

Then the techboard discussed which other field to move to the freed up space in 
the first cache line. There were no performance reports showing any 
improvements by moving the any of the suggested fields (m->pool, m->next, 
m->tx_offload), and there was a performance report showing no improvements by 
moving m->next in a test case with large segmented packets. The techboard 
decided to move m->pool as originally suggested. To me, this seemed like a 
somewhat random choice between A, B and C, on the grounds that moving one of 
them is probably better than moving none of them.

The techboard made its decision based on the information available at that time.

Unfortunately, I do not have the resources to test the performance improvement 
by moving m->next to the first cache line instead of m->pool and utilizing the 
DEV_TX_OFFLOAD_MBUF_FAST_FREE flag mentioned by Konstantin.

If no new information comes to light, we cannot expect the techboard to change 
a decision it has already made.

In any case, I am grateful for the joint effort put into nurturing the mbuf, 
and especially Thomas' unrelenting hard work in this area!


Med venlig hilsen / kind regards
- Morten Brørup

Reply via email to