Thomas,

Adding my thoughts to the already detailed feedback on this important patch...

The first cache line is not inherently "hotter" than the second. The hotness 
depends on their usage.

The mbuf cacheline1 marker has the following comment:
/* second cache line - fields only used in slow path or on TX */

In other words, the second cache line is intended not to be touched in fast 
path RX.

I do not think this is true anymore. Not even with simple non-scattered RX. And 
regression testing probably didn't catch this, because the tests perform TX 
after RX, so the cache miss moved from TX to RX and became a cache hit in TX 
instead. (I may be wrong about this claim, but it's not important for the 
discussion.)

I think the right question for this patch is: Can we achieve this - not using 
the second cache line for fast path RX - again by putting the right fields in 
the first cache line?

Probably not in all cases, but perhaps for some...

Consider the application scenarios.

When a packet is received, one of three things happens to it:
1. It is immediately transmitted on one or more ports.
2. It is immediately discarded, e.g. by a firewall rule.
3. It is put in some sort of queue, e.g. a ring for the next pipeline stage, or 
in a QoS queue.

1. If the packet is immediately transmitted, the m->tx_offload field in the 
second cache line will be touched by the application and TX function anyway, so 
we don't need to optimize the mbuf layout for this scenario.

2. The second scenario touches m->pool no matter how it is implemented. The 
application can avoid touching m->next by using rte_mbuf_raw_free(), knowing 
that the mbuf came directly from RX and thus no other fields have been touched. 
In this scenario, we want m->pool in the first cache line.

3. Now, let's consider the third scenario, where RX is followed by enqueue into 
a ring. If the application does nothing but put the packet into a ring, we 
don't need to move anything into the first cache line. But applications usually 
does more... So it is application specific what would be good to move to the 
first cache line:

A. If the application does not use segmented mbufs, and performs analysis and 
preparation for transmission in the initial pipeline stages, and only the last 
pipeline stage performs TX, we could move m->tx_offload to the first cache 
line, which would keep the second cache line cold until the actual TX happens 
in the last pipeline stage - maybe even after the packet has waited in a QoS 
queue for a long time, and its cache lines have gone cold.

B. If the application uses segmented mbufs on RX, it might make sense moving 
m->next to the first cache line. (We don't use segmented mbufs, so I'm not sure 
about this.)


However, reality perhaps beats theory:

Looking at the E1000 PMD, it seems like even its non-scattered RX function, 
eth_igb_recv_pkts(), sets m->next. If it only kept its own free pool 
pre-initialized instead... I haven't investigated other PMDs, except briefly 
looking at the mlx5 PMD, and it seems like it doesn't touch m->next in RX.

I haven't looked deeper into how m->pool is being used by RX in PMDs, but I 
suppose that it isn't touched in RX.

<rant on>
If only we had a performance test where RX was not immediately followed by TX, 
but the packets were passed through a large queue in-between, so RX cache 
misses were not free of charge because they transform TX cache misses into 
cache hits instead...
<rant off>

Whatever you choose, I am sure that most applications will find it more useful 
than the timestamp. :-)


Med venlig hilsen / kind regards
- Morten Brørup

Reply via email to