[dpdk-dev] rte_mbuf.next in 2nd cacheline

Thomas Monjalon Wed, 17 Jun 2015 18:32:24 +0200

2015-06-17 14:23, Damjan Marion:
> 
> > On 17 Jun 2015, at 16:06, Bruce Richardson <bruce.richardson at intel.com> 
> > wrote:
> > 
> > On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote:
> >> 
> >>> On 15 Jun 2015, at 16:12, Bruce Richardson <bruce.richardson at 
> >>> intel.com> wrote:
> >>> 
> >>> The next pointers always start out as NULL when the mbuf pool is created. 
> >>> The
> >>> only time it is set to non-NULL is when we have chained mbufs. If we 
> >>> never have
> >>> any chained mbufs, we never need to touch the next field, or even read it 
> >>> - since
> >>> we have the num-segments count in the first cache line. If we do have a 
> >>> multi-segment
> >>> mbuf, it's likely to be a big packet, so we have more processing time 
> >>> available
> >>> and we can then take the hit of setting the next pointer.
> >> 
> >> There are applications which are not using rx offload, but they deal with 
> >> chained mbufs.
> >> Why they are less important than ones using rx offload? This is something 
> >> people 
> >> should be able to configure on build time.
> > 
> > It's not that they are less important, it's that the packet processing 
> > cycle count
> > budget is going to be greater. A packet which is 64 bytes, or 128 bytes in 
> > size
> > can make use of a number of RX offloads to reduce it's processing time. 
> > However,
> > a 64/128 packet is not going to be split across multiple buffers [unless we
> > are dealing with a very unusual setup!].
> > 
> > To handle 64 byte packets at 40G line rate, one has 50 cycles per core per 
> > packet
> > when running at 3GHz. [3000000000 cycles / 59.5 mpps].
> > If we assume that we are dealing with fairly small buffers
> > here, and that anything greater than 1k packets are chained, we still have 
> > 626
> > cycles per 3GHz core per packet to work with for that 1k packet. Given that
> > "normal" DPDK buffers are 2k in size, we have over a thousand cycles per 
> > packet
> > for any packet that is split. 
> > 
> > In summary, packets spread across multiple buffers are large packets, and 
> > so have
> > larger packet cycle count budgets and so can much better absorb the cost of
> > touching a second cache line in the mbuf than a 64-byte packet can. 
> > Therefore,
> > we optimize for the 64B packet case.
> 
> This makes sense if there is no other work to do on the same core.
> Otherwise it is better to spent those cycles on actual work instead of 
> waiting for 
> 2nd cache line...


You're probably right.
I wonder wether this flexibility can be implemented only in static lib builds?

[dpdk-dev] rte_mbuf.next in 2nd cacheline

Reply via email to