2015-06-17 14:23, Damjan Marion: > > > On 17 Jun 2015, at 16:06, Bruce Richardson <bruce.richardson at intel.com> > > wrote: > > > > On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote: > >> > >>> On 15 Jun 2015, at 16:12, Bruce Richardson <bruce.richardson at > >>> intel.com> wrote: > >>> > >>> The next pointers always start out as NULL when the mbuf pool is created. > >>> The > >>> only time it is set to non-NULL is when we have chained mbufs. If we > >>> never have > >>> any chained mbufs, we never need to touch the next field, or even read it > >>> - since > >>> we have the num-segments count in the first cache line. If we do have a > >>> multi-segment > >>> mbuf, it's likely to be a big packet, so we have more processing time > >>> available > >>> and we can then take the hit of setting the next pointer. > >> > >> There are applications which are not using rx offload, but they deal with > >> chained mbufs. > >> Why they are less important than ones using rx offload? This is something > >> people > >> should be able to configure on build time. > > > > It's not that they are less important, it's that the packet processing > > cycle count > > budget is going to be greater. A packet which is 64 bytes, or 128 bytes in > > size > > can make use of a number of RX offloads to reduce it's processing time. > > However, > > a 64/128 packet is not going to be split across multiple buffers [unless we > > are dealing with a very unusual setup!]. > > > > To handle 64 byte packets at 40G line rate, one has 50 cycles per core per > > packet > > when running at 3GHz. [3000000000 cycles / 59.5 mpps]. > > If we assume that we are dealing with fairly small buffers > > here, and that anything greater than 1k packets are chained, we still have > > 626 > > cycles per 3GHz core per packet to work with for that 1k packet. Given that > > "normal" DPDK buffers are 2k in size, we have over a thousand cycles per > > packet > > for any packet that is split. > > > > In summary, packets spread across multiple buffers are large packets, and > > so have > > larger packet cycle count budgets and so can much better absorb the cost of > > touching a second cache line in the mbuf than a 64-byte packet can. > > Therefore, > > we optimize for the 64B packet case. > > This makes sense if there is no other work to do on the same core. > Otherwise it is better to spent those cycles on actual work instead of > waiting for > 2nd cache line...
You're probably right. I wonder wether this flexibility can be implemented only in static lib builds?