On Thu, Sep 04, 2014 at 05:00:12PM +0600, Yerden Zhumabekov wrote: > I get your point. I've also read throught the code of various PMDs and > have found no indication of setting l2_len/l3_len fields as well. > > As for testing, we'd be happy to test the patchset but we are now in > process of building our testing facilities so we are not ready to > provide enough workload for the hardware/software. I was also wondering > if anyone has run some test and can provide some numbers on that matter. > > Personally, I don't think frag/reassemly app is a perfect example for > evaluating 2nd cache line performance penalty. The offsets to L3 and L4 > headers need to be calculated for all TCP/IP traffic and fragmented > traffic is not representative in this case. Maybe it would be better to > write an app which calculates these offsets for different set of mbufs > and provides some stats. For example, l2fwd/l3fwd + additional l2_len > and l3_len calculation. > > And I'm also figuring out how to rewrite our app/libs (prefetch etc) to > reflect the future changes in mbuf, hence my concerns :) > Just a final point on this. Note that the second cache line is always being read by the TX leg of the code to free back mbufs to their mbuf pool post- transmit. The overall fast-path RX+TX benchmarks show no performance degradation due to that access.
For sample apps, you make a good point indeed about the existing app not being very useful as they work on larger packets. I'll see what I can throw together here to make a more realistic test. /Bruce > > 04.09.2014 16:27, Bruce Richardson ??????????: > > Hi Yerden, > > > > I understand your concerns and it's good to have this discussion. > > > > There are a number of reasons why I've moved these particular fields > > to the second cache line. Firstly, the main reason is that, obviously > > enough, > > not all fields will fit in cache line 0, and we need to prioritize what does > > get stored there. The guiding principle behind what fields get moved or not > > that I've chosen to use for this patch set is to move fields that are not > > used on the receive path (or the fastpath receive path, more specifically - > > so that we can move fields only used by jumbo frames that span mbufs) to the > > second cache line. From a search through the existing codebase, there are no > > drivers which set the l2/l3 length fields on RX, this is only used in > > reassembly libraries/apps and by the drivers on TX. > > > > The other reason for moving it to the second cache line is that it logically > > belongs with all the other length fields that we need to add to enable > > tunneling support. [To get an idea of the extra fields that I propose adding > > to the mbuf, please see the RFC patchset I sent out previously as "[RFC > > PATCH 00/14] Extend the mbuf structure"]. While we probably can fit the > > 16-bits > > needed for l2/l3 length on the mbuf line 0, there is not enough room for all > > the lengths so we would end up splitting them with other fields in between. > > > > So, in terms of what do to about this particular issue. I would hope that > > for > > applications that use these fields the impact should be small and/or > > possible > > to work around e.g. maybe prefetch second cache line on RX in driver. If > > not, > > then I'm happy to see about withdrawing this particular change and seeing if > > we can keep l2/l3 lengths on cache line zero, with other length fields being > > on cache line 1. > > > > Question: would you consider the ip fragmentation and reassembly example > > apps > > in the Intel DPDK releases good examples to test to see the impacts of this > > change, or is there some other test you would prefer that I look to do? > > Can you perhaps test out the patch sets for the mbuf that I've upstreamed so > > far and let me know what regressions, if any, you see in your use-case > > scenarios? > > > > Regards, > > /Bruce > > > -- > Sincerely, > > Yerden Zhumabekov > STS, ACI > Astana, KZ >