Stephen, Agree. Growing to two cache lines is an inevitability. Re-organizing the mbuf a bit to alleviate some of the immediate space with as minimal a performance as possible (including separating the QoS fields out completely into its own separate area) is a good idea - the first cache line would be packet + mbuf related information, the second more of the metadata that we need. Any suggestions on how many bytes would be needed for QoS?
Qinglai, For your TSO implementation patch, let's work the patch as is (assuming the mbuf grows) - add the 16 bits into the pktmbuf structure (and it will grow beyond a cache line). We can get some performance numbers for the standard benchmarks. I will look at a few ideas to free up some space in the mbuf to keep the packet related stuff within the first cache line while keeping performance close to where it is today. Regards, -Venky -----Original Message----- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Friday, October 04, 2013 9:41 AM To: jigsaw Cc: dev at dpdk.org Subject: Re: [dpdk-dev] Need comment on 82599 TSO On Fri, 4 Oct 2013 15:44:19 +0300 jigsaw <jigsaw at gmail.com> wrote: > Hi, > > I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS. > > TSO must be aware of MSS, or gso in skb of kernel. > But MSS nees 16 bits per mbuf. And we have no spare 16 bits in > rte_mbuf or rte_pktmbuf. > If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be > doubled, coz currently the size is at the edge of cacheline(32 byte). > > I have two solutions here: > > 1. Store MSS in struct rte_eth_conf. > This is actually a very bad idea, coz MSS is not bound to device. > > 2. Turn on and off TSO with rte_ctrlmbuf. > I found that rte_ctrlmbuf is not used at all. So it could be the first > use case of it. > With rte_ctrlmbuf we have enough space to store MSS. > > Looking forward to your comments. > > thx & > rgds, > -Qinglai The mbuf needs to grow to 2 cache lines. There are other things that need to be added to mbuf eventually as well. For example the QoS bitfield is too small when crammed into 32 bits. Ideally the normal small packet stuff would be in the first cacheline; and the other part of the struct would have things less likely to be used.