> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Slava Ovsiienko > Sent: Tuesday, November 3, 2020 3:03 PM > > Hi, Morten > > > From: Morten Brørup <m...@smartsharesystems.com> > > Sent: Tuesday, November 3, 2020 14:10 > > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > > Sent: Monday, November 2, 2020 4:58 PM > > > > > > +Cc techboard > > > > > > We need benchmark numbers in order to take a decision. > > > Please all, prepare some arguments and numbers so we can discuss > the > > > mbuf layout in the next techboard meeting. > > > > I propose that the techboard considers this from two angels: > > > > 1. Long term goals and their relative priority. I.e. what can be > achieved with > > wide-ranging modifications, requiring yet another ABI break and due > notices. > > > > 2. Short term goals, i.e. what can be achieved for this release. > > > > > > My suggestions follow... > > > > 1. Regarding long term goals: > > > > I have argued that simple forwarding of non-segmented packets using > only the > > first mbuf cache line can be achieved by making three > > modifications: > > > > a) Move m->tx_offload to the first cache line. > Not all PMDs use this field on Tx. HW might support the checksum > offloads > directly, not requiring these fields at all. > > > > b) Use an 8 bit pktmbuf mempool index in the first cache line, > > instead of the 64 bit m->pool pointer in the second cache line. > 256 mpool looks enough, as for me. Regarding the indirect access to the > pool > (via some table) - it might introduce some performance impact.
It might, but I hope that it is negligible, so the benefits outweigh the disadvantages. It would have to be measured, though. And m->pool is only used for free()'ing (and detach()'ing) mbufs. > For example, > mlx5 PMD strongly relies on pool field for allocating mbufs in Rx > datapath. > We're going to update (o-o, we found point to optimize), but for now it > does. Without looking at the source code, I don't think the PMD is using m->pool in the RX datapath, I think it is using a pool dedicated to a receive queue used for RX descriptors in the PMD (i.e. driver->queue->pool). > > > c) Do not access m->next when we know that it is NULL. > > We can use m->nb_segs == 1 or some other invariant as the gate. > > It can be implemented by adding an m->next accessor function: > > struct rte_mbuf * rte_mbuf_next(struct rte_mbuf * m) > > { > > return m->nb_segs == 1 ? NULL : m->next; > > } > > Sorry, not sure about this. IIRC, nb_segs is valid in the first > segment/mbuf only. > If we have the 4 segments in the pkt we see nb_seg=4 in the first one, > and the nb_seg=1 > in the others. The next field is NULL in the last mbuf only. Am I wrong > and miss something ? You are correct. This would have to be updated too. Either by increasing m->nb_seg in the following segments, or by splitting up relevant functions into functions for working on first segments (incl. non-segmented packets), and functions for working on following segments of segmented packets. > > > Regarding the priority of this goal, I guess that simple forwarding > of non- > > segmented packets is probably the path taken by the majority of > packets > > handled by DPDK. > > > > An alternative goal could be: > > Do not touch the second cache line during RX. > > A comment in the mbuf structure says so, but it is not true anymore. > > > > (I guess that regression testing didn't catch this because the tests > perform TX > > immediately after RX, so the cache miss just moves from the TX to the > RX part > > of the test application.) > > > > > > 2. Regarding short term goals: > > > > The current DPDK source code looks to me like m->next is the most > frequently > > accessed field in the second cache line, so it makes sense moving > this to the > > first cache line, rather than m->pool. > > Benchmarking may help here. > > Moreover, for the segmented packets the packet size is supposed to be > large, > and it imposes the relatively low packet rate, so probably optimization > of > moving next to the 1st cache line might be negligible at all. Just > compare 148Mpps of > 64B pkts and 4Mpps of 3000B pkts over 100Gbps link. Currently we are on > benchmarking > and did not succeed yet on difference finding. The benefit can't be > expressed in mpps delta, > we should measure CPU clocks, but Rx queue is almost always empty - we > have an empty > loops. So, if we have the boost - it is extremely hard to catch one. Very good point regarding the value of such an optimization, Slava! And when free()'ing packets, both m->next and m->pool are touched. So perhaps the free()/detach() functions in the mbuf library can be modified to handle first segments (and non-segmented packets) and following segments differently, so accessing m->next can be avoided for non-segmented packets. Then m->pool should be moved to the first cache line.