+Cc techboard We need benchmark numbers in order to take a decision. Please all, prepare some arguments and numbers so we can discuss the mbuf layout in the next techboard meeting.
01/11/2020 21:59, Morten Brørup: > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > Sent: Sunday, November 1, 2020 5:38 PM > > > > 01/11/2020 10:12, Morten Brørup: > > > One thing has always puzzled me: > > > Why do we use 64 bits to indicate which memory pool > > > an mbuf belongs to? > > > The portid only uses 16 bits and an indirection index. > > > Why don't we use the same kind of indirection index for mbuf pools? > > > > I wonder what would be the cost of indirection. Probably neglectible. > > Probably. The portid does it, and that indirection is heavily used everywhere. > > The size of mbuf memory pool indirection array should be compile time > configurable, like the size of the portid indirection array. > > And for reference, the indirection array will fit into one cache line if we > default to 8 mbuf pools, thus supporting an 8 CPU socket system with one mbuf > pool per CPU socket, or a 4 CPU socket system with two mbuf pools per CPU > socket. > > (And as a side note: Our application is optimized for single-socket systems, > and we only use one mbuf pool. I guess many applications were developed > without carefully optimizing for multi-socket systems, and also just use one > mbuf pool. In these cases, the mbuf structure doesn't really need a pool > field. But it is still there, and the DPDK libraries use it, so we didn't > bother removing it.) > > > I think it is a good proposal... > > ... for next year, after a deprecation notice. > > > > > I can easily imagine using one mbuf pool (or perhaps a few pools) > > > per CPU socket (or per physical memory bus closest to an attached NIC), > > > but not more than 256 mbuf memory pools in total. > > > So, let's introduce an mbufpoolid like the portid, > > > and cut this mbuf field down from 64 to 8 bits. We will need to measure the perf of the solution. There is a chance for the cost to be too much high. > > > If we also cut down m->pkt_len from 32 to 24 bits, > > > > Who is using packets larger than 64k? Are 16 bits enough? > > I personally consider 64k a reasonable packet size limit. Exotic applications > with even larger packets would have to live with this constraint. But let's > see if there are any objections. For reference, 64k corresponds to ca. 44 > Ethernet (1500 byte) packets. > > (The limit could be 65535 bytes, to avoid translation of the value 0 into > 65536 bytes.) > > This modification would go nicely hand in hand with the mbuf pool indirection > modification. > > ... after yet another round of ABI stability discussions, depreciation > notices, and so on. :-) After more thoughts, I'm afraid 64k is too small in some cases. And 24-bit manipulation would probably break performance. I'm afraid we are stuck with 32-bit length. > > > we can get the 8 bit mbuf pool index into the first cache line > > > at no additional cost. > > > > I like the idea. > > It means we don't need to move the pool pointer now, > > i.e. it does not have to replace the timestamp field. > > Agreed! Don't move m->pool to the first cache line; it is not used for RX. > > > > > > In other words: This would free up another 64 bit field in the mbuf > > structure! > > > > That would be great! > > > > > > > And even though the m->next pointer for scattered packets resides > > > in the second cache line, the libraries and application knows > > > that m->next is NULL when m->nb_segs is 1. > > > This proves that my suggestion would make touching > > > the second cache line unnecessary (in simple cases), > > > even for re-initializing the mbuf. > > > > So you think the "next" pointer should stay in the second half of mbuf? > > > > I feel you would like to move the Tx offloads in the first half > > to improve performance of very simple apps. > > "Very simple apps" sounds like a minority of apps. I would rather say "very > simple packet handling scenarios", e.g. forwarding of normal size > non-segmented packets. I would guess that the vast majority of packets > handled by DPDK applications actually match this scenario. So I'm proposing > to optimize for what I think is the most common scenario. > > If segmented packets are common, then m->next could be moved to the first > cache line. But it will only improve the pure RX steps of the pipeline. When > preparing the packet for TX, m->tx_offloads will need to be set, and the > second cache line comes into play. So I'm wondering how big the benefit of > having m->next in the first cache line really is - assuming that m->nb_segs > will be checked before accessing m->next. > > > I am thinking the opposite: we could have some dynamic fields space > > in the first half to improve performance of complex Rx. > > Note: we can add a flag hint for field registration in this first half. > > > > I have had the same thoughts. However, I would prefer being able to forward > ordinary packets without using the second mbuf cache line at all (although > only in specific scenarios like my example above). > > Furthermore, the application can abuse the 64 bit m->tx_offload field for > private purposes until it is time to prepare the packet for TX and pass it on > to the driver. This hack somewhat resembles a dynamic field in the first > cache line, and will not be possible if the m->pool or m->next field is moved > there.