+Cc techboard

We need benchmark numbers in order to take a decision.
Please all, prepare some arguments and numbers so we can discuss
the mbuf layout in the next techboard meeting.


01/11/2020 21:59, Morten Brørup:
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > Sent: Sunday, November 1, 2020 5:38 PM
> > 
> > 01/11/2020 10:12, Morten Brørup:
> > > One thing has always puzzled me:
> > > Why do we use 64 bits to indicate which memory pool
> > > an mbuf belongs to?
> > > The portid only uses 16 bits and an indirection index.
> > > Why don't we use the same kind of indirection index for mbuf pools?
> > 
> > I wonder what would be the cost of indirection. Probably neglectible.
> 
> Probably. The portid does it, and that indirection is heavily used everywhere.
> 
> The size of mbuf memory pool indirection array should be compile time 
> configurable, like the size of the portid indirection array.
> 
> And for reference, the indirection array will fit into one cache line if we 
> default to 8 mbuf pools, thus supporting an 8 CPU socket system with one mbuf 
> pool per CPU socket, or a 4 CPU socket system with two mbuf pools per CPU 
> socket.
> 
> (And as a side note: Our application is optimized for single-socket systems, 
> and we only use one mbuf pool. I guess many applications were developed 
> without carefully optimizing for multi-socket systems, and also just use one 
> mbuf pool. In these cases, the mbuf structure doesn't really need a pool 
> field. But it is still there, and the DPDK libraries use it, so we didn't 
> bother removing it.)
> 
> > I think it is a good proposal...
> > ... for next year, after a deprecation notice.
> > 
> > > I can easily imagine using one mbuf pool (or perhaps a few pools)
> > > per CPU socket (or per physical memory bus closest to an attached NIC),
> > > but not more than 256 mbuf memory pools in total.
> > > So, let's introduce an mbufpoolid like the portid,
> > > and cut this mbuf field down from 64 to 8 bits.

We will need to measure the perf of the solution.
There is a chance for the cost to be too much high.


> > > If we also cut down m->pkt_len from 32 to 24 bits,
> > 
> > Who is using packets larger than 64k? Are 16 bits enough?
> 
> I personally consider 64k a reasonable packet size limit. Exotic applications 
> with even larger packets would have to live with this constraint. But let's 
> see if there are any objections. For reference, 64k corresponds to ca. 44 
> Ethernet (1500 byte) packets.
> 
> (The limit could be 65535 bytes, to avoid translation of the value 0 into 
> 65536 bytes.)
> 
> This modification would go nicely hand in hand with the mbuf pool indirection 
> modification.
> 
> ... after yet another round of ABI stability discussions, depreciation 
> notices, and so on. :-)

After more thoughts, I'm afraid 64k is too small in some cases.
And 24-bit manipulation would probably break performance.
I'm afraid we are stuck with 32-bit length.


> > > we can get the 8 bit mbuf pool index into the first cache line
> > > at no additional cost.
> > 
> > I like the idea.
> > It means we don't need to move the pool pointer now,
> > i.e. it does not have to replace the timestamp field.
> 
> Agreed! Don't move m->pool to the first cache line; it is not used for RX.
> 
> > 
> > > In other words: This would free up another 64 bit field in the mbuf
> > structure!
> > 
> > That would be great!
> > 
> > 
> > > And even though the m->next pointer for scattered packets resides
> > > in the second cache line, the libraries and application knows
> > > that m->next is NULL when m->nb_segs is 1.
> > > This proves that my suggestion would make touching
> > > the second cache line unnecessary (in simple cases),
> > > even for re-initializing the mbuf.
> > 
> > So you think the "next" pointer should stay in the second half of mbuf?
> > 
> > I feel you would like to move the Tx offloads in the first half
> > to improve performance of very simple apps.
> 
> "Very simple apps" sounds like a minority of apps. I would rather say "very 
> simple packet handling scenarios", e.g. forwarding of normal size 
> non-segmented packets. I would guess that the vast majority of packets 
> handled by DPDK applications actually match this scenario. So I'm proposing 
> to optimize for what I think is the most common scenario.
> 
> If segmented packets are common, then m->next could be moved to the first 
> cache line. But it will only improve the pure RX steps of the pipeline. When 
> preparing the packet for TX, m->tx_offloads will need to be set, and the 
> second cache line comes into play. So I'm wondering how big the benefit of 
> having m->next in the first cache line really is - assuming that m->nb_segs 
> will be checked before accessing m->next.
> 
> > I am thinking the opposite: we could have some dynamic fields space
> > in the first half to improve performance of complex Rx.
> > Note: we can add a flag hint for field registration in this first half.
> > 
> 
> I have had the same thoughts. However, I would prefer being able to forward 
> ordinary packets without using the second mbuf cache line at all (although 
> only in specific scenarios like my example above).
> 
> Furthermore, the application can abuse the 64 bit m->tx_offload field for 
> private purposes until it is time to prepare the packet for TX and pass it on 
> to the driver. This hack somewhat resembles a dynamic field in the first 
> cache line, and will not be possible if the m->pool or m->next field is moved 
> there.



Reply via email to