> From: Morten Brørup [mailto:m...@smartsharesystems.com]
> Sent: Wednesday, 8 November 2023 18.49
> 
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Wednesday, 8 November 2023 17.52
> >

[...]

> >
> > Would it make sense to have an rte_config.h value for maximum burst
> > size?
> 
> I would support that!

It would also be a good place to document the reasoning behind the choice of 
burst size, so application developers can better understand how to fine tune 
the values according to available hardware and application specific 
requirements.

Those build-time configurable values should also be used by DPDK libraries, 
instead of more or less randomly chosen hardcoded burst sizes.

E.g. when I implemented rte_pktmbuf_free_bulk(), I considered 64 plenty of 
burst capacity, because it was double the size of the traditional burst size of 
32. But it is probably sub-optimal for applications using a default burst size 
of 128.

> There could be a few burst size defines, e.g.
> 
> - SMALL: used for small bursts (I think some drivers use bursts of 8)

The reason for choosing 8 is probably rooted in cache alignment:
Eight 64-bit pointers covers one cache line.

I wonder if those drivers would perform better using bursts of 16 mbufs on 
32-bit architectures, or on 64-bit architectures with 128 B cache line size?

> - NORMAL: used for typical bursts

This is usually a balance between latency and throughput:
Using shorter bursts can reduce the latency (if the application is designed 
with this in mind).
Using larger bursts improves processing performance, and thus increases 
throughput.

There is also some upper limit:
If the burst is too large, the amount of memory touched by a pipeline stage 
might not fit into the CPU data cache size, and performance drops like a rock.
E.g. a CPU with 64 B cache line size and 32 KB L1 data cache per lcore can hold 
512 cache lines in its L1 data cache, so a burst of 32 mbufs allows touching an 
average of 512/32 = 16 cache lines per packet.
The mbuf structure itself uses 2 cache lines, so the max theoretical burst 
would be 512/2 = 256 if no other memory was touched.
However, the array holding the mbuf pointers is also touched, so I would put 
128 as the largest good burst size on such a CPU.

> - LARGE: used for large bursts, e.g. mempool cache flush

If kept at 512, like the magnitude of the mempool cache flushes/refills, it 
should only be used for moving mbuf pointers around, without touching the mbufs 
themselves, or the CPU's L1 data cache will overflow.

> 
> Having these available at build time would also allow more
> optimizations in DPDK libs and drivers for those specific burst sizes.

Reply via email to