> > There is a performance penalty for the replenish scheme used in vectorized > > Rx > > burst for both MPRQ and SPRQ. > > Mbuf elements are being filled at the end of the mbufs array and being > > replenished at the beginning. That leads to an increase in cache misses and > > the > > performance drop. > > The more Rx descriptors are used the worse the situation. > > > > Change the allocation scheme for vectorized MPRQ Rx burst: > > allocate new mbufs only when consumed mbufs are almost depleted (always > > have one burst gap between allocated and consumed indices). Keeping a small > > number of mbufs allocated improves cache locality and improves performance > > a lot. > > > > Unfortunately, this approach cannot be applied to SPRQ Rx burst routine. In > > MPRQ Rx burst we simply copy packets from external MPRQ buffers or attach > > these buffers to mbufs. > > In SPRQ Rx burst we allow the NIC to fill mbufs for us. > > Hence keeping a small number of allocated mbufs will limit NIC ability to > > fill as > > many buffers as possible. This fact offsets the advantage of better cache > > locality. > > > > Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst") > > > > Signed-off-by: Alexander Kozyrev <akozy...@nvidia.com> > Acked-by: Viacheslav Ovsiienko <viachesl...@nvidia.com>
Applied in next-net-mlx, thanks.