RE: [PATCH v6 3/4] mempool: fix cache flushing algorithm

Morten Brørup Fri, 14 Oct 2022 08:57:58 -0700

> From: Olivier Matz [mailto:[email protected]]
> Sent: Friday, 14 October 2022 16.01
> 
> Hi Morten, Andrew,
> 
> On Sun, Oct 09, 2022 at 05:08:39PM +0200, Morten Brørup wrote:
> > > From: Andrew Rybchenko [mailto:[email protected]]
> > > Sent: Sunday, 9 October 2022 16.52
> > >
> > > On 10/9/22 17:31, Morten Brørup wrote:
> > > >> From: Andrew Rybchenko [mailto:[email protected]]
> > > >> Sent: Sunday, 9 October 2022 15.38
> > > >>
> > > >> From: Morten Brørup <[email protected]>
> > > >>
> >
> > [...]
> 
> I finally took a couple of hours to carefully review the mempool-
> related
> series (including the ones that have already been pushed).
> 
> The new behavior looks better to me in all situations I can think
> about.


Extreme care is required when touching a core library like the mempool.

Thank you, Olivier.

> 
> >
> > > >> --- a/lib/mempool/rte_mempool.h
> > > >> +++ b/lib/mempool/rte_mempool.h
> > > >> @@ -90,7 +90,7 @@ struct rte_mempool_cache {
> > > >>         * Cache is allocated to this size to allow it to overflow
> in
> > > >> certain
> > > >>         * cases to avoid needless emptying of cache.
> > > >>         */
> > > >> -      void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache
> objects */
> > > >> +      void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache
> objects */
> > > >>   } __rte_cache_aligned;
> > > >
> > > > How much are we allowed to break the ABI here?
> > > >
> > > > This patch reduces the size of the structure by removing a now
> unused
> > > part at the end, which should be harmless.
> 
> It is an ABI breakage: an existing application will use the new 22.11
> function to create the mempool (with a smaller cache), but will use the
> old inlined get/put that can exceed MAX_SIZE x 2 will remain.
> 
> But this is a nice memory consumption improvement, in my opinion we
> should accept it for 22.11 with an entry in the release note.
> 
> 
> > > >
> > > > If we may also move the position of the objs array, I would add
> > > __rte_cache_aligned to the objs array. It makes no difference in
> the
> > > general case, but if get/put operations are always 32 objects, it
> will
> > > reduce the number of memory (or last level cache) accesses from
> five to
> > > four 64 B cache lines for every get/put operation.
> 
> Will it really be the case? Since cache->len has to be accessed too,
> I don't think it would make a difference.

Yes, the first cache line, containing cache->len, will be accessed always. I 
forgot to count that; so the improvement by aligning cache->objs will be five 
cache line accesses instead of six.

Let me try to explain the scenario in other words:

In an application where a mempool cache is only accessed in bursts of 32 
objects (256 B), it matters if those 256 B accesses in the mempool cache start 
at a cache line aligned address or not. If cache line aligned, accessing those 
256 B in the mempool cache will only touch 4 cache lines; if not, 5 cache lines 
will be touched. (For architectures with 128 B cache line, it will be 2 instead 
of 3 touched cache lines per mempool cache get/put operation in applications 
using only bursts of 32 objects.)

If we cache line align cache->objs, those bursts of 32 objects (256 B) will be 
cache line aligned: Any address at cache->objs[N * 32 objects] is cache line 
aligned if objs->objs[0] is cache line aligned.

Currently, the cache->objs directly follows cache->len, which makes 
cache->objs[0] cache line unaligned.

If we decide to break the mempool cache ABI, we might as well include my 
suggested cache line alignment performance improvement. It doesn't degrade 
performance for mempool caches not only accessed in bursts of 32 objects.

> 
> 
> > > >
> > > >         uint32_t len;         /**< Current cache count */
> > > > -       /*
> > > > -        * Cache is allocated to this size to allow it to overflow
> in
> > > certain
> > > > -        * cases to avoid needless emptying of cache.
> > > > -        */
> > > > -       void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache
> objects */
> > > > +       /**
> > > > +        * Cache objects
> > > > +        *
> > > > +        * Cache is allocated to this size to allow it to overflow
> in
> > > certain
> > > > +        * cases to avoid needless emptying of cache.
> > > > +        */
> > > > +       void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]
> __rte_cache_aligned;
> > > > } __rte_cache_aligned;
> > >
> > > I think aligning objs on cacheline should be a separate patch.
> >
> > Good point. I'll let you do it. :-)
> >
> > PS: Thank you for following up on this patch series, Andrew!
> 
> Many thanks for this rework.
> 
> Acked-by: Olivier Matz <[email protected]>

Perhaps Reviewed-by would be appropriate?

RE: [PATCH v6 3/4] mempool: fix cache flushing algorithm

Reply via email to