On Tue, Mar 23, 2021 at 05:00:08PM +0100, Jesper Dangaard Brouer wrote:
> > + /*
> > + * If there are no allowed local zones that meets the watermarks then
> > + * try to allocate a single page and reclaim if necessary.
> > + */
> > + if (!zone)
> > + goto failed;
> > +
> > + /* Attempt the batch allocation */
> > + local_irq_save(flags);
> > + pcp = &this_cpu_ptr(zone->pageset)->pcp;
> > + pcp_list = &pcp->lists[ac.migratetype];
> > +
> > + while (allocated < nr_pages) {
> > + page = __rmqueue_pcplist(zone, ac.migratetype, alloc_flags,
> > + pcp, pcp_list);
>
> The function __rmqueue_pcplist() is now used two places, this cause the
> compiler to uninline the static function.
>
This was expected. It was not something I was particularly happy with
but avoiding it was problematic without major refactoring.
> My tests show you should inline __rmqueue_pcplist(). See patch I'm
> using below signature, which also have some benchmark notes. (Please
> squash it into your patch and drop these notes).
>
The cycle savings per element is very marginal at just 4 cycles. I
expect just the silly stat updates are way more costly but the series
that addresses that is likely to be controversial. As I know the cycle
budget for processing a packet is tight, I've applied the patch but am
keeping it separate to preserve the data in case someone points out that
is a big function to inline and "fixes" it.
--
Mel Gorman
SUSE Labs