Re: [PATCH 0/5] make slab gfp fair

Peter Zijlstra Wed, 16 May 2007 13:21:44 -0700

On Wed, 2007-05-16 at 12:53 -0700, Christoph Lameter wrote:
> On Wed, 16 May 2007, Peter Zijlstra wrote:
> 
> > If this 4k cpu system ever gets to touch the new lock it is in way
> > deeper problems than a bouncing cache-line.
> 
> So its no use on NUMA?


It is, its just that we're swapping very heavily at that point, a
bouncing cache-line will not significantly slow down the box compared to
waiting for block IO, will it?

> > Please look at it more carefully.
> > 
> > We differentiate pages allocated at the level where GFP_ATOMIC starts to
> > fail. By not updating the percpu slabs those are retried every time,
> > except for ALLOC_NO_WATERMARKS allocations; those are served from the
> > ->reserve_slab.
> > 
> > Once a regular slab allocation succeeds again, the ->reserve_slab is
> > cleaned up and never again looked at it until we're in distress again.
> 
> A single slab? This may only give you a a single object in an extreme 
> case. Are you sure that this solution is generic enough?

Well, single as in a single active; it gets spilled into the full list
and a new one is instanciated if more is needed.

> The problem here is that you may spinlock and take out the slab for one 
> cpu but then (AFAICT) other cpus can still not get their high priority 
> allocs satisfied. Some comments follow.

All cpus are redirected to ->reserve_slab when the regular allocations
start to fail.

> > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> > ---
> >  include/linux/slub_def.h |    2 +
> >  mm/slub.c                |   85 
> > ++++++++++++++++++++++++++++++++++++++++++-----
> >  2 files changed, 78 insertions(+), 9 deletions(-)
> > 
> > Index: linux-2.6-git/include/linux/slub_def.h
> > ===================================================================
> > --- linux-2.6-git.orig/include/linux/slub_def.h
> > +++ linux-2.6-git/include/linux/slub_def.h
> > @@ -46,6 +46,8 @@ struct kmem_cache {
> >     struct list_head list;  /* List of slab caches */
> >     struct kobject kobj;    /* For sysfs */
> >  
> > +   struct page *reserve_slab;
> > +
> >  #ifdef CONFIG_NUMA
> >     int defrag_ratio;
> >     struct kmem_cache_node *node[MAX_NUMNODES];
> > Index: linux-2.6-git/mm/slub.c
> > ===================================================================
> > --- linux-2.6-git.orig/mm/slub.c
> > +++ linux-2.6-git/mm/slub.c
> > @@ -20,11 +20,13 @@
> >  #include <linux/mempolicy.h>
> >  #include <linux/ctype.h>
> >  #include <linux/kallsyms.h>
> > +#include "internal.h"
> >  
> >  /*
> >   * Lock order:
> > - *   1. slab_lock(page)
> > - *   2. slab->list_lock
> > + *   1. reserve_lock
> > + *   2. slab_lock(page)
> > + *   3. node->list_lock
> >   *
> >   *   The slab_lock protects operations on the object of a particular
> >   *   slab and its metadata in the page struct. If the slab lock
> > @@ -259,6 +261,8 @@ static int sysfs_slab_alias(struct kmem_
> >  static void sysfs_slab_remove(struct kmem_cache *s) {}
> >  #endif
> >  
> > +static DEFINE_SPINLOCK(reserve_lock);
> > +
> >  /********************************************************************
> >   *                         Core slab cache functions
> >   *******************************************************************/
> > @@ -1007,7 +1011,7 @@ static void setup_object(struct kmem_cac
> >             s->ctor(object, s, 0);
> >  }
> >  
> > -static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
> > +static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node, 
> > int *rank)
> >  {
> >     struct page *page;
> >     struct kmem_cache_node *n;
> > @@ -1025,6 +1029,7 @@ static struct page *new_slab(struct kmem
> >     if (!page)
> >             goto out;
> >  
> > +   *rank = page->rank;
> >     n = get_node(s, page_to_nid(page));
> >     if (n)
> >             atomic_long_inc(&n->nr_slabs);
> > @@ -1311,7 +1316,7 @@ static void unfreeze_slab(struct kmem_ca
> >  /*
> >   * Remove the cpu slab
> >   */
> > -static void deactivate_slab(struct kmem_cache *s, struct page *page, int 
> > cpu)
> > +static void __deactivate_slab(struct kmem_cache *s, struct page *page)
> >  {
> >     /*
> >      * Merge cpu freelist into freelist. Typically we get here
> > @@ -1330,10 +1335,15 @@ static void deactivate_slab(struct kmem_
> >             page->freelist = object;
> >             page->inuse--;
> >     }
> > -   s->cpu_slab[cpu] = NULL;
> >     unfreeze_slab(s, page);
> >  }
> 
> So you want to spill back the lockless_freelist without deactivating the 
> slab? Why are you using the lockless_freelist at all? If you do not use it 
> then you can call unfreeze_slab. No need for this split.

Ah, quite right. I do indeed not use the lockless_freelist.

> > @@ -1395,6 +1405,7 @@ static void *__slab_alloc(struct kmem_ca
> >  {
> >     void **object;
> >     int cpu = smp_processor_id();
> > +   int rank = 0;
> >  
> >     if (!page)
> >             goto new_slab;
> > @@ -1424,10 +1435,26 @@ new_slab:
> >     if (page) {
> >             s->cpu_slab[cpu] = page;
> >             goto load_freelist;
> > -   }
> > +   } else if (unlikely(gfp_to_alloc_flags(gfpflags) & ALLOC_NO_WATERMARKS))
> > +           goto try_reserve;
> 
> Ok so we are trying to allocate a slab and do not get one thus -> 
> try_reserve. 

Right, so the cpu-slab is NULL, and we need a new slab.

> But this is only working if we are using the slab after
> explicitly flushing the cpuslabs. Otherwise the slab may be full and we
> get to alloc_slab.

/me fails to parse.

When we need a new_slab: 
 - we try the partial lists,
 - we try the reserve (if ALLOC_NO_WATERMARKS)
   otherwise alloc_slab

> >  
> > -   page = new_slab(s, gfpflags, node);
> > -   if (page) {
> 
> > +alloc_slab:
> > +   page = new_slab(s, gfpflags, node, &rank);
> > +   if (page && rank) {
> 
> Huh? You mean !page?

No, no, we did get a page, and it was !ALLOC_NO_WATERMARK hard to get
it. 

> > +           if (unlikely(s->reserve_slab)) {
> > +                   struct page *reserve;
> > +
> > +                   spin_lock(&reserve_lock);
> > +                   reserve = s->reserve_slab;
> > +                   s->reserve_slab = NULL;
> > +                   spin_unlock(&reserve_lock);
> > +
> > +                   if (reserve) {
> > +                           slab_lock(reserve);
> > +                           __deactivate_slab(s, reserve);
> > +                           putback_slab(s, reserve);
> 
> Remove the above two lines (they are wrong regardless) and simply make 
> this the cpu slab.

It need not be the same node; the reserve_slab is node agnostic.
So here the free page watermarks are good again, and we can forget all
about the ->reserve_slab. We just push it on the free/partial lists and
forget about it.

But like you said above: unfreeze_slab() should be good, since I don't
use the lockless_freelist.

> > +                   }
> > +           }
> >             cpu = smp_processor_id();
> >             if (s->cpu_slab[cpu]) {
> >                     /*
> > @@ -1455,6 +1482,18 @@ new_slab:
> >             SetSlabFrozen(page);
> >             s->cpu_slab[cpu] = page;
> >             goto load_freelist;
> > +   } else if (page) {
> > +           spin_lock(&reserve_lock);
> > +           if (s->reserve_slab) {
> > +                   discard_slab(s, page);
> > +                   page = s->reserve_slab;
> > +           }
> > +           slab_lock(page);
> > +           SetPageActive(page);
> > +           s->reserve_slab = page;
> > +           spin_unlock(&reserve_lock);
> > +
> > +           goto got_reserve;

So this is when we get a page and it was ALLOC_NO_WATERMARKS hard to get
it. Instead of updating the cpu_slab we leave that unset, so that
subsequent allocations will try to allocate a slab again thereby testing
the current free pages limit (and not gobble up the reserve).

> >     }
> >     return NULL;
> >  debug:
> > @@ -1470,6 +1509,31 @@ debug:
> >     page->freelist = object[page->offset];
> >     slab_unlock(page);
> >     return object;
> > +
> > +try_reserve:
> > +   spin_lock(&reserve_lock);
> > +   page = s->reserve_slab;
> > +   if (!page) {
> > +           spin_unlock(&reserve_lock);
> > +           goto alloc_slab;
> > +   }
> > +
> > +   slab_lock(page);
> > +   if (!page->freelist) {
> > +           s->reserve_slab = NULL;
> > +           spin_unlock(&reserve_lock);
> > +           __deactivate_slab(s, page);
> replace with unfreeze slab.
> 
> > +           putback_slab(s, page);
> 
> Putting back the slab twice.

__deactivete_slab() doesn't do putback_slab, and now I see that whole
function isn't there anymore. unfreeze_slab() it is.

> > +           goto alloc_slab;
> > +   }
> > +   spin_unlock(&reserve_lock);
> > +
> > +got_reserve:
> > +   object = page->freelist;
> > +   page->inuse++;
> > +   page->freelist = object[page->offset];
> > +   slab_unlock(page);
> > +   return object;



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] make slab gfp fair

Reply via email to