On Wed,  9 Oct 2019 18:49:34 +0200 "Uladzislau Rezki (Sony)" <ure...@gmail.com> 
wrote:

> Get rid of preempt_disable() and preempt_enable() when the
> preload is done for splitting purpose. The reason is that
> calling spin_lock() with disabled preemtion is forbidden in
> CONFIG_PREEMPT_RT kernel.
> 
> Therefore, we do not guarantee that a CPU is preloaded, instead
> we minimize the case when it is not with this change.
> 
> For example i run the special test case that follows the preload
> pattern and path. 20 "unbind" threads run it and each does
> 1000000 allocations. Only 3.5 times among 1000000 a CPU was
> not preloaded thus. So it can happen but the number is rather
> negligible.
>
> ...
>

A few questions about the resulting alloc_vmap_area():

: static struct vmap_area *alloc_vmap_area(unsigned long size,
:                               unsigned long align,
:                               unsigned long vstart, unsigned long vend,
:                               int node, gfp_t gfp_mask)
: {
:       struct vmap_area *va, *pva;
:       unsigned long addr;
:       int purged = 0;
: 
:       BUG_ON(!size);
:       BUG_ON(offset_in_page(size));
:       BUG_ON(!is_power_of_2(align));
: 
:       if (unlikely(!vmap_initialized))
:               return ERR_PTR(-EBUSY);
: 
:       might_sleep();
: 
:       va = kmem_cache_alloc_node(vmap_area_cachep,
:                       gfp_mask & GFP_RECLAIM_MASK, node);

Why does this use GFP_RECLAIM_MASK?  Please add a comment explaining
this.

:       if (unlikely(!va))
:               return ERR_PTR(-ENOMEM);
: 
:       /*
:        * Only scan the relevant parts containing pointers to other objects
:        * to avoid false negatives.
:        */
:       kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
: 
: retry:
:       /*
:        * Preload this CPU with one extra vmap_area object. It is used
:        * when fit type of free area is NE_FIT_TYPE. Please note, it
:        * does not guarantee that an allocation occurs on a CPU that
:        * is preloaded, instead we minimize the case when it is not.
:        * It can happen because of migration, because there is a race
:        * until the below spinlock is taken.
:        *
:        * The preload is done in non-atomic context, thus it allows us
:        * to use more permissive allocation masks to be more stable under
:        * low memory condition and high memory pressure.
:        *
:        * Even if it fails we do not really care about that. Just proceed
:        * as it is. "overflow" path will refill the cache we allocate from.
:        */
:       if (!this_cpu_read(ne_fit_preload_node)) {

Readability nit: local `pva' should be defined here, rather than having
function-wide scope.

:               pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);

Why doesn't this honour gfp_mask?  If it's not a bug, please add
comment explaining this.

The kmem_cache_alloc() in adjust_va_to_fit_type() omits the caller's
gfp_mask also.  If not a bug, please document the unexpected behaviour.

: 
:               if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
:                       if (pva)
:                               kmem_cache_free(vmap_area_cachep, pva);
:               }
:       }
: 
:       spin_lock(&vmap_area_lock);
: 
:       /*
:        * If an allocation fails, the "vend" address is
:        * returned. Therefore trigger the overflow path.
:        */

As for the intent of this patch, why not preallocate the vmap_area
outside the spinlock and use it within the spinlock?  Does spin_lock()
disable preemption on RT?  I forget, but it doesn't matter much anyway
- doing this will make the code better in the regular kernel I think? 
Something like this:

        struct vmap_area *pva = NULL;

        ...

        if (!this_cpu_read(ne_fit_preload_node))
                pva = kmem_cache_alloc_node(vmap_area_cachep, ...);

        spin_lock(&vmap_area_lock);

        if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
                kmem_cache_free(vmap_area_cachep, pva);




Reply via email to