On Mon, Feb 29, 2016 at 5:29 PM, Andrey Ryabinin <ryabinin....@gmail.com> wrote:
>
>
> On 02/26/2016 07:48 PM, Alexander Potapenko wrote:
>> Stack depot will allow KASAN store allocation/deallocation stack traces
>> for memory chunks. The stack traces are stored in a hash table and
>> referenced by handles which reside in the kasan_alloc_meta and
>> kasan_free_meta structures in the allocated memory chunks.
>>
>> IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
>> duplication.
>>
>> Right now stackdepot support is only enabled in SLAB allocator.
>> Once KASAN features in SLAB are on par with those in SLUB we can switch
>> SLUB to stackdepot as well, thus removing the dependency on SLUB stack
>> bookkeeping, which wastes a lot of memory.
>>
>> This patch is based on the "mm: kasan: stack depots" patch originally
>> prepared by Dmitry Chernenkov.
>>
>> Signed-off-by: Alexander Potapenko <gli...@google.com>
>> ---
>> v2: - per request from Joonsoo Kim, moved the stackdepot implementation to
>> lib/, as there's a plan to use it for page owner
>>     - added copyright comments
>>     - added comments about smp_load_acquire()/smp_store_release()
>>
>> v3: - minor description changes
>> ---
>
>
>
>> diff --git a/lib/Makefile b/lib/Makefile
>> index a7c26a4..10a4ae3 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
>>  obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
>>  obj-$(CONFIG_IRQ_POLL) += irq_poll.o
>>
>> +ifeq ($(CONFIG_KASAN),y)
>> +ifeq ($(CONFIG_SLAB),y)
>
> Just try to imagine that another subsystem wants to use stackdepot. How this 
> gonna look like?
>
> We have Kconfig to describe dependencies. So, this should be under 
> CONFIG_STACKDEPOT.
> So any user of this feature can just do 'select STACKDEPOT' in Kconfig.
Agreed. Will fix this in the updated patch.

>> +     obj-y   += stackdepot.o
>> +     KASAN_SANITIZE_slub.o := n
>> +endif
>> +endif
>> +
>>  libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \
>>              fdt_empty_tree.o
>>  $(foreach file, $(libfdt_files), \
>> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
>> new file mode 100644
>> index 0000000..f09b0da
>> --- /dev/null
>> +++ b/lib/stackdepot.c
>
>
>> +/* Allocation of a new stack in raw storage */
>> +static struct stack_record *depot_alloc_stack(unsigned long *entries, int 
>> size,
>> +             u32 hash, void **prealloc, gfp_t alloc_flags)
>> +{
>
>
>> +
>> +     stack->hash = hash;
>> +     stack->size = size;
>> +     stack->handle.slabindex = depot_index;
>> +     stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>> +     __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>
> s/__memcpy/memcpy
Ack.
>> +     depot_offset += required_size;
>> +
>> +     return stack;
>> +}
>> +
>
>
>> +/*
>> + * depot_save_stack - save stack in a stack depot.
>> + * @trace - the stacktrace to save.
>> + * @alloc_flags - flags for allocating additional memory if required.
>> + *
>> + * Returns the handle of the stack struct stored in depot.
>> + */
>> +depot_stack_handle depot_save_stack(struct stack_trace *trace,
>> +                                 gfp_t alloc_flags)
>> +{
>> +     u32 hash;
>> +     depot_stack_handle retval = 0;
>> +     struct stack_record *found = NULL, **bucket;
>> +     unsigned long flags;
>> +     struct page *page = NULL;
>> +     void *prealloc = NULL;
>> +
>> +     if (unlikely(trace->nr_entries == 0))
>> +             goto exit;
>> +
>> +     hash = hash_stack(trace->entries, trace->nr_entries);
>> +     /* Bad luck, we won't store this stack. */
>> +     if (hash == 0)
>> +             goto exit;
>> +
>> +     bucket = &stack_table[hash & STACK_HASH_MASK];
>> +
>> +     /* Fast path: look the stack trace up without locking.
>> +      *
>> +      * The smp_load_acquire() here pairs with smp_store_release() to
>> +      * |bucket| below.
>> +      */
>> +     found = find_stack(smp_load_acquire(bucket), trace->entries,
>> +                        trace->nr_entries, hash);
>> +     if (found)
>> +             goto exit;
>> +
>> +     /* Check if the current or the next stack slab need to be initialized.
>> +      * If so, allocate the memory - we won't be able to do that under the
>> +      * lock.
>> +      *
>> +      * The smp_load_acquire() here pairs with smp_store_release() to
>> +      * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
>> +      */
>> +     if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>> +             if (!preempt_count() && !in_irq()) {
>
> If you trying to detect atomic context here, than this doesn't work. E.g. you 
> can't know
> about held spinlocks in non-preemptible kernel.
> And I'm not sure why need this. You know gfp flags here, so allocation in 
> atomic context shouldn't be problem.
Yeah, we can just remove these checks. As discussed before, this will
eliminate allocations from kfree(), but that's very unlikely to become
a problem.

>
>> +                     alloc_flags &= (__GFP_RECLAIM | __GFP_IO | __GFP_FS |
>> +                             __GFP_NOWARN | __GFP_NORETRY |
>> +                             __GFP_NOMEMALLOC | __GFP_DIRECT_RECLAIM);
>
> I think blacklist approach would be better here.
Perhaps we don't need to change the mask at all.

>> +                     page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>
> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
Well, this is not "that" much, actually. The allocation happens only
~150 times within three hours under Trinity, which means only 9
megabytes.
At around 250 allocations the stack depot saturates and new stacks are
very rare.
We can probably drop the order to 3 or 2, which will increase the
number of allocations by just the factor of 2 to 4, but will be better
from the point of page fragmentation.
>
>> diff --git a/mm/kasan/Makefile b/mm/kasan/Makefile
>> index a61460d..32bd73a 100644
>> --- a/mm/kasan/Makefile
>> +++ b/mm/kasan/Makefile
>> @@ -7,3 +7,4 @@ CFLAGS_REMOVE_kasan.o = -pg
>>  CFLAGS_kasan.o := $(call cc-option, -fno-conserve-stack 
>> -fno-stack-protector)
>>
>>  obj-y := kasan.o report.o kasan_init.o
>> +
>
> Extra newline.
Ack.
>
>> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
>> index 7b9e4ab9..b4e5942 100644
>> --- a/mm/kasan/kasan.h
>> +++ b/mm/kasan/kasan.h
>> @@ -2,6 +2,7 @@
>>  #define __MM_KASAN_KASAN_H
>>
>>  #include <linux/kasan.h>
>> +#include <linux/stackdepot.h>
>>
>>  #define KASAN_SHADOW_SCALE_SIZE (1UL << KASAN_SHADOW_SCALE_SHIFT)
>>  #define KASAN_SHADOW_MASK       (KASAN_SHADOW_SCALE_SIZE - 1)
>> @@ -64,10 +65,13 @@ enum kasan_state {
>>       KASAN_STATE_FREE
>>  };
>>
>> +#define KASAN_STACK_DEPTH 64
>
> I think, you can reduce this (32 perhaps?). Kernel stacks are not so deep 
> usually.
>
>> +
>>  struct kasan_track {
>>       u64 cpu : 6;                    /* for NR_CPUS = 64 */
>>       u64 pid : 16;                   /* 65536 processes */
>>       u64 when : 42;                  /* ~140 years */
>> +     depot_stack_handle stack : sizeof(depot_stack_handle);
>>  };
>>
>



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Reply via email to