On Mon, Feb 29, 2016 at 5:29 PM, Andrey Ryabinin <ryabinin....@gmail.com> wrote: > > > On 02/26/2016 07:48 PM, Alexander Potapenko wrote: >> Stack depot will allow KASAN store allocation/deallocation stack traces >> for memory chunks. The stack traces are stored in a hash table and >> referenced by handles which reside in the kasan_alloc_meta and >> kasan_free_meta structures in the allocated memory chunks. >> >> IRQ stack traces are cut below the IRQ entry point to avoid unnecessary >> duplication. >> >> Right now stackdepot support is only enabled in SLAB allocator. >> Once KASAN features in SLAB are on par with those in SLUB we can switch >> SLUB to stackdepot as well, thus removing the dependency on SLUB stack >> bookkeeping, which wastes a lot of memory. >> >> This patch is based on the "mm: kasan: stack depots" patch originally >> prepared by Dmitry Chernenkov. >> >> Signed-off-by: Alexander Potapenko <gli...@google.com> >> --- >> v2: - per request from Joonsoo Kim, moved the stackdepot implementation to >> lib/, as there's a plan to use it for page owner >> - added copyright comments >> - added comments about smp_load_acquire()/smp_store_release() >> >> v3: - minor description changes >> --- > > > >> diff --git a/lib/Makefile b/lib/Makefile >> index a7c26a4..10a4ae3 100644 >> --- a/lib/Makefile >> +++ b/lib/Makefile >> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o >> obj-$(CONFIG_STMP_DEVICE) += stmp_device.o >> obj-$(CONFIG_IRQ_POLL) += irq_poll.o >> >> +ifeq ($(CONFIG_KASAN),y) >> +ifeq ($(CONFIG_SLAB),y) > > Just try to imagine that another subsystem wants to use stackdepot. How this > gonna look like? > > We have Kconfig to describe dependencies. So, this should be under > CONFIG_STACKDEPOT. > So any user of this feature can just do 'select STACKDEPOT' in Kconfig. Agreed. Will fix this in the updated patch.
>> + obj-y += stackdepot.o >> + KASAN_SANITIZE_slub.o := n >> +endif >> +endif >> + >> libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \ >> fdt_empty_tree.o >> $(foreach file, $(libfdt_files), \ >> diff --git a/lib/stackdepot.c b/lib/stackdepot.c >> new file mode 100644 >> index 0000000..f09b0da >> --- /dev/null >> +++ b/lib/stackdepot.c > > >> +/* Allocation of a new stack in raw storage */ >> +static struct stack_record *depot_alloc_stack(unsigned long *entries, int >> size, >> + u32 hash, void **prealloc, gfp_t alloc_flags) >> +{ > > >> + >> + stack->hash = hash; >> + stack->size = size; >> + stack->handle.slabindex = depot_index; >> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN; >> + __memcpy(stack->entries, entries, size * sizeof(unsigned long)); > > s/__memcpy/memcpy Ack. >> + depot_offset += required_size; >> + >> + return stack; >> +} >> + > > >> +/* >> + * depot_save_stack - save stack in a stack depot. >> + * @trace - the stacktrace to save. >> + * @alloc_flags - flags for allocating additional memory if required. >> + * >> + * Returns the handle of the stack struct stored in depot. >> + */ >> +depot_stack_handle depot_save_stack(struct stack_trace *trace, >> + gfp_t alloc_flags) >> +{ >> + u32 hash; >> + depot_stack_handle retval = 0; >> + struct stack_record *found = NULL, **bucket; >> + unsigned long flags; >> + struct page *page = NULL; >> + void *prealloc = NULL; >> + >> + if (unlikely(trace->nr_entries == 0)) >> + goto exit; >> + >> + hash = hash_stack(trace->entries, trace->nr_entries); >> + /* Bad luck, we won't store this stack. */ >> + if (hash == 0) >> + goto exit; >> + >> + bucket = &stack_table[hash & STACK_HASH_MASK]; >> + >> + /* Fast path: look the stack trace up without locking. >> + * >> + * The smp_load_acquire() here pairs with smp_store_release() to >> + * |bucket| below. >> + */ >> + found = find_stack(smp_load_acquire(bucket), trace->entries, >> + trace->nr_entries, hash); >> + if (found) >> + goto exit; >> + >> + /* Check if the current or the next stack slab need to be initialized. >> + * If so, allocate the memory - we won't be able to do that under the >> + * lock. >> + * >> + * The smp_load_acquire() here pairs with smp_store_release() to >> + * |next_slab_inited| in depot_alloc_stack() and init_stack_slab(). >> + */ >> + if (unlikely(!smp_load_acquire(&next_slab_inited))) { >> + if (!preempt_count() && !in_irq()) { > > If you trying to detect atomic context here, than this doesn't work. E.g. you > can't know > about held spinlocks in non-preemptible kernel. > And I'm not sure why need this. You know gfp flags here, so allocation in > atomic context shouldn't be problem. Yeah, we can just remove these checks. As discussed before, this will eliminate allocations from kfree(), but that's very unlikely to become a problem. > >> + alloc_flags &= (__GFP_RECLAIM | __GFP_IO | __GFP_FS | >> + __GFP_NOWARN | __GFP_NORETRY | >> + __GFP_NOMEMALLOC | __GFP_DIRECT_RECLAIM); > > I think blacklist approach would be better here. Perhaps we don't need to change the mask at all. >> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER); > > STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much? Well, this is not "that" much, actually. The allocation happens only ~150 times within three hours under Trinity, which means only 9 megabytes. At around 250 allocations the stack depot saturates and new stacks are very rare. We can probably drop the order to 3 or 2, which will increase the number of allocations by just the factor of 2 to 4, but will be better from the point of page fragmentation. > >> diff --git a/mm/kasan/Makefile b/mm/kasan/Makefile >> index a61460d..32bd73a 100644 >> --- a/mm/kasan/Makefile >> +++ b/mm/kasan/Makefile >> @@ -7,3 +7,4 @@ CFLAGS_REMOVE_kasan.o = -pg >> CFLAGS_kasan.o := $(call cc-option, -fno-conserve-stack >> -fno-stack-protector) >> >> obj-y := kasan.o report.o kasan_init.o >> + > > Extra newline. Ack. > >> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h >> index 7b9e4ab9..b4e5942 100644 >> --- a/mm/kasan/kasan.h >> +++ b/mm/kasan/kasan.h >> @@ -2,6 +2,7 @@ >> #define __MM_KASAN_KASAN_H >> >> #include <linux/kasan.h> >> +#include <linux/stackdepot.h> >> >> #define KASAN_SHADOW_SCALE_SIZE (1UL << KASAN_SHADOW_SCALE_SHIFT) >> #define KASAN_SHADOW_MASK (KASAN_SHADOW_SCALE_SIZE - 1) >> @@ -64,10 +65,13 @@ enum kasan_state { >> KASAN_STATE_FREE >> }; >> >> +#define KASAN_STACK_DEPTH 64 > > I think, you can reduce this (32 perhaps?). Kernel stacks are not so deep > usually. > >> + >> struct kasan_track { >> u64 cpu : 6; /* for NR_CPUS = 64 */ >> u64 pid : 16; /* 65536 processes */ >> u64 when : 42; /* ~140 years */ >> + depot_stack_handle stack : sizeof(depot_stack_handle); >> }; >> > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg