On Thu, Feb 28, 2019 at 10:22 AM Andrey Ryabinin <aryabi...@virtuozzo.com> wrote: > > > > On 2/27/19 4:11 PM, Christophe Leroy wrote: > > > > > > Le 27/02/2019 à 10:19, Andrey Ryabinin a écrit : > >> > >> > >> On 2/27/19 11:25 AM, Christophe Leroy wrote: > >>> With version v8 of the series implementing KASAN on 32 bits powerpc > >>> (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=94309), > >>> I'm now able to activate KASAN on a mac99 is QEMU. > >>> > >>> Then I get the following reports at startup. Which of the two reports I > >>> get seems to depend on the option used to build the kernel, but for a > >>> given kernel I always get the same report. > >>> > >>> Is that a real bug, in which case how could I spot it ? Or is it > >>> something wrong in my implementation of KASAN ? > >>> > >>> I checked that after kasan_init(), the entire shadow memory is full of 0 > >>> only. > >>> > >>> I also made a try with the strong STACK_PROTECTOR compiled in, but no > >>> difference and nothing detected by the stack protector. > >>> > >>> ================================================================== > >>> BUG: KASAN: stack-out-of-bounds in memchr+0x24/0x74 > >>> Read of size 1 at addr c0ecdd40 by task swapper/0 > >>> > >>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7+ #1133 > >>> Call Trace: > >>> [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable) > >>> [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180 > >>> [c0e9dd10] [c089579c] memchr+0x24/0x74 > >>> [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574 > >>> [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8 > >>> [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4 > >>> --- interrupt: c0e9df00 at 0x400f330 > >>> LR = init_stack+0x1f00/0x2000 > >>> [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable) > >>> [c0e9df20] [c0c28e44] early_irq_init+0x38/0x108 > >>> [c0e9df50] [c0c16434] start_kernel+0x310/0x488 > >>> [c0e9dff0] [00003484] 0x3484 > >>> > >>> The buggy address belongs to the variable: > >>> __log_buf+0xec0/0x4020 > >>> The buggy address belongs to the page: > >>> page:c6eac9a0 count:1 mapcount:0 mapping:00000000 index:0x0 > >>> flags: 0x1000(reserved) > >>> raw: 00001000 c6eac9a4 c6eac9a4 00000000 00000000 00000000 ffffffff > >>> 00000001 > >>> page dumped because: kasan: bad access detected > >>> > >>> Memory state around the buggy address: > >>> c0ecdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>> c0ecdc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>>> c0ecdd00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 > >>> ^ > >>> c0ecdd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > >>> c0ecde00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>> ================================================================== > >>> > >> > >> This one doesn't look good. Notice that it says stack-out-of-bounds, but > >> at the same time there is > >> "The buggy address belongs to the variable: __log_buf+0xec0/0x4020" > >> which is printed by following code: > >> if (kernel_or_module_addr(addr) && !init_task_stack_addr(addr)) { > >> pr_err("The buggy address belongs to the variable:\n"); > >> pr_err(" %pS\n", addr); > >> } > >> > >> So the stack unrelated address got stack-related poisoning. This could be > >> a stack overflow, did you increase THREAD_SHIFT? > >> KASAN with stack instrumentation significantly increases stack usage. > >> > > > > I get the above with THREAD_SHIFT set to 13 (default value). > > If increasing it to 14, I get the following instead. That means that in > > that case the problem arises a lot earlier in the boot process (but still > > after the final kasan shadow setup). > > > > We usually use 15 (with 4k pages), but I think 14 should be enough for the > clean boot. > > > ================================================================== > > BUG: KASAN: stack-out-of-bounds in pmac_nvram_init+0x1f8/0x5d0 > > Read of size 1 at addr f6f37de0 by task swapper/0 > > > > CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7+ #1143 > > Call Trace: > > [c0e9fd60] [c01c43c0] print_address_description+0x164/0x2bc (unreliable) > > [c0e9fd90] [c01c46a4] kasan_report+0xfc/0x180 > > [c0e9fdd0] [c0c226d4] pmac_nvram_init+0x1f8/0x5d0 > > [c0e9fef0] [c0c1f73c] pmac_setup_arch+0x298/0x314 > > [c0e9ff20] [c0c1ac40] setup_arch+0x250/0x268 > > [c0e9ff50] [c0c151dc] start_kernel+0xb8/0x488 > > [c0e9fff0] [00003484] 0x3484 > > > > > > Memory state around the buggy address: > > f6f37c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > f6f37d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>f6f37d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 > > ^ > > f6f37e00: 00 00 01 f4 f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2 > > f6f37e80: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 > > ================================================================== > > Powerpc's show_stack() prints stack addresses, so we know that stack is > something near 0xc0e9f... address. > f6f37de0 is definitely not stack address and it's to far for the stack > overflow. > So it looks like shadow for stack - kasan_mem_to_shadow(0xc0e9f...) and > shadow for address in report - kasan_mem_to_shadow(0xf6f37de0) > point to the same physical page.
Shouldn't shadow start at 0xf8 for powerpc32? I did some math yesterday which I think lead me to 0xf8. This allows to cover at most 1GB of memory. Do you have more by any chance?