> On Nov 21, 2017, at 2:09 AM, Dmitry Vyukov <dvyu...@google.com> wrote: > >> On Mon, Nov 20, 2017 at 10:44 PM, Andy Lutomirski <l...@kernel.org> wrote: >>> On Mon, Nov 20, 2017 at 9:07 AM, Andy Lutomirski <l...@kernel.org> wrote: >>> This sets up stack switching, including for SYSCALL. I think it's >>> in decent shape. >>> >>> Known issues: >>> - KASAN is likely to be busted. This could be fixed either by teaching >>> KASAN that cpu_entry_area contains valid stacks (I have no clue how >>> to go about doing this) or by rigging up the IST entry code to switch >>> RSP to point to the direct-mapped copy of the stacks before calling >>> into non-KASAN-excluded C code. >>> >> >> I tried to fix the KASAN issue, and I'm doing something wrong. I'm >> building this tree: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_stack&id=8319677bd04a1ab291ca71fe1da7aa023306e4a9 >> >> for 64 bits with KASAN on. The relevant commit is: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_stack&id=a4bdb48c3469708b6b51e5ab90d27bf0c859000c >> >> If I run tools/testing/selftests/single_step_syscall_32, then the >> kernel goes into lala land and infinite loops. The root cause seems >> to we're hitting do_debug with RSP pointing into the fixmap, >> specifically in the cpu_entry_area's exception stack, with a value of >> roughly 0xffffffffff1bd108. The KASAN instrumentation in do_debug is >> then getting a page fault. I think my KASAN setup code should be >> populating the KASAN data there and, indeed, gdb seems to be able to >> access the faulting address. So I'm confused. > > > Hi, > > I don't have any great insights. > > You have stack instrumentation turned on, right? And the fault happens > on stack instrumentation? > Stack instrumentation is turned on with gcc7+ I think. And as the > result compiler adds redzones on stack and poisons/unpoisons shadow > for them in function prologue/epilogue.
I found the problem. I goofed in the setup code, so I ended up with a only zero page in the shadow. Turns out that gdb can happily write to read only memory :( > > The fact that KASAN instrumentation faults, but gdb can access it > sounds strange. KASAN instrumentation is no magic, it just does not a > normal memory load. Please check exact faulting address. KASAN can do > accesses with large offset from RSP. > > Does the fault happen before/after kasan_early_init? Before that there > is a different bootstrap shadow mapped by kasan_map_early_shadow. > > Does the fault happen on read access or write access? Stack > instrumentation does write into shadow, but some parts of shadow are > mapped with a single read-only page. Can gdb write to that address? > > Is it possible that the stack has overflowed? I see that we increase > EXCEPTION_STACK_ORDER by order 1 under KASAN (from 4k page to 8k > pages), but it may be not enough. Normal stacks are increased from 16k > to 32k. > > Last stupid question: why is it -1 here: > FIX_CPU_ENTRY_AREA_BOTTOM = FIX_CPU_ENTRY_AREA_TOP + > (CPU_ENTRY_AREA_PAGES * NR_CPUS) - 1, > ? > Say CPU_ENTRY_AREA_PAGES=1 (we need only 1 page) and NR_CPUS=1, then > the increment will be 0, which looks wrong for any case (must be at > least 1, right?).