* Andy Lutomirski <l...@amacapital.net> wrote: > Hi all- > > On x86_64, we use IST for #BP and #DB. On x86_32, we don't. > > We started using IST for #BP in: > > b556b35e98ad [PATCH] x86_64: Move int 3 handler to debug stack and > allow to increase it. > > and we started using IST for #DB even earlier in: > > 7abe2c67299e [PATCH] x86-64 merge for 2.6.4 > > This has some unpleasant side effects these days. Primarily, it > requires a bunch of ugly code to avoid recursive use of the debug > stack when, say, an NMI interrupts do_int3 or do_debug and either hits > a kprobe int3 or a #DB if it inadvertently touches a userspace > watchpoint. See TRACE_IRQS_OFF_DEBUG for another bit wart in that > code. > > Here are all of the reasons I can come up with for using IST: > > 1. SYSENTER with TF set will immediately (or after one instruction -- > I'm not quite sure) cause #DB. This is easy to handle -- we can just > set up a sysenter stack just like x86_32. > > 2. #DB needs paranoid gsbase handling (due to SYSENTER if nothing > else). However, there's no real reason that IST and paranoid gsbase > handling need to be tied together. > > 3. Stack usage. Almost anything can hit a kprobe and any uaccess > operation can hit a watchpoint. I'm not sure how much of a problem > this is. If it is a real problem, we could use something more like > the irqstack mechanism instead of IST.
This might have been an issue back when we still tried to fit things into 8K kernel stacks (4K on 32-bit). These days we have ~15K kernel stacks on 64-bit: arch/x86/include/asm/page_64_types.h:#define THREAD_SIZE_ORDER (2 + KASAN_STACK_ORDER) and we also have irq stacks that dramatically reduce asynchronous stack nesting effects. > 4. kgdb. kgdb doesn't appear to respect the kprobe blacklist at > all, so kdbg would blow up if it tried to breakpoint early or late > in syscall handling. (Hmm. I bet kdbg also blows up if you use it > to put a breakpoint early in do_int3.) Yes, my answer to kernel debuggers is: "Don't do it then, or implement support for it more cleanly than this hackery." > Thoughts? > > Even if it turns out that we can't get rid of IST for #DB and #BP, I > bet we could simplify matters by rigging up the all of the IST > entries to switch IST off for #DB and #BP immediately upon entry and > to leave them off until immediately before returning, thereby > simplifying the logic quite a bit. I think this would be a pure > performance win -- the only patch here in which performance matters > is NMI AFAICT, and the NMI code already does that, albeit rather > deeply buried. I'd suggest we try get rid of it and restart with a clean implementation. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/