From: Nadav Amit <na...@vmware.com> Date: Fri, May 10, 2019 at 7:45 PM To: <x...@kernel.org> Cc: Borislav Petkov, <linux-kernel@vger.kernel.org>, Nadav Amit, Andy Lutomirsky, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Jann Horn
> It may be useful to check in runtime whether certain assertions are > violated even during speculative execution. This can allow to avoid > adding unnecessary memory fences and at the same time check that no data > leak channels exist. > > For example, adding such checks can show that allocating zeroed pages > can return speculatively non-zeroed pages (the first qword is not > zero). [This might be a problem when the page-fault handler performs > software page-walk, for example.] > > Introduce SPEC_WARN_ON(), which checks in runtime whether a certain > condition is violated during speculative execution. The condition should > be computed without branches, e.g., using bitwise operators. The check > will wait for the condition to be realized (i.e., not speculated), and > if the assertion is violated, a warning will be thrown. > > Warnings can be provided in one of two modes: precise and imprecise. > Both mode are not perfect. The precise mode does not always make it easy > to understand which assertion was broken, but instead points to a point > in the execution somewhere around the point in which the assertion was > violated. In addition, it prints a warning for each violation (unlike > WARN_ONCE() like behavior). > > The imprecise mode, on the other hand, can sometimes throw the wrong > indication, specifically if the control flow has changed between the > speculative execution and the actual one. Note that it is not a > false-positive, it just means that the output would mislead the user to > think the wrong assertion was broken. > > There are some more limitations. Since the mechanism requires an > indirect branch, it should not be used in production systems that are > susceptible for Spectre v2. The mechanism requires TSX and performance > counters that are only available in skylake+. There is a hidden > assumption that TSX is not used in the kernel for anything else, other > than this mechanism. > Nice trick! Can you eliminate the indirect call by forcing an access fault to abort the transaction instead, e.g. "cmove 0, $1"? (If this works, it may also allow support on older architectures as the RTM_RETIRED.ABORT* events go back further I believe?) > The basic idea behind the implementation is to use a performance counter > that updates also during speculative execution as an indication for > assertion failure. By using conditional-mov, which is not predicted, > to affect the control flow, the condition is realized before the event > that affects the PMU is triggered. > > Enable this feature by setting "spec_warn=on" or "spec_warn=precise" > kernel parameter. I did not run performance numbers but I guess the > overhead should not be too high. > > I did not run too many tests, but brief experiments suggest that it does > work. Let me know if I missed anything and whether you think this can be > useful. To be frank, the exact use cases are not super clear, and there > are various possible extensions (e.g., ensuring the speculation window > is long enough by adding data dependencies). I would appreciate your > inputs. > > Cc: Andy Lutomirsky <l...@kernel.org> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Jann Horn <ja...@google.com> > Signed-off-by: Nadav Amit <na...@vmware.com> > --- > arch/x86/Kconfig | 4 + > arch/x86/include/asm/nospec-branch.h | 30 +++++ > arch/x86/kernel/Makefile | 1 + > arch/x86/kernel/nospec.c | 185 +++++++++++++++++++++++++++ > 4 files changed, 220 insertions(+) > create mode 100644 arch/x86/kernel/nospec.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 62fc3fda1a05..2cc57c2172be 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2887,6 +2887,10 @@ config X86_DMA_REMAP > config HAVE_GENERIC_GUP > def_bool y > > +config DEBUG_SPECULATIVE_EXECUTION > + bool "Debug speculative execution" > + depends on X86_64 > + > source "drivers/firmware/Kconfig" > > source "arch/x86/kvm/Kconfig" > diff --git a/arch/x86/include/asm/nospec-branch.h > b/arch/x86/include/asm/nospec-branch.h > index dad12b767ba0..3f1af6378304 100644 > --- a/arch/x86/include/asm/nospec-branch.h > +++ b/arch/x86/include/asm/nospec-branch.h > @@ -290,6 +290,36 @@ static inline void > indirect_branch_prediction_barrier(void) > /* The Intel SPEC CTRL MSR base value cache */ > extern u64 x86_spec_ctrl_base; > > +#ifdef CONFIG_DEBUG_SPECULATIVE_EXECUTION > + > +extern bool spec_check(unsigned long cond); > + > +DECLARE_STATIC_KEY_FALSE(spec_test_key); > +DECLARE_STATIC_KEY_FALSE(spec_test_precise_key); > + > +#define SPEC_WARN_ON(cond) \ > +do { \ > + bool _error; \ > + \ > + if (!static_branch_unlikely(&spec_test_key)) \ > + break; \ > + \ > + _error = spec_check((unsigned long)(cond)); \ > + \ > + if (static_branch_unlikely(&spec_test_precise_key)) \ > + break; \ > + \ > + WARN_ONCE(_error, \ > + "Speculative execution assertion failed: (%s)\n", \ > + __stringify(cond)); \ > +} while (0) > + > +#else > + > +#define SPEC_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond) > + > +#endif > + > /* > * With retpoline, we must use IBRS to restrict branch prediction > * before calling into firmware. > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index 00b7e27bc2b7..63a3a1420f8e 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -139,6 +139,7 @@ obj-$(CONFIG_X86_INTEL_UMIP) += umip.o > obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o > obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o > obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o > +obj-y += nospec.o > > ### > # 64 bit specific files > diff --git a/arch/x86/kernel/nospec.c b/arch/x86/kernel/nospec.c > new file mode 100644 > index 000000000000..9e0711d34543 > --- /dev/null > +++ b/arch/x86/kernel/nospec.c > @@ -0,0 +1,185 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include <linux/init.h> > +#include <linux/perf_event.h> > + > +#include <asm/cmdline.h> > + > +#include "../events/perf_event.h" > + > +#ifdef CONFIG_DEBUG_SPECULATIVE_EXECUTION > + > +static struct perf_event_attr spec_event_attr = { > + .type = PERF_TYPE_RAW, > + .size = sizeof(struct perf_event_attr), > + .config = 0x15d, > + .sample_freq = 1, > + .pinned = 1, > + .freq = 1, > + .exclude_user = 1, > +}; > + > +static DEFINE_PER_CPU(unsigned int, spec_warn_irq_count); > + > +DEFINE_STATIC_KEY_FALSE(spec_test_key); > +EXPORT_SYMBOL(spec_test_key); > + > +DEFINE_STATIC_KEY_FALSE(spec_test_precise_key); > +EXPORT_SYMBOL(spec_test_precise_key); > + > +static void spec_event_handler(struct perf_event *evt, > + struct perf_sample_data *data, struct pt_regs *regs) > +{ > + unsigned int cnt; > + > + /* > + * We need a barrier to prevent the xabort from being executed > + * speculatively. > + */ > + > + indirect_branch_prediction_barrier(); > + > + /* > + * Avoid false positives or wrong indications in the wrong spot due to > + * IRQs. > + */ > + cnt = this_cpu_read(irq_count); > + this_cpu_write(spec_warn_irq_count, cnt); > + if (!static_branch_unlikely(&spec_test_precise_key)) > + return; > + > + WARN(1, "Speculative execution detected at ip=%lx\n", regs->ip); > +} > + > +static DEFINE_PER_CPU(struct perf_event *, spec_perf_event); > + > +void nop_func(void) > +{ > +} > + > +void xabort_func(void) > +{ > + asm volatile ("xabort $1"); > +} > + > +bool spec_check(unsigned long cond) > +{ > + bool error = false; > + > + /* > + * Avoid mistakenly triggering more events inside the NMI handler. We > + * will assume that this condition can be resolved with speculation. > + */ > + if (in_nmi()) > + return 0; > + > + preempt_disable(); > + > + /* > + * Run a transaction which would only trigger xabort if the condition > + * is true. This xabort is counted by the PMU and will trigger an > + * interrupt. The conditional move prevents speculative evaluation of > + * the condition. The branch would not mistakenly be mispredicted > later > + * to run xabort again, since we run IBPB in the nmi handler. > + * > + * The implementation is based on the assumption cmov's are not going > + * through prediction, which is a common assumption. It is based on > + * Intel SDM saying "Use the SETCC and CMOV instructions to eliminate > + * unpredictable conditional branches where possible". > + * > + * This code is susceptible to Spectre v2 attacks. > + */ > + asm volatile ("xbegin 1f\n\t" > + "movq nop_func, %%rax\n\t" > + "testq %[cond], %[cond]\n\t" > + "cmovnzq %[xabort_func], %%rax\n\t" > + ANNOTATE_RETPOLINE_SAFE > + "call *%%rax\n\t" > + "xend\n\t" > + "1:" : : [cond]"r"((u64)(cond)), > [xabort_func]"r"(xabort_func) : > + "memory", "rax"); > + > + /* > + * Prevent wrong alerts due to speculative events that happened in the > + * IRQ handler. > + */ > + if (likely(__this_cpu_read(spec_warn_irq_count) != > __this_cpu_read(irq_count))) > + goto out; > + > + /* Check again with IRQs disabled to be sure we get a stable read. */ > + this_cpu_write(spec_warn_irq_count, -2); > + error = true; > +out: > + preempt_enable(); > + return error; > +} > +EXPORT_SYMBOL(spec_check); > + > +static int __init nospec_init(void) > +{ > + int cpu, len; > + char buf[20]; > + bool precise, enable; > + > + /* Check the TSX and PMU are supported */ > + if (!boot_cpu_has(X86_FEATURE_RTM) || > !boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { > + pr_err("Speculative checks are not supported on this CPU"); > + return 0; > + } > + > + /* > + * Unlike their name indicates, precise warnings are actually > + * imprecise, in the sense that they do not provide the exact code > that > + * caused the speculation failure. They are precise in the sense that > + * they are prevent scenarios that the wrong assertion will be assumed > + * to fail. This might happen if speculatively a certain assertion > + * fails, but on the actual execution, this assertion does not fail. > + */ > + len = cmdline_find_option(boot_command_line, "spec_warn", buf, > sizeof(buf)); > + > + if (len < 0) > + return 0; > + > + precise = (len == 7 && !strncmp(buf, "precise", 7)); > + enable = (len == 2 && !strncmp(buf, "on", 2)) || precise; > + > + if (!enable) { > + pr_err("Invalid spec_warn argument (spec_warn=%s)", buf); > + return 0; > + } > + > + /* Set impossible value in order not to trigger a warning */ > + for_each_online_cpu(cpu) > + per_cpu(spec_warn_irq_count, cpu) = (unsigned int)-2; > + > + if (precise) > + static_key_enable(&spec_test_precise_key.key); > + > + pr_info("Enabling %s speculative execution checks", > + precise ? "precise" : "imprecise"); > + > + /* > + * For each CPU, set a performance counter to trigger a PMU interrupt > + * whenever an instruction that causes a transaction abort is > executed. > + * These instruction will only be counted within a transaction. > + */ > + for_each_online_cpu(cpu) { > + struct perf_event *evt = > + perf_event_create_kernel_counter(&spec_event_attr, > cpu, > + NULL, &spec_event_handler, NULL); > + > + if (!evt) { > + pr_err("Failed to enable speculative checks"); > + break; > + } > + > + per_cpu(spec_perf_event, cpu) = evt; > + } > + > + static_key_enable(&spec_test_key.key); > + > + return 0; > +} > +arch_initcall(nospec_init); > + > +#endif > -- > 2.19.1 >