This patch introduces the mitigation for Straight Line Speculation past the BLR instruction.
This mitigation replaces BLR instructions with a BL to a stub which simply consists of a BR to the original register. These function stubs are then appended with a speculation barrier to ensure no straight line speculation happens after these jumps. When optimising for speed we use a set of stubs for each function since this should help the branch predictor make more accurate predictions about where a stub should branch. When optimising for size we use one set of stubs for the entire compilation unit. This set of stubs can have human readable names, and we are currently using `__call_indirect_x<N>` for register x<N>. As an example when optimising for size: a BLR x0 instruction would get transformed to BL __call_indirect_x0 with __call_indirect_x0 labelling a thunk that contains __call_indirect_x0: BR X0 speculation barrier Since we add these function stubs to the assembly output all in one chunk, we need not add the speculation barrier directly after each one. This is because we know for certain that the instructions directly after the BR in all but the last function stub will be from another one of these stubs and hence will not contain a speculation gadget. Instead we add a speculation barrier at the end of the sequence of stubs. Special care needs to be given to this transformation occuring in a context where BTI is enabled. A BLR can jump to a `BTI c` target, while a BR can only jump to a `BTI c` target if it uses the registers x16 or x17. Hence we use constraints to limit the registers used when this transformation is being made in an environment that uses BTI. This mitigation does not apply for BLR instructions in the following places: - Some accesses to thread-local variables use a code sequence with a BLR instruction. This code sequence is part of the binary interface between compiler and linker. If this BLR instruction needs to be mitigated, it'd probably be best to do so in the linker. It seems that the code sequence for thread-local variable access is unlikely to lead to a Spectre Revalation Gadget. - PLT stubs are produced by the linker and each contain a BLR instruction. It seems that at most only after the last PLT stub a Spectre Revalation Gadget might appear. Testing: Bootstrap and regtest on AArch64 (with BOOT_CFLAGS="-mharden-sls=retbr,blr") Used a temporary hack(1) in gcc-dg.exp to use these options on every test in the testsuite, a slight modification to emit the speculation barrier after every function stub, and a script to check that the output never emitted a BLR, or unmitigated BR or RET instruction. Similar on an aarch64-none-elf cross-compiler. 1) Temporary hack emitted a speculation barrier at the end of every stub function, and used a script to ensure that: a) Every RET or BR is immediately followed by a speculation barrier. b) No BLR instruction is emitted by compiler. gcc/ChangeLog: 2020-06-08 Matthew Malcomson <matthew.malcom...@arm.com> * config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm): New declaration. * config/aarch64/aarch64.c (aarch64_use_return_insn_p): Return false if hardening BLR instructions. (aarch64_sls_shared_thunks): Global array to store stub labels. (aarch64_create_blr_label): New. (print_asm_branch): New macro. (aarch64_sls_emit_blr_function_thunks): New. (aarch64_sls_emit_shared_blr_thunks): New. (aarch64_asm_file_end): New. (aarch64_indirect_call_asm): New. (TARGET_ASM_FILE_END): Use aarch64_asm_file_end. (TARGET_ASM_FUNCTION_EPILOGUE): Use aarch64_sls_emit_blr_function_thunks. * config/aarch64/aarch64.h (struct machine_function): Introduce `call_via` array to store function-local stub labels. * config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use aarch64_indirect_call_asm to emit code when hardening BLR instructions. gcc/testsuite/ChangeLog: 2020-06-08 Matthew Malcomson <matthew.malcom...@arm.com> * gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test. * gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test. ############### Attachment also inlined for ease of reply ############### diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names; tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *); const char * aarch64_sls_barrier (int); +const char * aarch64_indirect_call_asm (rtx); extern bool aarch64_harden_sls_retbr_p (void); extern bool aarch64_harden_sls_blr_p (void); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -862,8 +862,9 @@ typedef struct GTY (()) machine_function struct aarch64_frame frame; /* One entry for each hard register. */ bool reg_is_wrapped_separately[LAST_SAVED_REGNUM]; + rtx call_via[LAST_SAVED_REGNUM]; bool label_is_assembled; } machine_function; #endif /* Which ABI to use. */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9356937fe266c68196392a1589b3cf96607de104..93552acda553e3258ccebdb9b82979b72489ba8e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8075,6 +8075,13 @@ aarch64_expand_prologue (void) bool aarch64_use_return_insn_p (void) { + /* Documentation says we should not have the "return" pattern enabled if we + wish to use the TARGET_ASM_FUNCTION_EPILOGUE hook. We wish to use that + hook to implement the BLR function stubs, so we always disable this + pattern when using those stubs. */ + if (aarch64_harden_sls_blr_p ()) + return false; + if (!reload_completed) return false; @@ -22923,6 +22930,180 @@ aarch64_sls_barrier (int mitigation_required) : ""; } +static GTY (()) rtx aarch64_sls_shared_thunks[31]; +static GTY (()) bool aarch64_sls_shared_thunks_needed = false; +const char *indirect_symbol_names[31] = { + "__call_indirect_x0", + "__call_indirect_x1", + "__call_indirect_x2", + "__call_indirect_x3", + "__call_indirect_x4", + "__call_indirect_x5", + "__call_indirect_x6", + "__call_indirect_x7", + "__call_indirect_x8", + "__call_indirect_x9", + "__call_indirect_x10", + "__call_indirect_x11", + "__call_indirect_x12", + "__call_indirect_x13", + "__call_indirect_x14", + "__call_indirect_x15", + "__call_indirect_x16", + "__call_indirect_x17", + "__call_indirect_x18", + "__call_indirect_x19", + "__call_indirect_x20", + "__call_indirect_x21", + "__call_indirect_x22", + "__call_indirect_x23", + "__call_indirect_x24", + "__call_indirect_x25", + "__call_indirect_x26", + "__call_indirect_x27", + "__call_indirect_x28", + "__call_indirect_x29", + "__call_indirect_x30", +}; + +/* Function to create a BLR thunk. This thunk is used to mitigate straight + line speculation. Instead of a simple BLR that can be speculated past, + code emits a BL to this thunk, and this thunk emits a BR to the relevant + register. These thunks have the relevant speculation barries put after + their indirect branch so that speculation is blocked. + + We use such a thunk so the speculation barriers are kept off the + architecturally executed path in order to reduce the performance overhead. + + When optimising for size we use stubs shared by the entire compilation unit. + When optimising for performance we emit stubs for each function in the hope + that the branch predictor can better train on jumps specific for a given + function. */ +rtx +aarch64_sls_create_blr_label (int regnum) +{ + gcc_assert (regnum < 31); + if (optimize_function_for_size_p (cfun)) + { + /* For the thunks shared between different functions in this compilation + unit we use a named symbol -- this is just for users to more easily + understand the generated assembly. */ + aarch64_sls_shared_thunks_needed = true; + if (aarch64_sls_shared_thunks[regnum] == NULL) + aarch64_sls_shared_thunks[regnum] + = gen_rtx_SYMBOL_REF (Pmode, indirect_symbol_names[regnum]); + + return aarch64_sls_shared_thunks[regnum]; + } + + if (cfun->machine->call_via[regnum] == NULL) + cfun->machine->call_via[regnum] + = gen_rtx_LABEL_REF(Pmode, gen_label_rtx ()); + return cfun->machine->call_via[regnum]; +} + +/* Emit all BLR stubs for this particular function. + Here we emit all the BLR stubs needed for the current function. Since we + emit these stubs in a consecutive block we know there will be no speculation + gadgets between each stub, and hence we only emit a speculation barrier at + the end of the stub sequences. + + This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook. */ +#define print_asm_branch(regno) asm_fprintf (out_file, "\tbr\tx%d\n", regno) +void +aarch64_sls_emit_blr_function_thunks (FILE *out_file) +{ + if (! aarch64_harden_sls_blr_p ()) + return; + + bool any_functions_emitted = false; + /* We must save and restore the current function section since this assembly + is emitted at the end of the function. This means it can be emitted *just + after* the cold section of a function. That cold part would be emitted in + a different section. That switch would trigger a `.cfi_endproc` directive + to be emitted in the original section and a `.cfi_startproc` directive to + be emitted in the new section. Switching to the original section without + restoring would mean that the `.cfi_endproc` emitted as a function ends + would happen in a different section -- leaving an unmatched + `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc` + in the standard text section. */ + section *save_text_section = in_section; + switch_to_section (function_section (current_function_decl)); + for (int regnum = 0; regnum < 31; ++regnum) + { + rtx specu_label = cfun->machine->call_via[regnum]; + if (specu_label == NULL) + continue; + + output_operand (specu_label, 0); + asm_fprintf (out_file, ":\n"); + print_asm_branch (regnum); + any_functions_emitted = true; + } + if (any_functions_emitted) + /* Can use the SB if needs be here, since this stub will only be used + by the current function, and hence for the current target. */ + output_asm_insn (aarch64_sls_barrier (true), NULL); + switch_to_section (save_text_section); +} + +/* Emit all BLR stubs for the current compilation unit. + Over the course of compiling this unit we may have converted some BLR + instructions to a BL to a shared stub function. This is where we emit those + stub functions. + This function is for the stubs shared between different functions in this + compilation unit. We share when optimising for size instead of speed. + + This function is called through the TARGET_ASM_FILE_END hook. */ +void +aarch64_sls_emit_shared_blr_thunks (FILE *out_file) +{ + if (! aarch64_sls_shared_thunks_needed) + return; + + switch_to_section (text_section); + ASM_OUTPUT_ALIGN (out_file, 2); + for (int regnum = 0; regnum < 31; ++regnum) + { + rtx specu_label = aarch64_sls_shared_thunks[regnum]; + if (!specu_label) + continue; + + ASM_OUTPUT_LABEL (out_file, indirect_symbol_names[regnum]); + print_asm_branch (regnum); + } + /* Use the most conservative target to ensure it can always be used by any + function in the translation unit. */ + asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n"); +} +#undef print_asm_branch + +/* Implement TARGET_ASM_FILE_END. */ +void +aarch64_asm_file_end () +{ + aarch64_sls_emit_shared_blr_thunks (asm_out_file); + /* Since this function will be called for the ASM_FILE_END hook, we ensure + that what would be called otherwise (e.g. `file_end_indicate_exec_stack` + for FreeBSD) still gets called. */ +#ifdef TARGET_ASM_FILE_END + TARGET_ASM_FILE_END (); +#endif +} + +const char * +aarch64_indirect_call_asm (rtx addr) +{ + if (aarch64_harden_sls_blr_p () && REG_P (addr)) + { + rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr)); + output_asm_insn ("bl\t%0", &stub_label); + } + else + output_asm_insn ("blr\t%0", &addr); + return ""; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -23473,6 +23654,12 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust +#undef TARGET_ASM_FILE_END +#define TARGET_ASM_FILE_END aarch64_asm_file_end + +#undef TARGET_ASM_FUNCTION_EPILOGUE +#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1019,15 +1019,22 @@ ) (define_insn "*call_insn" - [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf")) + [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Ucs, Usf")) (match_operand 1 "" "")) (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%0 + * return aarch64_indirect_call_asm (operands[0]); + * return aarch64_indirect_call_asm (operands[0]); bl\\t%c0" - [(set_attr "type" "call, call")] + [(set_attr "type" "call, call, call") + (set_attr_alternative + "enabled" [(if_then_else (and (match_test "aarch64_enable_bti") + (match_test "aarch64_harden_sls_blr_p ()")) + (const_string "no") + (const_string "yes")) + (const_string "yes") (const_string "yes")])] ) (define_expand "call_value" @@ -1047,15 +1054,22 @@ (define_insn "*call_value_insn" [(set (match_operand 0 "" "") - (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf")) + (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Ucs, Usf")) (match_operand 2 "" ""))) (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%1 + * return aarch64_indirect_call_asm (operands[1]); + * return aarch64_indirect_call_asm (operands[1]); bl\\t%c1" - [(set_attr "type" "call, call")] + [(set_attr "type" "call, call, call") + (set_attr_alternative + "enabled" [(if_then_else (and (match_test "aarch64_enable_bti") + (match_test "aarch64_harden_sls_blr_p ()")) + (const_string "no") + (const_string "yes")) + (const_string "yes") (const_string "yes")])] ) (define_expand "sibcall" diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c new file mode 100644 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + Here we also check that there are no BR instructions with anything except an + x16 or x17 register. This is because a `BTI c` instruction can be branched + to using a BLR instruction using any register, but can only be branched to + with a BR using an x16 or x17 register. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */ +/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c new file mode 100644 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c @@ -0,0 +1,35 @@ +/* { dg-additional-options "-mharden-sls=blr -save-temps" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + We only test that all BLR instructions have been removed, not that the + resulting code makes sense. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names; tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *); const char * aarch64_sls_barrier (int); +const char * aarch64_indirect_call_asm (rtx); extern bool aarch64_harden_sls_retbr_p (void); extern bool aarch64_harden_sls_blr_p (void); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -862,8 +862,9 @@ typedef struct GTY (()) machine_function struct aarch64_frame frame; /* One entry for each hard register. */ bool reg_is_wrapped_separately[LAST_SAVED_REGNUM]; + rtx call_via[LAST_SAVED_REGNUM]; bool label_is_assembled; } machine_function; #endif /* Which ABI to use. */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9356937fe266c68196392a1589b3cf96607de104..93552acda553e3258ccebdb9b82979b72489ba8e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8075,6 +8075,13 @@ aarch64_expand_prologue (void) bool aarch64_use_return_insn_p (void) { + /* Documentation says we should not have the "return" pattern enabled if we + wish to use the TARGET_ASM_FUNCTION_EPILOGUE hook. We wish to use that + hook to implement the BLR function stubs, so we always disable this + pattern when using those stubs. */ + if (aarch64_harden_sls_blr_p ()) + return false; + if (!reload_completed) return false; @@ -22923,6 +22930,180 @@ aarch64_sls_barrier (int mitigation_required) : ""; } +static GTY (()) rtx aarch64_sls_shared_thunks[31]; +static GTY (()) bool aarch64_sls_shared_thunks_needed = false; +const char *indirect_symbol_names[31] = { + "__call_indirect_x0", + "__call_indirect_x1", + "__call_indirect_x2", + "__call_indirect_x3", + "__call_indirect_x4", + "__call_indirect_x5", + "__call_indirect_x6", + "__call_indirect_x7", + "__call_indirect_x8", + "__call_indirect_x9", + "__call_indirect_x10", + "__call_indirect_x11", + "__call_indirect_x12", + "__call_indirect_x13", + "__call_indirect_x14", + "__call_indirect_x15", + "__call_indirect_x16", + "__call_indirect_x17", + "__call_indirect_x18", + "__call_indirect_x19", + "__call_indirect_x20", + "__call_indirect_x21", + "__call_indirect_x22", + "__call_indirect_x23", + "__call_indirect_x24", + "__call_indirect_x25", + "__call_indirect_x26", + "__call_indirect_x27", + "__call_indirect_x28", + "__call_indirect_x29", + "__call_indirect_x30", +}; + +/* Function to create a BLR thunk. This thunk is used to mitigate straight + line speculation. Instead of a simple BLR that can be speculated past, + code emits a BL to this thunk, and this thunk emits a BR to the relevant + register. These thunks have the relevant speculation barries put after + their indirect branch so that speculation is blocked. + + We use such a thunk so the speculation barriers are kept off the + architecturally executed path in order to reduce the performance overhead. + + When optimising for size we use stubs shared by the entire compilation unit. + When optimising for performance we emit stubs for each function in the hope + that the branch predictor can better train on jumps specific for a given + function. */ +rtx +aarch64_sls_create_blr_label (int regnum) +{ + gcc_assert (regnum < 31); + if (optimize_function_for_size_p (cfun)) + { + /* For the thunks shared between different functions in this compilation + unit we use a named symbol -- this is just for users to more easily + understand the generated assembly. */ + aarch64_sls_shared_thunks_needed = true; + if (aarch64_sls_shared_thunks[regnum] == NULL) + aarch64_sls_shared_thunks[regnum] + = gen_rtx_SYMBOL_REF (Pmode, indirect_symbol_names[regnum]); + + return aarch64_sls_shared_thunks[regnum]; + } + + if (cfun->machine->call_via[regnum] == NULL) + cfun->machine->call_via[regnum] + = gen_rtx_LABEL_REF(Pmode, gen_label_rtx ()); + return cfun->machine->call_via[regnum]; +} + +/* Emit all BLR stubs for this particular function. + Here we emit all the BLR stubs needed for the current function. Since we + emit these stubs in a consecutive block we know there will be no speculation + gadgets between each stub, and hence we only emit a speculation barrier at + the end of the stub sequences. + + This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook. */ +#define print_asm_branch(regno) asm_fprintf (out_file, "\tbr\tx%d\n", regno) +void +aarch64_sls_emit_blr_function_thunks (FILE *out_file) +{ + if (! aarch64_harden_sls_blr_p ()) + return; + + bool any_functions_emitted = false; + /* We must save and restore the current function section since this assembly + is emitted at the end of the function. This means it can be emitted *just + after* the cold section of a function. That cold part would be emitted in + a different section. That switch would trigger a `.cfi_endproc` directive + to be emitted in the original section and a `.cfi_startproc` directive to + be emitted in the new section. Switching to the original section without + restoring would mean that the `.cfi_endproc` emitted as a function ends + would happen in a different section -- leaving an unmatched + `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc` + in the standard text section. */ + section *save_text_section = in_section; + switch_to_section (function_section (current_function_decl)); + for (int regnum = 0; regnum < 31; ++regnum) + { + rtx specu_label = cfun->machine->call_via[regnum]; + if (specu_label == NULL) + continue; + + output_operand (specu_label, 0); + asm_fprintf (out_file, ":\n"); + print_asm_branch (regnum); + any_functions_emitted = true; + } + if (any_functions_emitted) + /* Can use the SB if needs be here, since this stub will only be used + by the current function, and hence for the current target. */ + output_asm_insn (aarch64_sls_barrier (true), NULL); + switch_to_section (save_text_section); +} + +/* Emit all BLR stubs for the current compilation unit. + Over the course of compiling this unit we may have converted some BLR + instructions to a BL to a shared stub function. This is where we emit those + stub functions. + This function is for the stubs shared between different functions in this + compilation unit. We share when optimising for size instead of speed. + + This function is called through the TARGET_ASM_FILE_END hook. */ +void +aarch64_sls_emit_shared_blr_thunks (FILE *out_file) +{ + if (! aarch64_sls_shared_thunks_needed) + return; + + switch_to_section (text_section); + ASM_OUTPUT_ALIGN (out_file, 2); + for (int regnum = 0; regnum < 31; ++regnum) + { + rtx specu_label = aarch64_sls_shared_thunks[regnum]; + if (!specu_label) + continue; + + ASM_OUTPUT_LABEL (out_file, indirect_symbol_names[regnum]); + print_asm_branch (regnum); + } + /* Use the most conservative target to ensure it can always be used by any + function in the translation unit. */ + asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n"); +} +#undef print_asm_branch + +/* Implement TARGET_ASM_FILE_END. */ +void +aarch64_asm_file_end () +{ + aarch64_sls_emit_shared_blr_thunks (asm_out_file); + /* Since this function will be called for the ASM_FILE_END hook, we ensure + that what would be called otherwise (e.g. `file_end_indicate_exec_stack` + for FreeBSD) still gets called. */ +#ifdef TARGET_ASM_FILE_END + TARGET_ASM_FILE_END (); +#endif +} + +const char * +aarch64_indirect_call_asm (rtx addr) +{ + if (aarch64_harden_sls_blr_p () && REG_P (addr)) + { + rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr)); + output_asm_insn ("bl\t%0", &stub_label); + } + else + output_asm_insn ("blr\t%0", &addr); + return ""; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -23473,6 +23654,12 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust +#undef TARGET_ASM_FILE_END +#define TARGET_ASM_FILE_END aarch64_asm_file_end + +#undef TARGET_ASM_FUNCTION_EPILOGUE +#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1019,15 +1019,22 @@ ) (define_insn "*call_insn" - [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf")) + [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Ucs, Usf")) (match_operand 1 "" "")) (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%0 + * return aarch64_indirect_call_asm (operands[0]); + * return aarch64_indirect_call_asm (operands[0]); bl\\t%c0" - [(set_attr "type" "call, call")] + [(set_attr "type" "call, call, call") + (set_attr_alternative + "enabled" [(if_then_else (and (match_test "aarch64_enable_bti") + (match_test "aarch64_harden_sls_blr_p ()")) + (const_string "no") + (const_string "yes")) + (const_string "yes") (const_string "yes")])] ) (define_expand "call_value" @@ -1047,15 +1054,22 @@ (define_insn "*call_value_insn" [(set (match_operand 0 "" "") - (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf")) + (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Ucs, Usf")) (match_operand 2 "" ""))) (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%1 + * return aarch64_indirect_call_asm (operands[1]); + * return aarch64_indirect_call_asm (operands[1]); bl\\t%c1" - [(set_attr "type" "call, call")] + [(set_attr "type" "call, call, call") + (set_attr_alternative + "enabled" [(if_then_else (and (match_test "aarch64_enable_bti") + (match_test "aarch64_harden_sls_blr_p ()")) + (const_string "no") + (const_string "yes")) + (const_string "yes") (const_string "yes")])] ) (define_expand "sibcall" diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c new file mode 100644 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + Here we also check that there are no BR instructions with anything except an + x16 or x17 register. This is because a `BTI c` instruction can be branched + to using a BLR instruction using any register, but can only be branched to + with a BR using an x16 or x17 register. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */ +/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c new file mode 100644 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c @@ -0,0 +1,35 @@ +/* { dg-additional-options "-mharden-sls=blr -save-temps" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + We only test that all BLR instructions have been removed, not that the + resulting code makes sense. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */