Hi! This is an attempt to add the Undefined Behavior Sanitizer to GCC. Note that it's very alpha version; so far it doesn't do that much, at the moment it should handle division by zero cases, INT_MIN / -1, and various shift cases (shifting by a negative value, shifting when second operand is >= than TYPE_PRECISION (first_operand) and suchlike. (On integer types, so far.)
It works by creating a COMPOUND_EXPR around original expression, so e.g. it creates: if (b < 0 || (b > 31 || a < 0)) { __builtin___ubsan_handle_shift_out_of_bounds (); } else { 0 }, a << b; from original "a <<= b;". There is of course a lot of stuff that needs to be done, more specifically: 0) fix an ICE which I've noticed right now ;( long a = 1; int b = 3; a <<= b; (error: mismatching comparison operand types) temporarily solved by surrounding "doing_shift = true;" with if (comptypes (type0, type1)) But that needs a better solution I'm afraid. Bah. 1) import & build the ubsan library from LLVM I've already spent some time on this, but failed miserably. I've thought that importing ubsan/ from LLVM into libsanitizer/, adding libsanitizer/ubsan/Makefile.{am,in}, editing libsanitizer/Makefile.am and libsanitizer/configure.ac, then something like aclocal && automake could be sufficient, but no. I'd very much appreciate any help with this; is someone willing to help me with this one? And it seemed so easy... 2) construct arguments for ubsan library I guess that if we want to call for instance void __ubsan::__ubsan_handle_shift_out_of_bounds(ShiftOutOfBoundsData *Data, ValueHandle LHS, ValueHandle RHS) from GCC, we need to construct arguments compatible with ShiftOutOfBoundsData/ValueHandle. So, perhaps we need some helper function that constructs the CALL_EXPR for the builtin; so far I haven't spent much time on this and don't know what exactly to do here. Time to look at what asan/tsan do. 3) add parsing of -fsanitize=<...> LLVM supports e.g. -fsanitize=shift,divbyzero combination, we should too. This doesn't sound like a big deal; just parse the arguments and set various flags, or error out on invalid combinations. 4) and of course, more instrumentation (C/C++ FE, gimple level) What comes to mind is: - float/double to integer conversions, - integer overflows (a long list of various cases here), - invalid conversions of int to bool, - reaching a __builtin_unreachable() call, - VLAs size (e.g. negative size), - store to/load of misaligned address, - store to/load of null pointer, - etc. For the time being, I plan to work on overflows instrumentation. Regtested/bootstrapped on x86_64-linux. Comments, please? 2013-06-05 Marek Polacek <pola...@redhat.com> * Makefile.in: Add ubsan.c * common.opt: Add -fsanitize=undefined option. * doc/invoke.texi: Document the new flag. * ubsan.h: New file. * ubsan.c): New file. * sanitizer.def (DEF_SANITIZER_BUILTIN): * builtins.def: Define BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW and BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS. * cp/typeck.c (cp_build_binary_op): Add division by zero and shift instrumentation. * c/c-typeck.c (build_binary_op): Likewise. * builtin-attrs.def: Define ATTR_COLD. * asan.c (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR_PTR. --- gcc/sanitizer.def.mp 2013-06-05 18:23:41.077439836 +0200 +++ gcc/sanitizer.def 2013-06-05 18:26:04.749921181 +0200 @@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE, "__tsan_atomic_signal_fence", BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST) + +/* Undefined Behavior Sanitizer */ +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW, + "__ubsan_handle_divrem_overflow", + BT_FN_VOID_PTR_PTR_PTR, + ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST) +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS, + "__ubsan_handle_shift_out_of_bounds", + BT_FN_VOID_PTR_PTR_PTR, + ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST) --- gcc/builtins.def.mp 2013-06-05 18:23:41.072439816 +0200 +++ gcc/builtins.def 2013-06-05 18:26:04.728921097 +0200 @@ -155,7 +155,7 @@ along with GCC; see the file COPYING3. #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \ true, true, true, ATTRS, true, \ - (flag_asan || flag_tsan)) + (flag_asan || flag_tsan || flag_ubsan)) #undef DEF_CILKPLUS_BUILTIN #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ --- gcc/Makefile.in.mp 2013-06-05 18:23:25.807388466 +0200 +++ gcc/Makefile.in 2013-06-05 18:26:04.723921077 +0200 @@ -1377,6 +1377,7 @@ OBJS = \ tree-affine.o \ asan.o \ tsan.o \ + ubsan.o \ tree-call-cdce.o \ tree-cfg.o \ tree-cfgcleanup.o \ @@ -2259,6 +2260,10 @@ tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_ $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \ intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h \ tree-ssa-propagate.h +ubsan.o : ubsan.c $(CONFIG_H) $(SYSTEM_H) $(GIMPLE_H) \ + output.h coretypes.h $(GIMPLE_PRETTY_PRINT_H) $(CFGLOOP_H) \ + tree-iterator.h $(TREE_FLOW_H) $(TREE_PASS_H) \ + $(TARGET_H) $(EXPR_H) $(OPTABS_H) $(TM_P_H) langhooks.h tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \ $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \ $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) $(CFGLOOP_H) \ --- gcc/doc/invoke.texi.mp 2013-06-05 18:29:18.301611796 +0200 +++ gcc/doc/invoke.texi 2013-06-05 18:33:53.756623280 +0200 @@ -5143,6 +5143,11 @@ Memory access instructions will be instr data race bugs. See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details. +@item -fsanitize=undefined +Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector +Various computations will be instrumented to detect +undefined behavior, e.g. division by zero or various overflows. + @item -fdump-final-insns@r{[}=@var{file}@r{]} @opindex fdump-final-insns Dump the final internal representation (RTL) to @var{file}. If the --- gcc/ubsan.h.mp 2013-06-05 18:23:55.083486235 +0200 +++ gcc/ubsan.h 2013-06-05 18:10:21.284693807 +0200 @@ -0,0 +1,27 @@ +/* UndefinedBehaviorSanitizer, undefined behavior detector. + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Marek Polacek <pola...@redhat.com> + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef GCC_UBSAN_H +#define GCC_UBSAN_H + +extern tree ubsan_instrument_division (location_t, enum tree_code, tree, tree); +extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree); + +#endif /* GCC_UBSAN_H */ --- gcc/ubsan.c.mp 2013-06-05 18:23:49.411467508 +0200 +++ gcc/ubsan.c 2013-06-05 18:00:25.000000000 +0200 @@ -0,0 +1,107 @@ +/* UndefinedBehaviorSanitizer, undefined behavior detector. + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Marek Polacek <pola...@redhat.com> + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "c-family/c-common.h" + +/* Instrument division by zero and INT_MIN / -1. */ + +tree +ubsan_instrument_division (location_t loc, enum tree_code code, + tree op0, tree op1) +{ + tree t, tt; + tree orig = build2 (code, TREE_TYPE (op0), op0, op1); + + if (TREE_CODE (TREE_TYPE (op0)) != INTEGER_TYPE + || TREE_CODE (TREE_TYPE (op1)) != INTEGER_TYPE) + return orig; + + /* If we *know* that the divisor is not -1 or 0, we don't have to + instrument this expression. + ??? We could use decl_constant_value to cover up more cases. */ + if (TREE_CODE (op1) == INTEGER_CST + && integer_nonzerop (op1) + && !integer_minus_onep (op1)) + return orig; + + tt = fold_build2 (EQ_EXPR, boolean_type_node, op1, + integer_minus_one_node); + t = fold_build2 (EQ_EXPR, boolean_type_node, op0, + TYPE_MIN_VALUE (TREE_TYPE (op0))); + t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt); + tt = build2 (EQ_EXPR, boolean_type_node, + op1, integer_zero_node); + t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t); + tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW); + // XXX Do we want _loc version here? + tt = build_call_expr_loc (loc, tt, 0); + t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node); + t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig); + + return t; +} + +/* Instrument left and right shifts. */ + +tree +ubsan_instrument_shift (location_t loc, enum tree_code code, + tree op0, tree op1) +{ + tree t, tt; + tree orig = build2 (code, TREE_TYPE (op0), op0, op1); + tree prec = build_int_cst (TREE_TYPE (op0), + TYPE_PRECISION (TREE_TYPE (op0))); + + t = fold_build2 (LT_EXPR, boolean_type_node, op1, integer_zero_node); + tt = fold_build2 (GE_EXPR, boolean_type_node, op1, prec); + + /* int a = 1; + a <<= 31; + is undefined in C99/C11. */ + if (code == LSHIFT_EXPR + && !TYPE_UNSIGNED (TREE_TYPE (op0)) + && (flag_isoc99 || flag_isoc11)) + { + tree prec1 = build_int_cst (TREE_TYPE (op1), + TYPE_PRECISION (TREE_TYPE (op1)) - 1); + tree x = fold_build2 (EQ_EXPR, boolean_type_node, op1, prec1); + tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, x); + } + + /* For left shift, shifting a negative value is undefined. */ + if (code == LSHIFT_EXPR) + { + tree x = fold_build2 (LT_EXPR, boolean_type_node, op0, + integer_zero_node); + tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, x); + } + + t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt); + tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS); + tt = build_call_expr_loc (loc, tt, 0); + t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node); + t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig); + + return t; +} --- gcc/cp/typeck.c.mp 2013-06-05 18:23:41.076439832 +0200 +++ gcc/cp/typeck.c 2013-06-05 18:26:04.746921169 +0200 @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. #include "intl.h" #include "target.h" #include "convert.h" +#include "ubsan.h" #include "c-family/c-common.h" #include "c-family/c-objc.h" #include "params.h" @@ -3891,6 +3892,12 @@ cp_build_binary_op (location_t location, op0 = orig_op0; op1 = orig_op1; + /* Remember whether we're doing / or %. */ + bool doing_div_or_mod = false; + + /* Remember whether we're doing << or >>. */ + bool doing_shift = false; + if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR || code == TRUTH_XOR_EXPR) @@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location, { enum tree_code tcode0 = code0, tcode1 = code1; tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none); + cop1 = maybe_constant_value (cop1); + + if (!processing_template_decl && tcode0 == INTEGER_TYPE + && (TREE_CODE (cop1) != INTEGER_CST + || integer_zerop (cop1) + || integer_minus_onep (cop1))) + doing_div_or_mod = true; - warn_for_div_by_zero (location, maybe_constant_value (cop1)); + warn_for_div_by_zero (location, cop1); if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE) tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0))); @@ -4109,8 +4123,14 @@ cp_build_binary_op (location_t location, case FLOOR_MOD_EXPR: { tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none); + cop1 = maybe_constant_value (cop1); - warn_for_div_by_zero (location, maybe_constant_value (cop1)); + if (!processing_template_decl && code0 == INTEGER_TYPE + && (TREE_CODE (cop1) != INTEGER_CST + || integer_zerop (cop1) + || integer_minus_onep (cop1))) + doing_div_or_mod = true; + warn_for_div_by_zero (location, cop1); } if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE @@ -4164,6 +4184,7 @@ cp_build_binary_op (location_t location, if (TREE_CODE (const_op1) != INTEGER_CST) const_op1 = op1; result_type = type0; + doing_shift = true; if (TREE_CODE (const_op1) == INTEGER_CST) { if (tree_int_cst_lt (const_op1, integer_zero_node)) @@ -4211,6 +4232,7 @@ cp_build_binary_op (location_t location, if (TREE_CODE (const_op1) != INTEGER_CST) const_op1 = op1; result_type = type0; + doing_shift = true; if (TREE_CODE (const_op1) == INTEGER_CST) { if (tree_int_cst_lt (const_op1, integer_zero_node)) @@ -4607,6 +4629,18 @@ cp_build_binary_op (location_t location, break; } + if (flag_ubsan && doing_div_or_mod) + { + resultcode = COMPOUND_EXPR; + return ubsan_instrument_division (location, code, op0, op1); + } + + if (flag_ubsan && doing_shift) + { + resultcode = COMPOUND_EXPR; + return ubsan_instrument_shift (location, code, op0, op1); + } + if (((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE || code0 == ENUMERAL_TYPE) && (code1 == INTEGER_TYPE || code1 == REAL_TYPE --- gcc/common.opt.mp 2013-06-05 18:23:41.075439828 +0200 +++ gcc/common.opt 2013-06-05 18:26:04.740921145 +0200 @@ -858,6 +858,10 @@ fsanitize=thread Common Report Var(flag_tsan) Enable ThreadSanitizer, a data race detector +fsanitize=undefined +Common Report Var(flag_ubsan) +Enable UndefinedBehaviorSanitizer, an undefined behavior detector + fasynchronous-unwind-tables Common Report Var(flag_asynchronous_unwind_tables) Optimization Generate unwind tables that are exact at each instruction boundary --- gcc/builtin-attrs.def.mp 2013-06-05 18:23:41.071439812 +0200 +++ gcc/builtin-attrs.def 2013-06-05 18:26:04.727921093 +0200 @@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6) #undef DEF_LIST_INT_INT /* Construct trees for identifiers. */ +DEF_ATTR_IDENT (ATTR_COLD, "cold") DEF_ATTR_IDENT (ATTR_CONST, "const") DEF_ATTR_IDENT (ATTR_FORMAT, "format") DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg") @@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO ATTR_NULL, ATTR_NOTHROW_LIST) DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\ ATTR_NULL, ATTR_NOTHROW_LEAF_LIST) +DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\ + ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST) DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\ ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST) DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC, \ --- gcc/c/c-typeck.c.mp 2013-06-05 18:23:41.073439820 +0200 +++ gcc/c/c-typeck.c 2013-06-05 18:26:04.736921129 +0200 @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. #include "tree-iterator.h" #include "bitmap.h" #include "gimple.h" +#include "ubsan.h" #include "c-family/c-objc.h" #include "c-family/c-common.h" @@ -9542,6 +9543,12 @@ build_binary_op (location_t location, en operands to truth-values. */ bool boolean_op = false; + /* Remember whether we're doing / or %. */ + bool doing_div_or_mod = false; + + /* Remember whether we're doing << or >>. */ + bool doing_shift = false; + if (location == UNKNOWN_LOCATION) location = input_location; @@ -9743,6 +9750,7 @@ build_binary_op (location_t location, en case FLOOR_DIV_EXPR: case ROUND_DIV_EXPR: case EXACT_DIV_EXPR: + doing_div_or_mod = true; warn_for_div_by_zero (location, op1); if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE @@ -9790,6 +9798,7 @@ build_binary_op (location_t location, en case TRUNC_MOD_EXPR: case FLOOR_MOD_EXPR: + doing_div_or_mod = true; warn_for_div_by_zero (location, op1); if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE @@ -9888,6 +9897,7 @@ build_binary_op (location_t location, en else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE) && code1 == INTEGER_TYPE) { + doing_shift = true; if (TREE_CODE (op1) == INTEGER_CST) { if (tree_int_cst_sgn (op1) < 0) @@ -9940,6 +9950,7 @@ build_binary_op (location_t location, en else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE) && code1 == INTEGER_TYPE) { + doing_shift = true; if (TREE_CODE (op1) == INTEGER_CST) { if (tree_int_cst_sgn (op1) < 0) @@ -10224,6 +10235,20 @@ build_binary_op (location_t location, en return error_mark_node; } + if (flag_ubsan && doing_div_or_mod) + { + ret = ubsan_instrument_division (location, code, op0, op1); + resultcode = COMPOUND_EXPR; + goto return_build_binary_op; + } + + if (flag_ubsan && doing_shift) + { + ret = ubsan_instrument_shift (location, code, op0, op1); + resultcode = COMPOUND_EXPR; + goto return_build_binary_op; + } + if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE || code0 == FIXED_POINT_TYPE || code0 == VECTOR_TYPE) && --- gcc/asan.c.mp 2013-06-05 18:23:41.070439808 +0200 +++ gcc/asan.c 2013-06-05 18:26:04.726921089 +0200 @@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void) tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE); tree BT_FN_VOID_PTR = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE); + tree BT_FN_VOID_PTR_PTR_PTR + = build_function_type_list (void_type_node, ptr_type_node, + ptr_type_node, ptr_type_node, NULL_TREE); tree BT_FN_VOID_PTR_PTRMODE = build_function_type_list (void_type_node, ptr_type_node, build_nonstandard_integer_type (POINTER_SIZE, @@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void) #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \ ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST +#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST +#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \ + /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST #undef DEF_SANITIZER_BUILTIN #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM, \ Marek