On Thu, 27 May 2021, Qing Zhao wrote: > Hi, Richard, > > Thanks a lot for your comments. > > > On May 26, 2021, at 6:18 AM, Richard Biener > <rguent...@suse.de<mailto:rguent...@suse.de>> wrote: > > On Wed, 12 May 2021, Qing Zhao wrote: > > Hi, > > This is the 3rd version of the patch for the new security feature for GCC.
Meh - can you try using a mailer that does proper quoting? It's difficult to spot your added comments. Will try anyway (and sorry for the delay) > Please take look and let me know your comments and suggestions. > > thanks. > > Qing > > ******Compare with the 2nd version, the following are the major changes: > > 1. use "lookup_attribute ("uninitialized",) directly instead of adding > one new field "uninitialized" into tree_decl_with_vis. > 2. update documentation to mention that the new option will not confuse > -Wuninitialized, GCC still consider an auto without explicit initializer > as uninitialized. > 3. change the name of "build_pattern_cst" to more specific name as > "build_pattern_cst_for_auto_init". > 4. handling of nested VLA; > Adding new testing cases (auto-init-15/16.c) for this new handling. > 5. Add new verifications of calls to .DEFERRED_INIT in tree-cfg.c; > 6. in tree-sra.c, update the handling of "grp_to_be_debug_replaced", > bind the lhs variable to a call to .DEFERRED_INIT. > 7. In tree-ssa-structalias.c, delete "find_func_aliases_for_deferred_init", > return directly for a call to .DEFERRED_INIT in > "find_func_aliases_for_call". > 8. Add more detailed comments in tree-ssa-uninit.c and tree-ssa.c to explain > the special handling on REALPART_EXPR/IMAGPRT_EXPR. > 9. in build_pattern_cst_for_auto_init: > BOOLEAN_TYPE will be set to zero always; > INTEGER_TYPE (?and ENUMERAL_TYPE) use wi::from_buffer in order to > correctly handle 128-bit integers. > POINTER_TYPE will not assert on SIZE < 32. > REAL_TYPE add fallback; > 10. changed gcc_assert to gcc_unreachable in several places; > 11. add more comments; > 12. some style issue changes. > > ******Please see the version 2 at: > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567262.html > > > ******The following 2 items are the ones I didn’t addressed in this version > due to further study and might need more discussion: > > 1. Using __builtin_clear_padding to replace type_has_padding. > > My study shows: the call to __builtin_clear_padding is expanded during > gimplification phase. > And there is no __bultin_clear_padding expanding during rtx expanding phase. > If so, for -ftrivial-auto-var-init, padding initialization should be done > both in gimplification phase and rtx expanding phase. > And since the __builtin_clear_padding might not be good for rtx expanding, > reusing __builtin_clear_padding might not work. > > 2. Pattern init to NULLPTR_TYPE and ENUMERAL_TYPE: need more comments from > Richard Biener on this. > > ******The change of the 3rd version compared to the 2nd version are: > > > +@item -ftrivial-auto-var-init=@var{choice} > +@opindex ftrivial-auto-var-init > +Initialize automatic variables with either a pattern or with zeroes to > increase > +the security and predictability of a program by preventing uninitialized > memory > +disclosure and use. > > the docs do not state what "trivial" actually means? Does it affect > C++ classes with ctors, thus is "trivial" equal to what C++ considers > a POD type? > > Thank you for this question. > > The name -ftrivial-auto-var-init is just for compatible with Clang. I really > don’t know why > they added trivial. > > As I checked a small example with C++ class with ctors, I see both Clang and > my patch add > Initialization to this class: > > ===== > #include <iostream> > > using namespace std; > > class Line { > public: > void setLength( double len ); > double getLength( void ); > Line(); // This is the constructor > private: > double length; > }; > > // Member functions definitions including constructor > Line::Line(void) { > cout << "Object is being created" << endl; > } > void Line::setLength( double len ) { > length = len; > } > double Line::getLength( void ) { > return length; > } > > // Main function for the program > int main() { > Line line; > > // set line length > line.setLength(6.0); > cout << "Length of line : " << line.getLength() <<endl; > > return 0; > } > > ===== > > Both clang and my patch add initialization to the above auto variable “line”. > > So, I have the following questions need help: > > 1. Do we need to exclude C++ class with ctor from auto initialization? > > 2. I see Clang use call to internal memset to initialize such class, but for > my patch, I only initialize the data fields inside this class. > Which one is better? I can't answer either question, but generally using block-initialization (for example via memset, but we'd generally prefer X = {}) is better for later optimization. > > > diff --git a/gcc/expr.c b/gcc/expr.c > index 798285eb52ca..9092349d7017 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -6539,14 +6539,19 @@ store_constructor (tree exp, rtx target, int > cleared, poly_int64 size, > cleared = 1; > } > > - /* If the constructor has fewer fields than the structure or > - if we are initializing the structure to mostly zeros, clear > - the whole structure first. Don't do this if TARGET is a > - register whose mode size isn't equal to SIZE since > - clear_storage can't handle this case. */ > + /* If the constructor has fewer fields than the structure, > + or if we are initializing the structure to mostly zeros, > + or if the user requests to initialize automatic variables with > + paddings inside the type, > + we should clear the whole structure first. > + Don't do this if TARGET is a register whose mode size isn't > equal > + SIZE since clear_storage can't handle this case. */ > else if (known_size_p (size) > && (((int) CONSTRUCTOR_NELTS (exp) != fields_length > (type)) > - || mostly_zeros_p (exp)) > + || mostly_zeros_p (exp) > + || (flag_trivial_auto_var_init > > AUTO_INIT_UNINITIALIZED > + && !TREE_STATIC (exp) > + && type_has_padding (type))) > > testing flag_trivial_auto_var_init tests the global options, if TUs > are compiled with different setting of flag_trivial_auto_var_init > and you use LTO or flag_trivial_auto_var_init is specified per > function via optimize attributes it's more appropriate to test > opt_for_fn (cfun->decl, flag_trivial_auto_var_init) > > Okay. Thanks for the info. I will update this. > > > You do not actually test whether TARGET is an auto-var in this place, > so I question this change > > Will add checking for Auto on this. > > - the documentation of ftrivial-auto-var-init > also doesn't mention initialization of padding > > Will add initialization of padding on this. > > Clang add the padding initialization with this option, I guess that we might > need to > be compatible with it? > > and the above doesn't > seem to honor -ftrivial-auto-var-init=pattern for this. > > Richard Sandiford raised the same question in the previous review. > And this behavior also follows Clang’s behavior. > > The following is the answer from Kees Cook on this question: > > ===== > > Originally, Clang > implemented using pattern, but there was discussion around it and the > decision there was to go with zero init, as it seemed to more closely > match the C spec: > https://github.com/llvm/llvm-project/commit/d39fbc7e20d84364e409ce59724ce20625637062 > > ===== > > > +enum auto_init_approach { > + AUTO_INIT_A = 0, > + AUTO_INIT_D = 1 > +}; > > I'm assuming we're going for one implementation in the end - have > you made up your mind yet? > > Yes, I already decided to take the approach D. (Adding call to internal > function .DEFFERED_INIT). > > The reason to temporarily keep both implementations is just for the > convenience to run some > Performance comparison during the implementation period. I am afraid that I > might need some > Changes for approach D during the review process, in case there might be > major change, we > need to run the performance testing again. > > So, before the last version, I will just keep both implementations. > > If all the change is good, I will complete delete approach A at that time. > Is this okay? OK. > > > +/* If FOR_UNINIT is true, GIMPLE_CALL S is a call to builtin_memset that > + is known to be emitted for unintialized VLA objects. */ > + > +static inline void > +gimple_call_set_memset_for_uninit (gcall *s, bool for_uninit) > > seeing this, can you explain why using .DEFERRED_INIT does not > work for VLAs? > > The major reason for going different routes for VLAs vs. no-VLAs is: > > In the original gimplification phase, VLAs and no-VLAs go different routes. > I just followed the different routes for them: > > In “gimplify_decl_expr”, VLA goes to “gimplify_vla_decl”, and is expanded to > call to alloca. Naturally, I add calls to “memset/memcpy” in > “gimplify_vla_decl” to > Initialize it. > > On the other hand, no-VLAs are handled differently in “gimplify_decl_expr”, so > I added calls to “.DEFFERED_INIT” to initialize them. > > What’s the major issue if I add calls to “memset/memcpy” in > “gimplify_vla_decl” to > Initialize VLAs? Just inconsistency and unexpected different behavior with respect to uninitialized warnings? > You can build > > .DEFERRED_INIT (WITH_SIZE_EXPR <decl, size-expression>, type) > > to carry the initialization size. There's maybe_with_size_expr that > you can conveniently use to perform the WITH_SIZE_EXPR wrapping. > I'd strongly prefer not going different routes for VLAs vs. on-VLAs. > > What’s the major benefit to this? See above. > > @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq > *pre_p, gimple_seq *post_p, > /* If a single access to the target must be ensured and all > elements > are zero, then it's optimal to clear whatever their number. > */ > cleared = true; > + else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED > + && !TREE_STATIC (object) > + && type_has_padding (type)) > + /* If the user requests to initialize automatic variables with > + paddings inside the type, we should initialize the paddings > too. > + C guarantees that brace-init with fewer initializers than > members > + aggregate will initialize the rest of the aggregate as-if it > were > + static initialization. In turn static initialization > guarantees > + that pad is initialized to zero bits. > + So, it's better to clear the whole record under such > situation. */ > + cleared = true; > > so here we have padding as well - I think this warrants to be controlled > by an extra option? And we can maybe split this out to a separate > patch? (the whole padding stuff) > > Clang does the padding initialization with this option, shall we be > consistent with Clang? Just for the sake of consistency? No. Is there a technical reason for this complication? Say we have struct { short s; int i; } a; what's the technical reason to initialize the padding? I might be tempted to use -ftrivial-auto-init but I'd definitely don't want to spend cycles/instructions initializing the padding in the above struct. At this point I also wonder whether doing the actual initialization by block-initializing the current function frame at allocation time. That would be a way smaller patch (but possibly backend specific). On x86 it could be a single rep mov; for all but the VLA cases. Just a thought. > +/* Expand the IFN_DEFERRED_INIT function according to its second > argument. */ > +static void > +expand_DEFERRED_INIT (internal_fn, gcall *stmt) > +{ > + tree var = gimple_call_lhs (stmt); > + tree init = NULL_TREE; > + enum auto_init_type init_type > + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); > + > + switch (init_type) > + { > + default: > + gcc_unreachable (); > + case AUTO_INIT_PATTERN: > + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); > + expand_assignment (var, init, false); > + break; > + case AUTO_INIT_ZERO: > + init = build_zero_cst (TREE_TYPE (var)); > + expand_assignment (var, init, false); > + break; > + } > > I think actually building build_pattern_cst_for_auto_init can generate > massive garbage and for big auto vars code size is also a concern and > ideally on x86 you'd produce rep movq. So I don't think going > via expand_assignment is good. Instead you possibly want to lower > .DEFERRED_INIT to MEMs following expand_builtin_memset and > eventually enhance that to allow storing pieces larger than a byte. > > Will study this approach a little bit more, might need more help from you on > this part. > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index e457b917b98e..2e0e76ea8838 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1829,7 +1829,7 @@ struct GTY(()) tree_decl_with_vis { > unsigned final : 1; > /* Belong to FUNCTION_DECL exclusively. */ > unsigned regdecl_flag : 1; > - /* 14 unused bits. */ > + /* 14 unused bits. */ > /* 32 more unused on 64 bit HW. */ > }; > > > spurious change, please drop. > > Okay. > > > + /* Ignore REALPART_EXPR or IMAGPART_EXPR if its operand is a call to > + .DEFERRED_INIT. This is for handling the following case correctly: > + > + 1 typedef _Complex float C; > + 2 C foo(int cond) > + 3 { > + 4 C f; > + 5 __imag__ f = 0; > + 6 if (cond) > + 7 { > + 8 __real__ f = 1; > + 9 return f; > + 10 } > + 11 return f; > + 12 } > + > + with -ftrivial-auto-var-init, compiler will insert the following > + artificial initialization at line 4: > + f = .DEFERRED_INIT (f, 2); > + _1 = REALPART_EXPR <f>; > + > + without the following special handling, _1 = REALPART_EXPR <f> will > + be treated as the uninitialized use point, which is incorrect. (the > + real uninitialized use point is at line 11). */ > + if (is_gimple_assign (context) > + && (gimple_assign_rhs_code (context) == REALPART_EXPR > + || gimple_assign_rhs_code (context) == IMAGPART_EXPR)) > + { > + tree v = gimple_assign_rhs1 (context); > + if (TREE_CODE (TREE_OPERAND (v, 0)) == SSA_NAME > + && gimple_call_internal_p (SSA_NAME_DEF_STMT (TREE_OPERAND (v, > 0)), > + IFN_DEFERRED_INIT)) > + return; > + } > > will this not mishandle (not report) > > C foo () > { > C f; > return __real f; > } > > ? I think ssa_undefined_value_p is supposed to catch this, you > seem to duplicate something there as well. > > Will check on this. > > > +/* Returns true when the given TYPE has padding inside it. > + return false otherwise. */ > +bool > +type_has_padding (tree type) > +{ > + switch (TREE_CODE (type)) > + { > + case RECORD_TYPE: > + { > > btw, there's __builtin_clear_padding and a whole machinery around > it in gimple-fold.c, I'm sure that parts could be re-used if they > are neccessary in the end. > > Richard Sandiford provided this suggestion previously, my previous study > shows that it might not > Be convenient to use it. But I will study this a little bit more and get back > to you. > > > Otherwise the changes look OK. > > Do DCE/DSE remove unused .DEFERRED_INIT? > > It should, but I will double check on this. > > Thanks again. > > Qing > > Thanks, > Richard. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)