On Thu, 27 May 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> Thanks a lot for your comments.
> 
> 
> On May 26, 2021, at 6:18 AM, Richard Biener 
> <rguent...@suse.de<mailto:rguent...@suse.de>> wrote:
> 
> On Wed, 12 May 2021, Qing Zhao wrote:
> 
> Hi,
> 
> This is the 3rd version of the patch for the new security feature for GCC.

Meh - can you try using a mailer that does proper quoting?  It's difficult
to spot your added comments.  Will try anyway (and sorry for the delay)

> Please take look and let me know your comments and suggestions.
> 
> thanks.
> 
> Qing
> 
> ******Compare with the 2nd version, the following are the major changes:
> 
> 1. use "lookup_attribute ("uninitialized",) directly instead of adding
>   one new field "uninitialized" into tree_decl_with_vis.
> 2. update documentation to mention that the new option will not confuse
>   -Wuninitialized, GCC still consider an auto without explicit initializer
>   as uninitialized.
> 3. change the name of "build_pattern_cst" to more specific name as
>   "build_pattern_cst_for_auto_init".
> 4. handling of nested VLA;
>   Adding new testing cases (auto-init-15/16.c) for this new handling.
> 5. Add  new verifications of calls to .DEFERRED_INIT in tree-cfg.c;
> 6. in tree-sra.c, update the handling of "grp_to_be_debug_replaced",
>   bind the lhs variable to a call to .DEFERRED_INIT.
> 7. In tree-ssa-structalias.c, delete "find_func_aliases_for_deferred_init",
>   return directly for a call to .DEFERRED_INIT in 
> "find_func_aliases_for_call".
> 8. Add more detailed comments in tree-ssa-uninit.c and tree-ssa.c to explain
>   the special handling on REALPART_EXPR/IMAGPRT_EXPR.
> 9. in build_pattern_cst_for_auto_init:
>   BOOLEAN_TYPE will be set to zero always;
>   INTEGER_TYPE (?and ENUMERAL_TYPE) use wi::from_buffer in order to
>                correctly handle 128-bit integers.
>   POINTER_TYPE will not assert on SIZE < 32.
>   REAL_TYPE add fallback;
> 10. changed gcc_assert to gcc_unreachable in several places;
> 11. add more comments;
> 12. some style issue changes.
> 
> ******Please see the version 2 at:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567262.html
> 
> 
> ******The following 2 items are the ones I didn’t addressed in this version 
> due to further study and might need more discussion:
> 
> 1. Using __builtin_clear_padding  to replace type_has_padding.
> 
> My study shows: the call to __builtin_clear_padding is expanded during 
> gimplification phase.
> And there is no __bultin_clear_padding expanding during rtx expanding phase.
> If so,  for -ftrivial-auto-var-init, padding initialization should be done 
> both in gimplification phase and rtx expanding phase.
> And since the __builtin_clear_padding might not be good for rtx expanding, 
> reusing __builtin_clear_padding might not work.
> 
> 2. Pattern init to NULLPTR_TYPE and ENUMERAL_TYPE: need more comments from 
> Richard Biener on this.
> 
> ******The change of the 3rd version compared to the 2nd version are:
> 
> 
> +@item -ftrivial-auto-var-init=@var{choice}
> +@opindex ftrivial-auto-var-init
> +Initialize automatic variables with either a pattern or with zeroes to
> increase
> +the security and predictability of a program by preventing uninitialized
> memory
> +disclosure and use.
> 
> the docs do not state what "trivial" actually means?  Does it affect
> C++ classes with ctors, thus is "trivial" equal to what C++ considers
> a POD type?
> 
> Thank you for this question.
> 
> The name -ftrivial-auto-var-init is just for compatible with Clang. I really 
> don’t know why
> they added trivial.
> 
> As I checked a small example with C++ class with ctors, I see both Clang and 
> my patch add
> Initialization to this class:
> 
> =====
> #include <iostream>
> 
> using namespace std;
> 
> class Line {
>    public:
>       void setLength( double len );
>       double getLength( void );
>       Line();  // This is the constructor
>    private:
>       double length;
> };
> 
> // Member functions definitions including constructor
> Line::Line(void) {
>    cout << "Object is being created" << endl;
> }
> void Line::setLength( double len ) {
>   length = len;
> }
> double Line::getLength( void ) {
>   return length;
> }
> 
> // Main function for the program
> int main() {
>   Line line;
> 
>   // set line length
>   line.setLength(6.0);
>   cout << "Length of line : " << line.getLength() <<endl;
> 
>   return 0;
> }
> 
> =====
> 
> Both clang and my patch add initialization to the above auto variable “line”.
> 
> So, I have the following questions need help:
> 
> 1. Do we need to exclude C++ class with ctor from auto initialization?
> 
> 2. I see Clang use call to internal memset to initialize such class, but for 
> my patch, I only initialize the data fields inside this class.
>     Which one is better?

I can't answer either question, but generally using block-initialization
(for example via memset, but we'd generally prefer X = {}) is better for
later optimization.

> 
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 798285eb52ca..9092349d7017 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -6539,14 +6539,19 @@ store_constructor (tree exp, rtx target, int
> cleared, poly_int64 size,
>            cleared = 1;
>          }
> 
> -        /* If the constructor has fewer fields than the structure or
> -          if we are initializing the structure to mostly zeros, clear
> -          the whole structure first.  Don't do this if TARGET is a
> -          register whose mode size isn't equal to SIZE since
> -          clear_storage can't handle this case.  */
> +       /* If the constructor has fewer fields than the structure,
> +          or if we are initializing the structure to mostly zeros,
> +          or if the user requests to initialize automatic variables with
> +          paddings inside the type,
> +          we should clear the whole structure first.
> +          Don't do this if TARGET is a register whose mode size isn't
> equal
> +          SIZE since clear_storage can't handle this case.  */
>        else if (known_size_p (size)
>                 && (((int) CONSTRUCTOR_NELTS (exp) != fields_length
> (type))
> -                    || mostly_zeros_p (exp))
> +                    || mostly_zeros_p (exp)
> +                    || (flag_trivial_auto_var_init >
> AUTO_INIT_UNINITIALIZED
> +                        && !TREE_STATIC (exp)
> +                        && type_has_padding (type)))
> 
> testing flag_trivial_auto_var_init tests the global options, if TUs
> are compiled with different setting of flag_trivial_auto_var_init
> and you use LTO or flag_trivial_auto_var_init is specified per
> function via optimize attributes it's more appropriate to test
> opt_for_fn (cfun->decl, flag_trivial_auto_var_init)
> 
> Okay.  Thanks for the info. I will update this.
> 
> 
> You do not actually test whether TARGET is an auto-var in this place,
> so I question this change
> 
> Will add checking for Auto on this.
> 
> - the documentation of ftrivial-auto-var-init
> also doesn't mention initialization of padding
> 
> Will add initialization of padding on this.
> 
> Clang add the padding initialization with this option, I guess that we might 
> need to
> be compatible with it?
> 
> and the above doesn't
> seem to honor -ftrivial-auto-var-init=pattern for this.
> 
> Richard Sandiford raised the same question in the previous review.
> And this behavior also follows Clang’s behavior.
> 
> The following is the answer from Kees Cook on this question:
> 
> =====
> 
> Originally, Clang
> implemented using pattern, but there was discussion around it and the
> decision there was to go with zero init, as it seemed to more closely
> match the C spec:
> https://github.com/llvm/llvm-project/commit/d39fbc7e20d84364e409ce59724ce20625637062
> 
> =====
> 
> 
> +enum auto_init_approach {
> +  AUTO_INIT_A = 0,
> +  AUTO_INIT_D = 1
> +};
> 
> I'm assuming we're going for one implementation in the end - have
> you made up your mind yet?
> 
> Yes, I already decided to take the approach D. (Adding call to internal 
> function .DEFFERED_INIT).
> 
> The reason to temporarily keep both implementations is just for the 
> convenience to run some
> Performance comparison during the implementation period. I am afraid that I 
> might need some
> Changes for approach D during the review process, in case there might be 
> major change, we
> need to run the performance testing again.
> 
> So, before the last version, I will just keep both implementations.
> 
> If all the change is good, I will complete delete approach A at that time.
> Is this okay?

OK.

> 
> 
> +/* If FOR_UNINIT is true, GIMPLE_CALL S is a call to builtin_memset that
> +   is known to be emitted for unintialized VLA objects.  */
> +
> +static inline void
> +gimple_call_set_memset_for_uninit (gcall *s, bool for_uninit)
> 
> seeing this, can you explain why using .DEFERRED_INIT does not
> work for VLAs?
> 
> The major reason for going different routes for VLAs vs. no-VLAs is:
> 
> In the original gimplification phase, VLAs and no-VLAs go different routes.
> I just followed the different routes for them:
> 
> In “gimplify_decl_expr”, VLA goes to “gimplify_vla_decl”, and is expanded to
> call to alloca.  Naturally, I add calls to “memset/memcpy” in 
> “gimplify_vla_decl” to
> Initialize it.
> 
> On the other hand, no-VLAs are handled differently in “gimplify_decl_expr”, so
> I added calls to “.DEFFERED_INIT” to initialize them.
> 
> What’s the major issue if I add calls to “memset/memcpy” in 
> “gimplify_vla_decl” to
> Initialize VLAs?

Just inconsistency and unexpected different behavior with respect to
uninitialized warnings?

>  You can build
> 
>  .DEFERRED_INIT (WITH_SIZE_EXPR <decl, size-expression>, type)
> 
> to carry the initialization size.  There's maybe_with_size_expr that
> you can conveniently use to perform the WITH_SIZE_EXPR wrapping.
> I'd strongly prefer not going different routes for VLAs vs. on-VLAs.
> 
> What’s the major benefit to this?

See above.

> 
> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
>          /* If a single access to the target must be ensured and all
> elements
>             are zero, then it's optimal to clear whatever their number.
> */
>          cleared = true;
> +       else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
> +                && !TREE_STATIC (object)
> +                && type_has_padding (type))
> +         /* If the user requests to initialize automatic variables with
> +            paddings inside the type, we should initialize the paddings
> too.
> +            C guarantees that brace-init with fewer initializers than
> members
> +            aggregate will initialize the rest of the aggregate as-if it
> were
> +            static initialization.  In turn static initialization
> guarantees
> +            that pad is initialized to zero bits.
> +            So, it's better to clear the whole record under such
> situation.  */
> +         cleared = true;
> 
> so here we have padding as well - I think this warrants to be controlled
> by an extra option?  And we can maybe split this out to a separate
> patch? (the whole padding stuff)
> 
> Clang does the padding initialization with this option, shall we be 
> consistent with Clang?

Just for the sake of consistency?  No.  Is there a technical reason
for this complication?  Say we have

  struct { short s; int i; } a;

what's the technical reason to initialize the padding?  I might
be tempted to use -ftrivial-auto-init but I'd definitely don't
want to spend cycles/instructions initializing the padding in the
above struct.

At this point I also wonder whether doing the actual initialization
by block-initializing the current function frame at allocation
time.  That would be a way smaller patch (but possibly backend
specific).  On x86 it could be a single rep mov; for all but the
VLA cases.  Just a thought.

> +/* Expand the IFN_DEFERRED_INIT function according to its second
> argument.  */
> +static void
> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> +{
> +  tree var = gimple_call_lhs (stmt);
> +  tree init = NULL_TREE;
> +  enum auto_init_type init_type
> +    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> +
> +  switch (init_type)
> +    {
> +    default:
> +      gcc_unreachable ();
> +    case AUTO_INIT_PATTERN:
> +      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
> +      expand_assignment (var, init, false);
> +      break;
> +    case AUTO_INIT_ZERO:
> +      init = build_zero_cst (TREE_TYPE (var));
> +      expand_assignment (var, init, false);
> +      break;
> +    }
> 
> I think actually building build_pattern_cst_for_auto_init can generate
> massive garbage and for big auto vars code size is also a concern and
> ideally on x86 you'd produce rep movq.  So I don't think going
> via expand_assignment is good.  Instead you possibly want to lower
> .DEFERRED_INIT to MEMs following expand_builtin_memset and
> eventually enhance that to allow storing pieces larger than a byte.
> 
> Will study this approach a little bit more, might need more help from you on 
> this part.
> 
> 
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index e457b917b98e..2e0e76ea8838 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1829,7 +1829,7 @@ struct GTY(()) tree_decl_with_vis {
>  unsigned final : 1;
>  /* Belong to FUNCTION_DECL exclusively.  */
>  unsigned regdecl_flag : 1;
> - /* 14 unused bits. */
> + /* 14 unused bits.  */
>  /* 32 more unused on 64 bit HW. */
> };
> 
> 
> spurious change, please drop.
> 
> Okay.
> 
> 
> +  /* Ignore REALPART_EXPR or IMAGPART_EXPR if its operand is a call to
> +     .DEFERRED_INIT.  This is for handling the following case correctly:
> +
> +  1 typedef _Complex float C;
> +  2 C foo(int cond)
> +  3 {
> +  4   C f;
> +  5   __imag__ f = 0;
> +  6   if (cond)
> +  7     {
> +  8       __real__ f = 1;
> +  9       return f;
> + 10     }
> + 11   return f;
> + 12 }
> +
> +    with -ftrivial-auto-var-init, compiler will insert the following
> +    artificial initialization at line 4:
> +  f = .DEFERRED_INIT (f, 2);
> +  _1 = REALPART_EXPR <f>;
> +
> +    without the following special handling, _1 = REALPART_EXPR <f> will
> +    be treated as the uninitialized use point, which is incorrect. (the
> +    real uninitialized use point is at line 11).  */
> +  if (is_gimple_assign (context)
> +      && (gimple_assign_rhs_code (context) == REALPART_EXPR
> +         || gimple_assign_rhs_code (context) == IMAGPART_EXPR))
> +    {
> +      tree v = gimple_assign_rhs1 (context);
> +      if (TREE_CODE (TREE_OPERAND (v, 0)) == SSA_NAME
> +         && gimple_call_internal_p (SSA_NAME_DEF_STMT (TREE_OPERAND (v,
> 0)),
> +                                    IFN_DEFERRED_INIT))
> +       return;
> +    }
> 
> will this not mishandle (not report)
> 
> C foo ()
> {
>  C f;
>  return __real f;
> }
> 
> ?  I think ssa_undefined_value_p is supposed to catch this, you
> seem to duplicate something there as well.
> 
> Will check on this.
> 
> 
> +/* Returns true when the given TYPE has padding inside it.
> +   return false otherwise.  */
> +bool
> +type_has_padding (tree type)
> +{
> +  switch (TREE_CODE (type))
> +    {
> +    case RECORD_TYPE:
> +      {
> 
> btw, there's __builtin_clear_padding and a whole machinery around
> it in gimple-fold.c, I'm sure that parts could be re-used if they
> are neccessary in the end.
> 
> Richard Sandiford provided this suggestion previously, my previous study 
> shows that it might not
> Be convenient to use it. But I will study this a little bit more and get back 
> to you.
> 
> 
> Otherwise the changes look OK.
> 
> Do DCE/DSE remove unused .DEFERRED_INIT?
> 
> It should, but I will double check on this.
> 
> Thanks again.
> 
> Qing
> 
> Thanks,
> Richard.
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to