Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

Qing Zhao via Gcc-patches Wed, 16 Jun 2021 08:05:15 -0700

Hi, Richard,

On Jun 16, 2021, at 1:19 AM, Richard Biener 
<rguent...@suse.de<mailto:rguent...@suse.de>> wrote:


+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+    {
+    default:
+      gcc_unreachable ();
+    case AUTO_INIT_PATTERN:
+      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+      expand_assignment (var, init, false);
+      break;
+    case AUTO_INIT_ZERO:
+      init = build_zero_cst (TREE_TYPE (var));
+      expand_assignment (var, init, false);
+      break;
+    }

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, after more consideration, I feel that this might be a more
general optimization for “store_constructor” itself:

I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
value over a certain threshold,
i.e, 70% of the total size, then we might need to use a call to memset first, 
and then emit some additional single
field stores  to fix up the fields that have different initialization values?

Just like the current handling of “zeroes” in the current “store_constructor”, 
if “zeroes” occupy most of the constructor, then
“Clear the whole structure” first, then emit additional single field stories to 
fix up other fields that do not hold zeros.

So, I think that it might be better to keep the current
“expand_assignment” for “Pattern initialization” as it is in this patch.

And then, later we can add a separate patch to add this more general
optimization in “store_constructor” to improve the run time performance
and code size in general?

What’s your opinion on this?

My point is that _building_ the constructor is what we want to avoid
since that involves a lot of overhead memory-wise, it also requires
yet another complex structure field walk with much room for errors.

Block-initializing the object is so much easier and more efficient.
Implementing block initialization with a block size different from
a single byte should be also reasonably possible.  I mean there's
wmemset (not in GCC), so such block initialization would have other
uses as well.

If the pattern of the value that is used to initialize is repeatable, then
Block-initializing is ideal. However, Since the patterns of the values that
are used to initialize might not be completely repeatable due to BOOLEAN (0),
POINTER_TYPE at 32-bit platform (0x000000AA) and FLOATING TYPE (NaN),
After block initializing of the whole object, we still need to add additional 
fix up
stores of these different patterns to the corresponding fields.

But that's a bug with the pattern used then.  You can never be sure that
an object is used only as its declared type but you are initializing it
as if it were.  Also all uninit uses invoke undefined behavior so I don't
see why you need to pay special attention here.  After all this makes
pattern init so much more fragile than zero-init which makes me question
it even more ...

Yes, you are right.  The major reason for the complexity of the code to handle 
pattern initialization
is because multiple different patterns are assigned to different types.

This is for the compatibility with CLANG. -:). (https://reviews.llvm.org/D54604)

For reference, I copied the part for pattern initialization from CLANG’s patch 
below:


1. Pattern initialization

  This is the recommended initialization approach. Pattern initialization's
  goal is to initialize automatic variables with values which will likely
  transform logic bugs into crashes down the line, are easily recognizable in
  a crash dump, without being values which programmers can rely on for useful
  program semantics. At the same time, pattern initialization tries to
  generate code which will optimize well. You'll find the following details in
  `patternFor`:

  - Integers are initialized with repeated 0xAA bytes (infinite scream).
  - Vectors of integers are also initialized with infinite scream.
  - Pointers are initialized with infinite scream on 64-bit platforms because
    it's an unmappable pointer value on architectures I'm aware of. Pointers
    are initialize to 0x000000AA (small scream) on 32-bit platforms because
    32-bit platforms don't consistently offer unmappable pages. When they do
    it's usually the zero page. As people try this out, I expect that we'll
    want to allow different platforms to customize this, let's do so later.
  - Vectors of pointers are initialized the same way pointers are.
  - Floating point values and vectors are initialized with a negative quiet NaN
    with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff). NaNs 
are nice
    (here, anways) because they propagate on arithmetic, making it more likely
    that entire computations become NaN when a single uninitialized value
    sneaks in.
  - Arrays are initialized to their homogeneous elements' initialization
    value, repeated. Stack-based Variable-Length Arrays (VLAs) are
    runtime-initialized to the allocated size (no effort is made for negative
    size, but zero-sized VLAs are untouched even if technically undefined).
  - Structs are initialized to their heterogeneous element's initialization
    values. Zero-size structs are initialized as 0xAA since they're allocated
    a single byte.
  - Unions are initialized using the initialization for the largest member of
    the union.

  Expect the values used for pattern initialization to change over time, as we
  refine heuristics (both for performance and security). The goal is truly to
  avoid injecting semantics into undefined behavior, and we should be
  comfortable changing these values when there's a worthwhile point in doing
  so.

  Why so much infinite scream? Repeated byte patterns tend to be easy to
  synthesize on most architectures, and otherwise memset is usually very
  efficient. For values which aren't entirely repeated byte patterns, LLVM
  will often generate code which does memset + a few stores.




For some of the objects whose most fields are BOOLEAN, POINTER_TYPE,
rr FLOATING_TYPE, pattern  initializing likee this might be less efficient. Do 
you
agree on this?

Use a pattern that fits them all.  I mean memory allocation hardening
fills allocated storage with a repeated (byte) pattern and people are
happy with that.  It also makes it easy to spot uninitialized storage
from a debugger.  So please, do not over-design this, it really doesn't
make any sense and the common case you are inevitably chasing here
would already be fine with a random repeated pattern.

So, My question is:

If we want to pattern initialize with the single repeated pattern for all 
types, with one is better to use:  “0xAAAAAAAA”
or “0xFFFFFFFF” , or other pattern that our current glibc used? What’s that 
pattern?

Will  “0xAAAAAAAA” in a floating type auto variable crash the program?
Will “0xFFFFFFFF” in a pointer type auto variable crash the program? (Might 
crash?)


(thus also my suggestion to split out
padding handling - now we can also split out pattern init handling,
maybe somebody else feels like reviewing and approving this, who knows).

I am okay with further splitting pattern initialization part to a separate 
patch. Then we will
have 4 independent patches in total:

1. -fauto-var-init=zero and all the handling in other passes to the new added 
call to .DEFERRED_INIT.
2. Add -fauto-var-init=pattern
3. Add -fauto-var-init-padding
4. Add -ftrivial-auto-var-init for CLANG compatibility.

Are the above the correct understanding?

As said, block-initializing with a repeated pattern is OK and I can see
that being useful.  Trying to produce "nicer" values for floats, bools
and pointers on 32bit platforms is IMHO not going to fix anything and
introduce as many problems as it will "fix".

Yes, I agree, if we can find a good repeated pattern for all types’s pattern 
initialization, that will
be much easier and simpler to implement, I am happy to do that.  (Honestly, the 
part of implementation
that took me most of the time is pattern-initialization.. and I am still not 
very comfortable with this part
Of the code myself.  -:)


And if you block-initialize stuff you then automagically cover padding.
I call this a win-win, no?

Yes, this will also initialize paddings with patterns (Not zeroes as CLANG did).
Shall we compatible with CLANG on this?


Now, what you _could_ try is do sth like

tree ar = build_array_type (uint_ptr_type_node, size_type_node, false);
tree range = build2 (RANGE_EXPR, size_type_node,
                     size_zero_node, build_int_cst (size_type_node,
size-in-words));
tree pattern = build_int_cst (uint_ptr_type_node, 0xdeadbeef);
ctor = build_constructor_single (ar, range, pattern);
ctor = build1 (VIEW_CONVERT_EXPR, type, ctor);


So, the above will be used to replace the following original code in my patch:

+    case AUTO_INIT_PATTERN:
+      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+      expand_assignment (var, init, false);
+      break;

?

in my example code (untested) you then still need

  expand_assignment (var, ctor, false);

it would be the easiest way to try pattern init with a pattern that's
bigger than a byte (otherwise of course the memset path is optimal).

If the pattern that is used to initialize all types is byte-repeatable, for 
example, 0xA or 0xF, then
We can use memset to initialize all types, however, the potential problem is, 
if later we decide
To change to another pattern that might not be byte-repeatable, then the memset 
implementation
is not proper at that time.

Is it possible that we might change the pattern later?



thus build a range-init CTOR of an array of pointer-sized elements
but do the actual assignment to the target object by viewing that
as the target objects type.

Okay.

So, for a target object that is smaller than a word, for example, BOOLEAN,
CHAR or short, is the above still working?

Well, you obviously have to adjust that - you can't pattern-init
a BOOL with word-size stores of a word-size pattern.

As said, for example glibc allocator hardening with MALLOC_PERTURB_
uses simple byte-init.

What’s the pattern glibc used?

Thanks a lot.

Qing

That should block-initialize with
a uint_ptr_type_node sized pattern (but likely less efficient than
special block init would do).

Not sure what problems you will run into this, you'll have to try.

I will try this.

Thanks a lot for your suggestions.

Qing

Richard.



--
Richard Biener <rguent...@suse.de<mailto:rguent...@suse.de>>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

Reply via email to