On Tue, 15 Jun 2021, Qing Zhao wrote: > Hi, Richard, > > > > On Jun 15, 2021, at 8:21 AM, Richard Biener <rguent...@suse.de> wrote: > >> > >> > >> +/* Expand the IFN_DEFERRED_INIT function according to its second > >> argument. */ > >> +static void > >> +expand_DEFERRED_INIT (internal_fn, gcall *stmt) > >> +{ > >> + tree var = gimple_call_lhs (stmt); > >> + tree init = NULL_TREE; > >> + enum auto_init_type init_type > >> + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); > >> + > >> + switch (init_type) > >> + { > >> + default: > >> + gcc_unreachable (); > >> + case AUTO_INIT_PATTERN: > >> + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); > >> + expand_assignment (var, init, false); > >> + break; > >> + case AUTO_INIT_ZERO: > >> + init = build_zero_cst (TREE_TYPE (var)); > >> + expand_assignment (var, init, false); > >> + break; > >> + } > >> > >> I think actually building build_pattern_cst_for_auto_init can generate > >> massive garbage and for big auto vars code size is also a concern and > >> ideally on x86 you'd produce rep movq. So I don't think going > >> via expand_assignment is good. Instead you possibly want to lower > >> .DEFERRED_INIT to MEMs following expand_builtin_memset and > >> eventually enhance that to allow storing pieces larger than a byte. > >> > >> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a > >> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer > >> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” > >> initialization through “memset” is not always possible. > >> > >> Let me know if I miss anything in the above. Do you have other suggestions? > >> > >> The main point is that you need to avoid building the explicit initializer > >> only to have it consumed by assignment expansion. If you want to keep > >> all the singing and dancing (as opposed to maybe initializing with a > >> 0x1 byte pattern) then I think for efficiency you still want to > >> block-initialize the variable and then only fixup the special fields. > >> > >> Yes, this is a good idea. > >> > >> We can memset the whole structure with repeated pattern “0xAA” first, > >> Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. > >> That might be more efficient. > >> > >> However, after more consideration, I feel that this might be a more > >> general optimization for “store_constructor” itself: > >> > >> I.e, if the “constructor” includes repeated byte value “0xAA” or any > >> other value over a certain threshold, > >> i.e, 70% of the total size, then we might need to use a call to memset > >> first, and then emit some additional single > >> field stores to fix up the fields that have different initialization > >> values? > >> > >> Just like the current handling of “zeroes” in the current > >> “store_constructor”, if “zeroes” occupy most of the constructor, then > >> “Clear the whole structure” first, then emit additional single field > >> stories to fix up other fields that do not hold zeros. > >> > >> So, I think that it might be better to keep the current > >> “expand_assignment” for “Pattern initialization” as it is in this patch. > >> > >> And then, later we can add a separate patch to add this more general > >> optimization in “store_constructor” to improve the run time performance > >> and code size in general? > >> > >> What’s your opinion on this? > > > > My point is that _building_ the constructor is what we want to avoid > > since that involves a lot of overhead memory-wise, it also requires > > yet another complex structure field walk with much room for errors. > > So, you mean I should completely get rid of the new added routine > “build_pattern_cst_for_auto_init”, since it built constructors for RECORD, > UNION, and ARRAY types.
Yes. > And the current RTL expansion of constructor > assignment is not efficient enough for pattern initialization purpose? Well, it is the wrong tool to use. It's efficient for initialization from a constructor but building a constructor and assigning from it is not an efficient way to do pattern init. > > > > Block-initializing the object is so much easier and more efficient. > > Implementing block initialization with a block size different from > > a single byte should be also reasonably possible. I mean there's > > wmemset (not in GCC), so such block initialization would have other > > uses as well. > > If the pattern of the value that is used to initialize is repeatable, then > Block-initializing is ideal. However, Since the patterns of the values that > are used to initialize might not be completely repeatable due to BOOLEAN (0), > POINTER_TYPE at 32-bit platform (0x000000AA) and FLOATING TYPE (NaN), > After block initializing of the whole object, we still need to add additional > fix up > stores of these different patterns to the corresponding fields. But that's a bug with the pattern used then. You can never be sure that an object is used only as its declared type but you are initializing it as if it were. Also all uninit uses invoke undefined behavior so I don't see why you need to pay special attention here. After all this makes pattern init so much more fragile than zero-init which makes me question it even more ... > For some of the objects whose most fields are BOOLEAN, POINTER_TYPE, > rr FLOATING_TYPE, pattern initializing likee this might be less efficient. > Do you > agree on this? Use a pattern that fits them all. I mean memory allocation hardening fills allocated storage with a repeated (byte) pattern and people are happy with that. It also makes it easy to spot uninitialized storage from a debugger. So please, do not over-design this, it really doesn't make any sense and the common case you are inevitably chasing here would already be fine with a random repeated pattern. > > I'm going to repeatedly point at those large chunks of code that > > handle padding and building the CTOR - I don't even want to review > > them ;) They should not exist > > So, just want to confirm -:), do you mean to completely delete the routine > “build_pattern_cst_for_auto_init”? And then use the approach you suggested > below > to replace this part of work? Yes. > > (thus also my suggestion to split out > > padding handling - now we can also split out pattern init handling, > > maybe somebody else feels like reviewing and approving this, who knows). > > I am okay with further splitting pattern initialization part to a separate > patch. Then we will > have 4 independent patches in total: > > 1. -fauto-var-init=zero and all the handling in other passes to the new added > call to .DEFERRED_INIT. > 2. Add -fauto-var-init=pattern > 3. Add -fauto-var-init-padding > 4. Add -ftrivial-auto-var-init for CLANG compatibility. > > Are the above the correct understanding? As said, block-initializing with a repeated pattern is OK and I can see that being useful. Trying to produce "nicer" values for floats, bools and pointers on 32bit platforms is IMHO not going to fix anything and introduce as many problems as it will "fix". And if you block-initialize stuff you then automagically cover padding. I call this a win-win, no? > > > > Now, what you _could_ try is do sth like > > > > tree ar = build_array_type (uint_ptr_type_node, size_type_node, false); > > tree range = build2 (RANGE_EXPR, size_type_node, > > size_zero_node, build_int_cst (size_type_node, > > size-in-words)); > > tree pattern = build_int_cst (uint_ptr_type_node, 0xdeadbeef); > > ctor = build_constructor_single (ar, range, pattern); > > ctor = build1 (VIEW_CONVERT_EXPR, type, ctor); > > > So, the above will be used to replace the following original code in my patch: > > >> + case AUTO_INIT_PATTERN: > >> + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); > >> + expand_assignment (var, init, false); > >> + break; > > ? in my example code (untested) you then still need expand_assignment (var, ctor, false); it would be the easiest way to try pattern init with a pattern that's bigger than a byte (otherwise of course the memset path is optimal). > > > > thus build a range-init CTOR of an array of pointer-sized elements > > but do the actual assignment to the target object by viewing that > > as the target objects type. > > Okay. > > So, for a target object that is smaller than a word, for example, BOOLEAN, > CHAR or short, is the above still working? Well, you obviously have to adjust that - you can't pattern-init a BOOL with word-size stores of a word-size pattern. As said, for example glibc allocator hardening with MALLOC_PERTURB_ uses simple byte-init. > > That should block-initialize with > > a uint_ptr_type_node sized pattern (but likely less efficient than > > special block init would do). > > > > Not sure what problems you will run into this, you'll have to try. > > I will try this. > > Thanks a lot for your suggestions. > > Qing > > > > Richard. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)