Hi, Richard, On Jun 16, 2021, at 1:19 AM, Richard Biener <rguent...@suse.de<mailto:rguent...@suse.de>> wrote:
+/* Expand the IFN_DEFERRED_INIT function according to its second argument. */ +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree var = gimple_call_lhs (stmt); + tree init = NULL_TREE; + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + + switch (init_type) + { + default: + gcc_unreachable (); + case AUTO_INIT_PATTERN: + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; + case AUTO_INIT_ZERO: + init = build_zero_cst (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; + } I think actually building build_pattern_cst_for_auto_init can generate massive garbage and for big auto vars code size is also a concern and ideally on x86 you'd produce rep movq. So I don't think going via expand_assignment is good. Instead you possibly want to lower .DEFERRED_INIT to MEMs following expand_builtin_memset and eventually enhance that to allow storing pieces larger than a byte. Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” initialization through “memset” is not always possible. Let me know if I miss anything in the above. Do you have other suggestions? The main point is that you need to avoid building the explicit initializer only to have it consumed by assignment expansion. If you want to keep all the singing and dancing (as opposed to maybe initializing with a 0x1 byte pattern) then I think for efficiency you still want to block-initialize the variable and then only fixup the special fields. Yes, this is a good idea. We can memset the whole structure with repeated pattern “0xAA” first, Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. That might be more efficient. However, after more consideration, I feel that this might be a more general optimization for “store_constructor” itself: I.e, if the “constructor” includes repeated byte value “0xAA” or any other value over a certain threshold, i.e, 70% of the total size, then we might need to use a call to memset first, and then emit some additional single field stores to fix up the fields that have different initialization values? Just like the current handling of “zeroes” in the current “store_constructor”, if “zeroes” occupy most of the constructor, then “Clear the whole structure” first, then emit additional single field stories to fix up other fields that do not hold zeros. So, I think that it might be better to keep the current “expand_assignment” for “Pattern initialization” as it is in this patch. And then, later we can add a separate patch to add this more general optimization in “store_constructor” to improve the run time performance and code size in general? What’s your opinion on this? My point is that _building_ the constructor is what we want to avoid since that involves a lot of overhead memory-wise, it also requires yet another complex structure field walk with much room for errors. Block-initializing the object is so much easier and more efficient. Implementing block initialization with a block size different from a single byte should be also reasonably possible. I mean there's wmemset (not in GCC), so such block initialization would have other uses as well. If the pattern of the value that is used to initialize is repeatable, then Block-initializing is ideal. However, Since the patterns of the values that are used to initialize might not be completely repeatable due to BOOLEAN (0), POINTER_TYPE at 32-bit platform (0x000000AA) and FLOATING TYPE (NaN), After block initializing of the whole object, we still need to add additional fix up stores of these different patterns to the corresponding fields. But that's a bug with the pattern used then. You can never be sure that an object is used only as its declared type but you are initializing it as if it were. Also all uninit uses invoke undefined behavior so I don't see why you need to pay special attention here. After all this makes pattern init so much more fragile than zero-init which makes me question it even more ... Yes, you are right. The major reason for the complexity of the code to handle pattern initialization is because multiple different patterns are assigned to different types. This is for the compatibility with CLANG. -:). (https://reviews.llvm.org/D54604) For reference, I copied the part for pattern initialization from CLANG’s patch below: 1. Pattern initialization This is the recommended initialization approach. Pattern initialization's goal is to initialize automatic variables with values which will likely transform logic bugs into crashes down the line, are easily recognizable in a crash dump, without being values which programmers can rely on for useful program semantics. At the same time, pattern initialization tries to generate code which will optimize well. You'll find the following details in `patternFor`: - Integers are initialized with repeated 0xAA bytes (infinite scream). - Vectors of integers are also initialized with infinite scream. - Pointers are initialized with infinite scream on 64-bit platforms because it's an unmappable pointer value on architectures I'm aware of. Pointers are initialize to 0x000000AA (small scream) on 32-bit platforms because 32-bit platforms don't consistently offer unmappable pages. When they do it's usually the zero page. As people try this out, I expect that we'll want to allow different platforms to customize this, let's do so later. - Vectors of pointers are initialized the same way pointers are. - Floating point values and vectors are initialized with a negative quiet NaN with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff). NaNs are nice (here, anways) because they propagate on arithmetic, making it more likely that entire computations become NaN when a single uninitialized value sneaks in. - Arrays are initialized to their homogeneous elements' initialization value, repeated. Stack-based Variable-Length Arrays (VLAs) are runtime-initialized to the allocated size (no effort is made for negative size, but zero-sized VLAs are untouched even if technically undefined). - Structs are initialized to their heterogeneous element's initialization values. Zero-size structs are initialized as 0xAA since they're allocated a single byte. - Unions are initialized using the initialization for the largest member of the union. Expect the values used for pattern initialization to change over time, as we refine heuristics (both for performance and security). The goal is truly to avoid injecting semantics into undefined behavior, and we should be comfortable changing these values when there's a worthwhile point in doing so. Why so much infinite scream? Repeated byte patterns tend to be easy to synthesize on most architectures, and otherwise memset is usually very efficient. For values which aren't entirely repeated byte patterns, LLVM will often generate code which does memset + a few stores. For some of the objects whose most fields are BOOLEAN, POINTER_TYPE, rr FLOATING_TYPE, pattern initializing likee this might be less efficient. Do you agree on this? Use a pattern that fits them all. I mean memory allocation hardening fills allocated storage with a repeated (byte) pattern and people are happy with that. It also makes it easy to spot uninitialized storage from a debugger. So please, do not over-design this, it really doesn't make any sense and the common case you are inevitably chasing here would already be fine with a random repeated pattern. So, My question is: If we want to pattern initialize with the single repeated pattern for all types, with one is better to use: “0xAAAAAAAA” or “0xFFFFFFFF” , or other pattern that our current glibc used? What’s that pattern? Will “0xAAAAAAAA” in a floating type auto variable crash the program? Will “0xFFFFFFFF” in a pointer type auto variable crash the program? (Might crash?) (thus also my suggestion to split out padding handling - now we can also split out pattern init handling, maybe somebody else feels like reviewing and approving this, who knows). I am okay with further splitting pattern initialization part to a separate patch. Then we will have 4 independent patches in total: 1. -fauto-var-init=zero and all the handling in other passes to the new added call to .DEFERRED_INIT. 2. Add -fauto-var-init=pattern 3. Add -fauto-var-init-padding 4. Add -ftrivial-auto-var-init for CLANG compatibility. Are the above the correct understanding? As said, block-initializing with a repeated pattern is OK and I can see that being useful. Trying to produce "nicer" values for floats, bools and pointers on 32bit platforms is IMHO not going to fix anything and introduce as many problems as it will "fix". Yes, I agree, if we can find a good repeated pattern for all types’s pattern initialization, that will be much easier and simpler to implement, I am happy to do that. (Honestly, the part of implementation that took me most of the time is pattern-initialization.. and I am still not very comfortable with this part Of the code myself. -:) And if you block-initialize stuff you then automagically cover padding. I call this a win-win, no? Yes, this will also initialize paddings with patterns (Not zeroes as CLANG did). Shall we compatible with CLANG on this? Now, what you _could_ try is do sth like tree ar = build_array_type (uint_ptr_type_node, size_type_node, false); tree range = build2 (RANGE_EXPR, size_type_node, size_zero_node, build_int_cst (size_type_node, size-in-words)); tree pattern = build_int_cst (uint_ptr_type_node, 0xdeadbeef); ctor = build_constructor_single (ar, range, pattern); ctor = build1 (VIEW_CONVERT_EXPR, type, ctor); So, the above will be used to replace the following original code in my patch: + case AUTO_INIT_PATTERN: + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; ? in my example code (untested) you then still need expand_assignment (var, ctor, false); it would be the easiest way to try pattern init with a pattern that's bigger than a byte (otherwise of course the memset path is optimal). If the pattern that is used to initialize all types is byte-repeatable, for example, 0xA or 0xF, then We can use memset to initialize all types, however, the potential problem is, if later we decide To change to another pattern that might not be byte-repeatable, then the memset implementation is not proper at that time. Is it possible that we might change the pattern later? thus build a range-init CTOR of an array of pointer-sized elements but do the actual assignment to the target object by viewing that as the target objects type. Okay. So, for a target object that is smaller than a word, for example, BOOLEAN, CHAR or short, is the above still working? Well, you obviously have to adjust that - you can't pattern-init a BOOL with word-size stores of a word-size pattern. As said, for example glibc allocator hardening with MALLOC_PERTURB_ uses simple byte-init. What’s the pattern glibc used? Thanks a lot. Qing That should block-initialize with a uint_ptr_type_node sized pattern (but likely less efficient than special block init would do). Not sure what problems you will run into this, you'll have to try. I will try this. Thanks a lot for your suggestions. Qing Richard. -- Richard Biener <rguent...@suse.de<mailto:rguent...@suse.de>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)