On Mon, 14 Jun 2021, Qing Zhao wrote: > Hi, Richard: > > On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > > On May 26, 2021, at 6:18 AM, Richard Biener > <rguent...@suse.de<mailto:rguent...@suse.de>> wrote: > > +/* Expand the IFN_DEFERRED_INIT function according to its second > argument. */ > +static void > +expand_DEFERRED_INIT (internal_fn, gcall *stmt) > +{ > + tree var = gimple_call_lhs (stmt); > + tree init = NULL_TREE; > + enum auto_init_type init_type > + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); > + > + switch (init_type) > + { > + default: > + gcc_unreachable (); > + case AUTO_INIT_PATTERN: > + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); > + expand_assignment (var, init, false); > + break; > + case AUTO_INIT_ZERO: > + init = build_zero_cst (TREE_TYPE (var)); > + expand_assignment (var, init, false); > + break; > + } > > I think actually building build_pattern_cst_for_auto_init can generate > massive garbage and for big auto vars code size is also a concern and > ideally on x86 you'd produce rep movq. So I don't think going > via expand_assignment is good. Instead you possibly want to lower > .DEFERRED_INIT to MEMs following expand_builtin_memset and > eventually enhance that to allow storing pieces larger than a byte. > > Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a > repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer > types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” > initialization through “memset” is not always possible. > > Let me know if I miss anything in the above. Do you have other suggestions? > > The main point is that you need to avoid building the explicit initializer > only to have it consumed by assignment expansion. If you want to keep > all the singing and dancing (as opposed to maybe initializing with a > 0x1 byte pattern) then I think for efficiency you still want to > block-initialize the variable and then only fixup the special fields. > > Yes, this is a good idea. > > We can memset the whole structure with repeated pattern “0xAA” first, > Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. > That might be more efficient. > > However, after more consideration, I feel that this might be a more > general optimization for “store_constructor” itself: > > I.e, if the “constructor” includes repeated byte value “0xAA” or any other > value over a certain threshold, > i.e, 70% of the total size, then we might need to use a call to memset first, > and then emit some additional single > field stores to fix up the fields that have different initialization values? > > Just like the current handling of “zeroes” in the current > “store_constructor”, if “zeroes” occupy most of the constructor, then > “Clear the whole structure” first, then emit additional single field stories > to fix up other fields that do not hold zeros. > > So, I think that it might be better to keep the current > “expand_assignment” for “Pattern initialization” as it is in this patch. > > And then, later we can add a separate patch to add this more general > optimization in “store_constructor” to improve the run time performance > and code size in general? > > What’s your opinion on this?
My point is that _building_ the constructor is what we want to avoid since that involves a lot of overhead memory-wise, it also requires yet another complex structure field walk with much room for errors. Block-initializing the object is so much easier and more efficient. Implementing block initialization with a block size different from a single byte should be also reasonably possible. I mean there's wmemset (not in GCC), so such block initialization would have other uses as well. I'm going to repeatedly point at those large chunks of code that handle padding and building the CTOR - I don't even want to review them ;) They should not exist (thus also my suggestion to split out padding handling - now we can also split out pattern init handling, maybe somebody else feels like reviewing and approving this, who knows). Now, what you _could_ try is do sth like tree ar = build_array_type (uint_ptr_type_node, size_type_node, false); tree range = build2 (RANGE_EXPR, size_type_node, size_zero_node, build_int_cst (size_type_node, size-in-words)); tree pattern = build_int_cst (uint_ptr_type_node, 0xdeadbeef); ctor = build_constructor_single (ar, range, pattern); ctor = build1 (VIEW_CONVERT_EXPR, type, ctor); thus build a range-init CTOR of an array of pointer-sized elements but do the actual assignment to the target object by viewing that as the target objects type. That should block-initialize with a uint_ptr_type_node sized pattern (but likely less efficient than special block init would do). Not sure what problems you will run into this, you'll have to try. Richard.