Hi, Richard: On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
On May 26, 2021, at 6:18 AM, Richard Biener <rguent...@suse.de<mailto:rguent...@suse.de>> wrote: +/* Expand the IFN_DEFERRED_INIT function according to its second argument. */ +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree var = gimple_call_lhs (stmt); + tree init = NULL_TREE; + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + + switch (init_type) + { + default: + gcc_unreachable (); + case AUTO_INIT_PATTERN: + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; + case AUTO_INIT_ZERO: + init = build_zero_cst (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; + } I think actually building build_pattern_cst_for_auto_init can generate massive garbage and for big auto vars code size is also a concern and ideally on x86 you'd produce rep movq. So I don't think going via expand_assignment is good. Instead you possibly want to lower .DEFERRED_INIT to MEMs following expand_builtin_memset and eventually enhance that to allow storing pieces larger than a byte. Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” initialization through “memset” is not always possible. Let me know if I miss anything in the above. Do you have other suggestions? The main point is that you need to avoid building the explicit initializer only to have it consumed by assignment expansion. If you want to keep all the singing and dancing (as opposed to maybe initializing with a 0x1 byte pattern) then I think for efficiency you still want to block-initialize the variable and then only fixup the special fields. Yes, this is a good idea. We can memset the whole structure with repeated pattern “0xAA” first, Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. That might be more efficient. However, after more consideration, I feel that this might be a more general optimization for “store_constructor” itself: I.e, if the “constructor” includes repeated byte value “0xAA” or any other value over a certain threshold, i.e, 70% of the total size, then we might need to use a call to memset first, and then emit some additional single field stores to fix up the fields that have different initialization values? Just like the current handling of “zeroes” in the current “store_constructor”, if “zeroes” occupy most of the constructor, then “Clear the whole structure” first, then emit additional single field stories to fix up other fields that do not hold zeros. So, I think that it might be better to keep the current “expand_assignment” for “Pattern initialization” as it is in this patch. And then, later we can add a separate patch to add this more general optimization in “store_constructor” to improve the run time performance and code size in general? What’s your opinion on this? Qing