Hi, Richard,

> On Jun 15, 2021, at 8:21 AM, Richard Biener <rguent...@suse.de> wrote:
>> 
>> 
>> +/* Expand the IFN_DEFERRED_INIT function according to its second
>> argument.  */
>> +static void
>> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>> +{
>> +  tree var = gimple_call_lhs (stmt);
>> +  tree init = NULL_TREE;
>> +  enum auto_init_type init_type
>> +    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>> +
>> +  switch (init_type)
>> +    {
>> +    default:
>> +      gcc_unreachable ();
>> +    case AUTO_INIT_PATTERN:
>> +      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
>> +      expand_assignment (var, init, false);
>> +      break;
>> +    case AUTO_INIT_ZERO:
>> +      init = build_zero_cst (TREE_TYPE (var));
>> +      expand_assignment (var, init, false);
>> +      break;
>> +    }
>> 
>> I think actually building build_pattern_cst_for_auto_init can generate
>> massive garbage and for big auto vars code size is also a concern and
>> ideally on x86 you'd produce rep movq.  So I don't think going
>> via expand_assignment is good.  Instead you possibly want to lower
>> .DEFERRED_INIT to MEMs following expand_builtin_memset and
>> eventually enhance that to allow storing pieces larger than a byte.
>> 
>> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
>> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
>> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
>> initialization through “memset” is not always possible.
>> 
>> Let me know if I miss anything in the above. Do you have other suggestions?
>> 
>> The main point is that you need to avoid building the explicit initializer
>> only to have it consumed by assignment expansion.  If you want to keep
>> all the singing and dancing (as opposed to maybe initializing with a
>> 0x1 byte pattern) then I think for efficiency you still want to
>> block-initialize the variable and then only fixup the special fields.
>> 
>> Yes, this is a good idea.
>> 
>> We can memset the whole structure with repeated pattern “0xAA” first,
>> Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
>> That might be more efficient.
>> 
>> However, after more consideration, I feel that this might be a more 
>> general optimization for “store_constructor” itself:
>> 
>> I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
>> value over a certain threshold,
>> i.e, 70% of the total size, then we might need to use a call to memset 
>> first, and then emit some additional single
>> field stores  to fix up the fields that have different initialization values?
>> 
>> Just like the current handling of “zeroes” in the current 
>> “store_constructor”, if “zeroes” occupy most of the constructor, then
>> “Clear the whole structure” first, then emit additional single field stories 
>> to fix up other fields that do not hold zeros.
>> 
>> So, I think that it might be better to keep the current 
>> “expand_assignment” for “Pattern initialization” as it is in this patch.
>> 
>> And then, later we can add a separate patch to add this more general 
>> optimization in “store_constructor” to improve the run time performance 
>> and code size in general?
>> 
>> What’s your opinion on this?
> 
> My point is that _building_ the constructor is what we want to avoid
> since that involves a lot of overhead memory-wise, it also requires
> yet another complex structure field walk with much room for errors.

So, you mean I should completely get rid of the new added routine 
“build_pattern_cst_for_auto_init”, since it built constructors for RECORD,
UNION, and ARRAY types.  And the current RTL expansion of constructor 
assignment is not efficient enough for pattern initialization purpose? 

> 
> Block-initializing the object is so much easier and more efficient.
> Implementing block initialization with a block size different from
> a single byte should be also reasonably possible.  I mean there's
> wmemset (not in GCC), so such block initialization would have other
> uses as well.

If the pattern of the value that is used to initialize is repeatable, then 
Block-initializing is ideal. However, Since the patterns of the values that
are used to initialize might not be completely repeatable due to BOOLEAN (0),
POINTER_TYPE at 32-bit platform (0x000000AA) and FLOATING TYPE (NaN), 
After block initializing of the whole object, we still need to add additional 
fix up 
stores of these different patterns to the corresponding fields. 

For some of the objects whose most fields are BOOLEAN, POINTER_TYPE, 
rr FLOATING_TYPE, pattern  initializing likee this might be less efficient. Do 
you 
agree on this?


> 
> I'm going to repeatedly point at those large chunks of code that
> handle padding and building the CTOR - I don't even want to review
> them ;)  They should not exist

So, just want to confirm -:),  do you mean to completely delete the routine 
“build_pattern_cst_for_auto_init”? And then use the approach you suggested below
to replace this part of work?

> (thus also my suggestion to split out
> padding handling - now we can also split out pattern init handling,
> maybe somebody else feels like reviewing and approving this, who knows).

I am okay with further splitting pattern initialization part to a separate 
patch. Then we will
have 4 independent patches in total:

1. -fauto-var-init=zero and all the handling in other passes to the new added 
call to .DEFERRED_INIT.
2. Add -fauto-var-init=pattern 
3. Add -fauto-var-init-padding
4. Add -ftrivial-auto-var-init for CLANG compatibility. 

Are the above the correct understanding?

> 
> Now, what you _could_ try is do sth like
> 
>  tree ar = build_array_type (uint_ptr_type_node, size_type_node, false);
>  tree range = build2 (RANGE_EXPR, size_type_node,
>                       size_zero_node, build_int_cst (size_type_node, 
> size-in-words));
>  tree pattern = build_int_cst (uint_ptr_type_node, 0xdeadbeef);
>  ctor = build_constructor_single (ar, range, pattern);
>  ctor = build1 (VIEW_CONVERT_EXPR, type, ctor);


So, the above will be used to replace the following original code in my patch:

>> +    case AUTO_INIT_PATTERN:
>> +      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
>> +      expand_assignment (var, init, false);
>> +      break;

?

> 
> thus build a range-init CTOR of an array of pointer-sized elements
> but do the actual assignment to the target object by viewing that
> as the target objects type.  

Okay.

So, for a target object that is smaller than a word, for example, BOOLEAN, 
CHAR or short, is the above still working?

> That should block-initialize with
> a uint_ptr_type_node sized pattern (but likely less efficient than
> special block init would do).
> 
> Not sure what problems you will run into this, you'll have to try.

I will try this.

Thanks a lot for your suggestions.

Qing
> 
> Richard.

Reply via email to