Hi, Richard:

On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches 
<gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:


On May 26, 2021, at 6:18 AM, Richard Biener 
<rguent...@suse.de<mailto:rguent...@suse.de>> wrote:

+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+    {
+    default:
+      gcc_unreachable ();
+    case AUTO_INIT_PATTERN:
+      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+      expand_assignment (var, init, false);
+      break;
+    case AUTO_INIT_ZERO:
+      init = build_zero_cst (TREE_TYPE (var));
+      expand_assignment (var, init, false);
+      break;
+    }

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, after more consideration, I feel that this might be a more general 
optimization for “store_constructor” itself:

I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
value over a certain threshold,
i.e, 70% of the total size, then we might need to use a call to memset first, 
and then emit some additional single
field stores  to fix up the fields that have different initialization values?

Just like the current handling of “zeroes” in the current “store_constructor”, 
if “zeroes” occupy most of the constructor, then
“Clear the whole structure” first, then emit additional single field stories to 
fix up other fields that do not hold zeros.

So, I think that it might be better to keep the current “expand_assignment”  
for “Pattern initialization” as it is in this patch.

And then, later we can  add a separate patch to add this more general 
optimization in “store_constructor” to improve the
run time performance and code size in general?

What’s your opinion on this?

Qing

Reply via email to