On Tue, 5 Oct 2021, Richard Sandiford wrote: > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Mon, 4 Oct 2021, Qing Zhao wrote: > > > >> > >> > >> > On Oct 4, 2021, at 12:19 PM, Richard Biener <rguent...@suse.de> wrote: > >> > > >> > On October 4, 2021 7:00:10 PM GMT+02:00, Qing Zhao > >> > <qing.z...@oracle.com> wrote: > >> >> I have several questions on this fix: > >> >> > >> >> 1. This fix avoided expanding ?.DEFERRED_INIT? when !tree_fits_uhwi_p > >> >> (TYPE_SIZE_UNIT (var_type)). > >> >> As a result, this call to .DEFERRED_INIT will NOT be expanded at all. > >> > > >> > Yes. > >> > >> Then, should we exclude such auto init during gimplification phase? > > > > No, we do want to and can handle such variables just fine. > > > >> > > >> >> Then not expanding .DEFERRED_INIT in RTL expanding phase will trigger > >> >> more issues in later RTL phases, this looks not correct to me. > >> >> (Actually, with is the patch, this testing case still failed in a later > >> >> RTL stage). > >> >> > >> >> So, If we really want to avoid auto-init for VLA vectors, we should > >> >> not add call to .DEFERRED_INIT in gimplification phase at all. > >> > >> > >> >> > >> >> > >> >> 2. For the added .DEFERRED_INIT: > >> >> > >> >> __SVFloat64_t f64; > >> >> > >> >> f64 = .DEFERRED_INIT (POLY_INT_CST [16, 16], 2, 0); > >> >> > >> >> What does ?POLY_INT_CST[16,16]? mean? Is this a constant size? If YES, > >> >> what?s the value of it? If Not, can we use ?memset? to expand it? > >> > > >> > When the target is a register memset doesn't work. I'm not sure the > >> > memset expansion path will work as-is either for aggregates with vla > >> > parts - > >> > >> Stupid question here: what does POLY_INT_CST[16,16] mean? It?s not a > >> constant? > > > > It's 16 * <vector-factor> where the factor is determined by the hardware > > implementation but fixed throughout the programs lifetime. You could > > think of the POLY_INT_CST expanding to a multiplication of 16 by a special > > hardware register. > > > > For vector types the zero-init could be done using build_zero_cst and > > the expand_assignment path. Also the memset path should just work > > as well. > > > > It's the pattern init that's a bit more complicated but I'm sure > > Richard will sort that out. > > > > Note TYPE_SIZE_UNIT will honor tree_fits_poly_uint64_p but for the > > pattern init we'd have to repeat the constant and maybe there's > > a clever way to do this repeating just the single pattern byte. > > > > But as said... > > > >> > but I'll leave that to Richard S. to sort out. > > > > ^^^ > > Yeah, I'm hoping to get to this in stage 3 :-) > > The PR is still open until then and I agree the bypass is a good idea in > the meantime.
Btw, I've just completed testing the following which restores init on aarch64 (when you specify -march=armv8.3-a+sve, otherwise we ICE on SVE register uses) and also restores the init of the VLA case that was lost. The only caveat is that we use zero-init for the VLA vectors even with pattern init - that's something to improve. Also initializing from build_zero_cst might explode later for poly-int sized things I cannot imagine right now ;) Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. Richard. >From bd73fdacf72563ce27edbcdfc0d06d5378339f85 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguent...@suse.de> Date: Tue, 5 Oct 2021 09:28:20 +0200 Subject: [PATCH] More .DEFERRED_INIT expansion rework To: gcc-patches@gcc.gnu.org This avoids looking at the type size and instead uses the size as passed to .DEFERRED_INIT to determine the size of the non-MEM to be initialized. It also arranges for possibly poly-int inits to always use zero-initialization rather than not initializing and when we need to pun puns the LHS instead of the constant value. That correctly initializes the variable-size typed array in the testcase for PR102285 and the SVE vector in PR102587 where for the testcase I needed to add a SVE capable -march as to not ICE later. 2021-10-05 Richard Biener <rguent...@suse.de> PR middle-end/102587 PR middle-end/102285 * internal-fn.c (expand_DEFERRED_INIT): Fall back to zero-initialization as last resort, use the constant size as given by the DEFERRED_INIT argument to build the initializer. * gcc.target/aarch64/sve/pr102587-1.c: Add -march=armv8.3-a+sve. * gcc.target/aarch64/sve/pr102587-2.c: Likewise. --- gcc/internal-fn.c | 27 ++++++++++--------- .../gcc.target/aarch64/sve/pr102587-1.c | 2 +- .../gcc.target/aarch64/sve/pr102587-2.c | 2 +- 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 110145218b9..78db25bbac4 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -3038,19 +3038,18 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt) /* Expand this memset call. */ expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type)); } - /* ??? Deal with poly-int sized registers. */ - else if (tree_fits_uhwi_p (TYPE_SIZE_UNIT (var_type))) + else { - /* If this variable is in a register, use expand_assignment might - generate better code. */ - tree init = build_zero_cst (var_type); - unsigned HOST_WIDE_INT total_bytes - = tree_to_uhwi (TYPE_SIZE_UNIT (var_type)); - - if (init_type == AUTO_INIT_PATTERN) + /* If this variable is in a register use expand_assignment. */ + tree init; + if (tree_fits_uhwi_p (var_size) + && (init_type == AUTO_INIT_PATTERN + || !is_gimple_reg_type (var_type))) { + unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size); unsigned char *buf = (unsigned char *) xmalloc (total_bytes); - memset (buf, INIT_PATTERN_VALUE, total_bytes); + memset (buf, (init_type == AUTO_INIT_PATTERN + ? INIT_PATTERN_VALUE : 0), total_bytes); if (can_native_interpret_type_p (var_type)) init = native_interpret_expr (var_type, buf, total_bytes); else @@ -3058,10 +3057,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt) tree itype = build_nonstandard_integer_type (total_bytes * BITS_PER_UNIT, 1); wide_int w = wi::from_buffer (buf, total_bytes); - init = build1 (VIEW_CONVERT_EXPR, var_type, - wide_int_to_tree (itype, w)); + init = wide_int_to_tree (itype, w); + /* Pun the LHS to make sure its type has constant size. */ + lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); } } + else + /* Use zero-init also for variable-length sizes. */ + init = build_zero_cst (var_type); expand_assignment (lhs, init, false); } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr102587-1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr102587-1.c index 2b9a68b0b59..af2ae59e5d4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pr102587-1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr102587-1.c @@ -1,4 +1,4 @@ /* { dg-do compile } */ -/* { dg-options "-ftrivial-auto-var-init=zero" } */ +/* { dg-options "-march=armv8.3-a+sve -ftrivial-auto-var-init=zero" } */ void foo() { __SVFloat64_t f64; } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr102587-2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr102587-2.c index 4cdb9056002..8c9d9908bac 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pr102587-2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr102587-2.c @@ -1,4 +1,4 @@ /* { dg-do compile } */ -/* { dg-options "-ftrivial-auto-var-init=pattern" } */ +/* { dg-options "-march=armv8.3-a+sve -ftrivial-auto-var-init=pattern" } */ void foo() { __SVFloat64_t f64; } -- 2.31.1