xiezhiheng <xiezhih...@huawei.com> writes: >> -----Original Message----- >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: Thursday, July 2, 2020 10:46 PM >> To: xiezhiheng <xiezhih...@huawei.com> >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions >> emitted at -O3 >> >> On Thu, Jul 2, 2020 at 3:22 PM xiezhiheng <xiezhih...@huawei.com> wrote: >> > >> > Hi, >> > >> > This is a fix for pr94442. >> > I modify get_inner_reference to handle the case for MEM[ptr, off]. >> > I extract the "off" and add it to the recorded offset, then I build a >> > MEM[ptr, 0] and return it later. >> > >> > diff --git a/gcc/expr.c b/gcc/expr.c >> > index 3c68b0d754c..8cc18449a0c 100644 >> > --- a/gcc/expr.c >> > +++ b/gcc/expr.c >> > @@ -7362,7 +7362,8 @@ tree >> > get_inner_reference (tree exp, poly_int64_pod *pbitsize, >> > poly_int64_pod *pbitpos, tree *poffset, >> > machine_mode *pmode, int *punsignedp, >> > - int *preversep, int *pvolatilep) >> > + int *preversep, int *pvolatilep, >> > + bool include_memref_p) >> > { >> > tree size_tree = 0; >> > machine_mode mode = VOIDmode; >> > @@ -7509,6 +7510,21 @@ get_inner_reference (tree exp, poly_int64_pod >> *pbitsize, >> > } >> > exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0); >> > } >> > + else if (include_memref_p >> > + && TREE_CODE (TREE_OPERAND (exp, 0)) == >> SSA_NAME) >> > + { >> > + tree off = TREE_OPERAND (exp, 1); >> > + if (!integer_zerop (off)) >> > + { >> > + poly_offset_int boff = mem_ref_offset (exp); >> > + boff <<= LOG2_BITS_PER_UNIT; >> > + bit_offset += boff; >> > + >> > + exp = build2 (MEM_REF, TREE_TYPE (exp), >> > + TREE_OPERAND (exp, 0), >> > + build_int_cst (TREE_TYPE (off), 0)); >> > + } >> > + } >> > goto done; >> > >> > default: >> > @@ -10786,7 +10802,7 @@ expand_expr_real_1 (tree exp, rtx target, >> machine_mode tmode, >> > int reversep, volatilep = 0, must_force_mem; >> > tree tem >> > = get_inner_reference (exp, &bitsize, &bitpos, &offset, >> &mode1, >> > - &unsignedp, &reversep, &volatilep); >> > + &unsignedp, &reversep, &volatilep, >> true); >> > rtx orig_op0, memloc; >> > bool clear_mem_expr = false; >> > >> > diff --git a/gcc/tree.h b/gcc/tree.h >> > index a74872f5f3e..7df0d15f7f9 100644 >> > --- a/gcc/tree.h >> > +++ b/gcc/tree.h >> > @@ -6139,7 +6139,8 @@ extern bool complete_ctor_at_level_p >> (const_tree, HOST_WIDE_INT, const_tree); >> > look for the ultimate containing object, which is returned and specify >> > the access position and size. */ >> > extern tree get_inner_reference (tree, poly_int64_pod *, poly_int64_pod >> *, >> > - tree *, machine_mode *, int *, int *, >> int *); >> > + tree *, machine_mode *, int *, int *, >> int *, >> > + bool = false); >> > >> > extern tree build_personality_function (const char *); >> > >> > >> > I add an argument "include_memref_p" to control whether to go into >> MEM_REF, >> > because without it will cause the test case "Warray-bounds-46.c" to fail in >> regression. >> > >> > It because function set_base_and_offset in gimple-ssa-warn-restrict.c >> > base = get_inner_reference (expr, &bitsize, &bitpos, &var_off, >> > &mode, &sign, &reverse, &vol); >> > ... >> > ... >> > if (TREE_CODE (base) == MEM_REF) >> > { >> > tree memrefoff = fold_convert (ptrdiff_type_node, TREE_OPERAND >> (base, 1)); >> > extend_offset_range (memrefoff); >> > base = TREE_OPERAND (base, 0); >> > >> > if (refoff != HOST_WIDE_INT_MIN >> > && TREE_CODE (expr) == COMPONENT_REF) >> > { >> > /* Bump up the offset of the referenced subobject to reflect >> > the offset to the enclosing object. For example, so that >> > in >> > struct S { char a, b[3]; } s[2]; >> > strcpy (s[1].b, "1234"); >> > REFOFF is set to s[1].b - (char*)s. */ >> > offset_int off = tree_to_shwi (memrefoff); >> > refoff += off; >> > } >> > >> > if (!integer_zerop (memrefoff)) <================= >> > /* A non-zero offset into an array of struct with flexible array >> > members implies that the array is empty because there is no >> > way to initialize such a member when it belongs to an array. >> > This must be some sort of a bug. */ >> > refsize = 0; >> > } >> > >> > needs MEM_REF offset to judge whether refsize should be set to zero. >> > But I fold the offset into bitpos and the offset will always be zero. >> > >> > Suggestion? >> >> The thing you want to fix is not get_inner_reference but the aarch64 backend >> to not make __builtin_aarch64_sqaddv16qi clobber global memory. That >> way >> CSE can happen on GIMPLE which can handle the difference in the IL just >> fine. >> >> Richard. > > Yes, __builtin_aarch64_sqaddv16qi is not set any attributes to describe that > it would not clobber global memory. But I find it strange that when building > SIMD built-in FUNCTION_DECLs they are not set any attributes in the backend. > > void > aarch64_init_simd_builtins (void) > { > ... > ftype = build_function_type (return_type, args); > > gcc_assert (ftype != NULL); > > if (print_type_signature_p) > snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s_%s", > d->name, type_signature); > else > snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s", > d->name); > > fndecl = aarch64_general_add_builtin (namebuf, ftype, fcode); > aarch64_builtin_decls[fcode] = fndecl; > ... > } > static tree > aarch64_general_add_builtin (const char *name, tree type, unsigned int code) > { > code = (code << AARCH64_BUILTIN_SHIFT) | AARCH64_BUILTIN_GENERAL; > return add_builtin_function (name, type, code, BUILT_IN_MD, > NULL, NULL_TREE); > } > > The loop in aarch64_init_simd_builtins creates FUNCTION_DECL node for each > build-in function and put the node in array. But it does not set any > attributes. > And I did not find interface for each build-in function to control the > attributes. > > Did I miss anything?
No, this is unfortunately a known bug. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964 (Although the PR is recent, it's been a known bug for longer.) As you say, the difficulty is that the correct attributes depend on what the built-in function does. Most integer arithmetic is “const”, but things get more complicated for floating-point arithmetic. The SVE intrinsics use a three stage process: - each function is classified into one of several groups - each group has a set of flags that describe what functions in the group can do - these flags get converted into attributes based on the current command-line options I guess we should have something similar for the arm_neon.h built-ins. If you're willing to help fix this, that'd be great. I think a first step would be to agree a design. Thanks, Richard