RE: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in match.pd
Hi Richard, Thanks. Yep, you've correctly the diagnosed that the motivation for the get_builtin_precision helper function was that the TREE_TYPE of the argument is affected by argument promotion. Your suggestion to instead use the TREE_TYPE of the function result is a much nicer solution. I also agree that that all of these bswap optimizations make the assumption that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which can be checked explicitly. Both of these improvements are implemented in the attached revised patch, which has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-07-08 Roger Sayle Richard Biener gcc/ChangeLog PR tree-optimization/40210 * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. gcc/testsuite/ChangeLog PR tree-optimization/40210 * gcc.dg/builtin-bswap-13.c: New test. * gcc.dg/builtin-bswap-14.c: New test. Roger -- -Original Message- From: Richard Biener Sent: 07 July 2021 08:56 To: Roger Sayle Cc: GCC Patches Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in match.pd On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle wrote: > > > All of the optimizations/transformations mentioned in bugzilla for PR > tree-optimization/40210 are already implemented in mainline GCC, with > one exception. In comment #5, there's a suggestion that > (bswap64(x)>>56)&0xff can be implemented without the bswap as > (unsigned char)x, or equivalently x&0xff. > > This patch implements the above optimization, and closely related > variants. For any single bit, (bswap(X)>>C1)&1 can be simplified to > (X>>C2)&1, where bit position C2 is the appropriate permutation of C1. > Similarly, the bswap can eliminated if the desired set of bits all lie > within the same byte, hence (bswap(x)>>8)&255 can always be optimized, > as can (bswap(x)>>8)&123. > > Previously, > > int foo(long long x) { > return (__builtin_bswap64(x) >> 56) & 0xff; } > > compiled with -O2 to > foo:movq%rdi, %rax > bswap %rax > shrq$56, %rax > ret > > with this patch, it now compiles to > foo:movzbl %dil, %eax > ret > > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. > > Ok for mainline? I don't like get_builtin_precision too much, did you consider simply using + (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1)) + INTEGER_CST@2)) and TYPE_PRECISION (TREE_TYPE (@3))? I think while we'll see argument promotion and thus cannot use @1 to derive the type the return value will be the original type. Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you also use char_type_node. I wonder whether + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & + C2. */ (simplify (bit_and (convert1? (rshift@0 (convert2? (bswap + @1)) INTEGER_CST@2)) + INTEGER_CST@3) and + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2. */ + (simplify (rshift (convert? (bswap @0)) INTEGER_CST@1) can build upon each other, for example by extending the latter to handle more cases, transforming to ((T)x >> C2) & C3? That might of course be only profitable when the bswap goes away. Thanks, Richard. > > > 2021-07-06 Roger Sayle > > gcc/ChangeLog > PR tree-optimization/40210 > * builtins.c (get_builtin_precision): Helper function to determine > the precision in bits of a built-in function. > * builtins.h (get_builtin_precision): Prototype here. > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as > (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 > when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. > > gcc/testsuite/ChangeLog > PR tree-optimization/40210 > * gcc.dg/builtin-bswap-13.c: New test. > * gcc.dg/builtin-bswap-14.c: New test. > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK > diff --git a/gcc/match.pd b/gcc/match.pd index 39fb57e..a134485 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3610,7 +3610,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (complex (convert:itype @0) (negate (convert:itype @1) /* BSWAP simplifications, transforms checked by gcc.dg/builtin-bswap-8.c. */ -(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32 BUILT_IN_BSWAP64) +(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32 + BUILT_IN_BSWAP64 BUILT_IN_BSWAP128) (simplify (bswap (bswap @0)) @0) @@ -3620,7 +3621,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (for bitop (bit_xor bit_ior bit_and) (simplify (bswap (bitop:c (bswap
Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)
On Thu, Jul 8, 2021 at 5:12 AM Martin Sebor wrote: > > On 7/7/21 7:48 PM, Marek Polacek wrote: > > On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches > > wrote: > >> On 7/7/21 1:38 AM, Richard Biener wrote: > >>> On Tue, Jul 6, 2021 at 5:47 PM Martin Sebor via Gcc-patches > >>> wrote: > > Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573349.html > >>> > >>> + if (TREE_CODE (axstype) != UNION_TYPE) > >>> > >>> what about QUAL_UNION_TYPE? (why constrain union type accesses > >>> here - note you don't seem to constrain accesses of union members here) > >> > >> I didn't know a QUAL_UNION_TYPE was a thing. Removing the test > >> doesn't seem to cause any regressions so let me do that in a followup. > >> > >>> > >>> +if (tree access_size = TYPE_SIZE_UNIT (axstype)) > >>> > >>> + /* The byte size of the array has already been determined above > >>> + based on a pointer ARG. Set ELTSIZE to the size of the type > >>> + it points to and REFTYPE to the array with the size, rounded > >>> + down as necessary. */ > >>> + if (POINTER_TYPE_P (reftype)) > >>> +reftype = TREE_TYPE (reftype); > >>> + if (TREE_CODE (reftype) == ARRAY_TYPE) > >>> +reftype = TREE_TYPE (reftype); > >>> + if (tree refsize = TYPE_SIZE_UNIT (reftype)) > >>> +if (TREE_CODE (refsize) == INTEGER_CST) > >>> + eltsize = wi::to_offset (refsize); > >>> > >>> probably pre-existing but the pointer indirection is definitely confusing > >>> me again and again given the variable is named 'reftype' - obviously > >>> an access to a pointer does not have any element size. Possibly the > >>> paths arriving here ensure somehow that the only case is when > >>> reftype is not the access type but a pointer to the accessed memory. > >>> "jump-threading" the source might help me avoiding to trip over this > >>> again and again ... > >> > >> I agree (it is confusing). There's more to simplify here. It's on > >> my to do list so let me see about this piece of code then. > >> > >>> > >>> The patch removes a lot of odd code, I like that. You know this code best > >>> and it's hard to spot errors. > >>> > >>> So OK, you'll deal with the fallout. > >> > >> I certainly will. Pushed in r12-2132. > > > > I think this patch breaks bootstrap on x86_64: > > > > In member function ‘availability > > varpool_node::get_availability(symtab_node*)’, > > inlined from ‘availability > > symtab_node::get_availability(symtab_node*)’ at > > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3360:63, > > inlined from ‘availability > > symtab_node::get_availability(symtab_node*)’ at > > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3355:1, > > inlined from ‘symtab_node* > > symtab_node::ultimate_alias_target(availability*, symtab_node*)’ at > > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3199:35, > > inlined from ‘symtab_node* > > symtab_node::ultimate_alias_target(availability*, symtab_node*)’ at > > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3193:1, > > inlined from ‘varpool_node* > > varpool_node::ultimate_alias_target(availability*, symtab_node*)’ at > > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3234:5, > > inlined from ‘availability > > varpool_node::_ZN12varpool_node16get_availabilityEP11symtab_node.part.0(symtab_node*)’ > > at /opt/notnfs/polacek/gcc/gcc/varpool.c:501:29: > > /opt/notnfs/polacek/gcc/gcc/varpool.c:490:19: error: array subscript > > ‘varpool_node[0]’ is partly outside array bounds of ‘varpool_node [0]’ > > [-Werror=array-bounds] > >490 | if (!definition && !in_other_partition) > >| ^~ > > In file included from /opt/notnfs/polacek/gcc/gcc/varpool.c:29: > > /opt/notnfs/polacek/gcc/gcc/cgraph.h: In member function ‘availability > > varpool_node::_ZN12varpool_node16get_availabilityEP11symtab_node.part.0(symtab_node*)’: > > /opt/notnfs/polacek/gcc/gcc/cgraph.h:1969:39: note: object > > ‘varpool_node::’ of size 120 > > 1969 | struct GTY((tag ("SYMTAB_VARIABLE"))) varpool_node : public > > symtab_node > >| ^~~~ > > cc1plus: all warnings being treated as errors > > I bootstrapped & regtested it on top of r12-2131 just before pushing > it but let me try with the top of trunk (r12-2135 as of now). > > [a bit later] > > The bootstrap succeeded with the same configuration settings: > >--enable-languages=ada,c,c++,d,fortran,jit,lto,objc,obj-c++ > --enable-checking=yes --enable-host-shared --enable-valgrind-annotations > > But with --enable-checking=release I was able to reproduce the error > above. Since there is a simple way to bootstrap I'm not going to > revert the patch tonight. I'll look into the problem tomorrow and > see if it can be easily fixed. If not, I'll revert it then. plain ./configure triggers the failure already, I guess your --enable-host-shared hides it. Richard. > > Martin
[x86_64 PATCH]: Improvement to signed division of integer constant.
This patch tweaks the way GCC handles 32-bit integer division on x86_64, when the numerator is constant. Currently the function int foo (int x) { return 100/x; } generates the code: foo:movl$100, %eax cltd idivl %edi ret where the sign-extension instruction "cltd" creates a long dependency chain, as it depends on the "mov" before it, and is depended upon by "idivl" after it. With this patch, GCC now matches both icc and LLVM and uses an xor instead, generating: foo:xorl%edx, %edx movl$100, %eax idivl %edi ret Microbenchmarking confirms that this is faster on Intel processors (Kaby lake), and no worse on AMD processors (Zen2), which agrees with intuition, but oddly disagrees with the llvm-mca cycle count prediction on godbolt.org. The tricky bit is that this sign-extension instruction is only produced by late (postreload) splitting, and unfortunately none of the subsequent passes (e.g. cprop_hardreg) is able to propagate and simplify its constant argument. The solution here is to introduce a define_insn_and_split that allows the constant numerator operand to be captured (by combine) and then split into an optimal form after reload. The above microbenchmarking also shows that eliminating the sign extension of negative values (using movl $-1,%edx) is also a performance improvement, as performed by icc but not by LLVM. Both the xor and movl sign-extensions are larger than cltd, so this transformation is prevented for -Os. This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-07-08 Roger Sayle gcc/ChangeLog * config/i386/i386.md (*divmodsi4_const): Optimize SImode divmod of a constant numerator with new define_insn_and_split. gcc/testsuite/ChangeLog * gcc.target/i386/divmod-9.c: New test case. Roger -- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 700c158..908ae33 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -8657,6 +8657,33 @@ [(set_attr "type" "idiv") (set_attr "mode" "SI")]) +;; Avoid sign-extension (using cdq) for constant numerators. +(define_insn_and_split "*divmodsi4_const" + [(set (match_operand:SI 0 "register_operand" "=&a") +(div:SI (match_operand:SI 2 "const_int_operand" "n") + (match_operand:SI 3 "nonimmediate_operand" "rm"))) + (set (match_operand:SI 1 "register_operand" "=&d") +(mod:SI (match_dup 2) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))] + "!optimize_function_for_size_p (cfun)" + "#" + "reload_completed" + [(parallel [(set (match_dup 0) + (div:SI (match_dup 0) (match_dup 3))) + (set (match_dup 1) + (mod:SI (match_dup 0) (match_dup 3))) + (use (match_dup 1)) + (clobber (reg:CC FLAGS_REG))])] +{ + emit_move_insn (operands[0], operands[2]); + if (INTVAL (operands[2]) < 0) +emit_move_insn (operands[1], constm1_rtx); + else +ix86_expand_clear (operands[1]); +} + [(set_attr "type" "multi") + (set_attr "mode" "SI")]) + (define_expand "divmodqi4" [(parallel [(set (match_operand:QI 0 "register_operand") (div:QI /* { dg-do compile } */ /* { dg-options "-O2" } */ int foo (int x) { return 100/x; } int bar(int x) { return -100/x; } /* { dg-final { scan-assembler-not "(cltd|cdq)" } } */
Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in match.pd
On Thu, Jul 8, 2021 at 9:37 AM Roger Sayle wrote: > > > Hi Richard, > Thanks. Yep, you've correctly the diagnosed that the motivation for the > get_builtin_precision helper function was that the TREE_TYPE of the > argument is affected by argument promotion. Your suggestion to instead > use the TREE_TYPE of the function result is a much nicer solution. > > I also agree that that all of these bswap optimizations make the assumption > that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the > front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which > can be checked explicitly. > > Both of these improvements are implemented in the attached revised patch, > which has been tested on x86_64-pc-linux-gnu with a "make bootstrap" > and "make -k check" with no new failures. > > Ok for mainline? OK. Thanks, Richard. > 2021-07-08 Roger Sayle > Richard Biener > > gcc/ChangeLog > PR tree-optimization/40210 > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as > (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 > when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. > > gcc/testsuite/ChangeLog > PR tree-optimization/40210 > * gcc.dg/builtin-bswap-13.c: New test. > * gcc.dg/builtin-bswap-14.c: New test. > > Roger > -- > > -Original Message- > From: Richard Biener > Sent: 07 July 2021 08:56 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 > in match.pd > > On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle wrote: > > > > > > All of the optimizations/transformations mentioned in bugzilla for PR > > tree-optimization/40210 are already implemented in mainline GCC, with > > one exception. In comment #5, there's a suggestion that > > (bswap64(x)>>56)&0xff can be implemented without the bswap as > > (unsigned char)x, or equivalently x&0xff. > > > > This patch implements the above optimization, and closely related > > variants. For any single bit, (bswap(X)>>C1)&1 can be simplified to > > (X>>C2)&1, where bit position C2 is the appropriate permutation of C1. > > Similarly, the bswap can eliminated if the desired set of bits all lie > > within the same byte, hence (bswap(x)>>8)&255 can always be optimized, > > as can (bswap(x)>>8)&123. > > > > Previously, > > > > int foo(long long x) { > > return (__builtin_bswap64(x) >> 56) & 0xff; } > > > > compiled with -O2 to > > foo:movq%rdi, %rax > > bswap %rax > > shrq$56, %rax > > ret > > > > with this patch, it now compiles to > > foo:movzbl %dil, %eax > > ret > > > > This patch has been tested on x86_64-pc-linux-gnu with a "make > > bootstrap" and "make -k check" with no new failures. > > > > Ok for mainline? > > I don't like get_builtin_precision too much, did you consider simply using > > + (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1)) > + INTEGER_CST@2)) > > and TYPE_PRECISION (TREE_TYPE (@3))? I think while we'll see argument > promotion and thus cannot use @1 to derive the type the return value will be > the original type. > > Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you > also use char_type_node. > > I wonder whether > > + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & > + C2. */ (simplify (bit_and (convert1? (rshift@0 (convert2? (bswap > + @1)) INTEGER_CST@2)) > + INTEGER_CST@3) > > and > > + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2. */ > + (simplify (rshift (convert? (bswap @0)) INTEGER_CST@1) > > can build upon each other, for example by extending the latter to handle more > cases, transforming to ((T)x >> C2) & C3? > That might of course be only profitable when the bswap goes away. > > Thanks, > Richard. > > > > > > > 2021-07-06 Roger Sayle > > > > gcc/ChangeLog > > PR tree-optimization/40210 > > * builtins.c (get_builtin_precision): Helper function to determine > > the precision in bits of a built-in function. > > * builtins.h (get_builtin_precision): Prototype here. > > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as > > (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 > > when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. > > > > gcc/testsuite/ChangeLog > > PR tree-optimization/40210 > > * gcc.dg/builtin-bswap-13.c: New test. > > * gcc.dg/builtin-bswap-14.c: New test. > > > > Roger > > -- > > Roger Sayle > > NextMove Software > > Cambridge, UK > >
[PATCH] PR tree-optimization/38943: Preserve trapping instructions with -fnon-call-exceptions
This patch addresses PR tree-optimization/38943 where gcc may optimize away trapping instructions even when -fnon-call-exceptions is specified. Interestingly this only affects the C compiler (when -fexceptions is not specified) as g++ (or -fexceptions) supports C++-style exception handling, where -fnon-call-exceptions triggers the stmt_could_throw_p machinery. Without -fexceptions, trapping instructions aren't always considered visible side-effects. This patch fixes this in two place. Firstly, gimple_has_side_effects is tweaked such that gimple_could_trap_p is considered a side-effect if the current function can throw non-call exceptions. And secondly, check_stmt in ipa-pure-const.c is tweaked such that a function containing a trapping statement is considered to have a side-effect with -fnon-call-exceptions, and therefore cannot be pure or const. Calling gimple_could_trap_p (which previously took a non-const gimple) from gimple_has_side_effects (which takes a const gimple) required improving the const-safety of gimple_could_trap_p (a relatively minor tweak) and its prototypes. Hopefully this is considered a clean-up/ improvement. This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. This should be relatively safe, as there are no changes in behaviour unless the user explicitly specifies -fnon-call-exceptions, when the C compiler then behaves more like the C++/Ada compiler. Ok for mainline? 2021-07-08 Roger Sayle gcc/ChangeLog PR tree-optimization/38943 * gimple.c (gimple_has_side_effects): Consider trapping to be a side-effect when -fnon-call-exceptions is specified. (gimple_coult_trap_p_1): Make S argument a "const gimple*". Preserve constness in call to gimple_asm_volatile_p. (gimple_could_trap_p): Make S argument a "const gimple*". * gimple.h (gimple_could_trap_p_1, gimple_could_trap_p): Update function prototypes. * ipa-pure-const.c (check_stmt): When the current function can throw non-call exceptions, a trapping statement should be considered a side-effect, so the function is neither const nor pure. gcc/testsuite/ChangeLog PR tree-optimization/38943 * gcc.dg/pr38943.c: New test case. Roger -- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/gimple.c b/gcc/gimple.c index f1044e9..4b150b0 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2090,7 +2090,8 @@ gimple_move_vops (gimple *new_stmt, gimple *old_stmt) statement to have side effects if: - It is a GIMPLE_CALL not marked with ECF_PURE or ECF_CONST. - - Any of its operands are marked TREE_THIS_VOLATILE or TREE_SIDE_EFFECTS. */ + - Any of its operands are marked TREE_THIS_VOLATILE or TREE_SIDE_EFFECTS. + - It may trap and -fnon-call-exceptions has been specified. */ bool gimple_has_side_effects (const gimple *s) @@ -2108,6 +2109,10 @@ gimple_has_side_effects (const gimple *s) && gimple_asm_volatile_p (as_a (s))) return true; + if (cfun->can_throw_non_call_exceptions + && gimple_could_trap_p (s)) +return true; + if (is_gimple_call (s)) { int flags = gimple_call_flags (s); @@ -2129,7 +2134,7 @@ gimple_has_side_effects (const gimple *s) S is a GIMPLE_ASSIGN, the LHS of the assignment is also checked. */ bool -gimple_could_trap_p_1 (gimple *s, bool include_mem, bool include_stores) +gimple_could_trap_p_1 (const gimple *s, bool include_mem, bool include_stores) { tree t, div = NULL_TREE; enum tree_code op; @@ -2146,7 +2151,7 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool include_stores) switch (gimple_code (s)) { case GIMPLE_ASM: - return gimple_asm_volatile_p (as_a (s)); + return gimple_asm_volatile_p (as_a (s)); case GIMPLE_CALL: t = gimple_call_fndecl (s); @@ -2192,7 +2197,7 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool include_stores) /* Return true if statement S can trap. */ bool -gimple_could_trap_p (gimple *s) +gimple_could_trap_p (const gimple *s) { return gimple_could_trap_p_1 (s, true, true); } diff --git a/gcc/gimple.h b/gcc/gimple.h index e7dc2a4..1a2e120 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -1601,8 +1601,8 @@ void gimple_set_lhs (gimple *, tree); gimple *gimple_copy (gimple *); void gimple_move_vops (gimple *, gimple *); bool gimple_has_side_effects (const gimple *); -bool gimple_could_trap_p_1 (gimple *, bool, bool); -bool gimple_could_trap_p (gimple *); +bool gimple_could_trap_p_1 (const gimple *, bool, bool); +bool gimple_could_trap_p (const gimple *); bool gimple_assign_rhs_could_trap_p (gimple *); extern void dump_gimple_statistics (void); unsigned get_gimple_rhs_num_ops (enum tree_code); diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index f045108..436cbcd 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -765,6 +765,14 @@ check_stmt (gimple_s
Re: [x86_64 PATCH]: Improvement to signed division of integer constant.
On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle wrote: > > > This patch tweaks the way GCC handles 32-bit integer division on > x86_64, when the numerator is constant. Currently the function > > int foo (int x) { > return 100/x; > } > > generates the code: > foo:movl$100, %eax > cltd > idivl %edi > ret > > where the sign-extension instruction "cltd" creates a long > dependency chain, as it depends on the "mov" before it, and > is depended upon by "idivl" after it. > > With this patch, GCC now matches both icc and LLVM and > uses an xor instead, generating: > foo:xorl%edx, %edx > movl$100, %eax > idivl %edi > ret You made me lookup idiv and I figured we're not optimally handling int foo (long x, int y) { return x / y; } by using a 32:32 / 32 bit divide. combine manages to see enough to eventually do this though. > Microbenchmarking confirms that this is faster on Intel > processors (Kaby lake), and no worse on AMD processors (Zen2), > which agrees with intuition, but oddly disagrees with the > llvm-mca cycle count prediction on godbolt.org. > > The tricky bit is that this sign-extension instruction is only > produced by late (postreload) splitting, and unfortunately none > of the subsequent passes (e.g. cprop_hardreg) is able to > propagate and simplify its constant argument. The solution > here is to introduce a define_insn_and_split that allows the > constant numerator operand to be captured (by combine) and > then split into an optimal form after reload. > > The above microbenchmarking also shows that eliminating the > sign extension of negative values (using movl $-1,%edx) is also > a performance improvement, as performed by icc but not by LLVM. > Both the xor and movl sign-extensions are larger than cltd, > so this transformation is prevented for -Os. > > > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. > > Ok for mainline? > > > 2021-07-08 Roger Sayle > > gcc/ChangeLog > * config/i386/i386.md (*divmodsi4_const): Optimize SImode > divmod of a constant numerator with new define_insn_and_split. > > gcc/testsuite/ChangeLog > * gcc.target/i386/divmod-9.c: New test case. > > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >
Re: [PATCH] PR tree-optimization/38943: Preserve trapping instructions with -fnon-call-exceptions
On Thu, Jul 8, 2021 at 11:54 AM Roger Sayle wrote: > > > This patch addresses PR tree-optimization/38943 where gcc may optimize > away trapping instructions even when -fnon-call-exceptions is specified. > Interestingly this only affects the C compiler (when -fexceptions is not > specified) as g++ (or -fexceptions) supports C++-style exception handling, > where -fnon-call-exceptions triggers the stmt_could_throw_p machinery. > Without -fexceptions, trapping instructions aren't always considered > visible side-effects. But -fnon-call-exceptions without -fexceptions doesn't make much sense, does it? I see the testcase behaves correctly when -fexceptions is also specified. The call vanishes in DCE because stmt_could_throw_p starts with bool stmt_could_throw_p (function *fun, gimple *stmt) { if (!flag_exceptions) return false; the documentation of -fnon-call-exceptions says Generate code that allows trapping instructions to throw exceptions. so either the above check is wrong or -fnon-call-exceptions should imply -fexceptions (or we should diagnose missing -fexceptions) > > This patch fixes this in two place. Firstly, gimple_has_side_effects > is tweaked such that gimple_could_trap_p is considered a side-effect > if the current function can throw non-call exceptions. But exceptions are not considered side-effects - they are explicit in the IL and thus passes are supposed to check for those and preserve dead (externally) throwing stmts if not told otherwise (flag_delete_dead_exceptions). > And secondly, > check_stmt in ipa-pure-const.c is tweaked such that a function > containing a trapping statement is considered to have a side-effect > with -fnon-call-exceptions, and therefore cannot be pure or const. EH is orthogonal to pure/const, so I think that's wrong. > Calling gimple_could_trap_p (which previously took a non-const gimple) > from gimple_has_side_effects (which takes a const gimple) required > improving the const-safety of gimple_could_trap_p (a relatively minor > tweak) and its prototypes. Hopefully this is considered a clean-up/ > improvement. Yeah, even an obvious one I think - you can push that independently. > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. This should > be relatively safe, as there are no changes in behaviour unless > the user explicitly specifies -fnon-call-exceptions, when the C > compiler then behaves more like the C++/Ada compiler. > > Ok for mainline? > > > 2021-07-08 Roger Sayle > > gcc/ChangeLog > PR tree-optimization/38943 > * gimple.c (gimple_has_side_effects): Consider trapping to > be a side-effect when -fnon-call-exceptions is specified. > (gimple_coult_trap_p_1): Make S argument a "const gimple*". > Preserve constness in call to gimple_asm_volatile_p. > (gimple_could_trap_p): Make S argument a "const gimple*". > * gimple.h (gimple_could_trap_p_1, gimple_could_trap_p): > Update function prototypes. > * ipa-pure-const.c (check_stmt): When the current function > can throw non-call exceptions, a trapping statement should > be considered a side-effect, so the function is neither > const nor pure. > > gcc/testsuite/ChangeLog > PR tree-optimization/38943 > * gcc.dg/pr38943.c: New test case. > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >
[PATCH] i386: Add pack/unpack patterns for 32bit vectors [PR100637]
V1SI mode shift is needed to shift 32bit operands and consequently we need to implement V1SI moves and pushes. 2021-07-08 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_sse_unpack): Handle V4QI mode. * config/i386/mmx.md (V_32): New mode iterator. (mov): Use V_32 mode iterator. (*mov_internal): Ditto. (*push2_rex64): Ditto. (*push2): Ditto. (movmisalign): Ditto. (mmx_v1si3): New insn pattern. (sse4_1_v2qiv2hi2): Ditto. (vec_unpacks_lo_v4qi): New expander. (vec_unpacks_hi_v4qi): Ditto. (vec_unpacku_lo_v4qi): Ditto. (vec_unpacku_hi_v4qi): Ditto. * config/i386/i386.h (VALID_SSE2_REG_MODE): Add V1SImode. (VALID_INT_MODE_P): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 58c208e166b..65764ad88c5 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -5355,6 +5355,12 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) else unpack = gen_sse4_1_sign_extendv2hiv2si2; break; + case E_V4QImode: + if (unsigned_p) + unpack = gen_sse4_1_zero_extendv2qiv2hi2; + else + unpack = gen_sse4_1_sign_extendv2qiv2hi2; + break; default: gcc_unreachable (); } @@ -5380,6 +5386,12 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) emit_insn (gen_mmx_lshrv1di3 (tmp, gen_lowpart (V1DImode, src), GEN_INT (32))); break; + case 4: + /* Shift higher 2 bytes to lower 2 bytes. */ + tmp = gen_reg_rtx (V1SImode); + emit_insn (gen_mmx_lshrv1si3 (tmp, gen_lowpart (V1SImode, src), + GEN_INT (16))); + break; default: gcc_unreachable (); } @@ -5427,6 +5439,12 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) else unpack = gen_mmx_punpcklwd; break; + case E_V4QImode: + if (high_p) + unpack = gen_mmx_punpckhbw_low; + else + unpack = gen_mmx_punpcklbw_low; + break; default: gcc_unreachable (); } diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 03d176143fe..8c3eace56da 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1016,7 +1016,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define VALID_SSE2_REG_MODE(MODE) \ ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \ - || (MODE) == V4QImode || (MODE) == V2HImode \ + || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ || (MODE) == V2DImode || (MODE) == DFmode) #define VALID_SSE_REG_MODE(MODE) \ @@ -1048,7 +1048,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == SImode || (MODE) == DImode \ || (MODE) == CQImode || (MODE) == CHImode \ || (MODE) == CSImode || (MODE) == CDImode \ - || (MODE) == V4QImode || (MODE) == V2HImode \ + || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ || (TARGET_64BIT\ && ((MODE) == TImode || (MODE) == CTImode \ || (MODE) == TFmode || (MODE) == TCmode \ diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 7e83b64ab59..986b758396a 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -57,10 +57,13 @@ (define_mode_iterator MMXMODE14 [V8QI V2SI]) (define_mode_iterator MMXMODE24 [V4HI V2SI]) (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI]) -;; All 32bit integer vector modes +;; All 4-byte integer vector modes +(define_mode_iterator V_32 [V4QI V2HI V1SI]) + +;; 4-byte integer vector modes (define_mode_iterator VI_32 [V4QI V2HI]) -;; All V2S* modes +;; V2S* modes (define_mode_iterator V2FI [V2SF V2SI]) ;; Mapping from integer vector mode to mnemonic suffix @@ -238,8 +241,8 @@ (define_expand "movmisalign" }) (define_expand "mov" - [(set (match_operand:VI_32 0 "nonimmediate_operand") - (match_operand:VI_32 1 "nonimmediate_operand"))] + [(set (match_operand:V_32 0 "nonimmediate_operand") + (match_operand:V_32 1 "nonimmediate_operand"))] "TARGET_SSE2" { ix86_expand_vector_move (mode, operands); @@ -247,9 +250,9 @@ (define_expand "mov" }) (define_insn "*mov_internal" - [(set (match_operand:VI_32 0 "nonimmediate_operand" + [(set (match_operand:V_32 0 "nonimmediate_op
Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)
On Jul 07 2021, Marek Polacek via Gcc-patches wrote: > On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches wrote: >> I certainly will. Pushed in r12-2132. > > I think this patch breaks bootstrap on x86_64: It also breaks bootstrap on aarch64 and ia64 in stage2. In file included from ../../gcc/c-family/c-common.h:26, from ../../gcc/cp/cp-tree.h:40, from ../../gcc/cp/module.cc:209: In function 'tree_node* identifier(const cpp_hashnode*)', inlined from 'bool module_state::read_macro_maps()' at ../../gcc/cp/module.cc:16305:10: ../../gcc/tree.h:1089:58: error: array subscript -1 is outside array bounds of 'cpp_hashnode [288230376151711743]' [-Werror=array-bounds] 1089 | ((tree) ((char *) (NODE) - sizeof (struct tree_common))) | ^ ../../gcc/cp/module.cc:277:10: note: in expansion of macro 'HT_IDENT_TO_GCC_IDENT' 277 | return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast (node))); | ^ In file included from ../../gcc/tree.h:23, from ../../gcc/c-family/c-common.h:26, from ../../gcc/cp/cp-tree.h:40, from ../../gcc/cp/module.cc:209: ../../gcc/tree-core.h: In member function 'bool module_state::read_macro_maps()': ../../gcc/tree-core.h:1445:24: note: at offset -24 into object 'tree_identifier::id' of size 16 1445 | struct ht_identifier id; |^~ Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [x86_64 PATCH]: Improvement to signed division of integer constant.
On Thu, 8 Jul 2021, Richard Biener via Gcc-patches wrote: > You made me lookup idiv and I figured we're not optimally > handling > > int foo (long x, int y) > { > return x / y; > } > > by using a 32:32 / 32 bit divide. combine manages to > see enough to eventually do this though. We cannot do that in general because idiv will cause an exception if the signed result is not representable in 32 bits, but GCC defines signed conversions to truncate without trapping. Alexander
[committed] match.pd: Relax rule to include POLY_INT_CSTs
match.pd has a rule to simplify an extension, operation and truncation back to the original type: (simplify (convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2))) Currently it handles cases in which @2 is an INTEGER_CST, but it also works for POLY_INT_CSTs.[*] For INTEGER_CST it doesn't matter whether we test @2 or @4, but for POLY_INT_CST it is possible to have unfolded (convert …)s. Originally I saw this leading to some bad ivopts decisions, because we weren't folding away redundancies from candidate iv expressions. It's also possible to test the fold directly using the SVE ACLE. Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious. Richard [*] Not all INTEGER_CST rules work for POLY_INT_CSTs, since extensions don't necessarily distribute over the internals of the POLY_INT_CST. But in this case that isn't an issue. gcc/ * match.pd: Simplify an extend-operate-truncate sequence involving a POLY_INT_CST. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cntb_1.c: New test. --- gcc/match.pd | 2 +- .../gcc.target/aarch64/sve/acle/general/cntb_1.c | 14 ++ 2 files changed, 15 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c diff --git a/gcc/match.pd b/gcc/match.pd index 334e8cc0496..30680d488ab 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -6175,7 +6175,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (types_match (@1, @2) /* Or the second operand is const integer or converted const integer from valueize. */ - || TREE_CODE (@2) == INTEGER_CST)) + || poly_int_tree_p (@4))) (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1))) (op @1 (convert @2)) (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c new file mode 100644 index 000..b43fcf0ed6d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c @@ -0,0 +1,14 @@ +/* { dg-options "-O -fdump-tree-optimized" } */ + +#include + +unsigned int +foo (unsigned int x) +{ + unsigned long tmp = x; + tmp += svcntb (); + x = tmp; + return x - svcntb (); +} + +/* { dg-final { scan-tree-dump-not { POLY_INT_CST } optimized } } */ -- 2.17.1
[committed] vect: Remove always-true condition
vectorizable_reduction had code guarded by: if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def) But that's always true after: if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_nested_cycle) return false; if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle) { … return true; } (I wasn't sure at first how the empty “else” for the first “if” above was supposed to work.) Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious. Richard gcc/ * tree-vect-loop.c (vectorizable_reduction): Remove always-true if condition. --- gcc/tree-vect-loop.c | 50 +--- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 51a46a6d852..bc523d151c6 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -6516,33 +6516,31 @@ vectorizable_reduction (loop_vec_info loop_vinfo, stmt_vec_info orig_stmt_of_analysis = stmt_info; stmt_vec_info phi_info = stmt_info; - if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def - || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def) + if (!is_a (stmt_info->stmt)) { - if (!is_a (stmt_info->stmt)) - { - STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; - return true; - } - if (slp_node) - { - slp_node_instance->reduc_phis = slp_node; - /* ??? We're leaving slp_node to point to the PHIs, we only -need it to get at the number of vector stmts which wasn't -yet initialized for the instance root. */ - } - if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def) - stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (stmt_info)); - else /* STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def */ - { - use_operand_p use_p; - gimple *use_stmt; - bool res = single_imm_use (gimple_phi_result (stmt_info->stmt), -&use_p, &use_stmt); - gcc_assert (res); - phi_info = loop_vinfo->lookup_stmt (use_stmt); - stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (phi_info)); - } + STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; + return true; +} + if (slp_node) +{ + slp_node_instance->reduc_phis = slp_node; + /* ??? We're leaving slp_node to point to the PHIs, we only +need it to get at the number of vector stmts which wasn't +yet initialized for the instance root. */ +} + if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def) +stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (stmt_info)); + else +{ + gcc_assert (STMT_VINFO_DEF_TYPE (stmt_info) + == vect_double_reduction_def); + use_operand_p use_p; + gimple *use_stmt; + bool res = single_imm_use (gimple_phi_result (stmt_info->stmt), +&use_p, &use_stmt); + gcc_assert (res); + phi_info = loop_vinfo->lookup_stmt (use_stmt); + stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (phi_info)); } /* PHIs should not participate in patterns. */
[PATCH] ifcvt: Improve tests for predicated operations
-msve-vector-bits=128 causes the AArch64 port to list 128-bit Advanced SIMD as the first-choice mode for vectorisation, with SVE being used for things that Advanced SIMD can't handle as easily. However, ifcvt would not then try to use SVE's predicated FP arithmetic, leading to tests like TSVC ControlFlow-flt failing to vectorise. The mask load/store code did try other vector modes, but could also be improved to make sure that SVEness sticks when computing derived modes. (Unlike mode_for_vector, related_vector_mode always returns a vector mode, so there's no need to check VECTOR_MODE_P as well.) Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ * internal-fn.c (vectorized_internal_fn_supported_p): Handle vector types first. For scalar types, consider both the preferred vector mode and the alternative vector modes. * optabs-query.c (can_vec_mask_load_store_p): Use the same structure as above, in particular using related_vector_mode for modes provided by autovectorize_vector_modes. gcc/testsuite/ * gcc.target/aarch64/sve/cond_arith_6.c: New test. --- gcc/internal-fn.c | 28 +++ gcc/optabs-query.c| 23 +-- .../gcc.target/aarch64/sve/cond_arith_6.c | 14 ++ 3 files changed, 43 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index fb8b43d1ce2..cd5e63f9acd 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -4109,16 +4109,32 @@ expand_internal_call (gcall *stmt) bool vectorized_internal_fn_supported_p (internal_fn ifn, tree type) { + if (VECTOR_MODE_P (TYPE_MODE (type))) +return direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED); + scalar_mode smode; - if (!VECTOR_TYPE_P (type) && is_a (TYPE_MODE (type), &smode)) + if (!is_a (TYPE_MODE (type), &smode)) +return false; + + machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode); + if (VECTOR_MODE_P (vmode)) { - machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode); - if (VECTOR_MODE_P (vmode)) - type = build_vector_type_for_mode (type, vmode); + tree vectype = build_vector_type_for_mode (type, vmode); + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + return true; } - return (VECTOR_MODE_P (TYPE_MODE (type)) - && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED)); + auto_vector_modes vector_modes; + targetm.vectorize.autovectorize_vector_modes (&vector_modes, true); + for (machine_mode base_mode : vector_modes) +if (related_vector_mode (base_mode, smode).exists (&vmode)) + { + tree vectype = build_vector_type_for_mode (type, vmode); + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + return true; + } + + return false; } void diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c index 3248ce2c06e..05ee5f517da 100644 --- a/gcc/optabs-query.c +++ b/gcc/optabs-query.c @@ -582,27 +582,18 @@ can_vec_mask_load_store_p (machine_mode mode, return false; vmode = targetm.vectorize.preferred_simd_mode (smode); - if (!VECTOR_MODE_P (vmode)) -return false; - - if (targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) + if (VECTOR_MODE_P (vmode) + && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing) return true; auto_vector_modes vector_modes; targetm.vectorize.autovectorize_vector_modes (&vector_modes, true); - for (unsigned int i = 0; i < vector_modes.length (); ++i) -{ - poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]); - poly_uint64 nunits; - if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits)) - continue; - if (mode_for_vector (smode, nunits).exists (&vmode) - && VECTOR_MODE_P (vmode) - && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) - && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing) - return true; -} + for (machine_mode base_mode : vector_modes) +if (related_vector_mode (base_mode, smode).exists (&vmode) + && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) + && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing) + return true; return false; } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c new file mode 100644 index 000..4085ab12444 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c @@ -0,0 +1,14 @@ +/* { dg-options "-O3 -msve-vector-bits=128" } */ + +void +f (float *x) +{ + for (int i = 0; i < 100; ++i) +if (x[i] > 1.0f) + x[i] -= 1.0f; +} + +/* { dg-final { scan-assembler {\tld1w
Re: [committed] match.pd: Relax rule to include POLY_INT_CSTs
On Thu, Jul 8, 2021 at 1:52 PM Richard Sandiford via Gcc-patches wrote: > > match.pd has a rule to simplify an extension, operation and truncation > back to the original type: > > (simplify >(convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2))) > > Currently it handles cases in which @2 is an INTEGER_CST, but it > also works for POLY_INT_CSTs.[*] > > For INTEGER_CST it doesn't matter whether we test @2 or @4, > but for POLY_INT_CST it is possible to have unfolded (convert …)s. But if it is an unfolded conversion then @4 is the conversion and of course not POLY_INT_CST_P, so I'm not sure what you says makes sense. But maybe you want to _not_ simplify the unfolded conversion case? > Originally I saw this leading to some bad ivopts decisions, because > we weren't folding away redundancies from candidate iv expressions. > It's also possible to test the fold directly using the SVE ACLE. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious. > > Richard > > [*] Not all INTEGER_CST rules work for POLY_INT_CSTs, since extensions > don't necessarily distribute over the internals of the POLY_INT_CST. > But in this case that isn't an issue. > > > gcc/ > * match.pd: Simplify an extend-operate-truncate sequence involving > a POLY_INT_CST. > > gcc/testsuite/ > * gcc.target/aarch64/sve/acle/general/cntb_1.c: New test. > --- > gcc/match.pd | 2 +- > .../gcc.target/aarch64/sve/acle/general/cntb_1.c | 14 ++ > 2 files changed, 15 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 334e8cc0496..30680d488ab 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -6175,7 +6175,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > && (types_match (@1, @2) > /* Or the second operand is const integer or converted const >integer from valueize. */ > - || TREE_CODE (@2) == INTEGER_CST)) > + || poly_int_tree_p (@4))) > (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1))) > (op @1 (convert @2)) > (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c > new file mode 100644 > index 000..b43fcf0ed6d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c > @@ -0,0 +1,14 @@ > +/* { dg-options "-O -fdump-tree-optimized" } */ > + > +#include > + > +unsigned int > +foo (unsigned int x) > +{ > + unsigned long tmp = x; > + tmp += svcntb (); > + x = tmp; > + return x - svcntb (); > +} > + > +/* { dg-final { scan-tree-dump-not { POLY_INT_CST } optimized } } */ > -- > 2.17.1 >
Re: [PATCH] ifcvt: Improve tests for predicated operations
On Thu, Jul 8, 2021 at 2:04 PM Richard Sandiford via Gcc-patches wrote: > > -msve-vector-bits=128 causes the AArch64 port to list 128-bit Advanced > SIMD as the first-choice mode for vectorisation, with SVE being used for > things that Advanced SIMD can't handle as easily. However, ifcvt would > not then try to use SVE's predicated FP arithmetic, leading to tests > like TSVC ControlFlow-flt failing to vectorise. > > The mask load/store code did try other vector modes, but could also be > improved to make sure that SVEness sticks when computing derived modes. > > (Unlike mode_for_vector, related_vector_mode always returns a vector > mode, so there's no need to check VECTOR_MODE_P as well.) > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Richard. > Richard > > > gcc/ > * internal-fn.c (vectorized_internal_fn_supported_p): Handle > vector types first. For scalar types, consider both the preferred > vector mode and the alternative vector modes. > * optabs-query.c (can_vec_mask_load_store_p): Use the same > structure as above, in particular using related_vector_mode > for modes provided by autovectorize_vector_modes. > > gcc/testsuite/ > * gcc.target/aarch64/sve/cond_arith_6.c: New test. > --- > gcc/internal-fn.c | 28 +++ > gcc/optabs-query.c| 23 +-- > .../gcc.target/aarch64/sve/cond_arith_6.c | 14 ++ > 3 files changed, 43 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index fb8b43d1ce2..cd5e63f9acd 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -4109,16 +4109,32 @@ expand_internal_call (gcall *stmt) > bool > vectorized_internal_fn_supported_p (internal_fn ifn, tree type) > { > + if (VECTOR_MODE_P (TYPE_MODE (type))) > +return direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED); > + >scalar_mode smode; > - if (!VECTOR_TYPE_P (type) && is_a (TYPE_MODE (type), &smode)) > + if (!is_a (TYPE_MODE (type), &smode)) > +return false; > + > + machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode); > + if (VECTOR_MODE_P (vmode)) > { > - machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode); > - if (VECTOR_MODE_P (vmode)) > - type = build_vector_type_for_mode (type, vmode); > + tree vectype = build_vector_type_for_mode (type, vmode); > + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) > + return true; > } > > - return (VECTOR_MODE_P (TYPE_MODE (type)) > - && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED)); > + auto_vector_modes vector_modes; > + targetm.vectorize.autovectorize_vector_modes (&vector_modes, true); > + for (machine_mode base_mode : vector_modes) > +if (related_vector_mode (base_mode, smode).exists (&vmode)) > + { > + tree vectype = build_vector_type_for_mode (type, vmode); > + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) > + return true; > + } > + > + return false; > } > > void > diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c > index 3248ce2c06e..05ee5f517da 100644 > --- a/gcc/optabs-query.c > +++ b/gcc/optabs-query.c > @@ -582,27 +582,18 @@ can_vec_mask_load_store_p (machine_mode mode, > return false; > >vmode = targetm.vectorize.preferred_simd_mode (smode); > - if (!VECTOR_MODE_P (vmode)) > -return false; > - > - if (targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) > + if (VECTOR_MODE_P (vmode) > + && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) >&& convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing) > return true; > >auto_vector_modes vector_modes; >targetm.vectorize.autovectorize_vector_modes (&vector_modes, true); > - for (unsigned int i = 0; i < vector_modes.length (); ++i) > -{ > - poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]); > - poly_uint64 nunits; > - if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits)) > - continue; > - if (mode_for_vector (smode, nunits).exists (&vmode) > - && VECTOR_MODE_P (vmode) > - && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) > - && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing) > - return true; > -} > + for (machine_mode base_mode : vector_modes) > +if (related_vector_mode (base_mode, smode).exists (&vmode) > + && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) > + && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing) > + return true; >return false; > } > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c > b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c > new file mode 100644 > inde
Re: [committed] match.pd: Relax rule to include POLY_INT_CSTs
Richard Biener via Gcc-patches writes: > On Thu, Jul 8, 2021 at 1:52 PM Richard Sandiford via Gcc-patches > wrote: >> >> match.pd has a rule to simplify an extension, operation and truncation >> back to the original type: >> >> (simplify >>(convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2))) >> >> Currently it handles cases in which @2 is an INTEGER_CST, but it >> also works for POLY_INT_CSTs.[*] >> >> For INTEGER_CST it doesn't matter whether we test @2 or @4, >> but for POLY_INT_CST it is possible to have unfolded (convert …)s. > > But if it is an unfolded conversion then @4 is the conversion and of > course not POLY_INT_CST_P, so I'm not sure what you says makes > sense. But maybe you want to _not_ simplify the unfolded > conversion case? Yeah, exactly that. Extensions of POLY_INT_CSTs won't be folded because extension doesn't distribute over (modulo) +. If an unfolded POLY_INT_CST has the same type as @1 then the match will succeed thanks to types_match (@1, @2). So the new pattern handles both that case and the case in which POLY_INT_CST occurs without a conversion. If an unfolded POLY_INT_CST has a different type from @1 then we'd need a more complicated check for validity. Maybe that would be useful, but it would no longer be a one-line change :-) Thanks, Richard > >> Originally I saw this leading to some bad ivopts decisions, because >> we weren't folding away redundancies from candidate iv expressions. >> It's also possible to test the fold directly using the SVE ACLE. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious. >> >> Richard >> >> [*] Not all INTEGER_CST rules work for POLY_INT_CSTs, since extensions >> don't necessarily distribute over the internals of the POLY_INT_CST. >> But in this case that isn't an issue. >> >> >> gcc/ >> * match.pd: Simplify an extend-operate-truncate sequence involving >> a POLY_INT_CST. >> >> gcc/testsuite/ >> * gcc.target/aarch64/sve/acle/general/cntb_1.c: New test. >> --- >> gcc/match.pd | 2 +- >> .../gcc.target/aarch64/sve/acle/general/cntb_1.c | 14 ++ >> 2 files changed, 15 insertions(+), 1 deletion(-) >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 334e8cc0496..30680d488ab 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -6175,7 +6175,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> && (types_match (@1, @2) >> /* Or the second operand is const integer or converted const >>integer from valueize. */ >> - || TREE_CODE (@2) == INTEGER_CST)) >> + || poly_int_tree_p (@4))) >> (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1))) >> (op @1 (convert @2)) >> (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c >> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c >> new file mode 100644 >> index 000..b43fcf0ed6d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c >> @@ -0,0 +1,14 @@ >> +/* { dg-options "-O -fdump-tree-optimized" } */ >> + >> +#include >> + >> +unsigned int >> +foo (unsigned int x) >> +{ >> + unsigned long tmp = x; >> + tmp += svcntb (); >> + x = tmp; >> + return x - svcntb (); >> +} >> + >> +/* { dg-final { scan-tree-dump-not { POLY_INT_CST } optimized } } */ >> -- >> 2.17.1 >>
[PATCH 00/10] vect: Reuse reduction accumulators between loops
Quoting from the final patch in the series: This patch adds support for reusing a main loop's reduction accumulator in an epilogue loop. This in turn lets the loops share a single piece of vector->scalar reduction code. The patch has the following restrictions: (1) The epilogue reduction can only operate on a single vector (e.g. ncopies must be 1 for non-SLP reductions, and the group size must be <= the element count for SLP reductions). (2) Both loops must use the same vector mode for their accumulators. This means that the patch is restricted to targets that support --param vect-partial-vector-usage=1. (3) The reduction must be a standard “tree code” reduction. However, these restrictions could be lifted in future. For example, if the main loop operates on 128-bit vectors and the epilogue loop operates on 64-bit vectors, we could in future reduce the 128-bit vector by one stage and use the 64-bit result as the starting point for the epilogue result. The patch tries to handle chained SLP reductions, unchained SLP reductions and non-SLP reductions. It also handles cases in which the epilogue loop is entered directly (rather than via the main loop) and cases in which the epilogue loop can be skipped. However, it ended up being difficult to do that without some preparatory clean-ups. Some of them could probably stand on their own, but others are a bit “meh” without the final patch to justify them. The diff below shows the effect of the patch when compiling: unsigned short __attribute__((noipa)) add_loop (unsigned short *x, int n) { unsigned short res = 0; for (int i = 0; i < n; ++i) res += x[i]; return res; } with -O3 --param vect-partial-vector-usage=1 on an SVE target: add_loop: add_loop: .LFB0: .LFB0: .cfi_startproc .cfi_startproc mov x4, x0< cmp w1, 0 cmp w1, 0 ble .L7 ble .L7 cnthx0| cnthx4 sub w2, w1, #1 sub w2, w1, #1 sub w3, w0, #1| sub w3, w4, #1 cmp w2, w3 cmp w2, w3 bcc .L8 bcc .L8 sub w0, w1, w0| sub w4, w1, w4 mov x3, 0 mov x3, 0 cnthx5 cnthx5 mov z0.b, #0mov z0.b, #0 ptrue p0.b, all ptrue p0.b, all .p2align 3,,7 .p2align 3,,7 .L4:.L4: ld1hz1.h, p0/z, [x4, x3, | ld1hz1.h, p0/z, [x0, x3, mov x2, x3 mov x2, x3 add x3, x3, x5 add x3, x3, x5 add z0.h, z0.h, z1.hadd z0.h, z0.h, z1.h cmp w0, w3| cmp w4, w3 bcs .L4 bcs .L4 uaddv d0, p0, z0.h < umovw0, v0.h[0] < inchx2 inchx2 and w0, w0, 65535 < cmp w1, w2 cmp w1, w2 beq .L2 | beq .L6 .L3:.L3: sub w1, w1, w2 sub w1, w1, w2 mov z1.b, #0 | add x2, x0, w2, uxtw 1 whilelo p0.h, wzr, w1 whilelo p0.h, wzr, w1 add x2, x4, w2, uxtw 1| ld1hz1.h, p0/z, [x2] ptrue p1.b, all | add z0.h, p0/m, z0.h, z1. ld1hz0.h, p0/z, [x2] | .L6: sel z0.h, p0, z0.h, z1.h | ptrue p0.b, all uaddv d0, p1, z0.h | uaddv d0, p0, z0.h fmovx1, d0| umovw0, v0.h[0] add w0, w0, w1, uxth < and w0, w0, 65535 and w0, w0, 65535 .L2: < ret ret .p2align 2,,3 .p2align 2,,3 .L7:.L7: mov w0, 0 mov w0, 0 ret ret .L8:.L8: mov w2, 0 mov w2, 0 mov w0, 0 | mov z0.b, #0 b .L3 b .L3 .cfi_end
[PATCH 01/10] vect: Simplify epilogue reduction code
vect_create_epilog_for_reduction only handles two cases: single-loop reductions and double reductions. “nested cycles” (i.e. reductions in the inner loop when vectorising an outer loop) are handled elsewhere and don't need a vector->scalar reduction. The function had variables called nested_in_vect_loop and double_reduc and asserted that nested_in_vect_loop implied double_reduc, but it still had code to handle nested_in_vect_loop && !double_reduc. This patch removes that and uses double_reduc everywhere. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Remove nested_in_vect_loop and use double_reduc everywhere. Remove dead assignment to "loop". --- gcc/tree-vect-loop.c | 30 -- 1 file changed, 4 insertions(+), 26 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index bc523d151c6..7c3e3352b43 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -5005,7 +5005,6 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, imm_use_iterator imm_iter, phi_imm_iter; use_operand_p use_p, phi_use_p; gimple *use_stmt; - bool nested_in_vect_loop = false; auto_vec new_phis; int j, i; auto_vec scalar_results; @@ -5023,10 +5022,8 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, { outer_loop = loop; loop = loop->inner; - nested_in_vect_loop = true; - gcc_assert (!slp_node); + gcc_assert (!slp_node && double_reduc); } - gcc_assert (!nested_in_vect_loop || double_reduc); vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info); gcc_assert (vectype); @@ -5049,8 +5046,6 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info); else if (double_reduc) ; - else if (nested_in_vect_loop) - ; else adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info); } @@ -5923,7 +5918,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, { gcc_assert (!slp_reduc); gimple_seq stmts = NULL; - if (nested_in_vect_loop) + if (double_reduc) { new_phi = new_phis[0]; gcc_assert (VECTOR_TYPE_P (TREE_TYPE (adjustment_def))); @@ -5942,21 +5937,12 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, epilog_stmt = gimple_seq_last_stmt (stmts); gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); - if (nested_in_vect_loop) -{ - if (!double_reduc) -scalar_results.quick_push (new_temp); - else -scalar_results[0] = new_temp; -} - else -scalar_results[0] = new_temp; - + scalar_results[0] = new_temp; new_phis[0] = epilog_stmt; } if (double_reduc) -loop = loop->inner; +loop = outer_loop; /* 2.6 Handle the loop-exit phis. Replace the uses of scalar loop-exit phis with new adjusted scalar results, i.e., replace use @@ -6017,14 +6003,6 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, scalar_dest = gimple_assign_lhs (scalar_stmt_info->stmt); } - if (nested_in_vect_loop) -{ - if (double_reduc) -loop = outer_loop; - else - gcc_unreachable (); -} - phis.create (3); /* Find the loop-closed-use at the loop exit of the original scalar result. (The reduction result is expected to have two immediate uses,
[PATCH 02/10] vect: Create array_slice of live-out stmts
This patch constructs an array_slice of the scalar statements that produce live-out reduction results in the original unvectorised loop. There are three cases: - SLP reduction chains: the final SLP stmt is live-out - full SLP reductions: all SLP stmts are live-out - non-SLP reductions: the single scalar stmt is live-out This is a slight simplification on its own, mostly because it maans “group_size” has a consistent meaning throughout the function. The main justification though is that it helps with later patches. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate scalar_results to group_size elements after reducing down from N*group_size elements. Construct an array_slice of the live-out stmts and assert that there is one stmt per scalar result. --- gcc/tree-vect-loop.c | 61 +++- 1 file changed, 21 insertions(+), 40 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 7c3e3352b43..8390ac80ca0 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -5010,7 +5010,12 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, auto_vec scalar_results; unsigned int group_size = 1, k; auto_vec phis; - bool slp_reduc = false; + /* SLP reduction without reduction chain, e.g., + # a1 = phi + # b1 = phi + a2 = operation (a1) + b2 = operation (b1) */ + bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)); bool direct_slp_reduc; tree new_phi_result; tree induction_index = NULL_TREE; @@ -5050,6 +5055,16 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info); } + stmt_vec_info single_live_out_stmt[] = { stmt_info }; + array_slice live_out_stmts = single_live_out_stmt; + if (slp_reduc) +/* All statements produce live-out values. */ +live_out_stmts = SLP_TREE_SCALAR_STMTS (slp_node); + else if (slp_node) +/* The last statement in the reduction chain produces the live-out + value. */ +single_live_out_stmt[0] = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]; + unsigned vec_num; int ncopies; if (slp_node) @@ -5248,13 +5263,6 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, new_scalar_dest = vect_create_destination_var (scalar_dest, NULL); bitsize = TYPE_SIZE (scalar_type); - /* SLP reduction without reduction chain, e.g., - # a1 = phi - # b1 = phi - a2 = operation (a1) - b2 = operation (b1) */ - slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)); - /* True if we should implement SLP_REDUC using native reduction operations instead of scalar operations. */ direct_slp_reduc = (reduc_fn != IFN_LAST @@ -5877,6 +5885,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, first_res, res); scalar_results[j % group_size] = new_res; } + scalar_results.truncate (group_size); for (k = 0; k < group_size; k++) scalar_results[k] = gimple_convert (&stmts, scalar_type, scalar_results[k]); @@ -5969,39 +5978,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, use use */ - - /* In SLP reduction chain we reduce vector results into one vector if - necessary, hence we set here REDUC_GROUP_SIZE to 1. SCALAR_DEST is the - LHS of the last stmt in the reduction chain, since we are looking for - the loop exit phi node. */ - if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) -{ - stmt_vec_info dest_stmt_info - = vect_orig_stmt (SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]); - scalar_dest = gimple_assign_lhs (dest_stmt_info->stmt); - group_size = 1; -} - - /* In SLP we may have several statements in NEW_PHIS and REDUCTION_PHIS (in - case that REDUC_GROUP_SIZE is greater than vectorization factor). - Therefore, we need to match SCALAR_RESULTS with corresponding statements. - The first (REDUC_GROUP_SIZE / number of new vector stmts) scalar results - correspond to the first vector stmt, etc. - (RATIO is equal to (REDUC_GROUP_SIZE / number of new vector stmts)). */ - if (group_size > new_phis.length ()) -gcc_assert (!(group_size % new_phis.length ())); - - for (k = 0; k < group_size; k++) + gcc_assert (live_out_stmts.size () == scalar_results.length ()); + for (k = 0; k < live_out_stmts.size (); k++) { - if (slp_reduc) -{ - stmt_vec_info scalar_stmt_info = SLP_TREE_SCALAR_STMTS (slp_node)[k]; - - orig_stmt_info = STMT_VINFO_RELATED_STMT (scalar_stmt_info); - /* SLP statements can't participate in patterns. */ - gcc_assert (!orig_stmt_info); - scalar_dest = gimple_assign_lhs (scalar_stmt_info->stmt); -} + stmt_vec
[PATCH 03/10] vect: Remove new_phis from
vect_create_epilog_for_reduction had a variable called new_phis. It collected the statements that produce the exit block definitions of the vector reduction accumulators. Although those statements are indeed phis initially, they are often replaced with normal statements later, leading to puzzling code like: FOR_EACH_VEC_ELT (new_phis, i, new_phi) { int bit_offset; if (gimple_code (new_phi) == GIMPLE_PHI) vec_temp = PHI_RESULT (new_phi); else vec_temp = gimple_assign_lhs (new_phi); Also, although the array collects statements, in practice all users want the lhs instead. This patch therefore replaces new_phis with a vector of gimple values called “reduc_inputs”. Also, reduction chains and ncopies>1 were handled with identical code (and there was a comment saying so). The patch unites them into a single “if”. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Replace the new_phis vector with a reduc_inputs vector. Combine handling of reduction chains and ncopies > 1. --- gcc/tree-vect-loop.c | 113 --- 1 file changed, 41 insertions(+), 72 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 8390ac80ca0..b7f73ca52c7 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -5005,7 +5005,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, imm_use_iterator imm_iter, phi_imm_iter; use_operand_p use_p, phi_use_p; gimple *use_stmt; - auto_vec new_phis; + auto_vec reduc_inputs; int j, i; auto_vec scalar_results; unsigned int group_size = 1, k; @@ -5017,7 +5017,6 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, b2 = operation (b1) */ bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)); bool direct_slp_reduc; - tree new_phi_result; tree induction_index = NULL_TREE; if (slp_node) @@ -5215,7 +5214,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, if (double_reduc) loop = outer_loop; exit_bb = single_exit (loop)->dest; - new_phis.create (slp_node ? vec_num : ncopies); + reduc_inputs.create (slp_node ? vec_num : ncopies); for (unsigned i = 0; i < vec_num; i++) { if (slp_node) @@ -5223,19 +5222,14 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, else def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]); for (j = 0; j < ncopies; j++) -{ + { tree new_def = copy_ssa_name (def); - phi = create_phi_node (new_def, exit_bb); - if (j == 0) -new_phis.quick_push (phi); - else - { - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); - new_phis.quick_push (phi); - } - - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); -} + phi = create_phi_node (new_def, exit_bb); + if (j) + def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); + SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); + reduc_inputs.quick_push (new_def); + } } exit_gsi = gsi_after_labels (exit_bb); @@ -5274,52 +5268,32 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, a2 = operation (a1) a3 = operation (a2), - we may end up with more than one vector result. Here we reduce them to - one vector. */ - if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc) + we may end up with more than one vector result. Here we reduce them + to one vector. + + The same is true if we couldn't use a single defuse cycle. */ + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) + || direct_slp_reduc + || ncopies > 1) { gimple_seq stmts = NULL; - tree first_vect = PHI_RESULT (new_phis[0]); - first_vect = gimple_convert (&stmts, vectype, first_vect); - for (k = 1; k < new_phis.length (); k++) + tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]); + for (k = 1; k < reduc_inputs.length (); k++) { - gimple *next_phi = new_phis[k]; - tree second_vect = PHI_RESULT (next_phi); - second_vect = gimple_convert (&stmts, vectype, second_vect); + tree second_vect = gimple_convert (&stmts, vectype, reduc_inputs[k]); first_vect = gimple_build (&stmts, code, vectype, first_vect, second_vect); } gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); - new_phi_result = first_vect; - new_phis.truncate (0); - new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect)); + reduc_inputs.truncate (0); + reduc_inputs.safe_push (first_vect); } - /* Likewise if we couldn't use a single defuse cycle. */ - else if (ncopies > 1) -{ - gimple_seq stmts = NULL; - tree first_vect = PHI
[PATCH 04/10] vect: Ensure reduc_inputs always have vectype
Vector reduction accumulators can differ in signedness from the final scalar result. The conversions to handle that case were distributed through vect_create_epilog_for_reduction; this patch does the conversion up-front instead. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Convert the phi results to vectype after creating them. Remove later conversion code that thus becomes redundant. --- gcc/tree-vect-loop.c | 28 +++- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index b7f73ca52c7..1bd9a6ea52c 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -5214,9 +5214,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, if (double_reduc) loop = outer_loop; exit_bb = single_exit (loop)->dest; + exit_gsi = gsi_after_labels (exit_bb); reduc_inputs.create (slp_node ? vec_num : ncopies); for (unsigned i = 0; i < vec_num; i++) { + gimple_seq stmts = NULL; if (slp_node) def = vect_get_slp_vect_def (slp_node, i); else @@ -5228,12 +5230,12 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, if (j) def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); + new_def = gimple_convert (&stmts, vectype, new_def); reduc_inputs.quick_push (new_def); } + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); } - exit_gsi = gsi_after_labels (exit_bb); - /* 2.2 Get the relevant tree-code to use in the epilog for schemes 2,3 (i.e. when reduc_fn is not available) and in the final adjustment code (if needed). Also get the original scalar reduction variable as @@ -5277,17 +5279,14 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, || ncopies > 1) { gimple_seq stmts = NULL; - tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]); + tree single_input = reduc_inputs[0]; for (k = 1; k < reduc_inputs.length (); k++) -{ - tree second_vect = gimple_convert (&stmts, vectype, reduc_inputs[k]); - first_vect = gimple_build (&stmts, code, vectype, -first_vect, second_vect); -} + single_input = gimple_build (&stmts, code, vectype, +single_input, reduc_inputs[k]); gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); reduc_inputs.truncate (0); - reduc_inputs.safe_push (first_vect); + reduc_inputs.safe_push (single_input); } if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION @@ -5323,10 +5322,6 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, /* Vector of {0, 0, 0,...}. */ tree zero_vec = build_zero_cst (vectype); - gimple_seq stmts = NULL; - reduc_inputs[0] = gimple_convert (&stmts, vectype, reduc_inputs[0]); - gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); - /* Find maximum value from the vector of found indexes. */ tree max_index = make_ssa_name (index_scalar_type); gcall *max_index_stmt = gimple_build_call_internal (IFN_REDUC_MAX, @@ -5394,7 +5389,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, /* Convert the reduced value back to the result type and set as the result. */ - stmts = NULL; + gimple_seq stmts = NULL; new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type, data_reduc); gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); @@ -5412,7 +5407,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, val = data_reduc[i], idx_val = induction_index[i]; return val; */ - tree data_eltype = TREE_TYPE (TREE_TYPE (reduc_inputs[0])); + tree data_eltype = TREE_TYPE (vectype); tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index)); unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype)); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index)); @@ -5488,8 +5483,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, "Reduce using direct vector reduction.\n"); gimple_seq stmts = NULL; - reduc_inputs[0] = gimple_convert (&stmts, vectype, reduc_inputs[0]); - vec_elem_type = TREE_TYPE (TREE_TYPE (reduc_inputs[0])); + vec_elem_type = TREE_TYPE (vectype); new_temp = gimple_build (&stmts, as_combined_fn (reduc_fn), vec_elem_type, reduc_inputs[0]); new_temp = gimple_convert (&stmts, scalar_type, new_temp);
[PATCH 05/10] vect: Add a vect_phi_initial_value helper function
This patch adds a helper function called vect_phi_initial_value for returning the incoming value of a given loop phi. The main reason for adding it is to ensure that the right preheader edge is used when vectorising nested loops. (PHI_ARG_DEF_FROM_EDGE itself doesn't assert that the given edge is for the right block, although I guess that would be good to add separately.) gcc/ * tree-vectorizer.h: Include tree-ssa-operands.h. (vect_phi_initial_value): New function. * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it. (get_initial_defs_for_reduction, info_for_reduction): Likewise. (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise. (vect_transform_cycle_phi, vectorizable_induction): Likewise. --- gcc/tree-vect-loop.c | 29 + gcc/tree-vectorizer.h | 21 - 2 files changed, 29 insertions(+), 21 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 1bd9a6ea52c..a31d7621c3b 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3288,8 +3288,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, has only a single initial value, so that value is neutral for all statements. */ if (reduc_chain) - return PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, - loop_preheader_edge (loop)); + return vect_phi_initial_value (stmt_vinfo); return NULL_TREE; default: @@ -4829,13 +4828,13 @@ get_initial_defs_for_reduction (vec_info *vinfo, /* Get the def before the loop. In reduction chain we have only one initial value. Else we have as many as PHIs in the group. */ if (reduc_chain) - op = j != 0 ? neutral_op : PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe); + op = j != 0 ? neutral_op : vect_phi_initial_value (stmt_vinfo); else if (((vec_oprnds->length () + 1) * nunits - number_of_places_left_in_vector >= group_size) && neutral_op) op = neutral_op; else - op = PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe); + op = vect_phi_initial_value (stmt_vinfo); /* Create 'vect_ = {op0,op1,...,opn}'. */ number_of_places_left_in_vector--; @@ -4906,9 +4905,7 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info stmt_info) } else if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle) { - edge pe = loop_preheader_edge (gimple_bb (phi)->loop_father); - stmt_vec_info info - = vinfo->lookup_def (PHI_ARG_DEF_FROM_EDGE (phi, pe)); + stmt_vec_info info = vinfo->lookup_def (vect_phi_initial_value (phi)); if (info && STMT_VINFO_DEF_TYPE (info) == vect_double_reduction_def) stmt_info = info; } @@ -5042,8 +5039,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, { /* Get at the scalar def before the loop, that defines the initial value of the reduction variable. */ - initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt, - loop_preheader_edge (loop)); + initial_def = vect_phi_initial_value (reduc_def_stmt); /* Optimize: for induction condition reduction, if we can't use zero for induc_val, use initial_def. */ if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION) @@ -5558,9 +5554,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, for MIN and MAX reduction, for example. */ if (!neutral_op) { - tree scalar_value - = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt, -loop_preheader_edge (loop)); + tree scalar_value = vect_phi_initial_value (orig_phis[i]); scalar_value = gimple_convert (&seq, TREE_TYPE (vectype), scalar_value); vector_identity = gimple_build_vector_from_val (&seq, vectype, @@ -6752,10 +6746,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, else if (cond_reduc_dt == vect_constant_def) { enum vect_def_type cond_initial_dt; - tree cond_initial_val - = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, loop_preheader_edge (loop)); - - gcc_assert (cond_reduc_val != NULL_TREE); + tree cond_initial_val = vect_phi_initial_value (reduc_def_phi); vect_is_simple_use (cond_initial_val, loop_vinfo, &cond_initial_dt); if (cond_initial_dt == vect_constant_def && types_compatible_p (TREE_TYPE (cond_initial_val), @@ -7528,8 +7519,7 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, { /* Get at the scalar def before the loop, that defines the initial value of the reduction variable. */ - tree initial_def = PHI_ARG_DEF_FROM_EDGE (phi, - loop_preheader_edge (loop))
[PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction
This patch passes the reduc_info to get_initial_defs_for_reduction, so that the function can get general information from there rather than from the first SLP statement. This isn't a win on its own, but it becomes important with later patches. gcc/ * tree-vect-loop.c (get_initial_defs_for_reduction): Take the reduc_info as an additional parameter. (vect_transform_cycle_phi): Update accordingly. --- gcc/tree-vect-loop.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index a31d7621c3b..565c2859477 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -4764,32 +4764,28 @@ get_initial_def_for_reduction (loop_vec_info loop_vinfo, return init_def; } -/* Get at the initial defs for the reduction PHIs in SLP_NODE. - NUMBER_OF_VECTORS is the number of vector defs to create. - If NEUTRAL_OP is nonnull, introducing extra elements of that - value will not change the result. */ +/* Get at the initial defs for the reduction PHIs for REDUC_INFO, whose + associated SLP node is SLP_NODE. NUMBER_OF_VECTORS is the number of vector + defs to create. If NEUTRAL_OP is nonnull, introducing extra elements of + that value will not change the result. */ static void get_initial_defs_for_reduction (vec_info *vinfo, + stmt_vec_info reduc_info, slp_tree slp_node, vec *vec_oprnds, unsigned int number_of_vectors, bool reduc_chain, tree neutral_op) { vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); - stmt_vec_info stmt_vinfo = stmts[0]; unsigned HOST_WIDE_INT nunits; unsigned j, number_of_places_left_in_vector; - tree vector_type; + tree vector_type = STMT_VINFO_VECTYPE (reduc_info); unsigned int group_size = stmts.length (); unsigned int i; class loop *loop; - vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); - - gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def); - - loop = (gimple_bb (stmt_vinfo->stmt))->loop_father; + loop = (gimple_bb (reduc_info->stmt))->loop_father; gcc_assert (loop); edge pe = loop_preheader_edge (loop); @@ -4823,7 +4819,7 @@ get_initial_defs_for_reduction (vec_info *vinfo, { tree op; i = j % group_size; - stmt_vinfo = stmts[i]; + stmt_vec_info stmt_vinfo = stmts[i]; /* Get the def before the loop. In reduction chain we have only one initial value. Else we have as many as PHIs in the group. */ @@ -7510,7 +7506,8 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, = neutral_op_for_slp_reduction (slp_node, vectype_out, STMT_VINFO_REDUC_CODE (reduc_info), first != NULL); - get_initial_defs_for_reduction (loop_vinfo, slp_node_instance->reduc_phis, + get_initial_defs_for_reduction (loop_vinfo, reduc_info, + slp_node_instance->reduc_phis, &vec_initial_defs, vec_num, first != NULL, neutral_op); }
[PATCH 07/10] vect: Pass reduc_info to get_initial_def_for_reduction
Similarly to the previous patch, this one passes the reduc_info to get_initial_def_for_reduction, rather than a stmt_vec_info that lacks the metadata. This again becomes useful later. gcc/ * tree-vect-loop.c (get_initial_def_for_reduction): Take the reduc_info instead of the original stmt_vec_info. (vect_transform_cycle_phi): Update accordingly. --- gcc/tree-vect-loop.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 565c2859477..a67036f92e0 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -4625,7 +4625,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo, /* Function get_initial_def_for_reduction Input: - STMT_VINFO - a stmt that performs a reduction operation in the loop. + REDUC_INFO - the info_for_reduction INIT_VAL - the initial value of the reduction variable Output: @@ -4667,7 +4667,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo, static tree get_initial_def_for_reduction (loop_vec_info loop_vinfo, - stmt_vec_info stmt_vinfo, + stmt_vec_info reduc_info, enum tree_code code, tree init_val, tree *adjustment_def) { @@ -4685,8 +4685,8 @@ get_initial_def_for_reduction (loop_vec_info loop_vinfo, gcc_assert (POINTER_TYPE_P (scalar_type) || INTEGRAL_TYPE_P (scalar_type) || SCALAR_FLOAT_TYPE_P (scalar_type)); - gcc_assert (nested_in_vect_loop_p (loop, stmt_vinfo) - || loop == (gimple_bb (stmt_vinfo->stmt))->loop_father); + gcc_assert (nested_in_vect_loop_p (loop, reduc_info) + || loop == (gimple_bb (reduc_info->stmt))->loop_father); /* ADJUSTMENT_DEF is NULL when called from vect_create_epilog_for_reduction to vectorize double reduction. */ @@ -7556,7 +7556,7 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def) adjustment_defp = NULL; vec_initial_def - = get_initial_def_for_reduction (loop_vinfo, reduc_stmt_info, code, + = get_initial_def_for_reduction (loop_vinfo, reduc_info, code, initial_def, adjustment_defp); STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info) = adjustment_def; vec_initial_defs.create (ncopies);
[PATCH 08/10] vect: Generalise neutral_op_for_slp_reduction
This patch generalises the interface to neutral_op_for_slp_reduction so that it can be used for non-SLP reductions too. This isn't much of a win on its own, but it helps later patches. gcc/ * tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with... (neutral_op_for_reduction): ...this, providing a more general interface. (vect_create_epilog_for_reduction): Update accordingly. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise. --- gcc/tree-vect-loop.c | 59 +++- 1 file changed, 26 insertions(+), 33 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index a67036f92e0..744645d8bad 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3248,23 +3248,15 @@ reduction_fn_for_scalar_code (enum tree_code code, internal_fn *reduc_fn) } } -/* If there is a neutral value X such that SLP reduction NODE would not - be affected by the introduction of additional X elements, return that X, - otherwise return null. CODE is the code of the reduction and VECTOR_TYPE - is the vector type that would hold element X. REDUC_CHAIN is true if - the SLP statements perform a single reduction, false if each statement - performs an independent reduction. */ +/* If there is a neutral value X such that a reduction would not be affected + by the introduction of additional X elements, return that X, otherwise + return null. CODE is the code of the reduction and SCALAR_TYPE is type + of the scalar elements. If the reduction has just a single initial value + then INITIAL_VALUE is that value, otherwise it is null. */ static tree -neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, - tree_code code, bool reduc_chain) +neutral_op_for_reduction (tree scalar_type, tree_code code, tree initial_value) { - vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); - stmt_vec_info stmt_vinfo = stmts[0]; - tree scalar_type = TREE_TYPE (vector_type); - class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father; - gcc_assert (loop); - switch (code) { case WIDEN_SUM_EXPR: @@ -3284,12 +3276,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, case MAX_EXPR: case MIN_EXPR: - /* For MIN/MAX the initial values are neutral. A reduction chain -has only a single initial value, so that value is neutral for -all statements. */ - if (reduc_chain) - return vect_phi_initial_value (stmt_vinfo); - return NULL_TREE; + return initial_value; default: return NULL_TREE; @@ -5535,10 +5522,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, tree neutral_op = NULL_TREE; if (slp_node) { - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (stmt_info); - neutral_op - = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, - vectype, code, first != NULL); + tree initial_value = NULL_TREE; + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) + initial_value = vect_phi_initial_value (orig_phis[0]); + neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code, +initial_value); } if (neutral_op) vector_identity = gimple_build_vector_from_val (&seq, vectype, @@ -6935,9 +6923,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, /* For SLP reductions, see if there is a neutral value we can use. */ tree neutral_op = NULL_TREE; if (slp_node) -neutral_op = neutral_op_for_slp_reduction - (slp_node_instance->reduc_phis, vectype_out, orig_code, - REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL); +{ + tree initial_value = NULL_TREE; + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL) + initial_value = vect_phi_initial_value (reduc_def_phi); + neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype_out), +orig_code, initial_value); +} if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION) { @@ -7501,15 +7493,16 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, else { gcc_assert (slp_node == slp_node_instance->reduc_phis); - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); - tree neutral_op - = neutral_op_for_slp_reduction (slp_node, vectype_out, - STMT_VINFO_REDUC_CODE (reduc_info), - first != NULL); + tree initial_value = NULL_TREE; + if (REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info)) + initial_value = vect_phi_initial_value (phi); + tree_code code = STMT_VINFO_REDUC_CODE (reduc_info); + tree neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype
[PATCH 09/10] vect: Simplify get_initial_def_for_reduction
After previous patches, we can now easily provide the neutral op as an argument to get_initial_def_for_reduction. This in turn allows the adjustment calculation to be moved outside of get_initial_def_for_reduction, which is the main motivation of the patch. gcc/ * tree-vect-loop.c (get_initial_def_for_reduction): Remove adjustment handling. Take the neutral value as an argument, in place of the code argument. (vect_transform_cycle_phi): Update accordingly. Handle the initial values of cond reductions separately from code reductions. Choose the adjustment here rather than in get_initial_def_for_reduction. Sink the splat of vec_initial_def. --- gcc/tree-vect-loop.c | 177 +++ 1 file changed, 59 insertions(+), 118 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 744645d8bad..fe7e73f655f 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -4614,57 +4614,26 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo, Input: REDUC_INFO - the info_for_reduction INIT_VAL - the initial value of the reduction variable + NEUTRAL_OP - a value that has no effect on the reduction, as per + neutral_op_for_reduction Output: - ADJUSTMENT_DEF - a tree that holds a value to be added to the final result -of the reduction (used for adjusting the epilog - see below). Return a vector variable, initialized according to the operation that STMT_VINFO performs. This vector will be used as the initial value of the vector of partial results. - Option1 (adjust in epilog): Initialize the vector as follows: - add/bit or/xor:[0,0,...,0,0] - mult/bit and: [1,1,...,1,1] - min/max/cond_expr: [init_val,init_val,..,init_val,init_val] - and when necessary (e.g. add/mult case) let the caller know - that it needs to adjust the result by init_val. - - Option2: Initialize the vector as follows: - add/bit or/xor:[init_val,0,0,...,0] - mult/bit and: [init_val,1,1,...,1] - min/max/cond_expr: [init_val,init_val,...,init_val] - and no adjustments are needed. - - For example, for the following code: - - s = init_val; - for (i=0;istmt))->loop_father); - /* ADJUSTMENT_DEF is NULL when called from - vect_create_epilog_for_reduction to vectorize double reduction. */ - if (adjustment_def) -*adjustment_def = NULL; - - switch (code) + if (operand_equal_p (init_val, neutral_op)) { -case WIDEN_SUM_EXPR: -case DOT_PROD_EXPR: -case SAD_EXPR: -case PLUS_EXPR: -case MINUS_EXPR: -case BIT_IOR_EXPR: -case BIT_XOR_EXPR: -case MULT_EXPR: -case BIT_AND_EXPR: - { -if (code == MULT_EXPR) - { -real_init_val = dconst1; -int_init_val = 1; - } - -if (code == BIT_AND_EXPR) - int_init_val = -1; - -if (SCALAR_FLOAT_TYPE_P (scalar_type)) - def_for_init = build_real (scalar_type, real_init_val); -else - def_for_init = build_int_cst (scalar_type, int_init_val); - - if (adjustment_def || operand_equal_p (def_for_init, init_val, 0)) - { - /* Option1: the first element is '0' or '1' as well. */ - if (!operand_equal_p (def_for_init, init_val, 0)) - *adjustment_def = init_val; - init_def = gimple_build_vector_from_val (&stmts, vectype, -def_for_init); - } - else if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ()) - { - /* Option2 (variable length): the first element is INIT_VAL. */ - init_def = gimple_build_vector_from_val (&stmts, vectype, -def_for_init); - init_def = gimple_build (&stmts, CFN_VEC_SHL_INSERT, -vectype, init_def, init_val); - } - else - { - /* Option2: the first element is INIT_VAL. */ - tree_vector_builder elts (vectype, 1, 2); - elts.quick_push (init_val); - elts.quick_push (def_for_init); - init_def = gimple_build_vector (&stmts, &elts); - } - } - break; - -case MIN_EXPR: -case MAX_EXPR: -case COND_EXPR: - { - init_val = gimple_convert (&stmts, TREE_TYPE (vectype), init_val); - init_def = gimple_build_vector_from_val (&stmts, vectype, init_val); - } - break; - -default: - gcc_unreachable (); + /* If both elements are equal then the vector described above is +just a splat. */ + neutral_op = gimple_convert (&stmts, TREE_TYPE (vectype), neutral_op); + init_def = gimple_build_vector_from_val (&stmts, vectype, neutral_op); +} + else +{ + neutral_op = gimple_convert (&stmts, TREE_TYPE (vectype), neutral_op); + init_val = gimple_convert
[PATCH 10/10] vect: Reuse reduction accumulators between loops
This patch adds support for reusing a main loop's reduction accumulator in an epilogue loop. This in turn lets the loops share a single piece of vector->scalar reduction code. The patch has the following restrictions: (1) The epilogue reduction can only operate on a single vector (e.g. ncopies must be 1 for non-SLP reductions, and the group size must be <= the element count for SLP reductions). (2) Both loops must use the same vector mode for their accumulators. This means that the patch is restricted to targets that support --param vect-partial-vector-usage=1. (3) The reduction must be a standard “tree code” reduction. However, these restrictions could be lifted in future. For example, if the main loop operates on 128-bit vectors and the epilogue loop operates on 64-bit vectors, we could in future reduce the 128-bit vector by one stage and use the 64-bit result as the starting point for the epilogue result. The patch tries to handle chained SLP reductions, unchained SLP reductions and non-SLP reductions. It also handles cases in which the epilogue loop is entered directly (rather than via the main loop) and cases in which the epilogue loop can be skipped. vect_get_main_loop_result is a bit more general than the current patch needs. gcc/ * tree-vectorizer.h (vect_reusable_accumulator): New structure. (_loop_vec_info::main_loop_edge): New field. (_loop_vec_info::skip_main_loop_edge): Likewise. (_loop_vec_info::skip_this_loop_edge): Likewise. (_loop_vec_info::reusable_accumulators): Likewise. (_stmt_vec_info::reduc_scalar_results): Likewise. (_stmt_vec_info::reused_accumulator): Likewise. (vect_get_main_loop_result): Declare. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize reduc_scalar_inputs. (vec_info::free_stmt_vec_info): Free reduc_scalar_inputs. * tree-vect-loop-manip.c (vect_get_main_loop_result): New function. (vect_do_peeling): Fill an epilogue loop's main_loop_edge, skip_main_loop_edge and skip_this_loop_edge fields. * tree-vect-loop.c (INCLUDE_ALGORITHM): Define. (vect_emit_reduction_init_stmts): New function. (get_initial_def_for_reduction): Use it. (get_initial_defs_for_reduction): Likewise. Change the vinfo parameter to a loop_vec_info. (vect_create_epilog_for_reduction): Store the scalar results in the reduc_info. If an epilogue loop is reusing an accumulator from the main loop, and if the epilogue loop can also be skipped, try to place the reduction code in the join block. Record accumulators that could potentially be reused by epilogue loops. (vect_transform_cycle_phi): When vectorizing epilogue loops, try to reuse accumulators from the main loop. Record the initial value in reduc_info for non-SLP reductions too. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_9.c: New test. * gcc.target/aarch64/sve/reduc_9_run.c: Likewise. * gcc.target/aarch64/sve/reduc_10.c: Likewise. * gcc.target/aarch64/sve/reduc_10_run.c: Likewise. * gcc.target/aarch64/sve/reduc_11.c: Likewise. * gcc.target/aarch64/sve/reduc_11_run.c: Likewise. * gcc.target/aarch64/sve/reduc_12.c: Likewise. * gcc.target/aarch64/sve/reduc_12_run.c: Likewise. * gcc.target/aarch64/sve/reduc_13.c: Likewise. * gcc.target/aarch64/sve/reduc_13_run.c: Likewise. * gcc.target/aarch64/sve/reduc_14.c: Likewise. * gcc.target/aarch64/sve/reduc_14_run.c: Likewise. * gcc.target/aarch64/sve/reduc_15.c: Likewise. * gcc.target/aarch64/sve/reduc_15_run.c: Likewise. --- .../gcc.target/aarch64/sve/reduc_10.c | 77 + .../gcc.target/aarch64/sve/reduc_10_run.c | 49 +++ .../gcc.target/aarch64/sve/reduc_11.c | 71 .../gcc.target/aarch64/sve/reduc_11_run.c | 34 ++ .../gcc.target/aarch64/sve/reduc_12.c | 71 .../gcc.target/aarch64/sve/reduc_12_run.c | 66 .../gcc.target/aarch64/sve/reduc_13.c | 101 ++ .../gcc.target/aarch64/sve/reduc_13_run.c | 61 .../gcc.target/aarch64/sve/reduc_14.c | 107 ++ .../gcc.target/aarch64/sve/reduc_14_run.c | 187 +++ .../gcc.target/aarch64/sve/reduc_15.c | 16 + .../gcc.target/aarch64/sve/reduc_15_run.c | 22 ++ .../gcc.target/aarch64/sve/reduc_9.c | 77 + .../gcc.target/aarch64/sve/reduc_9_run.c | 29 ++ gcc/tree-vect-loop-manip.c| 29 ++ gcc/tree-vect-loop.c | 309 ++ gcc/tree-vectorizer.c | 4 + gcc/tree-vectorizer.h | 51 ++- 18 files changed, 1297 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/reduc_10.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/reduc_10_run
Re: [PATCH] PR tree-optimization/38943: Preserve trapping instructions with -fnon-call-exceptions
> This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. This should > be relatively safe, as there are no changes in behaviour unless > the user explicitly specifies -fnon-call-exceptions, when the C > compiler then behaves more like the C++/Ada compiler. I think this will pessimize Ada, which defaults to -fnon-call-exceptions but where we do *not* want to preserve trapping instructions just because they may trap (i.e. -fdelete-dead-exceptions is enabled by default). And, as noticed by Richard, EH is orthogonal to side effects and pure/const. -- Eric Botcazou
Re: [x86_64 PATCH]: Improvement to signed division of integer constant.
On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle wrote: > > > This patch tweaks the way GCC handles 32-bit integer division on > x86_64, when the numerator is constant. Currently the function > > int foo (int x) { > return 100/x; > } > > generates the code: > foo:movl$100, %eax > cltd > idivl %edi > ret > > where the sign-extension instruction "cltd" creates a long > dependency chain, as it depends on the "mov" before it, and > is depended upon by "idivl" after it. > > With this patch, GCC now matches both icc and LLVM and > uses an xor instead, generating: > foo:xorl%edx, %edx > movl$100, %eax > idivl %edi > ret > > Microbenchmarking confirms that this is faster on Intel > processors (Kaby lake), and no worse on AMD processors (Zen2), > which agrees with intuition, but oddly disagrees with the > llvm-mca cycle count prediction on godbolt.org. > > The tricky bit is that this sign-extension instruction is only > produced by late (postreload) splitting, and unfortunately none > of the subsequent passes (e.g. cprop_hardreg) is able to > propagate and simplify its constant argument. The solution > here is to introduce a define_insn_and_split that allows the > constant numerator operand to be captured (by combine) and > then split into an optimal form after reload. > > The above microbenchmarking also shows that eliminating the > sign extension of negative values (using movl $-1,%edx) is also > a performance improvement, as performed by icc but not by LLVM. > Both the xor and movl sign-extensions are larger than cltd, > so this transformation is prevented for -Os. > > > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. > > Ok for mainline? > > > 2021-07-08 Roger Sayle > > gcc/ChangeLog > * config/i386/i386.md (*divmodsi4_const): Optimize SImode > divmod of a constant numerator with new define_insn_and_split. > > gcc/testsuite/ChangeLog > * gcc.target/i386/divmod-9.c: New test case. + if (INTVAL (operands[2]) < 0) +emit_move_insn (operands[1], constm1_rtx); + else +ix86_expand_clear (operands[1]); No need to call ix86_expand_clear, emit_move_insn (operands[1], const0_rtx); will result in xor, too. OK with the above change. Thanks, Uros. > > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >
Re: [PATCH 01/10] vect: Simplify epilogue reduction code
On Thu, Jul 8, 2021 at 2:41 PM Richard Sandiford via Gcc-patches wrote: > > vect_create_epilog_for_reduction only handles two cases: single-loop > reductions and double reductions. “nested cycles” (i.e. reductions > in the inner loop when vectorising an outer loop) are handled elsewhere > and don't need a vector->scalar reduction. > > The function had variables called nested_in_vect_loop and double_reduc > and asserted that nested_in_vect_loop implied double_reduc, but it > still had code to handle nested_in_vect_loop && !double_reduc. > This patch removes that and uses double_reduc everywhere. OK. (cleaning up after the GCC 10 time refactoring was still on my list :/) > gcc/ > * tree-vect-loop.c (vect_create_epilog_for_reduction): Remove > nested_in_vect_loop and use double_reduc everywhere. Remove dead > assignment to "loop". > --- > gcc/tree-vect-loop.c | 30 -- > 1 file changed, 4 insertions(+), 26 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index bc523d151c6..7c3e3352b43 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -5005,7 +5005,6 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >imm_use_iterator imm_iter, phi_imm_iter; >use_operand_p use_p, phi_use_p; >gimple *use_stmt; > - bool nested_in_vect_loop = false; >auto_vec new_phis; >int j, i; >auto_vec scalar_results; > @@ -5023,10 +5022,8 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > { >outer_loop = loop; >loop = loop->inner; > - nested_in_vect_loop = true; > - gcc_assert (!slp_node); > + gcc_assert (!slp_node && double_reduc); > } > - gcc_assert (!nested_in_vect_loop || double_reduc); > >vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info); >gcc_assert (vectype); > @@ -5049,8 +5046,6 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info); >else if (double_reduc) > ; > - else if (nested_in_vect_loop) > - ; >else > adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info); > } > @@ -5923,7 +5918,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > { >gcc_assert (!slp_reduc); >gimple_seq stmts = NULL; > - if (nested_in_vect_loop) > + if (double_reduc) > { >new_phi = new_phis[0]; > gcc_assert (VECTOR_TYPE_P (TREE_TYPE (adjustment_def))); > @@ -5942,21 +5937,12 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > >epilog_stmt = gimple_seq_last_stmt (stmts); >gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > - if (nested_in_vect_loop) > -{ > - if (!double_reduc) > -scalar_results.quick_push (new_temp); > - else > -scalar_results[0] = new_temp; > -} > - else > -scalar_results[0] = new_temp; > - > + scalar_results[0] = new_temp; >new_phis[0] = epilog_stmt; > } > >if (double_reduc) > -loop = loop->inner; > +loop = outer_loop; > >/* 2.6 Handle the loop-exit phis. Replace the uses of scalar loop-exit >phis with new adjusted scalar results, i.e., replace use > @@ -6017,14 +6003,6 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > scalar_dest = gimple_assign_lhs (scalar_stmt_info->stmt); > } > > - if (nested_in_vect_loop) > -{ > - if (double_reduc) > -loop = outer_loop; > - else > - gcc_unreachable (); > -} > - >phis.create (3); >/* Find the loop-closed-use at the loop exit of the original scalar > result. (The reduction result is expected to have two immediate > uses,
Re: [PATCH 02/10] vect: Create array_slice of live-out stmts
On Thu, Jul 8, 2021 at 2:42 PM Richard Sandiford via Gcc-patches wrote: > > This patch constructs an array_slice of the scalar statements that > produce live-out reduction results in the original unvectorised loop. > There are three cases: > > - SLP reduction chains: the final SLP stmt is live-out > - full SLP reductions: all SLP stmts are live-out > - non-SLP reductions: the single scalar stmt is live-out > > This is a slight simplification on its own, mostly because it maans > “group_size” has a consistent meaning throughout the function. > The main justification though is that it helps with later patches. OK > gcc/ > * tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate > scalar_results to group_size elements after reducing down from > N*group_size elements. Construct an array_slice of the live-out > stmts and assert that there is one stmt per scalar result. > --- > gcc/tree-vect-loop.c | 61 +++- > 1 file changed, 21 insertions(+), 40 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 7c3e3352b43..8390ac80ca0 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -5010,7 +5010,12 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >auto_vec scalar_results; >unsigned int group_size = 1, k; >auto_vec phis; > - bool slp_reduc = false; > + /* SLP reduction without reduction chain, e.g., > + # a1 = phi > + # b1 = phi > + a2 = operation (a1) > + b2 = operation (b1) */ > + bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)); >bool direct_slp_reduc; >tree new_phi_result; >tree induction_index = NULL_TREE; > @@ -5050,6 +5055,16 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info); > } > > + stmt_vec_info single_live_out_stmt[] = { stmt_info }; > + array_slice live_out_stmts = single_live_out_stmt; > + if (slp_reduc) > +/* All statements produce live-out values. */ > +live_out_stmts = SLP_TREE_SCALAR_STMTS (slp_node); > + else if (slp_node) > +/* The last statement in the reduction chain produces the live-out > + value. */ > +single_live_out_stmt[0] = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - > 1]; > + >unsigned vec_num; >int ncopies; >if (slp_node) > @@ -5248,13 +5263,6 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >new_scalar_dest = vect_create_destination_var (scalar_dest, NULL); >bitsize = TYPE_SIZE (scalar_type); > > - /* SLP reduction without reduction chain, e.g., > - # a1 = phi > - # b1 = phi > - a2 = operation (a1) > - b2 = operation (b1) */ > - slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)); > - >/* True if we should implement SLP_REDUC using native reduction operations > instead of scalar operations. */ >direct_slp_reduc = (reduc_fn != IFN_LAST > @@ -5877,6 +5885,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > first_res, res); >scalar_results[j % group_size] = new_res; > } > + scalar_results.truncate (group_size); > for (k = 0; k < group_size; k++) > scalar_results[k] = gimple_convert (&stmts, scalar_type, > scalar_results[k]); > @@ -5969,39 +5978,11 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >use >use */ > > - > - /* In SLP reduction chain we reduce vector results into one vector if > - necessary, hence we set here REDUC_GROUP_SIZE to 1. SCALAR_DEST is the > - LHS of the last stmt in the reduction chain, since we are looking for > - the loop exit phi node. */ > - if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) > -{ > - stmt_vec_info dest_stmt_info > - = vect_orig_stmt (SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]); > - scalar_dest = gimple_assign_lhs (dest_stmt_info->stmt); > - group_size = 1; > -} > - > - /* In SLP we may have several statements in NEW_PHIS and REDUCTION_PHIS (in > - case that REDUC_GROUP_SIZE is greater than vectorization factor). > - Therefore, we need to match SCALAR_RESULTS with corresponding > statements. > - The first (REDUC_GROUP_SIZE / number of new vector stmts) scalar results > - correspond to the first vector stmt, etc. > - (RATIO is equal to (REDUC_GROUP_SIZE / number of new vector stmts)). */ > - if (group_size > new_phis.length ()) > -gcc_assert (!(group_size % new_phis.length ())); > - > - for (k = 0; k < group_size; k++) > + gcc_assert (live_out_stmts.size () == scalar_results.length ()); > + for (k = 0; k < live_out_stmts.size (); k++) > { > - if (slp_reduc) > -{ > - stmt_vec_info scalar_stmt_info = SLP
Re: [PATCH 03/10] vect: Remove new_phis from
On Thu, Jul 8, 2021 at 2:43 PM Richard Sandiford via Gcc-patches wrote: > > vect_create_epilog_for_reduction had a variable called new_phis. > It collected the statements that produce the exit block definitions > of the vector reduction accumulators. Although those statements > are indeed phis initially, they are often replaced with normal > statements later, leading to puzzling code like: > > FOR_EACH_VEC_ELT (new_phis, i, new_phi) > { > int bit_offset; > if (gimple_code (new_phi) == GIMPLE_PHI) > vec_temp = PHI_RESULT (new_phi); > else > vec_temp = gimple_assign_lhs (new_phi); > > Also, although the array collects statements, in practice all users want > the lhs instead. > > This patch therefore replaces new_phis with a vector of gimple values > called “reduc_inputs”. > > Also, reduction chains and ncopies>1 were handled with identical code > (and there was a comment saying so). The patch unites them into > a single “if”. OK. Thanks, Richard. > gcc/ > * tree-vect-loop.c (vect_create_epilog_for_reduction): Replace > the new_phis vector with a reduc_inputs vector. Combine handling > of reduction chains and ncopies > 1. > --- > gcc/tree-vect-loop.c | 113 --- > 1 file changed, 41 insertions(+), 72 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 8390ac80ca0..b7f73ca52c7 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -5005,7 +5005,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >imm_use_iterator imm_iter, phi_imm_iter; >use_operand_p use_p, phi_use_p; >gimple *use_stmt; > - auto_vec new_phis; > + auto_vec reduc_inputs; >int j, i; >auto_vec scalar_results; >unsigned int group_size = 1, k; > @@ -5017,7 +5017,6 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > b2 = operation (b1) */ >bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)); >bool direct_slp_reduc; > - tree new_phi_result; >tree induction_index = NULL_TREE; > >if (slp_node) > @@ -5215,7 +5214,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >if (double_reduc) > loop = outer_loop; >exit_bb = single_exit (loop)->dest; > - new_phis.create (slp_node ? vec_num : ncopies); > + reduc_inputs.create (slp_node ? vec_num : ncopies); >for (unsigned i = 0; i < vec_num; i++) > { >if (slp_node) > @@ -5223,19 +5222,14 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >else > def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]); >for (j = 0; j < ncopies; j++) > -{ > + { > tree new_def = copy_ssa_name (def); > - phi = create_phi_node (new_def, exit_bb); > - if (j == 0) > -new_phis.quick_push (phi); > - else > - { > - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); > - new_phis.quick_push (phi); > - } > - > - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); > -} > + phi = create_phi_node (new_def, exit_bb); > + if (j) > + def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); > + SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); > + reduc_inputs.quick_push (new_def); > + } > } > >exit_gsi = gsi_after_labels (exit_bb); > @@ -5274,52 +5268,32 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > a2 = operation (a1) > a3 = operation (a2), > > - we may end up with more than one vector result. Here we reduce them to > - one vector. */ > - if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc) > + we may end up with more than one vector result. Here we reduce them > + to one vector. > + > + The same is true if we couldn't use a single defuse cycle. */ > + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) > + || direct_slp_reduc > + || ncopies > 1) > { >gimple_seq stmts = NULL; > - tree first_vect = PHI_RESULT (new_phis[0]); > - first_vect = gimple_convert (&stmts, vectype, first_vect); > - for (k = 1; k < new_phis.length (); k++) > + tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]); > + for (k = 1; k < reduc_inputs.length (); k++) > { > - gimple *next_phi = new_phis[k]; > - tree second_vect = PHI_RESULT (next_phi); > - second_vect = gimple_convert (&stmts, vectype, second_vect); > + tree second_vect = gimple_convert (&stmts, vectype, > reduc_inputs[k]); >first_vect = gimple_build (&stmts, code, vectype, > first_vect, second_vect); > } >gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > > - new_phi_result = fi
Re: [PATCH 04/10] vect: Ensure reduc_inputs always have vectype
On Thu, Jul 8, 2021 at 2:44 PM Richard Sandiford via Gcc-patches wrote: > > Vector reduction accumulators can differ in signedness from the > final scalar result. The conversions to handle that case were > distributed through vect_create_epilog_for_reduction; this patch > does the conversion up-front instead. But is that still correct? The conversions should be unsigned -> signed, that is, we've performed the reduction in unsigned because we associated the originally undefined overflow signed reduction. But the final reduction of the vector lanes in the epilogue still needs to be done unsigned. So it's just not obvious that the patch preserves this - if it does then the patch is OK. Richard. > gcc/ > * tree-vect-loop.c (vect_create_epilog_for_reduction): Convert > the phi results to vectype after creating them. Remove later > conversion code that thus becomes redundant. > --- > gcc/tree-vect-loop.c | 28 +++- > 1 file changed, 11 insertions(+), 17 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index b7f73ca52c7..1bd9a6ea52c 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -5214,9 +5214,11 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >if (double_reduc) > loop = outer_loop; >exit_bb = single_exit (loop)->dest; > + exit_gsi = gsi_after_labels (exit_bb); >reduc_inputs.create (slp_node ? vec_num : ncopies); >for (unsigned i = 0; i < vec_num; i++) > { > + gimple_seq stmts = NULL; >if (slp_node) > def = vect_get_slp_vect_def (slp_node, i); >else > @@ -5228,12 +5230,12 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > if (j) > def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); > SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); > + new_def = gimple_convert (&stmts, vectype, new_def); > reduc_inputs.quick_push (new_def); > } > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > } > > - exit_gsi = gsi_after_labels (exit_bb); > - >/* 2.2 Get the relevant tree-code to use in the epilog for schemes 2,3 > (i.e. when reduc_fn is not available) and in the final adjustment > code (if needed). Also get the original scalar reduction variable as > @@ -5277,17 +5279,14 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >|| ncopies > 1) > { >gimple_seq stmts = NULL; > - tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]); > + tree single_input = reduc_inputs[0]; >for (k = 1; k < reduc_inputs.length (); k++) > -{ > - tree second_vect = gimple_convert (&stmts, vectype, > reduc_inputs[k]); > - first_vect = gimple_build (&stmts, code, vectype, > -first_vect, second_vect); > -} > + single_input = gimple_build (&stmts, code, vectype, > +single_input, reduc_inputs[k]); >gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > >reduc_inputs.truncate (0); > - reduc_inputs.safe_push (first_vect); > + reduc_inputs.safe_push (single_input); > } > >if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION > @@ -5323,10 +5322,6 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >/* Vector of {0, 0, 0,...}. */ >tree zero_vec = build_zero_cst (vectype); > > - gimple_seq stmts = NULL; > - reduc_inputs[0] = gimple_convert (&stmts, vectype, reduc_inputs[0]); > - gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > - >/* Find maximum value from the vector of found indexes. */ >tree max_index = make_ssa_name (index_scalar_type); >gcall *max_index_stmt = gimple_build_call_internal (IFN_REDUC_MAX, > @@ -5394,7 +5389,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > >/* Convert the reduced value back to the result type and set as the > result. */ > - stmts = NULL; > + gimple_seq stmts = NULL; >new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type, >data_reduc); >gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > @@ -5412,7 +5407,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > val = data_reduc[i], idx_val = induction_index[i]; > return val; */ > > - tree data_eltype = TREE_TYPE (TREE_TYPE (reduc_inputs[0])); > + tree data_eltype = TREE_TYPE (vectype); >tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index)); >unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype)); >poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE > (induction_index)); > @@ -5488,8 +5483,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >
Re: [PATCH 05/10] vect: Add a vect_phi_initial_value helper function
On Thu, Jul 8, 2021 at 2:45 PM Richard Sandiford via Gcc-patches wrote: > > This patch adds a helper function called vect_phi_initial_value > for returning the incoming value of a given loop phi. The main > reason for adding it is to ensure that the right preheader edge > is used when vectorising nested loops. (PHI_ARG_DEF_FROM_EDGE > itself doesn't assert that the given edge is for the right block, > although I guess that would be good to add separately.) We were sometimes (most of the time?) using an explicit loop where you now get it from the PHI - that makes the assert somewhat pointless to some extent - of course it makes sense on its own that the loop is the same as that of the PHI def. I just wonder if you think any of the existing code might have been wrong? If so the new assert doesn't catch all originally wrong cases. Otherwise OK, Richard. > gcc/ > * tree-vectorizer.h: Include tree-ssa-operands.h. > (vect_phi_initial_value): New function. > * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it. > (get_initial_defs_for_reduction, info_for_reduction): Likewise. > (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise. > (vect_transform_cycle_phi, vectorizable_induction): Likewise. > --- > gcc/tree-vect-loop.c | 29 + > gcc/tree-vectorizer.h | 21 - > 2 files changed, 29 insertions(+), 21 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 1bd9a6ea52c..a31d7621c3b 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -3288,8 +3288,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree > vector_type, > has only a single initial value, so that value is neutral for > all statements. */ >if (reduc_chain) > - return PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, > - loop_preheader_edge (loop)); > + return vect_phi_initial_value (stmt_vinfo); >return NULL_TREE; > > default: > @@ -4829,13 +4828,13 @@ get_initial_defs_for_reduction (vec_info *vinfo, >/* Get the def before the loop. In reduction chain we have only > one initial value. Else we have as many as PHIs in the group. */ >if (reduc_chain) > - op = j != 0 ? neutral_op : PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, > pe); > + op = j != 0 ? neutral_op : vect_phi_initial_value (stmt_vinfo); >else if (((vec_oprnds->length () + 1) * nunits > - number_of_places_left_in_vector >= group_size) >&& neutral_op) > op = neutral_op; >else > - op = PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe); > + op = vect_phi_initial_value (stmt_vinfo); > >/* Create 'vect_ = {op0,op1,...,opn}'. */ >number_of_places_left_in_vector--; > @@ -4906,9 +4905,7 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info > stmt_info) > } >else if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle) > { > - edge pe = loop_preheader_edge (gimple_bb (phi)->loop_father); > - stmt_vec_info info > - = vinfo->lookup_def (PHI_ARG_DEF_FROM_EDGE (phi, pe)); > + stmt_vec_info info = vinfo->lookup_def (vect_phi_initial_value (phi)); >if (info && STMT_VINFO_DEF_TYPE (info) == vect_double_reduction_def) > stmt_info = info; > } > @@ -5042,8 +5039,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > { >/* Get at the scalar def before the loop, that defines the initial > value > of the reduction variable. */ > - initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt, > - loop_preheader_edge (loop)); > + initial_def = vect_phi_initial_value (reduc_def_stmt); >/* Optimize: for induction condition reduction, if we can't use zero > for induc_val, use initial_def. */ >if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION) > @@ -5558,9 +5554,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > for MIN and MAX reduction, for example. */ > if (!neutral_op) > { > - tree scalar_value > - = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt, > -loop_preheader_edge (loop)); > + tree scalar_value = vect_phi_initial_value (orig_phis[i]); > scalar_value = gimple_convert (&seq, TREE_TYPE (vectype), > scalar_value); > vector_identity = gimple_build_vector_from_val (&seq, vectype, > @@ -6752,10 +6746,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, >else if (cond_reduc_dt == vect_constant_def) > { > enum vect_def_type cond_initial_dt; > - tree cond_initial_val > - = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, loop_preheader_edge >
RE: [PATCH] testsuite: Add arm_arch_v7a_ok effective-target to pr57351.c
> -Original Message- > From: Gcc-patches bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe > Lyon via Gcc-patches > Sent: 07 July 2021 13:24 > To: Christophe LYON > Cc: gcc Patches > Subject: Re: [PATCH] testsuite: Add arm_arch_v7a_ok effective-target to > pr57351.c > > ping? > > On Wed, Jun 30, 2021 at 3:58 PM Christophe LYON via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > > I've noticed that overriding cpu/arch flags when running the testsuite > > can cause this test to fail rather than being skipped because of > > incompatible flags combination. > > > > Since the test forces -march=armv7-a, make sure it is accepted in > > combination with the current runtestflags. Ok, I would have counted it as obvious I suppose. Thanks, Kyrill > > > > 2021-06-30 Christophe Lyon > > > > gcc/testsuite/ > > * gcc.dg/debug/pr57351.c: Require arm_arch_v7a_ok > > effective-target. > > > > > > > >
Re: [PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction
On Thu, Jul 8, 2021 at 2:46 PM Richard Sandiford via Gcc-patches wrote: > > This patch passes the reduc_info to get_initial_defs_for_reduction, > so that the function can get general information from there rather > than from the first SLP statement. This isn't a win on its own, > but it becomes important with later patches. So the original code should have used SLP_TREE_REPRESENTATIVE instead of SLP_TREE_SCALAR_STMTS ()[0] (there might have been issues with doing that - my recollection is weak here). I'm not sure if reduc_info is actually better - only the representative will have STMT_VINFO_VECTYPE set, for the reduc_info there's STMT_VINFO_REDUC_VECTYPE (and STMT_VINFO_REDUC_VECTYPE_IN). So I think if you want to use reduc_info then you want to use STMT_VINFO_REDUC_VECTYPE? > gcc/ > * tree-vect-loop.c (get_initial_defs_for_reduction): Take the > reduc_info as an additional parameter. > (vect_transform_cycle_phi): Update accordingly. > --- > gcc/tree-vect-loop.c | 23 ++- > 1 file changed, 10 insertions(+), 13 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index a31d7621c3b..565c2859477 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -4764,32 +4764,28 @@ get_initial_def_for_reduction (loop_vec_info > loop_vinfo, >return init_def; > } > > -/* Get at the initial defs for the reduction PHIs in SLP_NODE. > - NUMBER_OF_VECTORS is the number of vector defs to create. > - If NEUTRAL_OP is nonnull, introducing extra elements of that > - value will not change the result. */ > +/* Get at the initial defs for the reduction PHIs for REDUC_INFO, whose > + associated SLP node is SLP_NODE. NUMBER_OF_VECTORS is the number of > vector > + defs to create. If NEUTRAL_OP is nonnull, introducing extra elements of > + that value will not change the result. */ > > static void > get_initial_defs_for_reduction (vec_info *vinfo, > + stmt_vec_info reduc_info, > slp_tree slp_node, > vec *vec_oprnds, > unsigned int number_of_vectors, > bool reduc_chain, tree neutral_op) > { >vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); > - stmt_vec_info stmt_vinfo = stmts[0]; >unsigned HOST_WIDE_INT nunits; >unsigned j, number_of_places_left_in_vector; > - tree vector_type; > + tree vector_type = STMT_VINFO_VECTYPE (reduc_info); >unsigned int group_size = stmts.length (); >unsigned int i; >class loop *loop; > > - vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); > - > - gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def); > - > - loop = (gimple_bb (stmt_vinfo->stmt))->loop_father; > + loop = (gimple_bb (reduc_info->stmt))->loop_father; >gcc_assert (loop); >edge pe = loop_preheader_edge (loop); > > @@ -4823,7 +4819,7 @@ get_initial_defs_for_reduction (vec_info *vinfo, > { >tree op; >i = j % group_size; > - stmt_vinfo = stmts[i]; > + stmt_vec_info stmt_vinfo = stmts[i]; > >/* Get the def before the loop. In reduction chain we have only > one initial value. Else we have as many as PHIs in the group. */ > @@ -7510,7 +7506,8 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > = neutral_op_for_slp_reduction (slp_node, vectype_out, > STMT_VINFO_REDUC_CODE > (reduc_info), > first != NULL); > - get_initial_defs_for_reduction (loop_vinfo, > slp_node_instance->reduc_phis, > + get_initial_defs_for_reduction (loop_vinfo, reduc_info, > + slp_node_instance->reduc_phis, > &vec_initial_defs, vec_num, > first != NULL, neutral_op); > }
Re: [PATCH 05/10] vect: Add a vect_phi_initial_value helper function
Richard Biener writes: > On Thu, Jul 8, 2021 at 2:45 PM Richard Sandiford via Gcc-patches > wrote: >> >> This patch adds a helper function called vect_phi_initial_value >> for returning the incoming value of a given loop phi. The main >> reason for adding it is to ensure that the right preheader edge >> is used when vectorising nested loops. (PHI_ARG_DEF_FROM_EDGE >> itself doesn't assert that the given edge is for the right block, >> although I guess that would be good to add separately.) > > We were sometimes (most of the time?) using an explicit > loop where you now get it from the PHI - that makes the > assert somewhat pointless to some extent - of course it > makes sense on its own that the loop is the same as that > of the PHI def. I just wonder if you think any of the existing > code might have been wrong? If so the new assert doesn't > catch all originally wrong cases. I don't remember seeing a case where the existing code got it wrong, but I think one of the patches in the series did initially use the wrong loop's preheader. But yeah, the function and assert only help to avoid using PHI_ARG_DEF_FROM_EDGE with the wrong edge. If the problem was instead passing the wrong phi then the patch doesn't help to catch that. The edge mistake is more likely to be a silent failure though, since the edge indices for both loops might happen to be the same (but might not). Thanks, Richard > > Otherwise OK, > Richard. > >> gcc/ >> * tree-vectorizer.h: Include tree-ssa-operands.h. >> (vect_phi_initial_value): New function. >> * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it. >> (get_initial_defs_for_reduction, info_for_reduction): Likewise. >> (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise. >> (vect_transform_cycle_phi, vectorizable_induction): Likewise. >> --- >> gcc/tree-vect-loop.c | 29 + >> gcc/tree-vectorizer.h | 21 - >> 2 files changed, 29 insertions(+), 21 deletions(-) >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index 1bd9a6ea52c..a31d7621c3b 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -3288,8 +3288,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree >> vector_type, >> has only a single initial value, so that value is neutral for >> all statements. */ >>if (reduc_chain) >> - return PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, >> - loop_preheader_edge (loop)); >> + return vect_phi_initial_value (stmt_vinfo); >>return NULL_TREE; >> >> default: >> @@ -4829,13 +4828,13 @@ get_initial_defs_for_reduction (vec_info *vinfo, >>/* Get the def before the loop. In reduction chain we have only >> one initial value. Else we have as many as PHIs in the group. */ >>if (reduc_chain) >> - op = j != 0 ? neutral_op : PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, >> pe); >> + op = j != 0 ? neutral_op : vect_phi_initial_value (stmt_vinfo); >>else if (((vec_oprnds->length () + 1) * nunits >> - number_of_places_left_in_vector >= group_size) >>&& neutral_op) >> op = neutral_op; >>else >> - op = PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe); >> + op = vect_phi_initial_value (stmt_vinfo); >> >>/* Create 'vect_ = {op0,op1,...,opn}'. */ >>number_of_places_left_in_vector--; >> @@ -4906,9 +4905,7 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info >> stmt_info) >> } >>else if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle) >> { >> - edge pe = loop_preheader_edge (gimple_bb (phi)->loop_father); >> - stmt_vec_info info >> - = vinfo->lookup_def (PHI_ARG_DEF_FROM_EDGE (phi, pe)); >> + stmt_vec_info info = vinfo->lookup_def (vect_phi_initial_value (phi)); >>if (info && STMT_VINFO_DEF_TYPE (info) == vect_double_reduction_def) >> stmt_info = info; >> } >> @@ -5042,8 +5039,7 @@ vect_create_epilog_for_reduction (loop_vec_info >> loop_vinfo, >> { >>/* Get at the scalar def before the loop, that defines the initial >> value >> of the reduction variable. */ >> - initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt, >> - loop_preheader_edge (loop)); >> + initial_def = vect_phi_initial_value (reduc_def_stmt); >>/* Optimize: for induction condition reduction, if we can't use zero >> for induc_val, use initial_def. */ >>if (STMT_VINFO_REDUC_TYPE (reduc_info) == >> INTEGER_INDUC_COND_REDUCTION) >> @@ -5558,9 +5554,7 @@ vect_create_epilog_for_reduction (loop_vec_info >> loop_vinfo, >> for MIN and MAX reduction, for example. */ >> if (!neutral_op) >> { >> - tree scalar_value >> - = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt, >> -
Re: [PATCH 08/10] vect: Generalise neutral_op_for_slp_reduction
On Thu, Jul 8, 2021 at 2:48 PM Richard Sandiford via Gcc-patches wrote: > > This patch generalises the interface to neutral_op_for_slp_reduction > so that it can be used for non-SLP reductions too. This isn't much > of a win on its own, but it helps later patches. I guess that makes sense - OK. Richard. > gcc/ > * tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with... > (neutral_op_for_reduction): ...this, providing a more general > interface. > (vect_create_epilog_for_reduction): Update accordingly. > (vectorizable_reduction): Likewise. > (vect_transform_cycle_phi): Likewise. > --- > gcc/tree-vect-loop.c | 59 +++- > 1 file changed, 26 insertions(+), 33 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index a67036f92e0..744645d8bad 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -3248,23 +3248,15 @@ reduction_fn_for_scalar_code (enum tree_code code, > internal_fn *reduc_fn) > } > } > > -/* If there is a neutral value X such that SLP reduction NODE would not > - be affected by the introduction of additional X elements, return that X, > - otherwise return null. CODE is the code of the reduction and VECTOR_TYPE > - is the vector type that would hold element X. REDUC_CHAIN is true if > - the SLP statements perform a single reduction, false if each statement > - performs an independent reduction. */ > +/* If there is a neutral value X such that a reduction would not be affected > + by the introduction of additional X elements, return that X, otherwise > + return null. CODE is the code of the reduction and SCALAR_TYPE is type > + of the scalar elements. If the reduction has just a single initial value > + then INITIAL_VALUE is that value, otherwise it is null. */ > > static tree > -neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, > - tree_code code, bool reduc_chain) > +neutral_op_for_reduction (tree scalar_type, tree_code code, tree > initial_value) > { > - vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); > - stmt_vec_info stmt_vinfo = stmts[0]; > - tree scalar_type = TREE_TYPE (vector_type); > - class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father; > - gcc_assert (loop); > - >switch (code) > { > case WIDEN_SUM_EXPR: > @@ -3284,12 +3276,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree > vector_type, > > case MAX_EXPR: > case MIN_EXPR: > - /* For MIN/MAX the initial values are neutral. A reduction chain > -has only a single initial value, so that value is neutral for > -all statements. */ > - if (reduc_chain) > - return vect_phi_initial_value (stmt_vinfo); > - return NULL_TREE; > + return initial_value; > > default: >return NULL_TREE; > @@ -5535,10 +5522,11 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, >tree neutral_op = NULL_TREE; >if (slp_node) > { > - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (stmt_info); > - neutral_op > - = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, > - vectype, code, first != NULL); > + tree initial_value = NULL_TREE; > + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) > + initial_value = vect_phi_initial_value (orig_phis[0]); > + neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code, > +initial_value); > } >if (neutral_op) > vector_identity = gimple_build_vector_from_val (&seq, vectype, > @@ -6935,9 +6923,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, >/* For SLP reductions, see if there is a neutral value we can use. */ >tree neutral_op = NULL_TREE; >if (slp_node) > -neutral_op = neutral_op_for_slp_reduction > - (slp_node_instance->reduc_phis, vectype_out, orig_code, > - REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL); > +{ > + tree initial_value = NULL_TREE; > + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL) > + initial_value = vect_phi_initial_value (reduc_def_phi); > + neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype_out), > +orig_code, initial_value); > +} > >if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION) > { > @@ -7501,15 +7493,16 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, >else > { > gcc_assert (slp_node == slp_node_instance->reduc_phis); > - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); > - tree neutral_op > - = neutral_op_for_slp_reduction (slp_node, vectype_out, > - STMT_VINFO_REDUC_CODE > (reduc_info), > -
Re: [PATCH 09/10] vect: Simplify get_initial_def_for_reduction
On Thu, Jul 8, 2021 at 2:49 PM Richard Sandiford via Gcc-patches wrote: > > After previous patches, we can now easily provide the neutral op > as an argument to get_initial_def_for_reduction. This in turn > allows the adjustment calculation to be moved outside of > get_initial_def_for_reduction, which is the main motivation > of the patch. OK. > gcc/ > * tree-vect-loop.c (get_initial_def_for_reduction): Remove > adjustment handling. Take the neutral value as an argument, > in place of the code argument. > (vect_transform_cycle_phi): Update accordingly. Handle the > initial values of cond reductions separately from code reductions. > Choose the adjustment here rather than in > get_initial_def_for_reduction. Sink the splat of vec_initial_def. > --- > gcc/tree-vect-loop.c | 177 +++ > 1 file changed, 59 insertions(+), 118 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 744645d8bad..fe7e73f655f 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -4614,57 +4614,26 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo, > Input: > REDUC_INFO - the info_for_reduction > INIT_VAL - the initial value of the reduction variable > + NEUTRAL_OP - a value that has no effect on the reduction, as per > + neutral_op_for_reduction > > Output: > - ADJUSTMENT_DEF - a tree that holds a value to be added to the final result > -of the reduction (used for adjusting the epilog - see below). > Return a vector variable, initialized according to the operation that > STMT_VINFO performs. This vector will be used as the initial value > of the vector of partial results. > > - Option1 (adjust in epilog): Initialize the vector as follows: > - add/bit or/xor:[0,0,...,0,0] > - mult/bit and: [1,1,...,1,1] > - min/max/cond_expr: [init_val,init_val,..,init_val,init_val] > - and when necessary (e.g. add/mult case) let the caller know > - that it needs to adjust the result by init_val. > - > - Option2: Initialize the vector as follows: > - add/bit or/xor:[init_val,0,0,...,0] > - mult/bit and: [init_val,1,1,...,1] > - min/max/cond_expr: [init_val,init_val,...,init_val] > - and no adjustments are needed. > - > - For example, for the following code: > - > - s = init_val; > - for (i=0;i - s = s + a[i]; > - > - STMT_VINFO is 's = s + a[i]', and the reduction variable is 's'. > - For a vector of 4 units, we want to return either [0,0,0,init_val], > - or [0,0,0,0] and let the caller know that it needs to adjust > - the result at the end by 'init_val'. > - > - FORNOW, we are using the 'adjust in epilog' scheme, because this way the > - initialization vector is simpler (same element in all entries), if > - ADJUSTMENT_DEF is not NULL, and Option2 otherwise. > - > - A cost model should help decide between these two schemes. */ > + The value we need is a vector in which element 0 has value INIT_VAL > + and every other element has value NEUTRAL_OP. */ > > static tree > get_initial_def_for_reduction (loop_vec_info loop_vinfo, >stmt_vec_info reduc_info, > - enum tree_code code, tree init_val, > - tree *adjustment_def) > + tree init_val, tree neutral_op) > { >class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >tree scalar_type = TREE_TYPE (init_val); >tree vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type); > - tree def_for_init; >tree init_def; > - REAL_VALUE_TYPE real_init_val = dconst0; > - int int_init_val = 0; >gimple_seq stmts = NULL; > >gcc_assert (vectype); > @@ -4675,75 +4644,34 @@ get_initial_def_for_reduction (loop_vec_info > loop_vinfo, >gcc_assert (nested_in_vect_loop_p (loop, reduc_info) > || loop == (gimple_bb (reduc_info->stmt))->loop_father); > > - /* ADJUSTMENT_DEF is NULL when called from > - vect_create_epilog_for_reduction to vectorize double reduction. */ > - if (adjustment_def) > -*adjustment_def = NULL; > - > - switch (code) > + if (operand_equal_p (init_val, neutral_op)) > { > -case WIDEN_SUM_EXPR: > -case DOT_PROD_EXPR: > -case SAD_EXPR: > -case PLUS_EXPR: > -case MINUS_EXPR: > -case BIT_IOR_EXPR: > -case BIT_XOR_EXPR: > -case MULT_EXPR: > -case BIT_AND_EXPR: > - { > -if (code == MULT_EXPR) > - { > -real_init_val = dconst1; > -int_init_val = 1; > - } > - > -if (code == BIT_AND_EXPR) > - int_init_val = -1; > - > -if (SCALAR_FLOAT_TYPE_P (scalar_type)) > - def_for_init = build_real (scalar_type, real_init_val); > -else > - def_for_init = build_int_cst (scalar_type, int_init_val); > - > - if (adjustment_
Re: [PATCH] ipa-sra: Fix thinko when overriding safe_to_import_accesses (PR 101066)
Hi, > 2021-06-16 Martin Jambor > > PR ipa/101066 > * ipa-sra.c (class isra_call_summary): New member > m_before_any_store, initialize it in the constructor. > (isra_call_summary::dump): Dump the new field. > (ipa_sra_call_summaries::duplicate): Copy it. > (process_scan_results): Set it. > (isra_write_edge_summary): Stream it. > (isra_read_edge_summary): Likewise. > (param_splitting_across_edge): Only override > safe_to_import_accesses if m_before_any_store is set. > > gcc/testsuite/ChangeLog: > > 2021-06-16 Martin Jambor > > PR ipa/101066 > * gcc.dg/ipa/pr101066.c: New test. OK, thanks! The analysis disabling transformation on any memory store is overly conservative. We have pointer (which is a parameter and comes from outer world) and no type infomration, however alias oracle will still be able to disambiguate when memory access is to non-escaping local memory or mallocated memory block etc. Honza
Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
Hi, On Wed, Jul 07 2021, Qing Zhao via Gcc-patches wrote: > Hi, > > This is the 4th version of the patch for the new security feature for GCC. I have been following the threads about this feature only very lightly, so please accept my apologies if my comments are about something which has been already discussed, but... [...] > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index c05d22f3e8f1..35051d7c6b96 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -384,6 +384,13 @@ static struct > >/* Numbber of components created when splitting aggregate parameters. */ >int param_reductions_created; > + > + /* Number of deferred_init calls that are modified. */ > + int deferred_init; > + > + /* Number of deferred_init calls that are created by > + generate_subtree_deferred_init. */ > + int subtree_deferred_init; > } sra_stats; > > static void > @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access *racc, > tree reg_type) >return get_or_create_ssa_default_def (cfun, racc->replacement_decl); > } > > + > +/* Generate statements to call .DEFERRED_INIT to initialize scalar > replacements > + of accesses within a subtree ACCESS; all its children, siblings and their > + children are to be processed. > + GSI is a statement iterator used to place the new statements. */ > +static void > +generate_subtree_deferred_init (struct access *access, > + tree init_type, > + tree is_vla, > + gimple_stmt_iterator *gsi, > + location_t loc) > +{ > + do > +{ > + if (access->grp_to_be_replaced) > + { > + tree repl = get_access_replacement (access); > + gimple *call > + = gimple_build_call_internal (IFN_DEFERRED_INIT, 3, > + TYPE_SIZE_UNIT (TREE_TYPE (repl)), > + init_type, is_vla); > + gimple_call_set_lhs (call, repl); > + gsi_insert_before (gsi, call, GSI_SAME_STMT); > + update_stmt (call); > + gimple_set_location (call, loc); > + sra_stats.subtree_deferred_init++; > + } > + else if (access->grp_to_be_debug_replaced) > + { > + tree drepl = get_access_replacement (access); > + tree call = build_call_expr_internal_loc > + (UNKNOWN_LOCATION, IFN_DEFERRED_INIT, > + TREE_TYPE (drepl), 3, > + TYPE_SIZE_UNIT (TREE_TYPE (drepl)), > + init_type, is_vla); > + gdebug *ds = gimple_build_debug_bind (drepl, call, > + gsi_stmt (*gsi)); > + gsi_insert_before (gsi, ds, GSI_SAME_STMT); Is handling of grp_to_be_debug_replaced accesses necessary here? If so, why? grp_to_be_debug_replaced accesses are there only to facilitate debug information about a part of an aggregate decl is that is likely going to be entirely removed - so that debuggers can sometimes show to users information about what they would contain had they not removed. It seems strange you need to mark them as uninitialized because they should not have any consumers. (But perhaps it is also harmless.) On a related note, if the intent of the feature is for optimizers to behave (almost?) as if it was not taking place, I believe you need to handle specially, and probably just ignore, calls to IFN_DEFERRED_INIT in scan_function in tree-sra.c. Otherwise the generated SRA access structures will have extra write flags turned on in them and that will lead to different behavior of the pass. Martin > + } > + if (access->first_child) > + generate_subtree_deferred_init (access->first_child, init_type, > + is_vla, gsi, loc); > + > + access = access ->next_sibling; > +} > + while (access); > +} > + > +/* For a call to .DEFERRED_INIT: > + var = .DEFERRED_INIT (size_of_var, init_type, is_vla); > + examine the LHS variable VAR and replace it with a scalar replacement if > + there is one, also replace the RHS call to a call to .DEFERRED_INIT of > + the corresponding scalar relacement variable. Examine the subtree and > + do the scalar replacements in the subtree too. STMT is the call, GSI is > + the statment iterator to place newly created statement. */ > + > +static enum assignment_mod_result > +sra_modify_deferred_init (gimple *stmt, gimple_stmt_iterator *gsi) > +{ > + tree lhs = gimple_call_lhs (stmt); > + tree init_type = gimple_call_arg (stmt, 1); > + tree is_vla = gimple_call_arg (stmt, 2); > + > + struct access *lhs_access = get_access_for_expr (lhs); > + if (!lhs_access) > +return SRA_AM_NONE; > + > + location_t loc = gimple_location (stmt); > + > + if (lhs_access->grp_to_be_replaced) > +{ > + tree lhs_repl = get_access_replacement (lhs_access); > + gimple_call_set_lhs (stmt, lhs_repl); > + tree arg0_repl = TYPE_SIZE_UNIT (TREE_T
Re: [PATCH] c++: Fix noexcept with unevaluated operand [PR101087]
On 7/7/21 9:40 PM, Marek Polacek wrote: It sounds plausible that this assert int f(); static_assert(noexcept(sizeof(f(; should pass: sizeof produces a std::size_t and its operand is not evaluated, so it can't throw. noexcept should only evaluate to false for potentially evaluated operands. Therefore I think that check_noexcept_r shouldn't walk into operands of sizeof/decltype/ alignof/typeof. Only checking cp_unevaluated_operand therein does not work, because expr_noexcept_p can be called in an unevaluated context, so I resorted to the following cp_evaluated hack. Does that seem acceptable? I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR directly? Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? PR c++/101087 gcc/cp/ChangeLog: * except.c (check_noexcept_r): Don't walk into unevaluated operands. (expr_noexcept_p): Use cp_evaluated. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept70.C: New test. --- gcc/cp/except.c | 14 +++--- gcc/testsuite/g++.dg/cpp0x/noexcept70.C | 5 + 2 files changed, 16 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept70.C diff --git a/gcc/cp/except.c b/gcc/cp/except.c index a8cea53cf91..6f97ac40b4b 100644 --- a/gcc/cp/except.c +++ b/gcc/cp/except.c @@ -1033,12 +1033,15 @@ check_handlers (tree handlers) expression whose type is a polymorphic class type (10.3). */ static tree -check_noexcept_r (tree *tp, int * /*walk_subtrees*/, void * /*data*/) +check_noexcept_r (tree *tp, int *walk_subtrees, void *) { tree t = *tp; enum tree_code code = TREE_CODE (t); - if ((code == CALL_EXPR && CALL_EXPR_FN (t)) - || code == AGGR_INIT_EXPR) + + if (cp_unevaluated_operand) +*walk_subtrees = false; + else if ((code == CALL_EXPR && CALL_EXPR_FN (t)) + || code == AGGR_INIT_EXPR) { /* We can only use the exception specification of the called function for determining the value of a noexcept expression; we can't use @@ -1155,6 +1158,11 @@ expr_noexcept_p (tree expr, tsubst_flags_t complain) if (expr == error_mark_node) return false; + /* Even though the operand of noexcept is an _unevaluated_ operand, + temporarily clearing cp_unevaluated_operand allows us to check it + in check_noexcept_r, to handle noexcept(sizeof(f())). It could be + set when we are called in the context of synthesized_method_walk. */ + cp_evaluated ev; fn = cp_walk_tree_without_duplicates (&expr, check_noexcept_r, 0); if (fn) { diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept70.C b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C new file mode 100644 index 000..45a6137dd6f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C @@ -0,0 +1,5 @@ +// PR c++/101087 +// { dg-do compile { target c++11 } } + +int f(); +static_assert(noexcept(sizeof(f())), ""); base-commit: a110855667782dac7b674d3e328b253b3b3c919b
Re: [PATCH] c++: Fix noexcept with unevaluated operand [PR101087]
On Thu, Jul 08, 2021 at 09:30:27AM -0400, Jason Merrill wrote: > On 7/7/21 9:40 PM, Marek Polacek wrote: > > It sounds plausible that this assert > > > >int f(); > >static_assert(noexcept(sizeof(f(; > > > > should pass: sizeof produces a std::size_t and its operand is not > > evaluated, so it can't throw. noexcept should only evaluate to > > false for potentially evaluated operands. Therefore I think that > > check_noexcept_r shouldn't walk into operands of sizeof/decltype/ > > alignof/typeof. Only checking cp_unevaluated_operand therein does > > not work, because expr_noexcept_p can be called in an unevaluated > > context, so I resorted to the following cp_evaluated hack. Does > > that seem acceptable? > > I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR > directly? I thought I would, but then it occurred to me that it might be better to rely on cp_walk_subtrees which ++/--s cp_unevaluated_operand for those codes. I'd be happy to change the patch to check those codes directly; maybe I'm overthinking things here. -- Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)
On Thu, 8 Jul 2021 at 12:42, Andreas Schwab wrote: > > On Jul 07 2021, Marek Polacek via Gcc-patches wrote: > > > On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches > > wrote: > >> I certainly will. Pushed in r12-2132. > > > > I think this patch breaks bootstrap on x86_64: > > It also breaks bootstrap on aarch64 and ia64 in stage2. > > In file included from ../../gcc/c-family/c-common.h:26, > from ../../gcc/cp/cp-tree.h:40, > from ../../gcc/cp/module.cc:209: > In function 'tree_node* identifier(const cpp_hashnode*)', > inlined from 'bool module_state::read_macro_maps()' at > ../../gcc/cp/module.cc:16305:10: > ../../gcc/tree.h:1089:58: error: array subscript -1 is outside array bounds > of 'cpp_hashnode [288230376151711743]' [-Werror=array-bounds] > 1089 | ((tree) ((char *) (NODE) - sizeof (struct tree_common))) > | ^ > ../../gcc/cp/module.cc:277:10: note: in expansion of macro > 'HT_IDENT_TO_GCC_IDENT' > 277 | return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast > (node))); > | ^ > In file included from ../../gcc/tree.h:23, > from ../../gcc/c-family/c-common.h:26, > from ../../gcc/cp/cp-tree.h:40, > from ../../gcc/cp/module.cc:209: > ../../gcc/tree-core.h: In member function 'bool > module_state::read_macro_maps()': > ../../gcc/tree-core.h:1445:24: note: at offset -24 into object > 'tree_identifier::id' of size 16 > 1445 | struct ht_identifier id; > |^~ > > Andreas. > on arm-linux-gnueabi, it breaks in: libatomic/config/linux/arm/host-config.h:42:34: error: array subscript 0 is outside array bounds of 'unsigned int[0]' [-Werror=array-bounds] Christophe > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > "And now for something completely different."
[Ada] Simplify string manipulation related to preprocessing
Code cleanup; semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sinput-l.adb (Load_File): Simplify foreword manipulation with concatenation; similar for filename with preprocessed output.diff --git a/gcc/ada/sinput-l.adb b/gcc/ada/sinput-l.adb --- a/gcc/ada/sinput-l.adb +++ b/gcc/ada/sinput-l.adb @@ -551,19 +551,10 @@ package body Sinput.L is Set_Source_File_Index_Table (X); if Opt.List_Preprocessing_Symbols then - Get_Name_String (N); - declare - Foreword : String (1 .. Foreword_Start'Length + - Name_Len + Foreword_End'Length); - + Foreword : constant String := +Foreword_Start & Get_Name_String (N) & Foreword_End; begin - Foreword (1 .. Foreword_Start'Length) := Foreword_Start; - Foreword (Foreword_Start'Length + 1 .. - Foreword_Start'Length + Name_Len) := -Name_Buffer (1 .. Name_Len); - Foreword (Foreword'Last - Foreword_End'Length + 1 .. - Foreword'Last) := Foreword_End; Prep.List_Symbols (Foreword); end; end if; @@ -654,14 +645,13 @@ package body Sinput.L is NB : Integer; Status : Boolean; - begin -Get_Name_String (N); -Add_Str_To_Name_Buffer (Prep_Suffix); +Prep_Filename : constant String := + Get_Name_String (N) & Prep_Suffix; -Delete_File (Name_Buffer (1 .. Name_Len), Status); + begin +Delete_File (Prep_Filename, Status); -FD := - Create_New_File (Name_Buffer (1 .. Name_Len), Text); +FD := Create_New_File (Prep_Filename, Text); Status := FD /= Invalid_FD;
[Ada] Avoid linear search when ensuring dependency on System
Replace a linear search with a hash table query. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * lib-writ.adb (Ensure_System_Dependency): Replace search in Lib.Units with a search in Lib.Unit_Names.diff --git a/gcc/ada/lib-writ.adb b/gcc/ada/lib-writ.adb --- a/gcc/ada/lib-writ.adb +++ b/gcc/ada/lib-writ.adb @@ -137,7 +137,8 @@ package body Lib.Writ is -- procedure Ensure_System_Dependency is - System_Uname : Unit_Name_Type; + System_Uname : constant Unit_Name_Type := +Name_To_Unit_Name (Name_System); -- Unit name for system spec if needed for dummy entry System_Fname : File_Name_Type; @@ -146,11 +147,9 @@ package body Lib.Writ is begin -- Nothing to do if we already compiled System - for Unum in Units.First .. Last_Unit loop - if Source_Index (Unum) = System_Source_File_Index then -return; - end if; - end loop; + if Unit_Names.Get (System_Uname) /= No_Unit then + return; + end if; -- If no entry for system.ads in the units table, then add a entry -- to the units table for system.ads, which will be referenced when @@ -158,7 +157,6 @@ package body Lib.Writ is -- on system as a result of Targparm scanning the system.ads file to -- determine the target dependent parameters for the compilation. - System_Uname := Name_To_Unit_Name (Name_System); System_Fname := File_Name (System_Source_File_Index); Units.Increment_Last;
[Ada] Make tools compatible with No_Dynamic_Accessibility_Checks
To help experiment with this new model. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * make.adb, osint.adb: Make code compatible with No_Dynamic_Accessibility_Checks restriction.diff --git a/gcc/ada/make.adb b/gcc/ada/make.adb --- a/gcc/ada/make.adb +++ b/gcc/ada/make.adb @@ -2364,7 +2364,7 @@ package body Make is Osint.Full_Source_Name (Source.File, Full_File => Full_Source_File, - Attr => Source_File_Attr'Access); + Attr => Source_File_Attr'Unchecked_Access); Lib_File := Osint.Lib_File_Name (Source.File, Source.Index); @@ -2392,7 +2392,7 @@ package body Make is Get_Name_String (Full_Lib_File); Name_Buffer (Name_Len + 1) := ASCII.NUL; Read_Only := not Is_Writable_File -(Name_Buffer'Address, Lib_File_Attr'Access); +(Name_Buffer'Address, Lib_File_Attr'Unchecked_Access); else Read_Only := False; end if; @@ -2460,7 +2460,7 @@ package body Make is The_Args => Args, Lib_File => Lib_File, Full_Lib_File => Full_Lib_File, - Lib_File_Attr => Lib_File_Attr'Access, + Lib_File_Attr => Lib_File_Attr'Unchecked_Access, Read_Only => Read_Only, ALI=> ALI, O_File => Obj_File, @@ -2630,7 +2630,8 @@ package body Make is Text := Read_Library_Info_From_Full - (Data.Full_Lib_File, Data.Lib_File_Attr'Access); + (Data.Full_Lib_File, + Data.Lib_File_Attr'Unchecked_Access); -- Restore Check_Object_Consistency to its initial value diff --git a/gcc/ada/osint.adb b/gcc/ada/osint.adb --- a/gcc/ada/osint.adb +++ b/gcc/ada/osint.adb @@ -1915,7 +1915,8 @@ package body Osint is begin if Opt.Look_In_Primary_Dir then Locate_File - (N, Source, Primary_Directory, File_Name, File, Attr'Access); + (N, Source, Primary_Directory, File_Name, File, + Attr'Unchecked_Access); if File /= No_File and then T = File_Stamp (N) then return File; @@ -1925,7 +1926,7 @@ package body Osint is Last_Dir := Src_Search_Directories.Last; for D in Primary_Directory + 1 .. Last_Dir loop -Locate_File (N, Source, D, File_Name, File, Attr'Access); +Locate_File (N, Source, D, File_Name, File, Attr'Unchecked_Access); if File /= No_File and then T = File_Stamp (File) then return File;
[Ada] Revert meaning of -gnatd_b
As part of experimenting with No_Dynamic_Accessibility_Checks, it seems that reverting the meaning of -gnatd_b is a better default for this experiment. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * debug.adb, sem_util.adb: Revert meaning of -gnatd_b. * sem_res.adb: Minor reformatting.diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb --- a/gcc/ada/debug.adb +++ b/gcc/ada/debug.adb @@ -140,7 +140,7 @@ package body Debug is -- d.Z Do not enable expansion in configurable run-time mode -- d_a Stop elaboration checks on accept or select statement - -- d_b Use compatibility model under No_Dynamic_Accessibility_Checks + -- d_b Use designated type model under No_Dynamic_Accessibility_Checks -- d_c CUDA compilation : compile for the host -- d_d -- d_e Ignore entry calls and requeue statements for elaboration @@ -956,6 +956,10 @@ package body Debug is -- behavior is similar to that of No_Entry_Calls_In_Elaboration_Code, -- but does not penalize actual entry calls in elaboration code. + -- d_b When the restriction No_Dynamic_Accessibility_Checks is enabled, + -- use the simple "designated type" accessibility model, instead of + -- using the implicit level of the anonymous access type declaration. + -- d_e The compiler ignores simple entry calls, asynchronous transfer of -- control, conditional entry calls, timed entry calls, and requeue -- statements in both the static and dynamic elaboration models. diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb --- a/gcc/ada/sem_res.adb +++ b/gcc/ada/sem_res.adb @@ -13738,8 +13738,7 @@ package body Sem_Res is Deepest_Type_Access_Level (Target_Type) and then (Nkind (Associated_Node_For_Itype (Opnd_Type)) /= N_Function_Specification -or else Ekind (Target_Type) in - Anonymous_Access_Kind) +or else Ekind (Target_Type) in Anonymous_Access_Kind) -- Check we are not in a return value ??? diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -410,17 +410,18 @@ package body Sem_Util is and then No_Dynamic_Accessibility_Checks_Enabled (N) and then Is_Anonymous_Access_Type (Etype (N)) then - -- In the alternative model the level is that of the subprogram + -- In the alternative model the level is that of the + -- designated type. if Debug_Flag_Underscore_B then + return Make_Level_Literal (Typ_Access_Level (Etype (N))); + + -- Otherwise the level is that of the subprogram + + else return Make_Level_Literal (Subprogram_Access_Level (Current_Subprogram)); end if; - - -- Otherwise the level is that of the designated type - - return Make_Level_Literal -(Typ_Access_Level (Etype (N))); end if; if Nkind (N) = N_Function_Call then @@ -659,24 +660,22 @@ package body Sem_Util is if Allow_Alt_Model and then No_Dynamic_Accessibility_Checks_Enabled (E) then - -- In the alternative model the level depends on the - -- entity's context. + -- In the alternative model the level is that of the + -- designated type entity's context. if Debug_Flag_Underscore_B then - if Is_Formal (E) then -return Make_Level_Literal - (Subprogram_Access_Level - (Enclosing_Subprogram (E))); - end if; + return Make_Level_Literal (Typ_Access_Level (Etype (E))); + + -- Otherwise the level depends on the entity's context + elsif Is_Formal (E) then + return Make_Level_Literal + (Subprogram_Access_Level +(Enclosing_Subprogram (E))); + else return Make_Level_Literal (Scope_Depth (Enclosing_Dynamic_Scope (E))); end if; - - -- Otherwise the level is that of the designated type - - return Make_Level_Literal - (Typ_Access_Level (Etype (E))); end if; -- Return the dynamic level in the normal case @@ -701,10 +700,11 @@ package body Sem_Util is elsif Is_Type (E) then -- When restriction No_Dynamic_Accessibility_Checks is active + -- along with -gnatd_b.
[Ada] Incorrect iteration over hashed containers after multiple Inserts
Cursors for Hashed maps and hashed sets include a component that speeds up iteration over these containers. However, in the presence of multiple insertions into the corresponding hash-tables, this component may become unreliable when a cursor obtained before an iteration is compared with a cursor denoting the same element but obtained during a loop over the container. To prevent these anomalies, we introduce an explicit equality operator for the corresponding Cursor types, which ignores the additional component. This patch assumes that the mention of "predefined" equality in the sections of the RM that discuss these cursors is in fact an over specification. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/a-cohama.ads: Introduce an equality operator over cursors. * libgnat/a-cohase.ads: Ditto. * libgnat/a-cohama.adb: Add body for "=" over cursors. (Insert): Do not set the Position component of the cursor that denotes the inserted element. * libgnat/a-cohase.adb: Ditto.diff --git a/gcc/ada/libgnat/a-cohama.adb b/gcc/ada/libgnat/a-cohama.adb --- a/gcc/ada/libgnat/a-cohama.adb +++ b/gcc/ada/libgnat/a-cohama.adb @@ -116,6 +116,13 @@ is -- "=" -- - + function "=" (Left, Right : Cursor) return Boolean is + begin + return + Left.Container = Right.Container + and then Left.Node = Right.Node; + end "="; + function "=" (Left, Right : Map) return Boolean is begin return Is_Equal (Left.HT, Right.HT); @@ -636,7 +643,11 @@ is end if; Position.Container := Container'Unrestricted_Access; - Position.Position := HT_Ops.Index (HT, Position.Node); + + -- Note that we do not set the Position component of the cursor, + -- because it may become incorrect on subsequent insertions/deletions + -- from the container. This will lose some optimizations but prevents + -- anomalies when the underlying hash-table is expanded or shrunk. end Insert; procedure Insert @@ -679,7 +690,6 @@ is end if; Position.Container := Container'Unrestricted_Access; - Position.Position := HT_Ops.Index (HT, Position.Node); end Insert; procedure Insert diff --git a/gcc/ada/libgnat/a-cohama.ads b/gcc/ada/libgnat/a-cohama.ads --- a/gcc/ada/libgnat/a-cohama.ads +++ b/gcc/ada/libgnat/a-cohama.ads @@ -110,6 +110,14 @@ is type Cursor is private; pragma Preelaborable_Initialization (Cursor); + function "=" (Left, Right : Cursor) return Boolean; + -- The representation of cursors includes a component used to optimize + -- iteration over maps. This component may become unreliable after + -- multiple map insertions, and must be excluded from cursor equality, + -- so we need to provide an explicit definition for it, instead of + -- using predefined equality (as implied by a questionable comment + -- in the RM). + Empty_Map : constant Map; -- Map objects declared without an initialization expression are -- initialized to the value Empty_Map. diff --git a/gcc/ada/libgnat/a-cohase.adb b/gcc/ada/libgnat/a-cohase.adb --- a/gcc/ada/libgnat/a-cohase.adb +++ b/gcc/ada/libgnat/a-cohase.adb @@ -145,6 +145,13 @@ is -- "=" -- - + function "=" (Left, Right : Cursor) return Boolean is + begin + return + Left.Container = Right.Container + and then Left.Node = Right.Node; + end "="; + function "=" (Left, Right : Set) return Boolean is begin return Is_Equal (Left.HT, Right.HT); @@ -763,11 +770,14 @@ is Position : out Cursor; Inserted : out Boolean) is - HT : Hash_Table_Type renames Container'Unrestricted_Access.HT; begin Insert (Container.HT, New_Item, Position.Node, Inserted); Position.Container := Container'Unchecked_Access; - Position.Position := HT_Ops.Index (HT, Position.Node); + + -- Note that we do not set the Position component of the cursor, + -- because it may become incorrect on subsequent insertions/deletions + -- from the container. This will lose some optimizations but prevents + -- anomalies when the underlying hash-table is expanded or shrunk. end Insert; procedure Insert diff --git a/gcc/ada/libgnat/a-cohase.ads b/gcc/ada/libgnat/a-cohase.ads --- a/gcc/ada/libgnat/a-cohase.ads +++ b/gcc/ada/libgnat/a-cohase.ads @@ -69,6 +69,15 @@ is type Cursor is private; pragma Preelaborable_Initialization (Cursor); + function "=" (Left, Right : Cursor) return Boolean; + -- The representation of cursors includes a component used to optimize + -- iteration over sets. This component may become unreliable after + -- multiple set insertions, and must be excluded from cursor equality, + -- so we need to provide an explicit definition for it, instead of + -- using predefined equality (as implied by a questionable comment + -- in the RM). This is also the c
[Ada] Add No_Tasking restriction is system.ads for bootstrap
Make it explicit that tasking is not used in the compiler, which also allows generating simpler and more efficient code. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * gcc-interface/system.ads: Add No_Tasking restriction.diff --git a/gcc/ada/gcc-interface/system.ads b/gcc/ada/gcc-interface/system.ads --- a/gcc/ada/gcc-interface/system.ads +++ b/gcc/ada/gcc-interface/system.ads @@ -50,6 +50,10 @@ pragma Restrictions (No_Finalization); -- access type on incomplete type Perm_Tree_Wrapper (which is required for -- defining a recursive type). +pragma Restrictions (No_Tasking); +-- Make it explicit that tasking is not used in the compiler, which also +-- allows generating simpler and more efficient code. + package System is pragma Pure; -- Note that we take advantage of the implementation permission to make
[Ada] Unsynchronized concurrent access to a Boolean variable
If an exception declaration occurs in a nonstatic scope (for example, within the body of a task type), System.Exception_Table.Register_Exception is to be called the first (and *only* the first) time the declaration is elaborated. A library-level "this exception has been registered" Boolean flag was being used to accomplish this, but this solution introduces potential problems with concurrency. So instead of Boolean, use the type System.Atomic_Operations.Test_And_Set.Test_And_Set_Flag if this option is available and concurrent access via tasking is a possibility; otherwise, stick with the old Boolean-based approach. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * rtsfind.ads, rtsfind.adb: Add support for finding the packages System.Atomic_Operations and System.Atomic_Operations.Test_And_Set and the declarations within that latter package of the type Test_And_Set_Flag and the function Atomic_Test_And_Set. * exp_ch11.adb (Expand_N_Exception_Declaration): If an exception is declared other than at library level, then we need to call Register_Exception the first time (and only the first time) the declaration is elaborated. In order to decide whether to perform this call for a given elaboration of the declaration, we used to unconditionally use a (library-level) Boolean variable. Now we instead use a variable of type System.Atomic_Operations.Test_And_Set.Test_And_Set_Flag unless either that type is unavailable or a No_Tasking restriction is in effect (in which case we use a Boolean variable as before).diff --git a/gcc/ada/exp_ch11.adb b/gcc/ada/exp_ch11.adb --- a/gcc/ada/exp_ch11.adb +++ b/gcc/ada/exp_ch11.adb @@ -1088,10 +1088,19 @@ package body Exp_Ch11 is -- (protecting test only needed if not at library level) - -- exceptF : Boolean := True -- static data + -- exceptF : aliased System.Atomic_Operations.Test_And_Set. + -- .Test_And_Set_Flag := 0; -- static data + -- if not Atomic_Test_And_Set (exceptF) then + --Register_Exception (except'Unrestricted_Access); + -- end if; + + -- If a No_Tasking restriction is in effect, or if Test_And_Set_Flag + -- is unavailable, then use Boolean instead. In that case, we generate: + -- + -- exceptF : Boolean := True; -- static data -- if exceptF then - --exceptF := False; - --Register_Exception (except'Unchecked_Access); + --ExceptF := False; + --Register_Exception (except'Unrestricted_Access); -- end if; procedure Expand_N_Exception_Declaration (N : Node_Id) is @@ -1275,7 +1284,7 @@ package body Exp_Ch11 is Force_Static_Allocation_Of_Referenced_Objects (Expression (N)); - -- Register_Exception (except'Unchecked_Access); + -- Register_Exception (except'Unrestricted_Access); if not No_Exception_Handlers_Set and then not Restriction_Active (No_Exception_Registration) @@ -1296,27 +1305,59 @@ package body Exp_Ch11 is Flag_Id := Make_Defining_Identifier (Loc, Chars => New_External_Name (Chars (Id), 'F')); - -Insert_Action (N, - Make_Object_Declaration (Loc, -Defining_Identifier => Flag_Id, -Object_Definition => - New_Occurrence_Of (Standard_Boolean, Loc), -Expression => - New_Occurrence_Of (Standard_True, Loc))); - Set_Is_Statically_Allocated (Flag_Id); -Append_To (L, - Make_Assignment_Statement (Loc, -Name => New_Occurrence_Of (Flag_Id, Loc), -Expression => New_Occurrence_Of (Standard_False, Loc))); +declare + Use_Test_And_Set_Flag : constant Boolean := + (not Global_No_Tasking) + and then RTE_Available (RE_Test_And_Set_Flag); + + Flag_Decl : Node_Id; + Condition : Node_Id; +begin + if Use_Test_And_Set_Flag then + Flag_Decl := +Make_Object_Declaration (Loc, + Defining_Identifier => Flag_Id, + Aliased_Present => True, + Object_Definition => +New_Occurrence_Of (RTE (RE_Test_And_Set_Flag), Loc), + Expression => +Make_Integer_Literal (Loc, 0)); + else + Flag_Decl := +Make_Object_Declaration (Loc, + Defining_Identifier => Flag_Id, + Object_Definition => +New_Occurrence_Of (Standard_Boolean, Loc), + Expression => +New_Occurrence_Of (Standard_True, Loc)); +
[Ada] Compute sizes when possible for packed array with Component_Size
For a packed constrained array type with a Component_Size clause, it may be possible to compute both its RM_Size and Esize. Do this as it benefits GNATprove for checking validity of overlays. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * layout.adb (Layout_Type): Special case when RM_Size and Esize can be computed for packed arrays.diff --git a/gcc/ada/layout.adb b/gcc/ada/layout.adb --- a/gcc/ada/layout.adb +++ b/gcc/ada/layout.adb @@ -487,6 +487,48 @@ package body Layout is then Set_Alignment (E, Alignment (Component_Type (E))); end if; + + -- If packing was requested, the one-dimensional array is constrained + -- with static bounds, the component size was set explicitly, and + -- the alignment is known, we can set (if not set explicitly) the + -- RM_Size and the Esize of the array type, as RM_Size is equal to + -- (arr'length * arr'component_size) and Esize is the same value + -- rounded to the next multiple of arr'alignment. This is not + -- applicable to packed arrays that are implemented specially + -- in GNAT, i.e. when Packed_Array_Impl_Type is set. + + if Is_Array_Type (E) + and then Number_Dimensions (E) = 1 + and then not Present (Packed_Array_Impl_Type (E)) + and then Has_Pragma_Pack (E) + and then Is_Constrained (E) + and then Compile_Time_Known_Bounds (E) + and then Known_Component_Size (E) + and then Known_Alignment (E) + then +declare + Abits : constant Int := UI_To_Int (Alignment (E)) * SSU; + Lo, Hi : Node_Id; + Siz : Uint; + +begin + Get_Index_Bounds (First_Index (E), Lo, Hi); + Siz := (Expr_Value (Hi) - Expr_Value (Lo) + 1) + * Component_Size (E); + + -- Do not overwrite a different value of 'Size specified + -- explicitly by the user. In that case, also do not set Esize. + + if Unknown_RM_Size (E) or else RM_Size (E) = Siz then + Set_RM_Size (E, Siz); + + if Unknown_Esize (E) then + Siz := ((Siz + (Abits - 1)) / Abits) * Abits; + Set_Esize (E, Siz); + end if; + end if; +end; + end if; end if; -- Even if the backend performs the layout, we still do a little in
[Ada] Make runtime code compatible with No_Dynamic_Accessibility_Checks
To help experiment with this new model. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/a-cbdlli.adb, libgnat/a-cbhama.adb, libgnat/a-cbhase.adb, libgnat/a-cbmutr.adb, libgnat/a-cborma.adb, libgnat/a-cborse.adb, libgnat/a-cobove.adb, libgnat/a-textio.adb, libgnat/a-witeio.adb, libgnat/a-ztexio.adb: Make code compatible with No_Dynamic_Accessibility_Checks restriction.diff --git a/gcc/ada/libgnat/a-cbdlli.adb b/gcc/ada/libgnat/a-cbdlli.adb --- a/gcc/ada/libgnat/a-cbdlli.adb +++ b/gcc/ada/libgnat/a-cbdlli.adb @@ -312,7 +312,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -1608,7 +1608,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); diff --git a/gcc/ada/libgnat/a-cbhama.adb b/gcc/ada/libgnat/a-cbhama.adb --- a/gcc/ada/libgnat/a-cbhama.adb +++ b/gcc/ada/libgnat/a-cbhama.adb @@ -213,7 +213,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -239,7 +239,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -1028,7 +1028,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -1053,7 +1053,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); diff --git a/gcc/ada/libgnat/a-cbhase.adb b/gcc/ada/libgnat/a-cbhase.adb --- a/gcc/ada/libgnat/a-cbhase.adb +++ b/gcc/ada/libgnat/a-cbhase.adb @@ -232,7 +232,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -1643,7 +1643,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); diff --git a/gcc/ada/libgnat/a-cbmutr.adb b/gcc/ada/libgnat/a-cbmutr.adb --- a/gcc/ada/libgnat/a-cbmutr.adb +++ b/gcc/ada/libgnat/a-cbmutr.adb @@ -600,7 +600,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => Container.Elements (Position.Node)'Access, + (Element => Container.Elements (Position.Node)'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -2533,7 +2533,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Reference_Type := - (Element => Container.Elements (Position.Node)'Access, + (Element => Container.Elements (Position.Node)'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); diff --git a/gcc/ada/libgnat/a-cborma.adb b/gcc/ada/libgnat/a-cborma.adb --- a/gcc/ada/libgnat/a-cborma.adb +++ b/gcc/ada/libgnat/a-cborma.adb @@ -420,7 +420,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'Unchecked_Access, Control => (Controlled with TC)) do Busy (TC.all); @@ -445,7 +445,7 @@ is Container.TC'Unrestricted_Access; begin return R : constant Constant_Reference_Type := - (Element => N.Element'Access, + (Element => N.Element'U
[Ada] Fix on computation of packed array size in case of error
In case of compilation error, the low and high bounds of the array type might have been replaced by an error node. Deal with this case by checking that the bounds are known at compile time. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * layout.adb (Layout_Type): Add guard before calling Expr_Value.diff --git a/gcc/ada/layout.adb b/gcc/ada/layout.adb --- a/gcc/ada/layout.adb +++ b/gcc/ada/layout.adb @@ -513,18 +513,28 @@ package body Layout is begin Get_Index_Bounds (First_Index (E), Lo, Hi); - Siz := (Expr_Value (Hi) - Expr_Value (Lo) + 1) - * Component_Size (E); - -- Do not overwrite a different value of 'Size specified - -- explicitly by the user. In that case, also do not set Esize. + -- Even if the bounds are known at compile time, they could + -- have been replaced by an error node. Check each bound + -- explicitly. - if Unknown_RM_Size (E) or else RM_Size (E) = Siz then - Set_RM_Size (E, Siz); + if Compile_Time_Known_Value (Lo) + and then Compile_Time_Known_Value (Hi) + then + Siz := (Expr_Value (Hi) - Expr_Value (Lo) + 1) +* Component_Size (E); + + -- Do not overwrite a different value of 'Size specified + -- explicitly by the user. In that case, also do not set + -- Esize. - if Unknown_Esize (E) then - Siz := ((Siz + (Abits - 1)) / Abits) * Abits; - Set_Esize (E, Siz); + if Unknown_RM_Size (E) or else RM_Size (E) = Siz then + Set_RM_Size (E, Siz); + + if Unknown_Esize (E) then +Siz := ((Siz + (Abits - 1)) / Abits) * Abits; +Set_Esize (E, Siz); + end if; end if; end if; end;
[Ada] Prevent crash on inspection point for unfrozen entity
Before this patch, the following program would make GNAT crash: procedure P is Unused_Var : Integer with Shared => False; pragma Inspection_Point; begin null; end tmp; This was because the Shared aspect resulted in a freeze node being inserted after the Inspection_Point pragma. This made Gigi delay the translation of the declaration of Unused_Var to the freeze node. This delaying resulted in a reference to an undeclared entity when trying to translate Inspection_Point from gnat to gnu. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_prag.adb (Expand_Pragma_Inspection_Point): After expansion of the Inspection_Point pragma, check if referenced entities that have a freeze node are already frozen. If they aren't, emit a warning and turn the pragma into a no-op.diff --git a/gcc/ada/exp_prag.adb b/gcc/ada/exp_prag.adb --- a/gcc/ada/exp_prag.adb +++ b/gcc/ada/exp_prag.adb @@ -2361,6 +2361,7 @@ package body Exp_Prag is S : Entity_Id; E : Entity_Id; + Remove_Inspection_Point : Boolean := False; begin if No (Pragma_Argument_Associations (N)) then A := New_List; @@ -2400,6 +2401,36 @@ package body Exp_Prag is Expand (Expression (Assoc)); Next (Assoc); end loop; + + -- If any of the references have a freeze node, it must appear before + -- pragma Inspection_Point, otherwise the entity won't be available when + -- Gigi processes Inspection_Point. + -- When this requirement isn't met, turn the pragma into a no-op. + + Assoc := First (Pragma_Argument_Associations (N)); + while Present (Assoc) loop + + if Present (Freeze_Node (Entity (Expression (Assoc and then + not Is_Frozen (Entity (Expression (Assoc))) + then +Error_Msg_NE ("?inspection point references unfrozen object &", + Assoc, + Entity (Expression (Assoc))); +Remove_Inspection_Point := True; + end if; + + Next (Assoc); + end loop; + + if Remove_Inspection_Point then + Error_Msg_N ("\pragma will be ignored", N); + + -- We can't just remove the pragma from the tree as it might be + -- iterated over by the caller. Turn it into a null statement + -- instead. + + Rewrite (N, Make_Null_Statement (Sloc (N))); + end if; end Expand_Pragma_Inspection_Point; --
[Ada] Skip types in error for test to compute array size
After a syntax error, if the code is compiled with -gnatq, semantic analysis should still proceed without internal errors if possible. Add special case to recognize ill-formed array type. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * layout.adb (Layout_Type): Do not call Number_Dimensions if the type does not have First_Index set.diff --git a/gcc/ada/layout.adb b/gcc/ada/layout.adb --- a/gcc/ada/layout.adb +++ b/gcc/ada/layout.adb @@ -498,6 +498,7 @@ package body Layout is -- in GNAT, i.e. when Packed_Array_Impl_Type is set. if Is_Array_Type (E) + and then Present (First_Index (E)) -- Skip types in error and then Number_Dimensions (E) = 1 and then not Present (Packed_Array_Impl_Type (E)) and then Has_Pragma_Pack (E)
[Ada] Fix use of single question mark in error message
Single question marks are deprecated. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_prag.adb (Expand_Pragma_Inspection_Point): Fix error message.diff --git a/gcc/ada/exp_prag.adb b/gcc/ada/exp_prag.adb --- a/gcc/ada/exp_prag.adb +++ b/gcc/ada/exp_prag.adb @@ -2413,7 +2413,7 @@ package body Exp_Prag is if Present (Freeze_Node (Entity (Expression (Assoc and then not Is_Frozen (Entity (Expression (Assoc))) then -Error_Msg_NE ("?inspection point references unfrozen object &", +Error_Msg_NE ("??inspection point references unfrozen object &", Assoc, Entity (Expression (Assoc))); Remove_Inspection_Point := True;
[Ada] Fix style in comments and code related to compilation units
Only style fixes; comments and code themselves are unchanged. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * lib-load.adb (Load_Unit): Fix style in comment. * par-load.adb (Load): Likewise. * scng.adb (Initialize_Scanner): Fix whitespace.diff --git a/gcc/ada/lib-load.adb b/gcc/ada/lib-load.adb --- a/gcc/ada/lib-load.adb +++ b/gcc/ada/lib-load.adb @@ -823,7 +823,7 @@ package body Lib.Load is Units.Table (Calling_Unit).Fatal_Error := Error_Detected; -- If with'ed unit had an ignored error, then propagate it - -- but do not overide an existring setting. + -- but do not overide an existing setting. when Error_Ignored => if Units.Table (Calling_Unit).Fatal_Error = None then @@ -900,7 +900,7 @@ package body Lib.Load is Remove_Unit (Unum); -- If unit not required, remove load stack entry and the junk --- file table entry, and return No_Unit to indicate not found, +-- file table entry, and return No_Unit to indicate not found. else Load_Stack.Decrement_Last; diff --git a/gcc/ada/par-load.adb b/gcc/ada/par-load.adb --- a/gcc/ada/par-load.adb +++ b/gcc/ada/par-load.adb @@ -129,8 +129,8 @@ begin Save_Style_Check_Options (Save_Style_Checks); Save_Style_Check := Opt.Style_Check; - -- If main unit, set Main_Unit_Entity (this will get overwritten if - -- the main unit has a separate spec, that happens later on in Load) + -- If main unit, set Main_Unit_Entity (this will get overwritten if the + -- main unit has a separate spec, that happens later on in Load). if Cur_Unum = Main_Unit then Main_Unit_Entity := Cunit_Entity (Main_Unit); diff --git a/gcc/ada/scng.adb b/gcc/ada/scng.adb --- a/gcc/ada/scng.adb +++ b/gcc/ada/scng.adb @@ -230,16 +230,16 @@ package body Scng is -- Initialize scan control variables - Current_Source_File := Index; - Source:= Source_Text (Current_Source_File); - Scan_Ptr := Source_First (Current_Source_File); - Token := No_Token; - Token_Ptr := Scan_Ptr; - Current_Line_Start:= Scan_Ptr; - Token_Node:= Empty; - Token_Name:= No_Name; - Start_Column := Set_Start_Column; - First_Non_Blank_Location := Scan_Ptr; + Current_Source_File := Index; + Source := Source_Text (Current_Source_File); + Scan_Ptr := Source_First (Current_Source_File); + Token:= No_Token; + Token_Ptr:= Scan_Ptr; + Current_Line_Start := Scan_Ptr; + Token_Node := Empty; + Token_Name := No_Name; + Start_Column := Set_Start_Column; + First_Non_Blank_Location := Scan_Ptr; Initialize_Checksum; Wide_Char_Byte_Count := 0;
[Ada] Prevent infinite recursion when there is no expected unit
The comment in Par.Load says "... or we are in big trouble, and abandon the compilation", but the code merely emitted errors and kept going. Now it emits errors, flags the problem in the unit table and gives up. Also, it was wrong for this routine to remove the unit, because the callers who add entries to the unit table assume those entries to be filled by the parser and not removed, even when irrecoverable errors happen. This prevents an infinite recursion that happened when parsing a file with multiple compilation units and wrong indexes, so the compiler was scanning unit X, followed its WITH Y clause but instead of unit Y it was getting unit X and scanned it again and again... Also, it fixes a crash when compiling a program with subunit that contains unexpected program unit (previously the compiler only cared about avoiding such a crash with -gnatc switch). Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * par-load.adb (Load): Don't remove unit, but flag it as erroneous and return.diff --git a/gcc/ada/par-load.adb b/gcc/ada/par-load.adb --- a/gcc/ada/par-load.adb +++ b/gcc/ada/par-load.adb @@ -234,9 +234,10 @@ begin Error_Msg ("\\found unit $!", Loc); end if; - -- In both cases, remove the unit so that it is out of the way later + -- In both cases, flag the fatal error and give up - Remove_Unit (Cur_Unum); + Set_Fatal_Error (Cur_Unum, Error_Detected); + return; end if; -- If current unit is a body, load its corresponding spec
[Ada] Replace low-level condition with a high-level call
Code cleanup; semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * lib-writ.adb (Ensure_System_Dependency): Simplify condition.diff --git a/gcc/ada/lib-writ.adb b/gcc/ada/lib-writ.adb --- a/gcc/ada/lib-writ.adb +++ b/gcc/ada/lib-writ.adb @@ -147,7 +147,7 @@ package body Lib.Writ is begin -- Nothing to do if we already compiled System - if Unit_Names.Get (System_Uname) /= No_Unit then + if Is_Loaded (System_Uname) then return; end if;
[Ada] Restore context on failure in loading of renamed child unit
When loading of renamed child unit failed, we didn't properly restore the value of a global Parsing_Main_Extended_Source variable. This is primarily a cleanup change; behaviour is not affected (perhaps except for errors reported on complicated code that is illegal anyway). Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * lib-load.adb (Load): Replace early return with goto to properly restore context on failure.diff --git a/gcc/ada/lib-load.adb b/gcc/ada/lib-load.adb --- a/gcc/ada/lib-load.adb +++ b/gcc/ada/lib-load.adb @@ -451,8 +451,8 @@ package body Lib.Load is With_Node => With_Node); if Unump = No_Unit then -Parsing_Main_Extended_Source := Save_PMES; -return No_Unit; +Unum := No_Unit; +goto Done; end if; -- If parent is a renaming, then we use the renamed package as
[Ada] Remove redundant condition for listing compilation units
There is only one call to Unit_Display and it is guarded by the List_Units global variable. There is no need to retest this variable inside the Unit_Display routine. Code cleanup; semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * par-ch10.adb (Unit_Display): Remove redundant condition; fix whitespace.diff --git a/gcc/ada/par-ch10.adb b/gcc/ada/par-ch10.adb --- a/gcc/ada/par-ch10.adb +++ b/gcc/ada/par-ch10.adb @@ -1162,24 +1162,22 @@ package body Ch10 is Loc: Source_Ptr; SR_Present : Boolean) is - Unum : constant Unit_Number_Type:= Get_Cunit_Unit_Number (Cunit); - Sind : constant Source_File_Index := Source_Index (Unum); - Unam : constant Unit_Name_Type := Unit_Name (Unum); + Unum : constant Unit_Number_Type := Get_Cunit_Unit_Number (Cunit); + Sind : constant Source_File_Index := Source_Index (Unum); + Unam : constant Unit_Name_Type:= Unit_Name (Unum); begin - if List_Units then - Write_Str ("Unit "); - Write_Unit_Name (Unit_Name (Unum)); - Unit_Location (Sind, Loc); + Write_Str ("Unit "); + Write_Unit_Name (Unit_Name (Unum)); + Unit_Location (Sind, Loc); - if SR_Present then -Write_Str (", SR"); - end if; - - Write_Str (", file name "); - Write_Name (Get_File_Name (Unam, Nkind (Unit (Cunit)) = N_Subunit)); - Write_Eol; + if SR_Present then + Write_Str (", SR"); end if; + + Write_Str (", file name "); + Write_Name (Get_File_Name (Unam, Nkind (Unit (Cunit)) = N_Subunit)); + Write_Eol; end Unit_Display; ---
[Ada] Simplify redundant checks for non-empty lists
Simplify "Present (L) and then not Is_Empty_List (L)" into "not Is_Empty_List (L)", since Is_Empty_List can be called on No_List and returns True. Code cleanup; semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_ch12.adb, sem_ch6.adb, sem_ch9.adb, sprint.adb: Simplify checks for non-empty lists.diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb --- a/gcc/ada/sem_ch12.adb +++ b/gcc/ada/sem_ch12.adb @@ -9724,7 +9724,6 @@ package body Sem_Ch12 is if Nkind (Par_N) = N_Package_Specification and then Decls = Visible_Declarations (Par_N) - and then Present (Private_Declarations (Par_N)) and then not Is_Empty_List (Private_Declarations (Par_N)) then Decls := Private_Declarations (Par_N); diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb --- a/gcc/ada/sem_ch6.adb +++ b/gcc/ada/sem_ch6.adb @@ -549,7 +549,6 @@ package body Sem_Ch6 is else if Nkind (Par) = N_Package_Specification and then Decls = Visible_Declarations (Par) - and then Present (Private_Declarations (Par)) and then not Is_Empty_List (Private_Declarations (Par)) then Decls := Private_Declarations (Par); diff --git a/gcc/ada/sem_ch9.adb b/gcc/ada/sem_ch9.adb --- a/gcc/ada/sem_ch9.adb +++ b/gcc/ada/sem_ch9.adb @@ -1955,9 +1955,7 @@ package body Sem_Ch9 is Tasking_Used := True; Analyze_Declarations (Visible_Declarations (N)); - if Present (Private_Declarations (N)) -and then not Is_Empty_List (Private_Declarations (N)) - then + if not Is_Empty_List (Private_Declarations (N)) then Last_Id := Last_Entity (Prot_Typ); Analyze_Declarations (Private_Declarations (N)); diff --git a/gcc/ada/sprint.adb b/gcc/ada/sprint.adb --- a/gcc/ada/sprint.adb +++ b/gcc/ada/sprint.adb @@ -1065,16 +1065,12 @@ package body Sprint is if Present (Expressions (Node)) then Sprint_Comma_List (Expressions (Node)); - if Present (Component_Associations (Node)) -and then not Is_Empty_List (Component_Associations (Node)) - then + if not Is_Empty_List (Component_Associations (Node)) then Write_Str (", "); end if; end if; - if Present (Component_Associations (Node)) - and then not Is_Empty_List (Component_Associations (Node)) - then + if not Is_Empty_List (Component_Associations (Node)) then Indent_Begin; declare
[Ada] Fix violation of No_Implicit_Loops restriction for enumeration type
The perfect hash function generated by the compiler to speed up the Value attribute of an enumeration type contains an implicit loop and, therefore, violates the No_Implicit_Loops restriction when it is active. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_imgv.adb: Add with and use clause for Restrict and Rident. (Build_Enumeration_Image_Tables): Do not generate the hash function if the No_Implicit_Loops restriction is active.diff --git a/gcc/ada/exp_imgv.adb b/gcc/ada/exp_imgv.adb --- a/gcc/ada/exp_imgv.adb +++ b/gcc/ada/exp_imgv.adb @@ -37,6 +37,8 @@ with Namet; use Namet; with Nmake; use Nmake; with Nlists; use Nlists; with Opt;use Opt; +with Restrict; use Restrict; +with Rident; use Rident; with Rtsfind;use Rtsfind; with Sem_Aux;use Sem_Aux; with Sem_Res;use Sem_Res; @@ -160,6 +162,8 @@ package body Exp_Imgv is Expression => Make_Aggregate (Loc, Expressions => V))); end Append_Table_To; + -- Start of Build_Enumeration_Image_Tables + begin -- Nothing to do for types other than a root enumeration type @@ -247,7 +251,7 @@ package body Exp_Imgv is Append_Table_To (Act, Eind, Nlit, Ityp, Ind); -- If the number of literals is not greater than Threshold, then we are - -- done. Otherwise we compute a (perfect) hash function for use by the + -- done. Otherwise we generate a (perfect) hash function for use by the -- Value attribute. if Nlit > Threshold then @@ -283,11 +287,12 @@ package body Exp_Imgv is -- If the unit where the type is declared is the main unit, and the -- number of literals is greater than Threshold_For_Size when we are - -- optimizing for size, and -gnatd_h is not specified, try to compute - -- the hash function. + -- optimizing for size, and the restriction No_Implicit_Loops is not + -- active, and -gnatd_h is not specified, generate the hash function. if In_Main_Unit and then (Optimize_Size = 0 or else Nlit > Threshold_For_Size) + and then not Restriction_Active (No_Implicit_Loops) and then not Debug_Flag_Underscore_H then declare
[Ada] Spurious warning in generic instance
In the case of complex generic instantiations, the warning on component not being present can be spurious (corresponding to dead code for the given instance), so we disable it. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_util.ads, sem_util.adb (Apply_Compile_Time_Constraint_Error): New parameter Emit_Message. * sem_ch4.adb (Analyze_Selected_Component): Disable warning within an instance.diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb --- a/gcc/ada/sem_ch4.adb +++ b/gcc/ada/sem_ch4.adb @@ -5471,7 +5471,9 @@ package body Sem_Ch4 is Apply_Compile_Time_Constraint_Error (N, "component not present in }??", CE_Discriminant_Check_Failed, -Ent => Prefix_Type); +Ent => Prefix_Type, +Emit_Message => + SPARK_Mode = On or not In_Instance_Not_Visible); return; end if; diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -1510,13 +1510,14 @@ package body Sem_Util is - procedure Apply_Compile_Time_Constraint_Error - (N : Node_Id; - Msg: String; - Reason : RT_Exception_Code; - Ent: Entity_Id := Empty; - Typ: Entity_Id := Empty; - Loc: Source_Ptr := No_Location; - Warn : Boolean:= False) + (N: Node_Id; + Msg : String; + Reason : RT_Exception_Code; + Ent : Entity_Id := Empty; + Typ : Entity_Id := Empty; + Loc : Source_Ptr := No_Location; + Warn : Boolean:= False; + Emit_Message : Boolean:= True) is Stat : constant Boolean := Is_Static_Expression (N); R_Stat : constant Node_Id := @@ -1530,8 +1531,10 @@ package body Sem_Util is Rtyp := Typ; end if; - Discard_Node -(Compile_Time_Constraint_Error (N, Msg, Ent, Loc, Warn => Warn)); + if Emit_Message then + Discard_Node + (Compile_Time_Constraint_Error (N, Msg, Ent, Loc, Warn => Warn)); + end if; -- Now we replace the node by an N_Raise_Constraint_Error node -- This does not need reanalyzing, so set it as analyzed now. diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads --- a/gcc/ada/sem_util.ads +++ b/gcc/ada/sem_util.ads @@ -161,13 +161,14 @@ package Sem_Util is -- part of the current package. procedure Apply_Compile_Time_Constraint_Error - (N : Node_Id; - Msg: String; - Reason : RT_Exception_Code; - Ent: Entity_Id := Empty; - Typ: Entity_Id := Empty; - Loc: Source_Ptr := No_Location; - Warn : Boolean:= False); + (N: Node_Id; + Msg : String; + Reason : RT_Exception_Code; + Ent : Entity_Id := Empty; + Typ : Entity_Id := Empty; + Loc : Source_Ptr := No_Location; + Warn : Boolean:= False; + Emit_Message : Boolean:= True); -- N is a subexpression that will raise Constraint_Error when evaluated -- at run time. Msg is a message that explains the reason for raising the -- exception. The last character is ? if the message is always a warning, @@ -189,6 +190,7 @@ package Sem_Util is -- when the caller wants to parameterize whether an error or warning is -- given), or when the message should be treated as a warning even when -- SPARK_Mode is On (which otherwise would force an error). + -- If Emit_Message is False, then do not emit any message. function Async_Readers_Enabled (Id : Entity_Id) return Boolean; -- Id should be the entity of a state abstraction, an object, or a type.
[Ada] AI12-0156 Use subtype indication in generalized iterators
Add syntax and semantic support for this new Ada 2022 feature. Support for proper accessibility levels to be investigated in a second step. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * par-ch5.adb (P_Iterator_Specification): Add support for access definition in loop parameter. * sem_ch5.adb (Check_Subtype_Indication): Renamed... (Check_Subtype_Definition): ... into this and check for conformance on access definitions, and improve error messages. (Analyze_Iterator_Specification): Add support for access definition in loop parameter.diff --git a/gcc/ada/par-ch5.adb b/gcc/ada/par-ch5.adb --- a/gcc/ada/par-ch5.adb +++ b/gcc/ada/par-ch5.adb @@ -1741,7 +1741,15 @@ package body Ch5 is if Token = Tok_Colon then Scan; -- past : - Set_Subtype_Indication (Node1, P_Subtype_Indication); + + if Token = Tok_Access then +Error_Msg_Ada_2022_Feature + ("access definition in loop parameter", Token_Ptr); +Set_Subtype_Indication (Node1, P_Access_Definition (False)); + + else +Set_Subtype_Indication (Node1, P_Subtype_Indication); + end if; end if; if Token = Tok_Of then @@ -1761,7 +1769,7 @@ package body Ch5 is Set_Of_Present (Node1); Error_Msg_N ("subtype indication is only legal on an element iterator", - Subtype_Indication (Node1)); +Subtype_Indication (Node1)); else return Error; diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb --- a/gcc/ada/sem_ch5.adb +++ b/gcc/ada/sem_ch5.adb @@ -2176,9 +2176,11 @@ package body Sem_Ch5 is -- indicator, verify that the container type has an Iterate aspect that -- implements the reversible iterator interface. - procedure Check_Subtype_Indication (Comp_Type : Entity_Id); + procedure Check_Subtype_Definition (Comp_Type : Entity_Id); -- If a subtype indication is present, verify that it is consistent -- with the component type of the array or container name. + -- In Ada 2022, the subtype indication may be an access definition, + -- if the array or container has elements of an anonymous access type. function Get_Cursor_Type (Typ : Entity_Id) return Entity_Id; -- For containers with Iterator and related aspects, the cursor is @@ -2209,24 +2211,46 @@ package body Sem_Ch5 is end Check_Reverse_Iteration; --- - -- Check_Subtype_Indication -- + -- Check_Subtype_Definition -- --- - procedure Check_Subtype_Indication (Comp_Type : Entity_Id) is + procedure Check_Subtype_Definition (Comp_Type : Entity_Id) is begin - if Present (Subt) - and then (not Covers (Base_Type ((Bas)), Comp_Type) + if not Present (Subt) then +return; + end if; + + if Is_Anonymous_Access_Type (Entity (Subt)) then +if not Is_Anonymous_Access_Type (Comp_Type) then + Error_Msg_NE + ("component type& is not an anonymous access", + Subt, Comp_Type); + +elsif not Conforming_Types +(Designated_Type (Entity (Subt)), + Designated_Type (Comp_Type), + Fully_Conformant) +then + Error_Msg_NE + ("subtype indication does not match component type&", + Subt, Comp_Type); +end if; + + elsif Present (Subt) + and then (not Covers (Base_Type (Bas), Comp_Type) or else not Subtypes_Statically_Match (Bas, Comp_Type)) then if Is_Array_Type (Typ) then - Error_Msg_N - ("subtype indication does not match component type", Subt); + Error_Msg_NE + ("subtype indication does not match component type&", + Subt, Comp_Type); else - Error_Msg_N - ("subtype indication does not match element type", Subt); + Error_Msg_NE + ("subtype indication does not match element type&", + Subt, Comp_Type); end if; end if; - end Check_Subtype_Indication; + end Check_Subtype_Definition; - -- Get_Cursor_Type -- @@ -2288,6 +2312,39 @@ package body Sem_Ch5 is Analyze (Decl); Rewrite (Subt, New_Occurrence_Of (S, Sloc (Subt))); end; + + -- Ada 2022: the subtype definition may be for an anonymous + -- access type. + + elsif Nkind (Subt) = N_Access_Definition then +declare + S: constant Entity_Id := Make_Temporary (Sloc (Subt), 'S'); + Decl : Node_Id; +begin + if Present
[Ada] Spurious style message on missing overriding indicator
In the presence of style switch -gnatyO, the compiler emits a spurious style violation message naming an inherited operation that does not come from an explicit subprogram declaration. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * style.adb (Missing_Overriding): Do not emit message when parent of subprogram is a full type declaration.diff --git a/gcc/ada/style.adb b/gcc/ada/style.adb --- a/gcc/ada/style.adb +++ b/gcc/ada/style.adb @@ -265,11 +265,15 @@ package body Style is -- indicators were introduced in Ada 2005. We apply Comes_From_Source -- to Original_Node to catch the case of a procedure body declared with -- "is null" that has been rewritten as a normal empty body. + -- We do not emit a warning on an inherited operation that comes from + -- a type derivation. if Style_Check_Missing_Overriding and then (Comes_From_Source (Original_Node (N)) or else Is_Generic_Instance (E)) and then Ada_Version_Explicit >= Ada_2005 +and then Present (Parent (E)) +and then Nkind (Parent (E)) /= N_Full_Type_Declaration then -- If the subprogram is an instantiation, its declaration appears -- within a wrapper package that precedes the instance node. Place
[Ada] Duplicated D lines in ali files
GNATcoverage possibly relies on the presence of the duplicate D lines in ALI files for its Source Coverage Obligation tables among different instantiations of a same generic. Mention this in comments. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * lib-writ.ads: Mention SCOs dependency as reason for duplicates. * lib.ads (Units): Update documentation to mention duplicated units.diff --git a/gcc/ada/lib-writ.ads b/gcc/ada/lib-writ.ads --- a/gcc/ada/lib-writ.ads +++ b/gcc/ada/lib-writ.ads @@ -1053,6 +1053,9 @@ package Lib.Writ is -- The Object parameter is true if an object file is created, and false -- otherwise. Note that the pseudo-object file generated in GNATprove mode -- does count as an object file from this point of view. + -- May output duplicate D lines caused by generic instantiations. This is + -- by design as GNATcoverage relies on them for its coverage of generic + -- instantiations, although this may be revisited in the future. procedure Add_Preprocessing_Dependency (S : Source_File_Index); -- Indicate that there is a dependency to be added on a preprocessing data diff --git a/gcc/ada/lib.ads b/gcc/ada/lib.ads --- a/gcc/ada/lib.ads +++ b/gcc/ada/lib.ads @@ -926,7 +926,9 @@ private -- The following table records a mapping between a name and the entry in -- the units table whose Unit_Name is this name. It is used to speed up -- the Is_Loaded function, whose original implementation (linear search) - -- could account for 2% of the time spent in the front end. Note that, in + -- could account for 2% of the time spent in the front end. When the unit + -- is an instance of a generic, the unit might get duplicated in the unit + -- table - see Make_Instance_Unit for more information. Note that, in -- the case of source files containing multiple units, the units table may -- temporarily contain two entries with the same Unit_Name during parsing, -- which means that the mapping must be to the first entry in the table.
[Ada] Rename sigtramp-vxworks-target.inc to sigtramp-vxworks-target.h
The .inc extension isn't recognized by gprconfig. The original motivation for using this extension was to match the convention of putting code in .inc ala unwind.inc. However it's easier in this situation to just rename it to a .h file. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sigtramp-vxworks-target.inc: Rename to... * sigtramp-vxworks-target.h: ... this. * sigtramp-vxworks.c, Makefile.rtl: Likewise.diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl --- a/gcc/ada/Makefile.rtl +++ b/gcc/ada/Makefile.rtl @@ -1043,7 +1043,7 @@ EXTRA_GNATRTL_NONTASKING_OBJS= EXTRA_GNATRTL_TASKING_OBJS= # Subsets of extra libgnat sources that always go together -VX_SIGTRAMP_EXTRA_SRCS=sigtramp.h sigtramp-vxworks-target.inc +VX_SIGTRAMP_EXTRA_SRCS=sigtramp.h sigtramp-vxworks-target.h # Additional object files that should go in the same directory as libgnat, # aside the library itself. Typically useful for crtbegin/crtend kind of files. diff --git a/gcc/ada/sigtramp-vxworks-target.inc b/gcc/ada/sigtramp-vxworks-target.h --- a/gcc/ada/sigtramp-vxworks-target.inc +++ b/gcc/ada/sigtramp-vxworks-target.h @@ -6,7 +6,7 @@ * * * Asm Implementation Include File * * * - * Copyright (C) 2011-2018, Free Software Foundation, Inc. * + * Copyright (C) 2011-2021, Free Software Foundation, Inc. * * * * GNAT is free software; you can redistribute it and/or modify it under * * terms of the GNU General Public License as published by the Free Soft- * diff --git a/gcc/ada/sigtramp-vxworks.c b/gcc/ada/sigtramp-vxworks.c --- a/gcc/ada/sigtramp-vxworks.c +++ b/gcc/ada/sigtramp-vxworks.c @@ -180,7 +180,7 @@ void __gnat_sigtramp (int signo, void *si, void *sc, } /* Include the target specific bits. */ -#include "sigtramp-vxworks-target.inc" +#include "sigtramp-vxworks-target.h" /* sigtramp stub for common registers. */
[Ada] Transient scope cleanup
Misc cleanups found while working on transient scopes. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * comperr.adb (Compiler_Abort): Call Sinput.Unlock, because if this is called late, then Source_Dump would crash otherwise. * debug.adb: Correct documentation of the -gnatd.9 switch. * exp_ch4.adb (Expand_Allocator_Expression): Add a comment. * exp_ch6.adb: Minor comment fixes. Add assertion. * exp_ch6.ads (Is_Build_In_Place_Result_Type): Correct comment. * exp_ch7.adb, checks.ads: Minor comment fixes.diff --git a/gcc/ada/checks.ads b/gcc/ada/checks.ads --- a/gcc/ada/checks.ads +++ b/gcc/ada/checks.ads @@ -851,7 +851,7 @@ package Checks is --are not following the flow graph (more properly the flow of actual --processing only corresponds to the flow graph for local assignments). --For non-local variables, we preserve the current setting, i.e. a - --validity check is performed when assigning to a knonwn valid global. + --validity check is performed when assigning to a known valid global. -- Note: no validity checking is required if range checks are suppressed -- regardless of the setting of the validity checking mode. diff --git a/gcc/ada/comperr.adb b/gcc/ada/comperr.adb --- a/gcc/ada/comperr.adb +++ b/gcc/ada/comperr.adb @@ -404,6 +404,7 @@ package body Comperr is Set_Standard_Output; Tree_Dump; + Sinput.Unlock; -- so Source_Dump can modify it Source_Dump; raise Unrecoverable_Error; end if; diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb --- a/gcc/ada/debug.adb +++ b/gcc/ada/debug.adb @@ -1101,7 +1101,7 @@ package body Debug is -- issues (e.g., assuming that a low bound of an array parameter -- of an unconstrained subtype belongs to the index subtype). - -- d.9 Enable build-in-place for function calls returning some nonlimited + -- d.9 Disable build-in-place for function calls returning nonlimited -- types. -- diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb --- a/gcc/ada/exp_ch4.adb +++ b/gcc/ada/exp_ch4.adb @@ -1166,6 +1166,9 @@ package body Exp_Ch4 is -- secondary stack). In that case, the object will be moved, so we do -- want to Adjust. However, if it's a nonlimited build-in-place -- function call, Adjust is not wanted. + -- + -- Needs_Finalization (DesigT) can differ from Needs_Finalization (T) + -- if one of the two types is class-wide, and the other is not. if Needs_Finalization (DesigT) and then Needs_Finalization (T) diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb --- a/gcc/ada/exp_ch6.adb +++ b/gcc/ada/exp_ch6.adb @@ -4913,7 +4913,7 @@ package body Exp_Ch6 is -- Optimization, if the returned value (which is on the sec-stack) is -- returned again, no need to copy/readjust/finalize, we can just pass -- the value thru (see Expand_N_Simple_Return_Statement), and thus no - -- attachment is needed + -- attachment is needed. if Nkind (Parent (N)) = N_Simple_Return_Statement then return; @@ -7310,15 +7310,16 @@ package body Exp_Ch6 is Set_Enclosing_Sec_Stack_Return (N); - -- Optimize the case where the result is a function call. In this - -- case the result is already on the secondary stack and no further - -- processing is required except to set the By_Ref flag to ensure - -- that gigi does not attempt an extra unnecessary copy. (Actually - -- not just unnecessary but wrong in the case of a controlled type, - -- where gigi does not know how to do a copy.) + -- Optimize the case where the result is a function call that also + -- returns on the secondary stack. In this case the result is already + -- on the secondary stack and no further processing is required + -- except to set the By_Ref flag to ensure that gigi does not attempt + -- an extra unnecessary copy. (Actually not just unnecessary but + -- wrong in the case of a controlled type, where gigi does not know + -- how to do a copy.) - if Requires_Transient_Scope (Exp_Typ) - and then Exp_Is_Function_Call + pragma Assert (Requires_Transient_Scope (R_Type)); + if Exp_Is_Function_Call and then Requires_Transient_Scope (Exp_Typ) then Set_By_Ref (N); @@ -7849,7 +7850,7 @@ package body Exp_Ch6 is Compute_Returns_By_Ref (Subp); - -- Wnen freezing a null procedure, analyze its delayed aspects now + -- When freezing a null procedure, analyze its delayed aspects now -- because we may not have reached the end of the declarative list when -- delayed aspects are normally analyzed. This ensures that dispatching -
[Ada] Use encoded names only with -fgnat-encodings=all
This disables the last special encoding done in Get_Encoded_Name, except when -fgnat-encodings=all is passed on the command line. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_dbug.adb (Get_Encoded_Name): Do not encode names of discrete types with custom bounds, except with -fgnat-encodings=all. * exp_pakd.adb (Create_Packed_Array_Impl_Type): Adjust comment.diff --git a/gcc/ada/exp_dbug.adb b/gcc/ada/exp_dbug.adb --- a/gcc/ada/exp_dbug.adb +++ b/gcc/ada/exp_dbug.adb @@ -655,10 +655,10 @@ package body Exp_Dbug is Has_Suffix := True; - -- Fixed-point case: generate GNAT encodings when asked to + -- Generate GNAT encodings when asked to for fixed-point case - if Is_Fixed_Point_Type (E) -and then GNAT_Encodings = DWARF_GNAT_Encodings_All + if GNAT_Encodings = DWARF_GNAT_Encodings_All +and then Is_Fixed_Point_Type (E) then Get_External_Name (E, True, "XF_"); Add_Real_To_Buffer (Delta_Value (E)); @@ -668,10 +668,9 @@ package body Exp_Dbug is Add_Real_To_Buffer (Small_Value (E)); end if; - -- Discrete case where bounds do not match size. Not necessary if we can - -- emit standard DWARF. + -- Likewise for discrete case where bounds do not match size - elsif GNAT_Encodings /= DWARF_GNAT_Encodings_Minimal + elsif GNAT_Encodings = DWARF_GNAT_Encodings_All and then Is_Discrete_Type (E) and then not Bounds_Match_Size (E) then diff --git a/gcc/ada/exp_pakd.adb b/gcc/ada/exp_pakd.adb --- a/gcc/ada/exp_pakd.adb +++ b/gcc/ada/exp_pakd.adb @@ -828,8 +828,8 @@ package body Exp_Pakd is elsif not Is_Constrained (Typ) then - -- When generating standard DWARF (i.e when GNAT_Encodings is - -- DWARF_GNAT_Encodings_Minimal), the ___XP suffix will be stripped + -- When generating standard DWARF (i.e when GNAT_Encodings is not + -- DWARF_GNAT_Encodings_All), the ___XP suffix will be stripped -- by the back-end but generate it anyway to ease compiler debugging. -- This will help to distinguish implementation types from original -- packed arrays.
[Ada] Diagnose properly illegal uses of Target_Name
Ada_2022 introduces the notion of Target_Name, written @, to be used in assignment statements, where it denotes the value of the left-hand side prior to the assignment. This patch diagnoses illegal uses of the target name outside of its legal context. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_ch5.adb (Analyze_Target_Name): Properly reject a Target_Name when it appears outside of an assignment statement, or within the left-hand side of one.diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb --- a/gcc/ada/sem_ch5.adb +++ b/gcc/ada/sem_ch5.adb @@ -4233,10 +4233,50 @@ package body Sem_Ch5 is - procedure Analyze_Target_Name (N : Node_Id) is + procedure Report_Error; + + -- + -- Report_Error -- + -- + + procedure Report_Error is + begin + Error_Msg_N + ("must appear in the right-hand side of an assignment statement", + N); + Rewrite (N, New_Occurrence_Of (Any_Id, Sloc (N))); + end Report_Error; + begin -- A target name has the type of the left-hand side of the enclosing -- assignment. + -- First, verify that the context is the right-hand side of an + -- assignment statement. + + if No (Current_Assignment) then + Report_Error; + return; + + else + declare +P : Node_Id := N; + begin +while Present (P) + and then Nkind (Parent (P)) /= N_Assignment_Statement +loop + P := Parent (P); +end loop; + +if No (P) + or else P /= Expression (Parent (P)) +then + Report_Error; + return; +end if; + end; + end if; + Set_Etype (N, Etype (Name (Current_Assignment))); end Analyze_Target_Name;
[Ada] Tune detection of illegal occurrences of target_name
Prevent AST climbing from going outside of the current program unit; tune style; add comments. Also, only set the Current_Assignment global variable when needed and clear it once the analysis of an assignment statement is done. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_ch5.adb (Analyze_Assignment): Clear Current_Assignment at exit. (Analyze_Target_Name): Prevent AST climbing from going too far.diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb --- a/gcc/ada/sem_ch5.adb +++ b/gcc/ada/sem_ch5.adb @@ -480,12 +480,11 @@ package body Sem_Ch5 is Mark_And_Set_Ghost_Assignment (N); if Has_Target_Names (N) then + pragma Assert (No (Current_Assignment)); Current_Assignment := N; Expander_Mode_Save_And_Set (False); Save_Full_Analysis := Full_Analysis; Full_Analysis := False; - else - Current_Assignment := Empty; end if; Analyze (Lhs); @@ -1302,6 +1301,7 @@ package body Sem_Ch5 is if Has_Target_Names (N) then Expander_Mode_Restore; Full_Analysis := Save_Full_Analysis; +Current_Assignment := Empty; end if; pragma Assert (not Should_Transform_BIP_Assignment (Typ => T1)); @@ -4234,6 +4234,8 @@ package body Sem_Ch5 is procedure Analyze_Target_Name (N : Node_Id) is procedure Report_Error; + -- Complain about illegal use of target_name and rewrite it into unknown + -- identifier. -- -- Report_Error -- @@ -4247,6 +4249,8 @@ package body Sem_Ch5 is Rewrite (N, New_Occurrence_Of (Any_Id, Sloc (N))); end Report_Error; + -- Start of processing for Analyze_Target_Name + begin -- A target name has the type of the left-hand side of the enclosing -- assignment. @@ -4257,27 +4261,39 @@ package body Sem_Ch5 is if No (Current_Assignment) then Report_Error; return; + end if; - else - declare -P : Node_Id := N; - begin -while Present (P) - and then Nkind (Parent (P)) /= N_Assignment_Statement -loop - P := Parent (P); -end loop; + declare + Current : Node_Id := N; + Context : Node_Id := Parent (N); + begin + while Present (Context) loop -if No (P) - or else P /= Expression (Parent (P)) -then +-- Check if target_name appears in the expression of the enclosing +-- assignment. + +if Nkind (Context) = N_Assignment_Statement then + if Current = Expression (Context) then + pragma Assert (Context = Current_Assignment); + Set_Etype (N, Etype (Name (Current_Assignment))); + else + Report_Error; + end if; + return; + +-- Prevent the search from going too far + +elsif Is_Body_Or_Package_Declaration (Context) then Report_Error; return; end if; - end; - end if; - Set_Etype (N, Etype (Name (Current_Assignment))); +Current := Context; +Context := Parent (Context); + end loop; + + Report_Error; + end; end Analyze_Target_Name;
[Ada] Remove Unknown_ functions
Remove the Unknown_ type representation attribute predicates from Einfo.Utils. "not Known_Alignment (...)" is at least as readable as "Unknown_Alignment (...)" -- we don't need a bunch of functions that just do a "not". Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * einfo-utils.ads, einfo-utils.adb (Unknown_Alignment, Unknown_Component_Bit_Offset, Unknown_Component_Size, Unknown_Esize, Unknown_Normalized_First_Bit, Unknown_Normalized_Position, Unknown_Normalized_Position_Max, Unknown_RM_Size): Remove these functions. * exp_pakd.adb, exp_util.adb, fe.h, freeze.adb, layout.adb, repinfo.adb, sem_ch13.adb, sem_ch3.adb, sem_util.adb: Remove calls to these functions; do "not Known_..." instead. * gcc-interface/decl.c, gcc-interface/trans.c (Unknown_Alignment, Unknown_Component_Size, Unknown_Esize, Unknown_RM_Size): Remove calls to these functions; do "!Known_..." instead.diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb --- a/gcc/ada/einfo-utils.adb +++ b/gcc/ada/einfo-utils.adb @@ -597,46 +597,6 @@ package body Einfo.Utils is and then not Is_Generic_Type (E); end Known_Static_RM_Size; - function Unknown_Alignment (E : Entity_Id) return B is - begin - return not Known_Alignment (E); - end Unknown_Alignment; - - function Unknown_Component_Bit_Offset (E : Entity_Id) return B is - begin - return not Known_Component_Bit_Offset (E); - end Unknown_Component_Bit_Offset; - - function Unknown_Component_Size(E : Entity_Id) return B is - begin - return not Known_Component_Size (E); - end Unknown_Component_Size; - - function Unknown_Esize (E : Entity_Id) return B is - begin - return not Known_Esize (E); - end Unknown_Esize; - - function Unknown_Normalized_First_Bit (E : Entity_Id) return B is - begin - return not Known_Normalized_First_Bit (E); - end Unknown_Normalized_First_Bit; - - function Unknown_Normalized_Position (E : Entity_Id) return B is - begin - return not Known_Normalized_Position (E); - end Unknown_Normalized_Position; - - function Unknown_Normalized_Position_Max (E : Entity_Id) return B is - begin - return not Known_Normalized_Position_Max (E); - end Unknown_Normalized_Position_Max; - - function Unknown_RM_Size (E : Entity_Id) return B is - begin - return not Known_RM_Size (E); - end Unknown_RM_Size; - -- Address_Clause -- diff --git a/gcc/ada/einfo-utils.ads b/gcc/ada/einfo-utils.ads --- a/gcc/ada/einfo-utils.ads +++ b/gcc/ada/einfo-utils.ads @@ -314,12 +314,11 @@ package Einfo.Utils is -- Type Representation Attribute Predicates -- -- - -- These predicates test the setting of the indicated attribute. If the - -- value has been set, then Known is True, and Unknown is False. If no - -- value is set, then Known is False and Unknown is True. The Known_Static - -- predicate is true only if the value is set (Known) and is set to a - -- compile time known value. Note that in the case of Alignment and - -- Normalized_First_Bit, dynamic values are not possible, so we do not + -- These predicates test the setting of the indicated attribute. The + -- Known predicate is True if and only if the value has been set. The + -- Known_Static predicate is True only if the value is set (Known) and is + -- set to a compile time known value. Note that in the case of Alignment + -- and Normalized_First_Bit, dynamic values are not possible, so we do not -- need a separate Known_Static calls in these cases. The not set (unknown) -- values are as follows: @@ -364,15 +363,6 @@ package Einfo.Utils is function Known_Static_Normalized_Position_Max (E : Entity_Id) return B; function Known_Static_RM_Size (E : Entity_Id) return B; - function Unknown_Alignment (E : Entity_Id) return B; - function Unknown_Component_Bit_Offset (E : Entity_Id) return B; - function Unknown_Component_Size(E : Entity_Id) return B; - function Unknown_Esize (E : Entity_Id) return B; - function Unknown_Normalized_First_Bit (E : Entity_Id) return B; - function Unknown_Normalized_Position (E : Entity_Id) return B; - function Unknown_Normalized_Position_Max (E : Entity_Id) return B; - function Unknown_RM_Size (E : Entity_Id) return B; - pragma Inline (Known_Alignment); pragma Inline (Known_Component_Bit_Offset); pragma Inline (Known_Component_Size); @@ -390,15 +380,6 @@ package Einfo.Utils is pragma Inline (Known_Static_Normalized_Position_Max); pragma Inline (Known_Static_RM_Size); - pragma Inline
Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
Hi, Martin, Thank you for the review and comment. On Jul 8, 2021, at 8:29 AM, Martin Jambor mailto:mjam...@suse.cz>> wrote: diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index c05d22f3e8f1..35051d7c6b96 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -384,6 +384,13 @@ static struct /* Numbber of components created when splitting aggregate parameters. */ int param_reductions_created; + + /* Number of deferred_init calls that are modified. */ + int deferred_init; + + /* Number of deferred_init calls that are created by + generate_subtree_deferred_init. */ + int subtree_deferred_init; } sra_stats; static void @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access *racc, tree reg_type) return get_or_create_ssa_default_def (cfun, racc->replacement_decl); } + +/* Generate statements to call .DEFERRED_INIT to initialize scalar replacements + of accesses within a subtree ACCESS; all its children, siblings and their + children are to be processed. + GSI is a statement iterator used to place the new statements. */ +static void +generate_subtree_deferred_init (struct access *access, + tree init_type, + tree is_vla, + gimple_stmt_iterator *gsi, + location_t loc) +{ + do +{ + if (access->grp_to_be_replaced) + { + tree repl = get_access_replacement (access); + gimple *call += gimple_build_call_internal (IFN_DEFERRED_INIT, 3, + TYPE_SIZE_UNIT (TREE_TYPE (repl)), + init_type, is_vla); + gimple_call_set_lhs (call, repl); + gsi_insert_before (gsi, call, GSI_SAME_STMT); + update_stmt (call); + gimple_set_location (call, loc); + sra_stats.subtree_deferred_init++; + } + else if (access->grp_to_be_debug_replaced) + { + tree drepl = get_access_replacement (access); + tree call = build_call_expr_internal_loc + (UNKNOWN_LOCATION, IFN_DEFERRED_INIT, + TREE_TYPE (drepl), 3, + TYPE_SIZE_UNIT (TREE_TYPE (drepl)), + init_type, is_vla); + gdebug *ds = gimple_build_debug_bind (drepl, call, + gsi_stmt (*gsi)); + gsi_insert_before (gsi, ds, GSI_SAME_STMT); Is handling of grp_to_be_debug_replaced accesses necessary here? If so, why? grp_to_be_debug_replaced accesses are there only to facilitate debug information about a part of an aggregate decl is that is likely going to be entirely removed - so that debuggers can sometimes show to users information about what they would contain had they not removed. It seems strange you need to mark them as uninitialized because they should not have any consumers. (But perhaps it is also harmless.) This part has been discussed during the 2nd version of the patch, but I think that more discussion might be necessary. In the previous discussion, Richard Sandiford mentioned: (https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html): = I guess the thing we need to decide here is whether -ftrivial-auto-var-init should affect debug-only constructs too. If it doesn't, exmaining removed components in a debugger might show uninitialised values in cases where the user was expecting initialised ones. There would be no security concern, but it might be surprising. I think in principle the DRHS can contain a call to DEFERRED_INIT. Doing that would probably require further handling elsewhere though. = I am still not very confident now for this part of the change. My questions: 1. If we don’t handle grp_to_be_debug_replaced at all, what will happen? ( the user of the debugger will see uninitialized values in the removed part of the aggregate? Or something else?) 2. On the other hand, if we handle grp_to_be_debug_replaced as the current patch, what will the user of the debugger see? On a related note, if the intent of the feature is for optimizers to behave (almost?) as if it was not taking place, What’s you mean by “it” here? I believe you need to handle specially, and probably just ignore, calls to IFN_DEFERRED_INIT in scan_function in tree-sra.c. Do you mean to let tree-sra phase ignore IFN_DEFERRED_INIT calls completely? My major purpose of change tree-sra.c phase is: Change: tmp = .DEFERRED_INIT (24, 2, 0) To tmp1 = .DEFERRED_INIT (8, 2, 0); tmp2 = .DEFERRED_INIT (8, 2, 0); tmp3 = .DEFERRED_INIT (8, 2, 0); Doing this is to reduce the stack usage. Otherwise the generated SRA access structures will have extra write flags turned on in them and that will lead to different behavior of the pass. Could you please explain this more? thanks. Qing Martin + } + if (access->first_child) + generate_subtree_deferred_init (access->first_child, init_type, + is_vla, gsi, loc); + + access = access ->next_sibling; +} + while (access); +} + +/* For a call to .DEFERRED_INIT: + var = .DEFERRED_INIT (size_of_var, init_type, is_vla); + examine the LHS variable VAR and replace it with a scalar replacement if + there is one, also replace the RHS call to a call to .DEFERRED_INIT of + the corresponding scalar relacement variable. Examine the subtree and + do the s
[PATCH] c++: requires-expr with dependent extra args [PR101181]
Here we're crashing ultimately because the mechanism for delaying substitution into a requires-expression (or constexpr if) doesn't expect to see dependent args. But we end up capturing dependent args here when substituting into the default template argument during coerce_template_parms for the dependent specialization p. This patch enables the commented out code in add_extra_args for handling this situation. It turns out we also need to make a copy of the captured arguments so that coerce_template_parms doesn't later add to the argument, which would form an unexpected cycle. And we need to make tsubst_template_args more forgiving about missing template arguments, since the arguments we capture from coerce_template_parms are incomplete. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/11? PR c++/101181 gcc/cp/ChangeLog: * constraint.cc (tsubst_requires_expr): Pass complain/in_decl to add_extra_args. * cp-tree.h (add_extra_args): Add complain/in_decl parameters. * pt.c (build_extra_args): Make a copy of args. (add_extra_args): Add complain/in_decl parameters. Handle the case where the extra arguments are dependent. (tsubst_pack_expansion): Pass complain/in_decl to add_extra_args. (tsubst_template_args): Handle missing template arguments. (tsubst_expr) : Pass complain/in_decl to add_extra_args. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires26.C: New test. * g++.dg/cpp2a/lambda-uneval16.C: New test. --- gcc/cp/constraint.cc | 3 +- gcc/cp/cp-tree.h | 2 +- gcc/cp/pt.c | 31 +-- .../g++.dg/cpp2a/concepts-requires26.C| 18 +++ gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C | 22 + 5 files changed, 58 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires26.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 99d3ccc6998..4ee5215df50 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -2266,7 +2266,8 @@ tsubst_requires_expr (tree t, tree args, sat_info info) /* A requires-expression is an unevaluated context. */ cp_unevaluated u; - args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args); + args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args, +info.complain, info.in_decl); if (processing_template_decl) { /* We're partially instantiating a generic lambda. Substituting into diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 58da7460001..0a5f13489cc 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7289,7 +7289,7 @@ extern void add_mergeable_specialization(bool is_decl, bool is_alias, tree outer, unsigned); extern tree add_to_template_args (tree, tree); extern tree add_outermost_template_args(tree, tree); -extern tree add_extra_args (tree, tree); +extern tree add_extra_args (tree, tree, tsubst_flags_t, tree); extern tree build_extra_args (tree, tree, tsubst_flags_t); /* in rtti.c */ diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 06116d16887..e4bdac087ad 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -12928,7 +12928,9 @@ extract_local_specs (tree pattern, tsubst_flags_t complain) tree build_extra_args (tree pattern, tree args, tsubst_flags_t complain) { - tree extra = args; + /* Make a copy of the extra arguments so that they won't get changed + from under us. */ + tree extra = copy_template_args (args); if (local_specializations) if (tree locals = extract_local_specs (pattern, complain)) extra = tree_cons (NULL_TREE, extra, locals); @@ -12939,7 +12941,7 @@ build_extra_args (tree pattern, tree args, tsubst_flags_t complain) normal template args to ARGS. */ tree -add_extra_args (tree extra, tree args) +add_extra_args (tree extra, tree args, tsubst_flags_t complain, tree in_decl) { if (extra && TREE_CODE (extra) == TREE_LIST) { @@ -12959,20 +12961,14 @@ add_extra_args (tree extra, tree args) gcc_assert (!TREE_PURPOSE (extra)); extra = TREE_VALUE (extra); } -#if 1 - /* I think we should always be able to substitute dependent args into the - pattern. If that turns out to be incorrect in some cases, enable the - alternate code (and add complain/in_decl parms to this function). */ - gcc_checking_assert (!uses_template_parms (extra)); -#else - if (!uses_template_parms (extra)) + if (uses_template_parms (extra)) { - gcc_unreachable (); + /* This can happen during dependent substitution into a requires-expr +or a lambda that uses constexpr if. */ extra = tsubst_template_args (extra
Re: [PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction
Richard Biener writes: > On Thu, Jul 8, 2021 at 2:46 PM Richard Sandiford via Gcc-patches > wrote: >> >> This patch passes the reduc_info to get_initial_defs_for_reduction, >> so that the function can get general information from there rather >> than from the first SLP statement. This isn't a win on its own, >> but it becomes important with later patches. > > So the original code should have used SLP_TREE_REPRESENTATIVE > instead of SLP_TREE_SCALAR_STMTS ()[0] (there might have been > issues with doing that - my recollection is weak here). > > I'm not sure if reduc_info is actually better - only the representative > will have STMT_VINFO_VECTYPE set, for the reduc_info > there's STMT_VINFO_REDUC_VECTYPE (and STMT_VINFO_REDUC_VECTYPE_IN). > > So I think if you want to use reduc_info then you want to use > STMT_VINFO_REDUC_VECTYPE? I guess I'm a bit fuzzy on the details, but AIUI STMT_VINFO_REDUC_VECTYPE is the type that we do the arithmetic in, which might be different from the types of the phis. Is that right? In this context we want the types of the phis, since the routine is providing the initial values. Using STMT_VINFO_REDUC_VECTYPE gives things like: --- gcc.dg/torture/pr92345.c:8:1: error: incompatible types in 'PHI' argument 1 vector(4) int vector(4) unsigned int vect_fr_lsm.11_58 = PHI --- Thanks, Richard > >> gcc/ >> * tree-vect-loop.c (get_initial_defs_for_reduction): Take the >> reduc_info as an additional parameter. >> (vect_transform_cycle_phi): Update accordingly. >> --- >> gcc/tree-vect-loop.c | 23 ++- >> 1 file changed, 10 insertions(+), 13 deletions(-) >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index a31d7621c3b..565c2859477 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -4764,32 +4764,28 @@ get_initial_def_for_reduction (loop_vec_info >> loop_vinfo, >>return init_def; >> } >> >> -/* Get at the initial defs for the reduction PHIs in SLP_NODE. >> - NUMBER_OF_VECTORS is the number of vector defs to create. >> - If NEUTRAL_OP is nonnull, introducing extra elements of that >> - value will not change the result. */ >> +/* Get at the initial defs for the reduction PHIs for REDUC_INFO, whose >> + associated SLP node is SLP_NODE. NUMBER_OF_VECTORS is the number of >> vector >> + defs to create. If NEUTRAL_OP is nonnull, introducing extra elements of >> + that value will not change the result. */ >> >> static void >> get_initial_defs_for_reduction (vec_info *vinfo, >> + stmt_vec_info reduc_info, >> slp_tree slp_node, >> vec *vec_oprnds, >> unsigned int number_of_vectors, >> bool reduc_chain, tree neutral_op) >> { >>vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); >> - stmt_vec_info stmt_vinfo = stmts[0]; >>unsigned HOST_WIDE_INT nunits; >>unsigned j, number_of_places_left_in_vector; >> - tree vector_type; >> + tree vector_type = STMT_VINFO_VECTYPE (reduc_info); >>unsigned int group_size = stmts.length (); >>unsigned int i; >>class loop *loop; >> >> - vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); >> - >> - gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def); >> - >> - loop = (gimple_bb (stmt_vinfo->stmt))->loop_father; >> + loop = (gimple_bb (reduc_info->stmt))->loop_father; >>gcc_assert (loop); >>edge pe = loop_preheader_edge (loop); >> >> @@ -4823,7 +4819,7 @@ get_initial_defs_for_reduction (vec_info *vinfo, >> { >>tree op; >>i = j % group_size; >> - stmt_vinfo = stmts[i]; >> + stmt_vec_info stmt_vinfo = stmts[i]; >> >>/* Get the def before the loop. In reduction chain we have only >> one initial value. Else we have as many as PHIs in the group. */ >> @@ -7510,7 +7506,8 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, >> = neutral_op_for_slp_reduction (slp_node, vectype_out, >> STMT_VINFO_REDUC_CODE >> (reduc_info), >> first != NULL); >> - get_initial_defs_for_reduction (loop_vinfo, >> slp_node_instance->reduc_phis, >> + get_initial_defs_for_reduction (loop_vinfo, reduc_info, >> + slp_node_instance->reduc_phis, >> &vec_initial_defs, vec_num, >> first != NULL, neutral_op); >> }
[committed] Use Object Size Type zero for -Warray-bounds [PR101374]
I have committed the attached patch to unblock the bootstrap errors due to the tightening up of the -Warray-bounds checking in r12-213. I have also temporarily disabled a couple of instances of the warning in gcc/cp/module.cc. They don't appear to be caused by the same tighter checking but I haven't determined the root cause yet. I'll submit another patch and/or bug when I do. I tested this change by configuring with --enable-checking=release and --enable-checking=yes,extra and successfully bootstrapping all languages but libgo. Libgo fails with a couple of new instances of -Warray-bounds where it writes into an invalid address. I have a patch that suppresses these two -Warray-bounds instances but it doesn't look like I can commit it myself so I'll forward the patch to Ian separately. Martin Use Object Size Type zero for -Warray-bounds [PR101374]. PR bootstrap/101374 - -Warray-bounds accessing a member subobject as derived gcc/cp/ChangeLog: * module.cc (module_state::read_macro_maps): Temporarily disable -Warray-bounds. (module_state::install_macros): Same. gcc/ChangeLog: * gimple-array-bounds.cc (array_bounds_checker::check_mem_ref): Use Object Size Type 0 instead of 1. gcc/testsuite/ChangeLog: * c-c++-common/Warray-bounds-3.c: Xfail assertion. * c-c++-common/Warray-bounds-4.c: Same. libgo/ChangeLog: * runtime/proc.c (runtime_mstart): Suppress -Warray-bounds. * runtime/runtime_c.c (runtime_signalstack): Same. diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index f259515a498..8a890c167cf 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -16301,11 +16301,18 @@ module_state::read_macro_maps () } if (count) sec.set_overrun (); + + /* FIXME: Re-enable or fix after root causing. */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Warray-bounds" + dump (dumper::LOCATION) && dump ("Macro:%u %I %u/%u*2 locations [%u,%u)", ix, identifier (node), runs, n_tokens, MAP_START_LOCATION (macro), MAP_START_LOCATION (macro) + n_tokens); + +#pragma GCC diagnostic pop } location_t lwm = sec.u (); macro_locs.first = lwm - slurp->loc_deltas.second; @@ -16911,6 +16918,10 @@ module_state::install_macros () macro_import::slot &slot = imp.append (mod, flags); slot.offset = sec.u (); + /* FIXME: Re-enable or fix after root causing. */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Warray-bounds" + dump (dumper::MACRO) && dump ("Read %s macro %s%s%s %I at %u", imp.length () > 1 ? "add" : "new", @@ -16931,6 +16942,8 @@ module_state::install_macros () exp.def = cur; dump (dumper::MACRO) && dump ("Saving current #define %I", identifier (node)); + +#pragma GCC diagnostic pop } } diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc index 83b8db9755e..8dfd6f9500a 100644 --- a/gcc/gimple-array-bounds.cc +++ b/gcc/gimple-array-bounds.cc @@ -427,7 +427,7 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref, axssize = wi::to_offset (access_size); access_ref aref; - if (!compute_objsize (ref, 1, &aref, ranges)) + if (!compute_objsize (ref, 0, &aref, ranges)) return false; if (aref.offset_in_range (axssize)) diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-3.c b/gcc/testsuite/c-c++-common/Warray-bounds-3.c index 3d7c7687374..75f9a496eae 100644 --- a/gcc/testsuite/c-c++-common/Warray-bounds-3.c +++ b/gcc/testsuite/c-c++-common/Warray-bounds-3.c @@ -178,7 +178,7 @@ void test_memcpy_bounds_memarray_range (void) TM (ma.a5, ma.a5 + i, ma.a5, 1); TM (ma.a5, ma.a5 + i, ma.a5, 3); - TM (ma.a5, ma.a5 + i, ma.a5, 5); /* { dg-warning "\\\[-Warray-bounds" } */ + TM (ma.a5, ma.a5 + i, ma.a5, 5); /* { dg-warning "\\\[-Warray-bounds" "pr101374" { xfail *-*-* } } */ TM (ma.a5, ma.a5 + i, ma.a5, 7); /* diagnosed with -Warray-bounds=2 */ } diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-4.c b/gcc/testsuite/c-c++-common/Warray-bounds-4.c index 1f73f11943f..835c634fd27 100644 --- a/gcc/testsuite/c-c++-common/Warray-bounds-4.c +++ b/gcc/testsuite/c-c++-common/Warray-bounds-4.c @@ -52,7 +52,7 @@ void test_memcpy_bounds_memarray_range (void) = MEM [(char * {ref-all})&ma]; and could be improved. Just verify that one is issued but not its full text. */ - TM (ma.a5, ma.a5 + j, ma.a5, 5);/* { dg-warning "\\\[-Warray-bounds" } */ + TM (ma.a5, ma.a5 + j, ma.a5, 5);/* { dg-warning "\\\[-Warray-bounds" "pr101374" { xfail *-*-* } } */ TM (ma.a5, ma.a5 + j, ma.a5, 7);/* { dg-warning "offset \\\[5, 7] from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a5. with type .char ?\\\[5]. at offset 0" } */ TM (ma.a5, ma.a5 + j, ma.a5, 9);/* { dg-warning "offset \\\[5, 9] from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a5. with type .char ?\\\[5]. at offset 0" } */
disable -Warray-bounds in libgo (PR 101374)
Hi Ian, Yesterday's enhancement to -Warray-bounds has exposed a couple of issues in libgo where the code writes into an invalid constant address that the warning is designed to flag. On the assumption that those invalid addresses are deliberate, the attached patch suppresses these instances by using #pragma GCC diagnostic but I don't think I'm supposed to commit it (at least Git won't let me). To avoid Go bootstrap failures please either apply the patch or otherwise suppress the warning (e.g., by using a volatile pointer temporary). Thanks Martin Use Object Size Type zero for -Warray-bounds [PR101374]. PR bootstrap/101374 - -Warray-bounds accessing a member subobject as derived libgo/ChangeLog: PR bootstrap/101374 * runtime/proc.c (runtime_mstart): Suppress -Warray-bounds. * runtime/runtime_c.c (runtime_signalstack): Same. diff --git a/libgo/runtime/proc.c b/libgo/runtime/proc.c index 38bf7a6b255..61635e6c1ea 100644 --- a/libgo/runtime/proc.c +++ b/libgo/runtime/proc.c @@ -594,7 +594,14 @@ runtime_mstart(void *arg) gp->entry = nil; gp->param = nil; __builtin_call_with_static_chain(pfn(gp1), fv); + + /* Writing to an invalid address is detected. */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Warray-bounds" + *(int*)0x21 = 0x21; + +#pragma GCC diagnostic push } if(mp->exiting) { @@ -662,7 +669,12 @@ setGContext(void) gp->entry = nil; gp->param = nil; __builtin_call_with_static_chain(pfn(gp1), fv); + + /* Writing to an invalid address is detected. */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Warray-bounds" *(int*)0x22 = 0x22; +#pragma GCC diagnostic pop } } diff --git a/libgo/runtime/runtime_c.c b/libgo/runtime/runtime_c.c index 18222c14465..53feaa075c7 100644 --- a/libgo/runtime/runtime_c.c +++ b/libgo/runtime/runtime_c.c @@ -116,7 +116,11 @@ runtime_signalstack(byte *p, uintptr n) if(p == nil) st.ss_flags = SS_DISABLE; if(sigaltstack(&st, nil) < 0) + /* Writing to an invalid address is detected. */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Warray-bounds" *(int *)0xf1 = 0xf1; +#pragma GCC diagnostic push } int32 go_open(char *, int32, int32)
Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)
On 7/8/21 4:41 AM, Andreas Schwab wrote: On Jul 07 2021, Marek Polacek via Gcc-patches wrote: On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches wrote: I certainly will. Pushed in r12-2132. I think this patch breaks bootstrap on x86_64: It also breaks bootstrap on aarch64 and ia64 in stage2. In file included from ../../gcc/c-family/c-common.h:26, from ../../gcc/cp/cp-tree.h:40, from ../../gcc/cp/module.cc:209: In function 'tree_node* identifier(const cpp_hashnode*)', inlined from 'bool module_state::read_macro_maps()' at ../../gcc/cp/module.cc:16305:10: ../../gcc/tree.h:1089:58: error: array subscript -1 is outside array bounds of 'cpp_hashnode [288230376151711743]' [-Werror=array-bounds] 1089 | ((tree) ((char *) (NODE) - sizeof (struct tree_common))) | ^ ../../gcc/cp/module.cc:277:10: note: in expansion of macro 'HT_IDENT_TO_GCC_IDENT' 277 | return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast (node))); | ^ In file included from ../../gcc/tree.h:23, from ../../gcc/c-family/c-common.h:26, from ../../gcc/cp/cp-tree.h:40, from ../../gcc/cp/module.cc:209: ../../gcc/tree-core.h: In member function 'bool module_state::read_macro_maps()': ../../gcc/tree-core.h:1445:24: note: at offset -24 into object 'tree_identifier::id' of size 16 1445 | struct ht_identifier id; |^~ Thanks. This is a different issue than what triggered the other warnings. I have temporarily suppressed these two instances until I root cause them. Bootstrap should now be restored (at least on x86_64). If there are any outstanding warnings that are causing failures please either update pr101374 or open new bugs. Martin
Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack
On 7/5/2021 5:17 AM, Richard Biener via Gcc-patches wrote: On Fri, Jul 2, 2021 at 6:13 PM Jeff Law wrote: This is a minor missed optimization we found with our internal port. Given this code: typedef struct {short a; short b;} T; extern void g1(); void f(T s) { if (s.a < 0) g1(); } "s" is passed in a register, but it's still a BLKmode object because the alignment of T is smaller than the alignment that an integer of the same size would have (16 bits vs 32 bits). Because "s" is BLKmode function.c is going to store it into a stack slot and we'll load it from the that slot for each reference. So on the v850 (just to pick a port that likely has the same behavior we're seeing) we have this RTL from CSE2: (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp) (const_int -4 [0xfffc])) [2 S4 A32]) (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 6 r6) (nil))) (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 3 9 2 (set (reg:HI 44 [ s.a ]) (mem/c:HI (plus:SI (reg/f:SI 34 .fp) (const_int -4 [0xfffc])) [1 s.a+0 S2 A32])) "j.c":7:5 3 {*movhi_internal} (nil)) (insn 9 8 10 2 (parallel [ (set (reg:SI 45) (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0) (const_int 16 [0x10]))) (clobber (reg:CC 32 psw)) ]) "j.c":7:5 94 {ashlsi3_clobber_flags} (expr_list:REG_DEAD (reg:HI 44 [ s.a ]) (expr_list:REG_UNUSED (reg:CC 32 psw) (nil (insn 10 9 11 2 (parallel [ (set (reg:SI 43) (ashiftrt:SI (reg:SI 45) (const_int 16 [0x10]))) (clobber (reg:CC 32 psw)) ]) "j.c":7:5 104 {ashrsi3_clobber_flags} (expr_list:REG_DEAD (reg:SI 45) (expr_list:REG_UNUSED (reg:CC 32 psw) (nil Insn 2 is the store into the stack. insn 8 is the load for s.a in the conditional. DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6) has the value we want. After that the store at insn 2 is dead. Sadly DSE never removes the store. The problem is RTL DSE considers a store with no MEM_EXPR as escaping, which keeps the MEM live. The lack of a MEM_EXPR is due to call to change_address to twiddle the mode on the MEM for the store at insn 2. It should be safe to copy the MEM_EXPR (which should always be a PARM_DECL) from the original memory to the memory returned by change_address. Doing so results in DSE1 removing the store at insn 2. It would be nice to remove the stack setup/teardown. I'm not offhand aware of mechanisms to remove the setup/teardown after we've already allocated a slot, even if the slot is no longer used. Bootstrapped and regression tested on x86, though I don't think that's a particularly useful test. So I also ran it through my tester across those pesky embedded targets without regressions as well. I didn't include a test simply because I didn't want to have an insane target selector. I guess if we really wanted a test we could look after DSE1 is done and verify there aren't any MEMs left at all. Willing to try that if the consensus is we want this tested. OK for the trunk? I wonder why the code doesn't use adjust_address instead? That handles most cases already and the code doesn't change the address but just the mode (and access size)? Yea, adjust_address seems to work fine. I'm spinning that in my tester at the moment. Jeff
Re: [PATCH v2] c++: Fix noexcept with unevaluated operand [PR101087]
On Thu, Jul 08, 2021 at 09:35:02AM -0400, Marek Polacek wrote: > On Thu, Jul 08, 2021 at 09:30:27AM -0400, Jason Merrill wrote: > > On 7/7/21 9:40 PM, Marek Polacek wrote: > > > It sounds plausible that this assert > > > > > >int f(); > > >static_assert(noexcept(sizeof(f(; > > > > > > should pass: sizeof produces a std::size_t and its operand is not > > > evaluated, so it can't throw. noexcept should only evaluate to > > > false for potentially evaluated operands. Therefore I think that > > > check_noexcept_r shouldn't walk into operands of sizeof/decltype/ > > > alignof/typeof. Only checking cp_unevaluated_operand therein does > > > not work, because expr_noexcept_p can be called in an unevaluated > > > context, so I resorted to the following cp_evaluated hack. Does > > > that seem acceptable? > > > > I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR > > directly? > > I thought I would, but then it occurred to me that it might be better to > rely on cp_walk_subtrees which ++/--s cp_unevaluated_operand for those > codes. I'd be happy to change the patch to check those codes directly; > maybe I'm overthinking things here. So here's v2 which checks the codes directly, via a new inline: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- It sounds plausible that this assert int f(); static_assert(noexcept(sizeof(f(; should pass: sizeof produces a std::size_t and its operand is not evaluated, so it can't throw. noexcept should only evaluate to false for potentially evaluated operands. Therefore I think that check_noexcept_r shouldn't walk into operands of sizeof/decltype/ alignof/typeof. PR c++/101087 gcc/cp/ChangeLog: * cp-tree.h (unevaluated_p): New. * except.c (check_noexcept_r): Use it. Don't walk into unevaluated operands. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept70.C: New test. --- gcc/cp/cp-tree.h| 13 + gcc/cp/except.c | 9 ++--- gcc/testsuite/g++.dg/cpp0x/noexcept70.C | 5 + 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept70.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index b4501576b26..d4810c0c986 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -8465,6 +8465,19 @@ is_constrained_auto (const_tree t) return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t); } +/* True if CODE, a tree code, denotes a tree whose operand is not evaluated + as per [expr.context], i.e., an operand to sizeof, typeof, decltype, or + alignof. */ + +inline bool +unevaluated_p (tree_code code) +{ + return (code == DECLTYPE_TYPE + || code == ALIGNOF_EXPR + || code == SIZEOF_EXPR + || code == NOEXCEPT_EXPR); +} + /* RAII class to push/pop the access scope for T. */ struct push_access_scope_guard diff --git a/gcc/cp/except.c b/gcc/cp/except.c index a8cea53cf91..a8acbc4b7b2 100644 --- a/gcc/cp/except.c +++ b/gcc/cp/except.c @@ -1033,12 +1033,15 @@ check_handlers (tree handlers) expression whose type is a polymorphic class type (10.3). */ static tree -check_noexcept_r (tree *tp, int * /*walk_subtrees*/, void * /*data*/) +check_noexcept_r (tree *tp, int *walk_subtrees, void *) { tree t = *tp; enum tree_code code = TREE_CODE (t); - if ((code == CALL_EXPR && CALL_EXPR_FN (t)) - || code == AGGR_INIT_EXPR) + + if (unevaluated_p (code)) +*walk_subtrees = false; + else if ((code == CALL_EXPR && CALL_EXPR_FN (t)) + || code == AGGR_INIT_EXPR) { /* We can only use the exception specification of the called function for determining the value of a noexcept expression; we can't use diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept70.C b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C new file mode 100644 index 000..45a6137dd6f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C @@ -0,0 +1,5 @@ +// PR c++/101087 +// { dg-do compile { target c++11 } } + +int f(); +static_assert(noexcept(sizeof(f())), ""); base-commit: 763121ccd908f52bc666f277ea2cf42110b3aad9 -- 2.31.1
Re: disable -Warray-bounds in libgo (PR 101374)
Hi Martin, > Yesterday's enhancement to -Warray-bounds has exposed a couple of > issues in libgo where the code writes into an invalid constant > address that the warning is designed to flag. > > On the assumption that those invalid addresses are deliberate, > the attached patch suppresses these instances by using #pragma > GCC diagnostic but I don't think I'm supposed to commit it (at > least Git won't let me). To avoid Go bootstrap failures please > either apply the patch or otherwise suppress the warning (e.g., > by using a volatile pointer temporary). while this patch does fix the libgo bootstrap failure, Go is completely broken: almost 1000 go.test failures and all libgo tests FAIL as well. Seen on both i386-pc-solaris2.11 and sparc-sun-solaris2.11. Please fix. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
(Resend this email since the previous one didn’t quote, I changed one setting in my mail client, hopefully that can fix this issue). Hi, Martin, Thank you for the review and comment. > On Jul 8, 2021, at 8:29 AM, Martin Jambor wrote: >> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >> index c05d22f3e8f1..35051d7c6b96 100644 >> --- a/gcc/tree-sra.c >> +++ b/gcc/tree-sra.c >> @@ -384,6 +384,13 @@ static struct >> >> /* Numbber of components created when splitting aggregate parameters. */ >> int param_reductions_created; >> + >> + /* Number of deferred_init calls that are modified. */ >> + int deferred_init; >> + >> + /* Number of deferred_init calls that are created by >> + generate_subtree_deferred_init. */ >> + int subtree_deferred_init; >> } sra_stats; >> >> static void >> @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access *racc, >> tree reg_type) >> return get_or_create_ssa_default_def (cfun, racc->replacement_decl); >> } >> >> + >> +/* Generate statements to call .DEFERRED_INIT to initialize scalar >> replacements >> + of accesses within a subtree ACCESS; all its children, siblings and their >> + children are to be processed. >> + GSI is a statement iterator used to place the new statements. */ >> +static void >> +generate_subtree_deferred_init (struct access *access, >> +tree init_type, >> +tree is_vla, >> +gimple_stmt_iterator *gsi, >> +location_t loc) >> +{ >> + do >> +{ >> + if (access->grp_to_be_replaced) >> +{ >> + tree repl = get_access_replacement (access); >> + gimple *call >> += gimple_build_call_internal (IFN_DEFERRED_INIT, 3, >> + TYPE_SIZE_UNIT (TREE_TYPE (repl)), >> + init_type, is_vla); >> + gimple_call_set_lhs (call, repl); >> + gsi_insert_before (gsi, call, GSI_SAME_STMT); >> + update_stmt (call); >> + gimple_set_location (call, loc); >> + sra_stats.subtree_deferred_init++; >> +} >> + else if (access->grp_to_be_debug_replaced) >> +{ >> + tree drepl = get_access_replacement (access); >> + tree call = build_call_expr_internal_loc >> + (UNKNOWN_LOCATION, IFN_DEFERRED_INIT, >> + TREE_TYPE (drepl), 3, >> + TYPE_SIZE_UNIT (TREE_TYPE (drepl)), >> + init_type, is_vla); >> + gdebug *ds = gimple_build_debug_bind (drepl, call, >> +gsi_stmt (*gsi)); >> + gsi_insert_before (gsi, ds, GSI_SAME_STMT); > > Is handling of grp_to_be_debug_replaced accesses necessary here? If so, > why? grp_to_be_debug_replaced accesses are there only to facilitate > debug information about a part of an aggregate decl is that is likely > going to be entirely removed - so that debuggers can sometimes show to > users information about what they would contain had they not removed. > It seems strange you need to mark them as uninitialized because they > should not have any consumers. (But perhaps it is also harmless.) This part has been discussed during the 2nd version of the patch, but I think that more discussion might be necessary. In the previous discussion, Richard Sandiford mentioned: (https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html): = I guess the thing we need to decide here is whether -ftrivial-auto-var-init should affect debug-only constructs too. If it doesn't, exmaining removed components in a debugger might show uninitialised values in cases where the user was expecting initialised ones. There would be no security concern, but it might be surprising. I think in principle the DRHS can contain a call to DEFERRED_INIT. Doing that would probably require further handling elsewhere though. = I am still not very confident now for this part of the change. My questions: 1. If we don’t handle grp_to_be_debug_replaced at all, what will happen? ( the user of the debugger will see uninitialized values in the removed part of the aggregate? Or something else?) 2. On the other hand, if we handle grp_to_be_debug_replaced as the current patch, what will the user of the debugger see? > > On a related note, if the intent of the feature is for optimizers to > behave (almost?) as if it was not taking place, What’s you mean by “it” here? > I believe you need to > handle specially, and probably just ignore, calls to IFN_DEFERRED_INIT > in scan_function in tree-sra.c. Do you mean to let tree-sra phase ignore IFN_DEFERRED_INIT calls completely? My major purpose of change tree-sra.c phase is: Change: tmp = .DEFERRED_INIT (24, 2, 0) To tmp1 = .DEFERRED_INIT (8, 2, 0); tmp2 = .DEFERRED_INIT (8, 2, 0); tmp3 = .DEFERRED_INIT (8, 2, 0); Doing this is to reduce the stack usage. > Otherwise the generated SRA access > structures will have extra write flags tu
[committed] Further improvements to H8 variable shift patterns
And another installment in optimizing a dead architecture. This builds on prior patches to improve compare/test elimination for shifts. Specifically for the older chips in the H8 family we have to handle variable shifts with a loop. Right now the splitter generates (set (pc) (if_then_else (lt (countreg) (const_int 0))) to test the shift count. That will get lowered into a compare and a conditional branch using CC_REG. However, this lowering happens after the compare-elim pass, so we don't get much benefit. Instead we can lower directly to the cc exposing form and remove the unnecessary test ourselves (particularly for the case where the shift count pseudo does not die). Essentially we know that the copy into the scratch register is going to set the condition codes in a useful way. So we expose the condition codes on that copy and emit a condition code exposed conditional branch and no longer generate the comparison. Built & tested in the usual way on the H8 without regressions. Committed to the trunk. Probably the last H8 patch before going on vacation :-) Jeff commit b14ac7b29c9a05c94f62fe065c219bbaa83653db Author: Jeff Law Date: Thu Jul 8 17:09:36 2021 -0400 Further improvements to H8 variable shift patterns gcc/ * config/h8300/shiftrotate.md (variable shifts): Expose condition code handling for the test before the loop. diff --git a/gcc/config/h8300/shiftrotate.md b/gcc/config/h8300/shiftrotate.md index 485303cb906..d3aa6bea064 100644 --- a/gcc/config/h8300/shiftrotate.md +++ b/gcc/config/h8300/shiftrotate.md @@ -377,8 +377,10 @@ (clobber (reg:CC CC_REG))] "epilogue_completed && find_regno_note (insn, REG_DEAD, REGNO (operands[1]))" - [(set (pc) -(if_then_else (le (match_dup 1) (const_int 0)) + [(set (reg:CCZN CC_REG) +(compare:CCZN (match_dup 1) (const_int 0))) + (set (pc) +(if_then_else (le (reg:CCZN CC_REG) (const_int 0)) (label_ref (match_dup 5)) (pc))) (match_dup 4) @@ -411,10 +413,12 @@ (clobber (reg:CC CC_REG))] "epilogue_completed && !find_regno_note (insn, REG_DEAD, REGNO (operands[1]))" - [(set (match_dup 3) - (match_dup 1)) + [(parallel + [(set (reg:CCZN CC_REG) + (compare:CCZN (match_dup 1) (const_int 0))) + (set (match_dup 3) (match_dup 1))]) (set (pc) -(if_then_else (le (match_dup 3) (const_int 0)) +(if_then_else (le (reg:CCZN CC_REG) (const_int 0)) (label_ref (match_dup 5)) (pc))) (match_dup 4)
[PATCH] Fix for powerpc64 long double complex divide failure
This patch resolves the failure of powerpc64 long double complex divide in native ibm long double format after the patch "Practical improvement to libgcc complex divide". See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101104 The new code uses the following macros which are intended to be mapped to appropriate values according to the underlying hardware representation. RBIG a value near the maximum representation RMIN a value near the minimum representation (but not in the subnormal range) RMIN2a value moderately less than 1 RMINSCAL the inverse of RMIN2 RMAX2RBIG * RMIN2 - a value to limit scaling to not overflow When "long double" values were not using the IEEE 128-bit format but the traditional IBM 128-bit, the previous code used the LDBL values which caused overflow for RMINSCAL. The new code uses the DBL values. RBIG LDBL_MAX = 0x1.f800p+1022 DBL_MAX = 0x1.f000p+1022 RMIN LDBL_MIN = 0x1.p-969 RMIN DBL_MIN = 0x1.p-1022 RMIN2 LDBL_EPSILON = 0x0.1000p-1022 = 0x1.0p-1074 RMIN2 DBL_EPSILON = 0x1.p-52 RMINSCAL 1/LDBL_EPSILON = inf (1.0p+1074 does not fit in IBM 128-bit). 1/DBL_EPSILON = 0x1.p+52 RMAX2 = RBIG * RMIN2 = 0x1.f800p-52 RBIG * RMIN2 = 0x1.f000p+970 The MAX and MIN values have only modest changes since the exponent field for IBM 128-bit floating point values is the same size as the exponent field for IBM 64-bit floating point values. However the EPSILON field is considerably different. Due to how small values can be represented in the lower 64 bits of the IBM 128-bit floating point, EPSILON is extremely small, so far beyond the desired value that inversion of the value overflows and even without the overflow, the RMAX2 is so small as to eliminate most usage of the test. In addition, the gcc support for the KF fields (IBM native long double format) does not exist on older gcc compilers such as the default compilers on the gcc compiler farm. That adds build complexity for users who's environment is only a few years out of date. Instead of just replacing the use of KF_EPSILON with DF_ESPILON, we replace all uses of KF_* with DF_*. The change has been tested on gcc135.fsffrance.org and gains the expected improvements in accuracy for long double complex divide. libgcc/ * config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2): Fix long double complex divide for native IBM 128-bit --- libgcc/config/rs6000/_divkc3.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c index a1d29d2..2b229c8 100644 --- a/libgcc/config/rs6000/_divkc3.c +++ b/libgcc/config/rs6000/_divkc3.c @@ -38,10 +38,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #endif #ifndef __LONG_DOUBLE_IEEE128__ -#define RBIG (__LIBGCC_KF_MAX__ / 2) -#define RMIN (__LIBGCC_KF_MIN__) -#define RMIN2 (__LIBGCC_KF_EPSILON__) -#define RMINSCAL (1 / __LIBGCC_KF_EPSILON__) +#define RBIG (__LIBGCC_DF_MAX__ / 2) +#define RMIN (__LIBGCC_DF_MIN__) +#define RMIN2 (__LIBGCC_DF_EPSILON__) +#define RMINSCAL (1 / __LIBGCC_DF_EPSILON__) #define RMAX2 (RBIG * RMIN2) #else #define RBIG (__LIBGCC_TF_MAX__ / 2) -- 1.8.3.1
Re: [PATCH v2] c++: Fix noexcept with unevaluated operand [PR101087]
On 7/8/21 4:26 PM, Marek Polacek wrote: On Thu, Jul 08, 2021 at 09:35:02AM -0400, Marek Polacek wrote: On Thu, Jul 08, 2021 at 09:30:27AM -0400, Jason Merrill wrote: On 7/7/21 9:40 PM, Marek Polacek wrote: It sounds plausible that this assert int f(); static_assert(noexcept(sizeof(f(; should pass: sizeof produces a std::size_t and its operand is not evaluated, so it can't throw. noexcept should only evaluate to false for potentially evaluated operands. Therefore I think that check_noexcept_r shouldn't walk into operands of sizeof/decltype/ alignof/typeof. Only checking cp_unevaluated_operand therein does not work, because expr_noexcept_p can be called in an unevaluated context, so I resorted to the following cp_evaluated hack. Does that seem acceptable? I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR directly? I thought I would, but then it occurred to me that it might be better to rely on cp_walk_subtrees which ++/--s cp_unevaluated_operand for those codes. I'd be happy to change the patch to check those codes directly; maybe I'm overthinking things here. So here's v2 which checks the codes directly, via a new inline: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? OK for trunk and 11, at least. I lean toward putting it on older release branches as well, but it doesn't seem urgent. -- >8 -- It sounds plausible that this assert int f(); static_assert(noexcept(sizeof(f(; should pass: sizeof produces a std::size_t and its operand is not evaluated, so it can't throw. noexcept should only evaluate to false for potentially evaluated operands. Therefore I think that check_noexcept_r shouldn't walk into operands of sizeof/decltype/ alignof/typeof. PR c++/101087 gcc/cp/ChangeLog: * cp-tree.h (unevaluated_p): New. * except.c (check_noexcept_r): Use it. Don't walk into unevaluated operands. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept70.C: New test. --- gcc/cp/cp-tree.h| 13 + gcc/cp/except.c | 9 ++--- gcc/testsuite/g++.dg/cpp0x/noexcept70.C | 5 + 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept70.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index b4501576b26..d4810c0c986 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -8465,6 +8465,19 @@ is_constrained_auto (const_tree t) return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t); } +/* True if CODE, a tree code, denotes a tree whose operand is not evaluated + as per [expr.context], i.e., an operand to sizeof, typeof, decltype, or + alignof. */ + +inline bool +unevaluated_p (tree_code code) +{ + return (code == DECLTYPE_TYPE + || code == ALIGNOF_EXPR + || code == SIZEOF_EXPR + || code == NOEXCEPT_EXPR); +} + /* RAII class to push/pop the access scope for T. */ struct push_access_scope_guard diff --git a/gcc/cp/except.c b/gcc/cp/except.c index a8cea53cf91..a8acbc4b7b2 100644 --- a/gcc/cp/except.c +++ b/gcc/cp/except.c @@ -1033,12 +1033,15 @@ check_handlers (tree handlers) expression whose type is a polymorphic class type (10.3). */ static tree -check_noexcept_r (tree *tp, int * /*walk_subtrees*/, void * /*data*/) +check_noexcept_r (tree *tp, int *walk_subtrees, void *) { tree t = *tp; enum tree_code code = TREE_CODE (t); - if ((code == CALL_EXPR && CALL_EXPR_FN (t)) - || code == AGGR_INIT_EXPR) + + if (unevaluated_p (code)) +*walk_subtrees = false; + else if ((code == CALL_EXPR && CALL_EXPR_FN (t)) + || code == AGGR_INIT_EXPR) { /* We can only use the exception specification of the called function for determining the value of a noexcept expression; we can't use diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept70.C b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C new file mode 100644 index 000..45a6137dd6f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C @@ -0,0 +1,5 @@ +// PR c++/101087 +// { dg-do compile { target c++11 } } + +int f(); +static_assert(noexcept(sizeof(f())), ""); base-commit: 763121ccd908f52bc666f277ea2cf42110b3aad9
Re: [PATCH v2] c++: Fix noexcept with unevaluated operand [PR101087]
On Thu, Jul 08, 2021 at 05:34:24PM -0400, Jason Merrill wrote: > On 7/8/21 4:26 PM, Marek Polacek wrote: > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > OK for trunk and 11, at least. I lean toward putting it on older release > branches as well, but it doesn't seem urgent. Ok, I'll backport to 11 and 10, it seems very safe. Thanks, Marek
Re: [PATCH] c++: requires-expr with dependent extra args [PR101181]
On 7/8/21 11:28 AM, Patrick Palka wrote: Here we're crashing ultimately because the mechanism for delaying substitution into a requires-expression (or constexpr if) doesn't expect to see dependent args. But we end up capturing dependent args here when substituting into the default template argument during coerce_template_parms for the dependent specialization p. This patch enables the commented out code in add_extra_args for handling this situation. It turns out we also need to make a copy of the captured arguments so that coerce_template_parms doesn't later add to the argument, which would form an unexpected cycle. And we need to make tsubst_template_args more forgiving about missing template arguments, since the arguments we capture from coerce_template_parms are incomplete. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/11? PR c++/101181 gcc/cp/ChangeLog: * constraint.cc (tsubst_requires_expr): Pass complain/in_decl to add_extra_args. * cp-tree.h (add_extra_args): Add complain/in_decl parameters. * pt.c (build_extra_args): Make a copy of args. (add_extra_args): Add complain/in_decl parameters. Handle the case where the extra arguments are dependent. (tsubst_pack_expansion): Pass complain/in_decl to add_extra_args. (tsubst_template_args): Handle missing template arguments. (tsubst_expr) : Pass complain/in_decl to add_extra_args. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires26.C: New test. * g++.dg/cpp2a/lambda-uneval16.C: New test. --- gcc/cp/constraint.cc | 3 +- gcc/cp/cp-tree.h | 2 +- gcc/cp/pt.c | 31 +-- .../g++.dg/cpp2a/concepts-requires26.C| 18 +++ gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C | 22 + 5 files changed, 58 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires26.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 99d3ccc6998..4ee5215df50 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -2266,7 +2266,8 @@ tsubst_requires_expr (tree t, tree args, sat_info info) /* A requires-expression is an unevaluated context. */ cp_unevaluated u; - args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args); + args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args, +info.complain, info.in_decl); if (processing_template_decl) { /* We're partially instantiating a generic lambda. Substituting into diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 58da7460001..0a5f13489cc 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7289,7 +7289,7 @@ extern void add_mergeable_specialization(bool is_decl, bool is_alias, tree outer, unsigned); extern tree add_to_template_args (tree, tree); extern tree add_outermost_template_args (tree, tree); -extern tree add_extra_args (tree, tree); +extern tree add_extra_args (tree, tree, tsubst_flags_t, tree); extern tree build_extra_args (tree, tree, tsubst_flags_t); /* in rtti.c */ diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 06116d16887..e4bdac087ad 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -12928,7 +12928,9 @@ extract_local_specs (tree pattern, tsubst_flags_t complain) tree build_extra_args (tree pattern, tree args, tsubst_flags_t complain) { - tree extra = args; + /* Make a copy of the extra arguments so that they won't get changed + from under us. */ + tree extra = copy_template_args (args); if (local_specializations) if (tree locals = extract_local_specs (pattern, complain)) extra = tree_cons (NULL_TREE, extra, locals); @@ -12939,7 +12941,7 @@ build_extra_args (tree pattern, tree args, tsubst_flags_t complain) normal template args to ARGS. */ tree -add_extra_args (tree extra, tree args) +add_extra_args (tree extra, tree args, tsubst_flags_t complain, tree in_decl) { if (extra && TREE_CODE (extra) == TREE_LIST) { @@ -12959,20 +12961,14 @@ add_extra_args (tree extra, tree args) gcc_assert (!TREE_PURPOSE (extra)); extra = TREE_VALUE (extra); } -#if 1 - /* I think we should always be able to substitute dependent args into the - pattern. If that turns out to be incorrect in some cases, enable the - alternate code (and add complain/in_decl parms to this function). */ Ah, because these cases aren't pack expansions, so we aren't trying to do the substitution; I wonder if it would be feasible to do so. But this approach is probably simpler. OK. - gcc_checking_assert (!uses_template_parms (extra)); -#else - if (!use
rs6000: Generate an lxvp instead of two adjacent lxv instructions
The MMA build built-ins currently use individual lxv instructions to load up the registers of a __vector_pair or __vector_quad. If the memory addresses of the built-in operands are to adjacent locations, then we could use an lxvp in some cases to load up two registers at once. The patch below adds support for checking whether memory addresses are adjacent and emitting an lxvp instead of two lxv instructions. This passed bootstrap and regtesting on powerpc64le-linux with no regressions. Ok for trunk? This seems simple enough, that I'd like to backport this to GCC 11 after some burn in on trunk, if that is ok? Given the MMA redesign from GCC 10 to GCC 11, I have no plans to backport this to GCC 10. Peter gcc/ * config/rs6000/rs6000.c (consecutive_mem_locations): New function. (rs6000_split_multireg_move): Handle MMA build built-ins with operands in consecutive memory locations. (adjacent_mem_locations): Return the lower addressed memory rtx, if any. (power6_sched_reorder2): Update for adjacent_mem_locations change. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-9.c: New test. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 9a5db63d0ef..de36c5ecd91 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -293,6 +293,8 @@ bool cpu_builtin_p = false; don't link in rs6000-c.c, so we can't call it directly. */ void (*rs6000_target_modify_macros_ptr) (bool, HOST_WIDE_INT, HOST_WIDE_INT); +static bool consecutive_mem_locations (rtx, rtx); + /* Simplfy register classes into simpler classifications. We assume GPR_REG_TYPE - FPR_REG_TYPE are ordered so that we can use a simple range check for standard register classes (gpr/floating/altivec/vsx) and @@ -16841,8 +16843,35 @@ rs6000_split_multireg_move (rtx dst, rtx src) for (int i = 0; i < nvecs; i++) { int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; - rtx dst_i = gen_rtx_REG (reg_mode, reg + index); - emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); + int index_next = WORDS_BIG_ENDIAN ? index + 1 : index - 1; + rtx dst_i; + int regno = reg + i; + + /* If we are loading an even VSX register and our memory location +is adjacent to the next register's memory location (if any), +then we can load them both with one LXVP instruction. */ + if ((regno & 1) == 0 + && VSX_REGNO_P (regno) + && MEM_P (XVECEXP (src, 0, index)) + && MEM_P (XVECEXP (src, 0, index_next))) + { + rtx base = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index) + : XVECEXP (src, 0, index_next); + rtx next = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index_next) + : XVECEXP (src, 0, index); + + if (consecutive_mem_locations (base, next)) + { + dst_i = gen_rtx_REG (OOmode, regno); + emit_move_insn (dst_i, adjust_address (base, OOmode, 0)); + /* Skip the next register, since we just loaded it. */ + i++; + continue; + } + } + + dst_i = gen_rtx_REG (reg_mode, reg + i); + emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, index))); } /* We are writing an accumulator register, so we have to @@ -18427,23 +18456,37 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset, return true; } -/* The function returns true if the target storage location of - mem1 is adjacent to the target storage location of mem2 */ -/* Return 1 if memory locations are adjacent. */ +/* If the target storage locations of arguments MEM1 and MEM2 are + adjacent, then return the argument that has the lower address. + Otherwise, return NULL_RTX. */ -static bool +static rtx adjacent_mem_locations (rtx mem1, rtx mem2) { rtx reg1, reg2; HOST_WIDE_INT off1, size1, off2, size2; if (get_memref_parts (mem1, ®1, &off1, &size1) - && get_memref_parts (mem2, ®2, &off2, &size2)) -return ((REGNO (reg1) == REGNO (reg2)) - && ((off1 + size1 == off2) - || (off2 + size2 == off1))); + && get_memref_parts (mem2, ®2, &off2, &size2) + && REGNO (reg1) == REGNO (reg2)) +{ + if (off1 + size1 == off2) + return mem1; + else if (off2 + size2 == off1) + return mem2; +} - return false; + return NULL_RTX; +} + +/* The function returns true if the target storage location of + MEM1 is adjacent to the target storage location of MEM2 and + MEM1 has a lower address then MEM2. */ + +static bool +consecutive_mem_locations (rtx mem1, rtx mem2) +{ + return adjacent_mem_locations (mem1, mem2) == mem1; } /* This fun
[committed] avoid including to ease cross-compiler testing
I have committed the attached change to ease testing with bare bones cross-compilers with no libstdc++ headers. Tested on x86_64 and with a powerpc64 cross-compiler. Martin commit c68cac900ab4ccaf6b1a31168bc9a302ebc46428 Author: Martin Sebor Date: Thu Jul 8 16:02:01 2021 -0600 Avoid including to make cross-compiler testing easy. gcc/testsuite/ChangeLog: * g++.dg/warn/Warray-bounds-11.C: Avoid including . * g++.dg/warn/Warray-bounds-13.C: Same. diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C index 70b39122f78..9670898770f 100644 --- a/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C +++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C @@ -4,7 +4,24 @@ { dg-do compile } { dg-options "-O2 -Wall -Warray-bounds -ftrack-macro-expansion=0" } */ -#include +#if 0 +// Avoid including to make cross-compiler testing easy. +// #include +#else +namespace std { + +typedef __SIZE_TYPE__ size_t; +struct nothrow_t { }; +extern const nothrow_t nothrow; + +} + +void* operator new (std::size_t, const std::nothrow_t &) throw () + __attribute__ ((__alloc_size__ (1), __malloc__)); +void* operator new[] (std::size_t, const std::nothrow_t &) throw () +__attribute__ ((__alloc_size__ (1), __malloc__)); + +#endif typedef __INT32_TYPE__ int32_t; diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C index 2d3e9dcfd68..449324a315d 100644 --- a/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C +++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C @@ -4,7 +4,24 @@ { dg-do compile } { dg-options "-O2 -Wall -Warray-bounds -ftrack-macro-expansion=0" } */ -#include +#if 0 +// Avoid including to make cross-compiler testing easy. +// #include +#else +namespace std { + +typedef __SIZE_TYPE__ size_t; +struct nothrow_t { }; +extern const nothrow_t nothrow; + +} + +void* operator new (std::size_t, const std::nothrow_t &) throw () + __attribute__ ((__alloc_size__ (1), __malloc__)); +void* operator new[] (std::size_t, const std::nothrow_t &) throw () +__attribute__ ((__alloc_size__ (1), __malloc__)); + +#endif typedef __INT32_TYPE__ int32_t;
[committed] adjust expected test output to LP32 (PR100451)
I have committed the attached change adjusting the expected test output to the difference between LP64 and ILP32. Tested in both modes on x86_64 and with a powerpc64 cross-compiler. Martin Adjust expected output for LP32 [PR100451]. gcc/testsuite/ChangeLog: PR testsuite/100451 * g++.dg/warn/Warray-bounds-20.C: Adjust expected output for LP32. diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C index a65b29e6269..f4876d8a269 100644 --- a/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C +++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C @@ -27,7 +27,7 @@ struct D1: virtual B, virtual C to the opening brace. */ D1 () { // { dg-warning "\\\[-Warray-bounds" "brace" } -ci = 0; // { dg-warning "\\\[-Warray-bounds" "assign" { xfail *-*-* } } +ci = 0; // { dg-warning "\\\[-Warray-bounds" "assign" { xfail lp64 } } } }; @@ -35,7 +35,8 @@ void sink (void*); void warn_derived_ctor_access_new_decl () { - char a[sizeof (D1)];// { dg-message "at offset 1 into object 'a' of size 40" "note" } + char a[sizeof (D1)];// { dg-message "at offset 1 into object 'a' of size 40" "LP64 note" { target lp64} } + // { dg-message "at offset 1 into object 'a' of size 20" "LP64 note" { target ilp32} .-1 } char *p = a; ++p; D1 *q = new (p) D1; @@ -52,7 +53,8 @@ void warn_derived_ctor_access_new_alloc () void warn_derived_ctor_access_new_array_decl () { - char b[sizeof (D1) * 2];// { dg-message "at offset \\d+ into object 'b' of size 80" "note" } + char b[sizeof (D1) * 2];// { dg-message "at offset \\d+ into object 'b' of size 80" "LP64 note" { target lp64 } } + // { dg-message "at offset \\d+ into object 'b' of size 40" "LP64 note" { target ilp32 } .-1 } char *p = b; ++p; D1 *q = new (p) D1[2];
[PATCH] [wwwdocs] Update description of GM2 and document branch
Hello Gerald, Here are two proposed patches to wwwdocs: htdocs/frontends.html: Update the description of GNU Modula-2. htdocs/git.html: Document the new devel/modula-2 branch. regards, Gaius = diff --git a/htdocs/frontends.html b/htdocs/frontends.html index bec33b7b..60f08aa4 100644 --- a/htdocs/frontends.html +++ b/htdocs/frontends.html @@ -42,10 +42,10 @@ has a back end that generates assembler directly, using the GCC back end. http://www.nongnu.org/gm2/";>GNU Modula-2 implements the PIM2, PIM3, PIM4 and ISO dialects of the language. The compiler -is fully operational with the GCC 4.1.2 back end (on GNU/Linux x86 -systems). Work is in progress to move the front end to the GCC trunk. -The front end is mostly written in Modula-2, but includes a bootstrap -procedure via a heavily modified version of p2c. +is fully operational with the GCC 10 and GCC 11 back ends (on +GNU/Linux x86 systems). Work is in progress to move the front end to +the GCC trunk. The front end is mostly written in Modula-2, but +includes a bootstrap procedure using mc. Modula-3 (for links see http://www.modula3.org/";>www.modula3.org); SRC M3 is based on an old diff --git a/htdocs/git.html b/htdocs/git.html index 2bbfc334..4fea5224 100644 --- a/htdocs/git.html +++ b/htdocs/git.html @@ -471,6 +471,17 @@ in Git. Further information can be found on the https://github.com/Intrepid/GUPC";>GNU UPC page. + modula-2 + This branch is for the +http://nongnu.org/gm2/homepage.html";>GNU Modula-2 +front end to gcc prior to its integration with the mainline. The +branch will be regularly rebased against the mainline. It is +maintained by +mailto:gaius.mul...@southwales.ac.uk";>Gaius Mulley. +Patches should be +prefixed with [modula-2] in the subject line. + + pph This branch implements https://gcc.gnu.org/wiki/pph";> Pre-Parsed Headers for C++. It is maintained by
[RFC,PATCH] Allow means for targets to out out of CTF/BTF support
Hello, It was brought up when discussing PR debug/101283 (Several tests fail on Darwin with -gctf/gbtf) that it will be good to provide means for targets to opt out of CTF/BTF support. By and large, it seems to me that CTF/BTF debug formats can be safely enabled for all ELF-based targets by default in GCC. So, at a high level: - By default, CTF/BTF formats can be enabled for all ELF-based targets. - By default, CTF/BTF formats can be disabled for all non ELF-based targets. - If the user passed a -gctf but CTF is not enabled for the target, GCC issues an error to the user (as is done currently with other debug formats) - "target system does not support the 'ctf' debug format". This is a makeshift patch which fulfills the above requirements and is based on the approach taken for DWARF via DWARF2_DEBUGGING_INFO (I still have to see if I need some specific handling in common_handle_option in opts.c). On minimal testing, the patch works as desired on x86_64-pc-linux-gnu and a darwin-based target. My question is - Looking around in config.gcc etc., it seems defining in elfos.h gives targets/platforms means to override it by virtue of the recommended order of # includes in $tm_file. What I cannot say for certain is if this is true in practice ? On first look, I believe this could work fine. What do you think ? If you think this approach could work, I will continue on this track and test/refine the patch. Thanks Indu - gcc/ChangeLog: * config/elfos.h (CTF_DEBUGGING_INFO): New definition. (BTF_DEBUGGING_INFO): Likewise. * toplev.c: Guard initialization of debug hooks. gcc/testsuite/ChangeLog: * gcc.dg/debug/btf/btf.exp: Do not run BTF testsuite if target does not support BTF format. * gcc.dg/debug/ctf/ctf.exp: Do not run CTF testsuite if target does not support CTF format. --- gcc/config/elfos.h | 8 gcc/testsuite/gcc.dg/debug/btf/btf.exp | 11 +-- gcc/testsuite/gcc.dg/debug/ctf/ctf.exp | 11 +-- gcc/toplev.c | 11 +-- 4 files changed, 35 insertions(+), 6 deletions(-) diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h index 7a736cc..e5cb487 100644 --- a/gcc/config/elfos.h +++ b/gcc/config/elfos.h @@ -68,6 +68,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define DWARF2_DEBUGGING_INFO 1 +/* All ELF targets can support CTF. */ + +#define CTF_DEBUGGING_INFO 1 + +/* All ELF targets can support BTF. */ + +#define BTF_DEBUGGING_INFO 1 + /* The GNU tools operate better with dwarf2, and it is required by some psABI's. Since we don't have any native tools to be compatible with, default to dwarf2. */ diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf.exp b/gcc/testsuite/gcc.dg/debug/btf/btf.exp index e173515..a3e680c 100644 --- a/gcc/testsuite/gcc.dg/debug/btf/btf.exp +++ b/gcc/testsuite/gcc.dg/debug/btf/btf.exp @@ -39,8 +39,15 @@ if ![info exists DEFAULT_CFLAGS] then { dg-init # Main loop. -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \ - "" $DEFAULT_CFLAGS +set comp_output [gcc_target_compile \ +"$srcdir/$subdir/../trivial.c" "trivial.S" assembly \ +"additional_flags=-gbtf"] +if { ! [string match "*: target system does not support the * debug format*" \ +$comp_output] } { +remove-build-file "trivial.S" +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \ + "" $DEFAULT_CFLAGS +} # All done. dg-finish diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp b/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp index 0b650ed..c53cd8b 100644 --- a/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp +++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp @@ -39,8 +39,15 @@ if ![info exists DEFAULT_CFLAGS] then { dg-init # Main loop. -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \ - "" $DEFAULT_CFLAGS +set comp_output [gcc_target_compile \ +"$srcdir/$subdir/../trivial.c" "trivial.S" assembly \ +"additional_flags=-gctf"] +if { ! [string match "*: target system does not support the * debug format*" \ +$comp_output] } { +remove-build-file "trivial.S" +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \ + "" $DEFAULT_CFLAGS +} # All done. dg-finish diff --git a/gcc/toplev.c b/gcc/toplev.c index 43f1f7d..8103812 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1463,8 +1463,15 @@ process_options (void) debug_hooks = &xcoff_debug_hooks; #endif #ifdef DWARF2_DEBUGGING_INFO - else if (dwarf_debuginfo_p () - || dwarf_based_debuginfo_p ()) + else if (dwarf_debuginfo_p ()) +debug_hooks = &dwarf2_debug_hooks; +#endif +#ifdef CTF_DEBUGGING_INFO + else if (write_symbols & CTF_DEBUG) +debug_hooks = &dwarf2_debug_hooks; +#endif +#ifdef BTF_DEBUGGING_INFO + else if (btf_debuginfo_p ()) debug_hooks = &dwarf2_debug_hooks; #endif #ifdef VMS_DEBUGGING_INFO -- 1.8.3.1
[committed] remove an xfail
The test xfailed for ILP32 has been apparently passing for some time. I've removed the xfail after confirming in with -m32 on x86_64 and powerpc64. Martin commit 68b938fada4c728c0b850b44125d9a173c01c8fb Author: Martin Sebor Date: Thu Jul 8 16:22:25 2021 -0600 testsuite: Remove an xfail. gcc/testsuite/ChangeLog: * gcc.dg/Wstringop-overflow-43.c: Remove an xfail. diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c index 14ab925afdc..6d045c58bf6 100644 --- a/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c +++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c @@ -167,9 +167,7 @@ void warn_memset_reversed_range (void) /* The following are represented as ordinary ranges with reversed bounds and those are handled. */ T1 (p, SAR (INT_MIN, 11), n11); // { dg-warning "writing 11 or more bytes into a region of size 0" } - /* In ILP32 the offset in the following has no range info associated - with it. */ - T1 (p, SAR (INT_MIN, 1), n11); // { dg-warning "writing 11 or more bytes into a region of size 0" "pr?" { xfail ilp32 } } + T1 (p, SAR (INT_MIN, 1), n11); // { dg-warning "writing 11 or more bytes into a region of size 0" } T1 (p, SAR (INT_MIN, 0), n11); // { dg-warning "writing 11 or more bytes into a region of size 0" } /* Also represented as a true anti-range. */ T1 (p, SAR (-12, -11), n11); // { dg-warning "writing 11 or more bytes into a region of size \\d+" }
Re: [PATCH] [wwwdocs] Update description of GM2 and document branch
Hi Gaius, On Thu, 8 Jul 2021, Gaius Mulley wrote: > Here are two proposed patches to wwwdocs: thank you for thinking of updating the web pages, too! > diff --git a/htdocs/frontends.html b/htdocs/frontends.html : > http://www.nongnu.org/gm2/";>GNU Modula-2 implements > the PIM2, PIM3, PIM4 and ISO dialects of the language. The compiler > +is fully operational with the GCC 10 and GCC 11 back ends (on > +GNU/Linux x86 systems). I realize this predates your patch (which merely changes version numbers), but a reference to back ends could be misunderstood. I assume GNU Modula-2 doesn't just use the back ends (x86, aarch64,...), but also the middle-end and tree optimizers etc.? What do you think about just saying "with GCC 10 and GCC 11". > Work is in progress to move the front end to > +the GCC trunk. The front end is mostly written in Modula-2, but > +includes a bootstrap procedure using mc. On my system mc refers to Midnight Commander :-), whereas I guess mc here is about "Modula Compiler"? Can you rephrase this for the sake of those not so closely involved? > --- a/htdocs/git.html > +++ b/htdocs/git.html > + This branch is for the > +http://nongnu.org/gm2/homepage.html";>GNU Modula-2 > +front end to gcc prior to its integration with the mainline. The GCC (all uppercase) > +branch will be regularly rebased against the mainline. It is > +maintained by > +mailto:gaius.mul...@southwales.ac.uk";>Gaius Mulley. > +Patches should be > +prefixed with [modula-2] in the subject line. Usually I'd just say "subject", which is a header in our mail systems; the term "subject line" isn't widely used. Thanks (and okay considering the above), Gerald
[committed] move warning suppression closer to invalid access (PR101372)
To unblock bootstrap this morning that was failing due to stricter array bounds checking, I suppressed two -Warray-bounds instances in cp/modules.cc without analyzing them, tracking the to-do in pr101372. Now that I understand what's going on -- the warning is behaving as designed, flagging accesses to one member via a pointer derived from another -- I believe the suppression is still appropriate but can be moved to the inline function that does the access. Thanks to the recent improvements to warning suppression (r12-1992 and related) this more targeted fix should work reliably while also avoiding a recurrence of the warning in future uses of the function. I have committed the attached patch to make this change after testing it on x86_64-linux. Martin commit 79d3378c7d73814442eb468c562ab8aa572f9c43 Author: Martin Sebor Date: Thu Jul 8 16:36:15 2021 -0600 Move warning suppression to the ultimate callee. Resolves: PR bootstrap/101372 - -Warray-bounds in gcc/cp/module.cc causing bootstrap failure gcc/cp/ChangeLog: PR bootstrap/101372 * module.cc (identifier): Suppress warning. (module_state::read_macro_maps): Remove warning suppression. (module_state::install_macros): Ditto. diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 8a890c167cf..ccbde292c22 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -274,7 +274,14 @@ static inline cpp_hashnode *cpp_node (tree id) static inline tree identifier (const cpp_hashnode *node) { + /* HT_NODE() expands to node->ident that HT_IDENT_TO_GCC_IDENT() + then subtracts a nonzero constant, deriving a pointer to + a different member than ident. That's strictly undefined + and detected by -Warray-bounds. Suppress it. See PR 101372. */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Warray-bounds" return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast (node))); +#pragma GCC diagnostic pop } /* Id for dumping module information. */ @@ -16301,18 +16308,11 @@ module_state::read_macro_maps () } if (count) sec.set_overrun (); - - /* FIXME: Re-enable or fix after root causing. */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Warray-bounds" - dump (dumper::LOCATION) && dump ("Macro:%u %I %u/%u*2 locations [%u,%u)", ix, identifier (node), runs, n_tokens, MAP_START_LOCATION (macro), MAP_START_LOCATION (macro) + n_tokens); - -#pragma GCC diagnostic pop } location_t lwm = sec.u (); macro_locs.first = lwm - slurp->loc_deltas.second; @@ -16918,10 +16918,6 @@ module_state::install_macros () macro_import::slot &slot = imp.append (mod, flags); slot.offset = sec.u (); - /* FIXME: Re-enable or fix after root causing. */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Warray-bounds" - dump (dumper::MACRO) && dump ("Read %s macro %s%s%s %I at %u", imp.length () > 1 ? "add" : "new", @@ -16942,8 +16938,6 @@ module_state::install_macros () exp.def = cur; dump (dumper::MACRO) && dump ("Saving current #define %I", identifier (node)); - -#pragma GCC diagnostic pop } }
Re: PING: [PATCH] mips: check MSA support for vector modes [PR100760,PR100761,PR100762]
On 7/5/2021 8:04 PM, Paul Hua wrote: Looks good to me, but I have no right to approve. But your opinions are well respected :-) I'll go ahead and ACK, though in general I'm stepping away from reviewing target specific work. jeff
Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions
Hi! On Thu, Jul 08, 2021 at 05:01:05PM -0500, Peter Bergner wrote: > The MMA build built-ins currently use individual lxv instructions to > load up the registers of a __vector_pair or __vector_quad. If the > memory addresses of the built-in operands are to adjacent locations, > then we could use an lxvp in some cases to load up two registers at once. > The patch below adds support for checking whether memory addresses are > adjacent and emitting an lxvp instead of two lxv instructions. > > This passed bootstrap and regtesting on powerpc64le-linux with no regressions. > Ok for trunk? It needs testing on BE. > +static bool consecutive_mem_locations (rtx, rtx); Please don't; just move functions to somewhere earlier than where they are used. > @@ -16841,8 +16843,35 @@ rs6000_split_multireg_move (rtx dst, rtx src) > for (int i = 0; i < nvecs; i++) > { > int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; > - rtx dst_i = gen_rtx_REG (reg_mode, reg + index); > - emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); > + int index_next = WORDS_BIG_ENDIAN ? index + 1 : index - 1; What does index_next mean? The machine instructions do the same thing in any endianness. > + rtx dst_i; > + int regno = reg + i; > + > + /* If we are loading an even VSX register and our memory location > + is adjacent to the next register's memory location (if any), > + then we can load them both with one LXVP instruction. */ > + if ((regno & 1) == 0 > + && VSX_REGNO_P (regno) > + && MEM_P (XVECEXP (src, 0, index)) > + && MEM_P (XVECEXP (src, 0, index_next))) > + { > + rtx base = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index) > + : XVECEXP (src, 0, index_next); > + rtx next = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index_next) > + : XVECEXP (src, 0, index); Please get rid of index_next, if you still have to do different code for LE here -- it doesn't make the code any clearer (in fact I cannot follow it at all anymore :-( ) So this converts pairs of lxv to an lxvp in only a very limited case, right? Can we instead do it more generically? And what about stxvp? Segher
[r12-2132 Regression] FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++98 note (test for warnings, line 55) on Linux/x86_64
On Linux/x86_64, a110855667782dac7b674d3e328b253b3b3c919b is the first bad commit commit a110855667782dac7b674d3e328b253b3b3c919b Author: Martin Sebor Date: Wed Jul 7 14:05:25 2021 -0600 Correct handling of variable offset minus constant in -Warray-bounds [PR100137] caused FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 34) FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 37) FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 42) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 note (test for warnings, line 38) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 note (test for warnings, line 55) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++17 note (test for warnings, line 38) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++17 note (test for warnings, line 55) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++2a note (test for warnings, line 38) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++2a note (test for warnings, line 55) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++98 note (test for warnings, line 38) FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++98 note (test for warnings, line 55) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2132/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-47.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-47.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/warn/Warray-bounds-20.C --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/warn/Warray-bounds-20.C --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)