RE: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in match.pd

2021-07-08 Thread Roger Sayle

Hi Richard,
Thanks. Yep, you've correctly the diagnosed that the motivation for the
get_builtin_precision helper function was that the TREE_TYPE of the
argument is affected by argument promotion.  Your suggestion to instead
use the TREE_TYPE of the function result is a much nicer solution.

I also agree that that all of these bswap optimizations make the assumption
that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the
front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which
can be checked explicitly.

Both of these improvements are implemented in the attached revised patch,
which has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?

2021-07-08  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/40210
* match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
(x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.

gcc/testsuite/ChangeLog
PR tree-optimization/40210
* gcc.dg/builtin-bswap-13.c: New test.
* gcc.dg/builtin-bswap-14.c: New test.

Roger
--

-Original Message-
From: Richard Biener  
Sent: 07 July 2021 08:56
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in 
match.pd

On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle  wrote:
>
>
> All of the optimizations/transformations mentioned in bugzilla for PR 
> tree-optimization/40210 are already implemented in mainline GCC, with 
> one exception.  In comment #5, there's a suggestion that 
> (bswap64(x)>>56)&0xff can be implemented without the bswap as 
> (unsigned char)x, or equivalently x&0xff.
>
> This patch implements the above optimization, and closely related 
> variants.  For any single bit, (bswap(X)>>C1)&1 can be simplified to 
> (X>>C2)&1, where bit position C2 is the appropriate permutation of C1.  
> Similarly, the bswap can eliminated if the desired set of bits all lie 
> within the same byte, hence (bswap(x)>>8)&255 can always be optimized, 
> as can (bswap(x)>>8)&123.
>
> Previously,
>
> int foo(long long x) {
>   return (__builtin_bswap64(x) >> 56) & 0xff; }
>
> compiled with -O2 to
> foo:movq%rdi, %rax
> bswap   %rax
> shrq$56, %rax
> ret
>
> with this patch, it now compiles to
> foo:movzbl  %dil, %eax
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make 
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?

I don't like get_builtin_precision too much, did you consider simply using

+  (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1)) 
+ INTEGER_CST@2))

and TYPE_PRECISION (TREE_TYPE (@3))?  I think while we'll see argument 
promotion and thus cannot use @1 to derive the type the return value will be 
the original type.

Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you also 
use char_type_node.

I wonder whether

+ /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & 
+ C2.  */ (simplify  (bit_and (convert1? (rshift@0 (convert2? (bswap 
+ @1)) INTEGER_CST@2))
+  INTEGER_CST@3)

and

+ /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2.  */ 
+ (simplify  (rshift (convert? (bswap @0)) INTEGER_CST@1)

can build upon each other, for example by extending the latter to handle more 
cases, transforming to ((T)x >> C2) & C3?
That might of course be only profitable when the bswap goes away.

Thanks,
Richard.

>
>
> 2021-07-06  Roger Sayle  
>
> gcc/ChangeLog
> PR tree-optimization/40210
> * builtins.c (get_builtin_precision): Helper function to determine
> the precision in bits of a built-in function.
> * builtins.h (get_builtin_precision): Prototype here.
> * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
> (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
> when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/40210
> * gcc.dg/builtin-bswap-13.c: New test.
> * gcc.dg/builtin-bswap-14.c: New test.
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>
diff --git a/gcc/match.pd b/gcc/match.pd
index 39fb57e..a134485 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3610,7 +3610,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (complex (convert:itype @0) (negate (convert:itype @1)
 
 /* BSWAP simplifications, transforms checked by gcc.dg/builtin-bswap-8.c.  */
-(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32 BUILT_IN_BSWAP64)
+(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32
+   BUILT_IN_BSWAP64 BUILT_IN_BSWAP128)
  (simplify
   (bswap (bswap @0))
   @0)
@@ -3620,7 +3621,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (for bitop (bit_xor bit_ior bit_and)
   (simplify
(bswap (bitop:c (bswap

Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 5:12 AM Martin Sebor  wrote:
>
> On 7/7/21 7:48 PM, Marek Polacek wrote:
> > On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches 
> > wrote:
> >> On 7/7/21 1:38 AM, Richard Biener wrote:
> >>> On Tue, Jul 6, 2021 at 5:47 PM Martin Sebor via Gcc-patches
> >>>  wrote:
> 
>  Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573349.html
> >>>
> >>> +  if (TREE_CODE (axstype) != UNION_TYPE)
> >>>
> >>> what about QUAL_UNION_TYPE?  (why constrain union type accesses
> >>> here - note you don't seem to constrain accesses of union members here)
> >>
> >> I didn't know a QUAL_UNION_TYPE was a thing.  Removing the test
> >> doesn't seem to cause any regressions so let me do that in a followup.
> >>
> >>>
> >>> +if (tree access_size = TYPE_SIZE_UNIT (axstype))
> >>>
> >>> +  /* The byte size of the array has already been determined above
> >>> + based on a pointer ARG.  Set ELTSIZE to the size of the type
> >>> + it points to and REFTYPE to the array with the size, rounded
> >>> + down as necessary.  */
> >>> +  if (POINTER_TYPE_P (reftype))
> >>> +reftype = TREE_TYPE (reftype);
> >>> +  if (TREE_CODE (reftype) == ARRAY_TYPE)
> >>> +reftype = TREE_TYPE (reftype);
> >>> +  if (tree refsize = TYPE_SIZE_UNIT (reftype))
> >>> +if (TREE_CODE (refsize) == INTEGER_CST)
> >>> +  eltsize = wi::to_offset (refsize);
> >>>
> >>> probably pre-existing but the pointer indirection is definitely confusing
> >>> me again and again given the variable is named 'reftype' - obviously
> >>> an access to a pointer does not have any element size.  Possibly the
> >>> paths arriving here ensure somehow that the only case is when
> >>> reftype is not the access type but a pointer to the accessed memory.
> >>> "jump-threading" the source might help me avoiding to trip over this
> >>> again and again ...
> >>
> >> I agree (it is confusing).  There's more to simplify here.  It's on
> >> my to do list so let me see about this piece of code then.
> >>
> >>>
> >>> The patch removes a lot of odd code, I like that.  You know this code best
> >>> and it's hard to spot errors.
> >>>
> >>> So OK, you'll deal with the fallout.
> >>
> >> I certainly will.  Pushed in r12-2132.
> >
> > I think this patch breaks bootstrap on x86_64:
> >
> > In member function ‘availability 
> > varpool_node::get_availability(symtab_node*)’,
> >  inlined from ‘availability 
> > symtab_node::get_availability(symtab_node*)’ at 
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3360:63,
> >  inlined from ‘availability 
> > symtab_node::get_availability(symtab_node*)’ at 
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3355:1,
> >  inlined from ‘symtab_node* 
> > symtab_node::ultimate_alias_target(availability*, symtab_node*)’ at 
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3199:35,
> >  inlined from ‘symtab_node* 
> > symtab_node::ultimate_alias_target(availability*, symtab_node*)’ at 
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3193:1,
> >  inlined from ‘varpool_node* 
> > varpool_node::ultimate_alias_target(availability*, symtab_node*)’ at 
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h:3234:5,
> >  inlined from ‘availability 
> > varpool_node::_ZN12varpool_node16get_availabilityEP11symtab_node.part.0(symtab_node*)’
> >  at /opt/notnfs/polacek/gcc/gcc/varpool.c:501:29:
> > /opt/notnfs/polacek/gcc/gcc/varpool.c:490:19: error: array subscript 
> > ‘varpool_node[0]’ is partly outside array bounds of ‘varpool_node [0]’ 
> > [-Werror=array-bounds]
> >490 |   if (!definition && !in_other_partition)
> >|   ^~
> > In file included from /opt/notnfs/polacek/gcc/gcc/varpool.c:29:
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h: In member function ‘availability 
> > varpool_node::_ZN12varpool_node16get_availabilityEP11symtab_node.part.0(symtab_node*)’:
> > /opt/notnfs/polacek/gcc/gcc/cgraph.h:1969:39: note: object 
> > ‘varpool_node::’ of size 120
> >   1969 | struct GTY((tag ("SYMTAB_VARIABLE"))) varpool_node : public 
> > symtab_node
> >|   ^~~~
> > cc1plus: all warnings being treated as errors
>
> I bootstrapped & regtested it on top of r12-2131 just before pushing
> it but let me try with the top of trunk (r12-2135 as of now).
>
> [a bit later]
>
> The bootstrap succeeded with the same configuration settings:
>
>--enable-languages=ada,c,c++,d,fortran,jit,lto,objc,obj-c++
> --enable-checking=yes --enable-host-shared --enable-valgrind-annotations
>
> But with --enable-checking=release I was able to reproduce the error
> above.  Since there is a simple way to bootstrap I'm not going to
> revert the patch tonight.  I'll look into the problem tomorrow and
> see if it can be easily fixed.  If not, I'll revert it then.

plain ./configure triggers the failure already, I guess your
--enable-host-shared
hides it.

Richard.

>
> Martin


[x86_64 PATCH]: Improvement to signed division of integer constant.

2021-07-08 Thread Roger Sayle

This patch tweaks the way GCC handles 32-bit integer division on
x86_64, when the numerator is constant.  Currently the function

int foo (int x) {
  return 100/x;
}

generates the code:
foo:movl$100, %eax
cltd
idivl   %edi
ret

where the sign-extension instruction "cltd" creates a long
dependency chain, as it depends on the "mov" before it, and
is depended upon by "idivl" after it.

With this patch, GCC now matches both icc and LLVM and
uses an xor instead, generating:
foo:xorl%edx, %edx
movl$100, %eax
idivl   %edi
ret

Microbenchmarking confirms that this is faster on Intel
processors (Kaby lake), and no worse on AMD processors (Zen2),
which agrees with intuition, but oddly disagrees with the
llvm-mca cycle count prediction on godbolt.org.

The tricky bit is that this sign-extension instruction is only
produced by late (postreload) splitting, and unfortunately none
of the subsequent passes (e.g. cprop_hardreg) is able to
propagate and simplify its constant argument.  The solution
here is to introduce a define_insn_and_split that allows the
constant numerator operand to be captured (by combine) and
then split into an optimal form after reload.

The above microbenchmarking also shows that eliminating the
sign extension of negative values (using movl $-1,%edx) is also
a performance improvement, as performed by icc but not by LLVM.
Both the xor and movl sign-extensions are larger than cltd,
so this transformation is prevented for -Os.


This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.

Ok for mainline?


2021-07-08  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (*divmodsi4_const): Optimize SImode
divmod of a constant numerator with new define_insn_and_split.

gcc/testsuite/ChangeLog
* gcc.target/i386/divmod-9.c: New test case.


Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 700c158..908ae33 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8657,6 +8657,33 @@
   [(set_attr "type" "idiv")
(set_attr "mode" "SI")])
 
+;; Avoid sign-extension (using cdq) for constant numerators.
+(define_insn_and_split "*divmodsi4_const"
+  [(set (match_operand:SI 0 "register_operand" "=&a")
+(div:SI (match_operand:SI 2 "const_int_operand" "n")
+   (match_operand:SI 3 "nonimmediate_operand" "rm")))
+   (set (match_operand:SI 1 "register_operand" "=&d")
+(mod:SI (match_dup 2) (match_dup 3)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!optimize_function_for_size_p (cfun)"
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0)
+   (div:SI (match_dup 0) (match_dup 3)))
+  (set (match_dup 1)
+   (mod:SI (match_dup 0) (match_dup 3)))
+  (use (match_dup 1))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  emit_move_insn (operands[0], operands[2]);
+  if (INTVAL (operands[2]) < 0)
+emit_move_insn (operands[1], constm1_rtx);
+  else
+ix86_expand_clear (operands[1]);
+}
+  [(set_attr "type" "multi")
+   (set_attr "mode" "SI")])
+
 (define_expand "divmodqi4"
   [(parallel [(set (match_operand:QI 0 "register_operand")
   (div:QI
/* { dg-do compile } */
/* { dg-options "-O2" } */

int foo (int x)
{
  return 100/x;
}

int bar(int x)
{
  return -100/x;
}
/* { dg-final { scan-assembler-not "(cltd|cdq)" } } */



Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in match.pd

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 9:37 AM Roger Sayle  wrote:
>
>
> Hi Richard,
> Thanks. Yep, you've correctly the diagnosed that the motivation for the
> get_builtin_precision helper function was that the TREE_TYPE of the
> argument is affected by argument promotion.  Your suggestion to instead
> use the TREE_TYPE of the function result is a much nicer solution.
>
> I also agree that that all of these bswap optimizations make the assumption
> that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the
> front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which
> can be checked explicitly.
>
> Both of these improvements are implemented in the attached revised patch,
> which has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?

OK.

Thanks,
Richard.

> 2021-07-08  Roger Sayle  
> Richard Biener  
>
> gcc/ChangeLog
> PR tree-optimization/40210
> * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
> (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
> when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/40210
> * gcc.dg/builtin-bswap-13.c: New test.
> * gcc.dg/builtin-bswap-14.c: New test.
>
> Roger
> --
>
> -Original Message-
> From: Richard Biener 
> Sent: 07 July 2021 08:56
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 
> in match.pd
>
> On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle  wrote:
> >
> >
> > All of the optimizations/transformations mentioned in bugzilla for PR
> > tree-optimization/40210 are already implemented in mainline GCC, with
> > one exception.  In comment #5, there's a suggestion that
> > (bswap64(x)>>56)&0xff can be implemented without the bswap as
> > (unsigned char)x, or equivalently x&0xff.
> >
> > This patch implements the above optimization, and closely related
> > variants.  For any single bit, (bswap(X)>>C1)&1 can be simplified to
> > (X>>C2)&1, where bit position C2 is the appropriate permutation of C1.
> > Similarly, the bswap can eliminated if the desired set of bits all lie
> > within the same byte, hence (bswap(x)>>8)&255 can always be optimized,
> > as can (bswap(x)>>8)&123.
> >
> > Previously,
> >
> > int foo(long long x) {
> >   return (__builtin_bswap64(x) >> 56) & 0xff; }
> >
> > compiled with -O2 to
> > foo:movq%rdi, %rax
> > bswap   %rax
> > shrq$56, %rax
> > ret
> >
> > with this patch, it now compiles to
> > foo:movzbl  %dil, %eax
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with a "make
> > bootstrap" and "make -k check" with no new failures.
> >
> > Ok for mainline?
>
> I don't like get_builtin_precision too much, did you consider simply using
>
> +  (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1))
> + INTEGER_CST@2))
>
> and TYPE_PRECISION (TREE_TYPE (@3))?  I think while we'll see argument 
> promotion and thus cannot use @1 to derive the type the return value will be 
> the original type.
>
> Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you 
> also use char_type_node.
>
> I wonder whether
>
> + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) &
> + C2.  */ (simplify  (bit_and (convert1? (rshift@0 (convert2? (bswap
> + @1)) INTEGER_CST@2))
> +  INTEGER_CST@3)
>
> and
>
> + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2.  */
> + (simplify  (rshift (convert? (bswap @0)) INTEGER_CST@1)
>
> can build upon each other, for example by extending the latter to handle more 
> cases, transforming to ((T)x >> C2) & C3?
> That might of course be only profitable when the bswap goes away.
>
> Thanks,
> Richard.
>
> >
> >
> > 2021-07-06  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR tree-optimization/40210
> > * builtins.c (get_builtin_precision): Helper function to determine
> > the precision in bits of a built-in function.
> > * builtins.h (get_builtin_precision): Prototype here.
> > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
> > (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
> > when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.
> >
> > gcc/testsuite/ChangeLog
> > PR tree-optimization/40210
> > * gcc.dg/builtin-bswap-13.c: New test.
> > * gcc.dg/builtin-bswap-14.c: New test.
> >
> > Roger
> > --
> > Roger Sayle
> > NextMove Software
> > Cambridge, UK
> >


[PATCH] PR tree-optimization/38943: Preserve trapping instructions with -fnon-call-exceptions

2021-07-08 Thread Roger Sayle

This patch addresses PR tree-optimization/38943 where gcc may optimize
away trapping instructions even when -fnon-call-exceptions is specified.
Interestingly this only affects the C compiler (when -fexceptions is not
specified) as g++ (or -fexceptions) supports C++-style exception handling,
where -fnon-call-exceptions triggers the stmt_could_throw_p machinery.
Without -fexceptions, trapping instructions aren't always considered
visible side-effects.

This patch fixes this in two place.  Firstly, gimple_has_side_effects
is tweaked such that gimple_could_trap_p is considered a side-effect
if the current function can throw non-call exceptions.  And secondly,
check_stmt in ipa-pure-const.c is tweaked such that a function
containing a trapping statement is considered to have a side-effect
with -fnon-call-exceptions, and therefore cannot be pure or const.

Calling gimple_could_trap_p (which previously took a non-const gimple)
from gimple_has_side_effects (which takes a const gimple) required
improving the const-safety of gimple_could_trap_p (a relatively minor
tweak) and its prototypes.  Hopefully this is considered a clean-up/
improvement.

This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.  This should
be relatively safe, as there are no changes in behaviour unless
the user explicitly specifies -fnon-call-exceptions, when the C
compiler then behaves more like the C++/Ada compiler.

Ok for mainline?


2021-07-08  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/38943
* gimple.c (gimple_has_side_effects): Consider trapping to
be a side-effect when -fnon-call-exceptions is specified.
(gimple_coult_trap_p_1):  Make S argument a "const gimple*".
Preserve constness in call to gimple_asm_volatile_p.
(gimple_could_trap_p): Make S argument a "const gimple*".
* gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
Update function prototypes.
* ipa-pure-const.c (check_stmt): When the current function
can throw non-call exceptions, a trapping statement should
be considered a side-effect, so the function is neither
const nor pure.

gcc/testsuite/ChangeLog
PR tree-optimization/38943
* gcc.dg/pr38943.c: New test case.

Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/gimple.c b/gcc/gimple.c
index f1044e9..4b150b0 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2090,7 +2090,8 @@ gimple_move_vops (gimple *new_stmt, gimple *old_stmt)
statement to have side effects if:
 
- It is a GIMPLE_CALL not marked with ECF_PURE or ECF_CONST.
-   - Any of its operands are marked TREE_THIS_VOLATILE or TREE_SIDE_EFFECTS.  
*/
+   - Any of its operands are marked TREE_THIS_VOLATILE or TREE_SIDE_EFFECTS.
+   - It may trap and -fnon-call-exceptions has been specified.  */
 
 bool
 gimple_has_side_effects (const gimple *s)
@@ -2108,6 +2109,10 @@ gimple_has_side_effects (const gimple *s)
   && gimple_asm_volatile_p (as_a  (s)))
 return true;
 
+  if (cfun->can_throw_non_call_exceptions
+  && gimple_could_trap_p (s))
+return true;
+
   if (is_gimple_call (s))
 {
   int flags = gimple_call_flags (s);
@@ -2129,7 +2134,7 @@ gimple_has_side_effects (const gimple *s)
S is a GIMPLE_ASSIGN, the LHS of the assignment is also checked.  */
 
 bool
-gimple_could_trap_p_1 (gimple *s, bool include_mem, bool include_stores)
+gimple_could_trap_p_1 (const gimple *s, bool include_mem, bool include_stores)
 {
   tree t, div = NULL_TREE;
   enum tree_code op;
@@ -2146,7 +2151,7 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool 
include_stores)
   switch (gimple_code (s))
 {
 case GIMPLE_ASM:
-  return gimple_asm_volatile_p (as_a  (s));
+  return gimple_asm_volatile_p (as_a  (s));
 
 case GIMPLE_CALL:
   t = gimple_call_fndecl (s);
@@ -2192,7 +2197,7 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool 
include_stores)
 /* Return true if statement S can trap.  */
 
 bool
-gimple_could_trap_p (gimple *s)
+gimple_could_trap_p (const gimple *s)
 {
   return gimple_could_trap_p_1 (s, true, true);
 }
diff --git a/gcc/gimple.h b/gcc/gimple.h
index e7dc2a4..1a2e120 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1601,8 +1601,8 @@ void gimple_set_lhs (gimple *, tree);
 gimple *gimple_copy (gimple *);
 void gimple_move_vops (gimple *, gimple *);
 bool gimple_has_side_effects (const gimple *);
-bool gimple_could_trap_p_1 (gimple *, bool, bool);
-bool gimple_could_trap_p (gimple *);
+bool gimple_could_trap_p_1 (const gimple *, bool, bool);
+bool gimple_could_trap_p (const gimple *);
 bool gimple_assign_rhs_could_trap_p (gimple *);
 extern void dump_gimple_statistics (void);
 unsigned get_gimple_rhs_num_ops (enum tree_code);
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index f045108..436cbcd 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -765,6 +765,14 @@ check_stmt (gimple_s

Re: [x86_64 PATCH]: Improvement to signed division of integer constant.

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle  wrote:
>
>
> This patch tweaks the way GCC handles 32-bit integer division on
> x86_64, when the numerator is constant.  Currently the function
>
> int foo (int x) {
>   return 100/x;
> }
>
> generates the code:
> foo:movl$100, %eax
> cltd
> idivl   %edi
> ret
>
> where the sign-extension instruction "cltd" creates a long
> dependency chain, as it depends on the "mov" before it, and
> is depended upon by "idivl" after it.
>
> With this patch, GCC now matches both icc and LLVM and
> uses an xor instead, generating:
> foo:xorl%edx, %edx
> movl$100, %eax
> idivl   %edi
> ret

You made me lookup idiv and I figured we're not optimally
handling

int foo (long x, int y)
{
  return x / y;
}

by using a 32:32 / 32 bit divide.  combine manages to
see enough to eventually do this though.

> Microbenchmarking confirms that this is faster on Intel
> processors (Kaby lake), and no worse on AMD processors (Zen2),
> which agrees with intuition, but oddly disagrees with the
> llvm-mca cycle count prediction on godbolt.org.
>
> The tricky bit is that this sign-extension instruction is only
> produced by late (postreload) splitting, and unfortunately none
> of the subsequent passes (e.g. cprop_hardreg) is able to
> propagate and simplify its constant argument.  The solution
> here is to introduce a define_insn_and_split that allows the
> constant numerator operand to be captured (by combine) and
> then split into an optimal form after reload.
>
> The above microbenchmarking also shows that eliminating the
> sign extension of negative values (using movl $-1,%edx) is also
> a performance improvement, as performed by icc but not by LLVM.
> Both the xor and movl sign-extensions are larger than cltd,
> so this transformation is prevented for -Os.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-08  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*divmodsi4_const): Optimize SImode
> divmod of a constant numerator with new define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/divmod-9.c: New test case.
>
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


Re: [PATCH] PR tree-optimization/38943: Preserve trapping instructions with -fnon-call-exceptions

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 11:54 AM Roger Sayle  wrote:
>
>
> This patch addresses PR tree-optimization/38943 where gcc may optimize
> away trapping instructions even when -fnon-call-exceptions is specified.
> Interestingly this only affects the C compiler (when -fexceptions is not
> specified) as g++ (or -fexceptions) supports C++-style exception handling,
> where -fnon-call-exceptions triggers the stmt_could_throw_p machinery.
> Without -fexceptions, trapping instructions aren't always considered
> visible side-effects.

But -fnon-call-exceptions without -fexceptions doesn't make much sense,
does it?  I see the testcase behaves correctly when -fexceptions is also
specified.

The call vanishes in DCE because stmt_could_throw_p starts with

bool
stmt_could_throw_p (function *fun, gimple *stmt)
{
  if (!flag_exceptions)
return false;

the documentation of -fnon-call-exceptions says

Generate code that allows trapping instructions to throw exceptions.

so either the above check is wrong or -fnon-call-exceptions should
imply -fexceptions (or we should diagnose missing -fexceptions)

>
> This patch fixes this in two place.  Firstly, gimple_has_side_effects
> is tweaked such that gimple_could_trap_p is considered a side-effect
> if the current function can throw non-call exceptions.

But exceptions are not considered side-effects - they are explicit in the
IL and thus passes are supposed to check for those and preserve
dead (externally) throwing stmts if not told otherwise
(flag_delete_dead_exceptions).

>  And secondly,
> check_stmt in ipa-pure-const.c is tweaked such that a function
> containing a trapping statement is considered to have a side-effect
> with -fnon-call-exceptions, and therefore cannot be pure or const.

EH is orthogonal to pure/const, so I think that's wrong.

> Calling gimple_could_trap_p (which previously took a non-const gimple)
> from gimple_has_side_effects (which takes a const gimple) required
> improving the const-safety of gimple_could_trap_p (a relatively minor
> tweak) and its prototypes.  Hopefully this is considered a clean-up/
> improvement.

Yeah, even an obvious one I think - you can push that independently.

> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.  This should
> be relatively safe, as there are no changes in behaviour unless
> the user explicitly specifies -fnon-call-exceptions, when the C
> compiler then behaves more like the C++/Ada compiler.
>
> Ok for mainline?
>
>
> 2021-07-08  Roger Sayle  
>
> gcc/ChangeLog
> PR tree-optimization/38943
> * gimple.c (gimple_has_side_effects): Consider trapping to
> be a side-effect when -fnon-call-exceptions is specified.
> (gimple_coult_trap_p_1):  Make S argument a "const gimple*".
> Preserve constness in call to gimple_asm_volatile_p.
> (gimple_could_trap_p): Make S argument a "const gimple*".
> * gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
> Update function prototypes.
> * ipa-pure-const.c (check_stmt): When the current function
> can throw non-call exceptions, a trapping statement should
> be considered a side-effect, so the function is neither
> const nor pure.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/38943
> * gcc.dg/pr38943.c: New test case.
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


[PATCH] i386: Add pack/unpack patterns for 32bit vectors [PR100637]

2021-07-08 Thread Uros Bizjak via Gcc-patches
V1SI mode shift is needed to shift 32bit operands and consequently we
need to implement V1SI moves and pushes.

2021-07-08  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_sse_unpack):
Handle V4QI mode.
* config/i386/mmx.md (V_32): New mode iterator.
(mov): Use V_32 mode iterator.
(*mov_internal): Ditto.
(*push2_rex64): Ditto.
(*push2): Ditto.
(movmisalign): Ditto.
(mmx_v1si3): New insn pattern.
(sse4_1_v2qiv2hi2): Ditto.
(vec_unpacks_lo_v4qi): New expander.
(vec_unpacks_hi_v4qi): Ditto.
(vec_unpacku_lo_v4qi): Ditto.
(vec_unpacku_hi_v4qi): Ditto.
* config/i386/i386.h (VALID_SSE2_REG_MODE): Add V1SImode.
(VALID_INT_MODE_P): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 58c208e166b..65764ad88c5 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -5355,6 +5355,12 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool 
unsigned_p, bool high_p)
  else
unpack = gen_sse4_1_sign_extendv2hiv2si2;
  break;
+   case E_V4QImode:
+ if (unsigned_p)
+   unpack = gen_sse4_1_zero_extendv2qiv2hi2;
+ else
+   unpack = gen_sse4_1_sign_extendv2qiv2hi2;
+ break;
default:
  gcc_unreachable ();
}
@@ -5380,6 +5386,12 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool 
unsigned_p, bool high_p)
  emit_insn (gen_mmx_lshrv1di3 (tmp, gen_lowpart (V1DImode, src),
GEN_INT (32)));
  break;
+   case 4:
+ /* Shift higher 2 bytes to lower 2 bytes.  */
+ tmp = gen_reg_rtx (V1SImode);
+ emit_insn (gen_mmx_lshrv1si3 (tmp, gen_lowpart (V1SImode, src),
+   GEN_INT (16)));
+ break;
default:
  gcc_unreachable ();
}
@@ -5427,6 +5439,12 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool 
unsigned_p, bool high_p)
  else
unpack = gen_mmx_punpcklwd;
  break;
+   case E_V4QImode:
+ if (high_p)
+   unpack = gen_mmx_punpckhbw_low;
+ else
+   unpack = gen_mmx_punpcklbw_low;
+ break;
default:
  gcc_unreachable ();
}
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 03d176143fe..8c3eace56da 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1016,7 +1016,7 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 
 #define VALID_SSE2_REG_MODE(MODE)  \
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \
-   || (MODE) == V4QImode || (MODE) == V2HImode \
+   || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
|| (MODE) == V2DImode || (MODE) == DFmode)
 
 #define VALID_SSE_REG_MODE(MODE)   \
@@ -1048,7 +1048,7 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
|| (MODE) == SImode || (MODE) == DImode \
|| (MODE) == CQImode || (MODE) == CHImode   \
|| (MODE) == CSImode || (MODE) == CDImode   \
-   || (MODE) == V4QImode || (MODE) == V2HImode \
+   || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
|| (TARGET_64BIT\
&& ((MODE) == TImode || (MODE) == CTImode   \
   || (MODE) == TFmode || (MODE) == TCmode  \
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7e83b64ab59..986b758396a 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -57,10 +57,13 @@ (define_mode_iterator MMXMODE14 [V8QI V2SI])
 (define_mode_iterator MMXMODE24 [V4HI V2SI])
 (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI])
 
-;; All 32bit integer vector modes
+;; All 4-byte integer vector modes
+(define_mode_iterator V_32 [V4QI V2HI V1SI])
+
+;; 4-byte integer vector modes
 (define_mode_iterator VI_32 [V4QI V2HI])
 
-;; All V2S* modes
+;; V2S* modes
 (define_mode_iterator V2FI [V2SF V2SI])
 
 ;; Mapping from integer vector mode to mnemonic suffix
@@ -238,8 +241,8 @@ (define_expand "movmisalign"
 })
 
 (define_expand "mov"
-  [(set (match_operand:VI_32 0 "nonimmediate_operand")
-   (match_operand:VI_32 1 "nonimmediate_operand"))]
+  [(set (match_operand:V_32 0 "nonimmediate_operand")
+   (match_operand:V_32 1 "nonimmediate_operand"))]
   "TARGET_SSE2"
 {
   ix86_expand_vector_move (mode, operands);
@@ -247,9 +250,9 @@ (define_expand "mov"
 })
 
 (define_insn "*mov_internal"
-  [(set (match_operand:VI_32 0 "nonimmediate_operand"
+  [(set (match_operand:V_32 0 "nonimmediate_op

Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)

2021-07-08 Thread Andreas Schwab
On Jul 07 2021, Marek Polacek via Gcc-patches wrote:

> On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches wrote:
>> I certainly will.  Pushed in r12-2132.
>
> I think this patch breaks bootstrap on x86_64:

It also breaks bootstrap on aarch64 and ia64 in stage2.

In file included from ../../gcc/c-family/c-common.h:26,
 from ../../gcc/cp/cp-tree.h:40,
 from ../../gcc/cp/module.cc:209:
In function 'tree_node* identifier(const cpp_hashnode*)',
inlined from 'bool module_state::read_macro_maps()' at 
../../gcc/cp/module.cc:16305:10:
../../gcc/tree.h:1089:58: error: array subscript -1 is outside array bounds of 
'cpp_hashnode [288230376151711743]' [-Werror=array-bounds]
 1089 |   ((tree) ((char *) (NODE) - sizeof (struct tree_common)))
  |  ^
../../gcc/cp/module.cc:277:10: note: in expansion of macro 
'HT_IDENT_TO_GCC_IDENT'
  277 |   return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast 
(node)));
  |  ^
In file included from ../../gcc/tree.h:23,
 from ../../gcc/c-family/c-common.h:26,
 from ../../gcc/cp/cp-tree.h:40,
 from ../../gcc/cp/module.cc:209:
../../gcc/tree-core.h: In member function 'bool 
module_state::read_macro_maps()':
../../gcc/tree-core.h:1445:24: note: at offset -24 into object 
'tree_identifier::id' of size 16
 1445 |   struct ht_identifier id;
  |^~

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [x86_64 PATCH]: Improvement to signed division of integer constant.

2021-07-08 Thread Alexander Monakov via Gcc-patches
On Thu, 8 Jul 2021, Richard Biener via Gcc-patches wrote:

> You made me lookup idiv and I figured we're not optimally
> handling
> 
> int foo (long x, int y)
> {
>   return x / y;
> }
> 
> by using a 32:32 / 32 bit divide.  combine manages to
> see enough to eventually do this though.

We cannot do that in general because idiv will cause an exception
if the signed result is not representable in 32 bits, but GCC
defines signed conversions to truncate without trapping.

Alexander


[committed] match.pd: Relax rule to include POLY_INT_CSTs

2021-07-08 Thread Richard Sandiford via Gcc-patches
match.pd has a rule to simplify an extension, operation and truncation
back to the original type:

 (simplify
   (convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2)))

Currently it handles cases in which @2 is an INTEGER_CST, but it
also works for POLY_INT_CSTs.[*]

For INTEGER_CST it doesn't matter whether we test @2 or @4,
but for POLY_INT_CST it is possible to have unfolded (convert …)s.

Originally I saw this leading to some bad ivopts decisions, because
we weren't folding away redundancies from candidate iv expressions.
It's also possible to test the fold directly using the SVE ACLE.

Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious.

Richard

[*] Not all INTEGER_CST rules work for POLY_INT_CSTs, since extensions
don't necessarily distribute over the internals of the POLY_INT_CST.
But in this case that isn't an issue.


gcc/
* match.pd: Simplify an extend-operate-truncate sequence involving
a POLY_INT_CST.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/cntb_1.c: New test.
---
 gcc/match.pd   |  2 +-
 .../gcc.target/aarch64/sve/acle/general/cntb_1.c   | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 334e8cc0496..30680d488ab 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6175,7 +6175,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (types_match (@1, @2)
/* Or the second operand is const integer or converted const
   integer from valueize.  */
-   || TREE_CODE (@2) == INTEGER_CST))
+   || poly_int_tree_p (@4)))
  (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
(op @1 (convert @2))
(with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
new file mode 100644
index 000..b43fcf0ed6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+#include 
+
+unsigned int
+foo (unsigned int x)
+{
+  unsigned long tmp = x;
+  tmp += svcntb ();
+  x = tmp;
+  return x - svcntb ();
+}
+
+/* { dg-final { scan-tree-dump-not { POLY_INT_CST } optimized } } */
-- 
2.17.1



[committed] vect: Remove always-true condition

2021-07-08 Thread Richard Sandiford via Gcc-patches
vectorizable_reduction had code guarded by:

  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
  || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)

But that's always true after:

  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def
  && STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def
  && STMT_VINFO_DEF_TYPE (stmt_info) != vect_nested_cycle)
return false;

  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
{
  …
  return true;
}

(I wasn't sure at first how the empty “else” for the first “if” above
was supposed to work.)

Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious.

Richard


gcc/
* tree-vect-loop.c (vectorizable_reduction): Remove always-true
if condition.
---
 gcc/tree-vect-loop.c | 50 +---
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 51a46a6d852..bc523d151c6 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6516,33 +6516,31 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   stmt_vec_info orig_stmt_of_analysis = stmt_info;
   stmt_vec_info phi_info = stmt_info;
-  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
-  || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
+  if (!is_a  (stmt_info->stmt))
 {
-  if (!is_a  (stmt_info->stmt))
-   {
- STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
- return true;
-   }
-  if (slp_node)
-   {
- slp_node_instance->reduc_phis = slp_node;
- /* ???  We're leaving slp_node to point to the PHIs, we only
-need it to get at the number of vector stmts which wasn't
-yet initialized for the instance root.  */
-   }
-  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
-   stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (stmt_info));
-  else /* STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def */
-   {
- use_operand_p use_p;
- gimple *use_stmt;
- bool res = single_imm_use (gimple_phi_result (stmt_info->stmt),
-&use_p, &use_stmt);
- gcc_assert (res);
- phi_info = loop_vinfo->lookup_stmt (use_stmt);
- stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (phi_info));
-   }
+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+  return true;
+}
+  if (slp_node)
+{
+  slp_node_instance->reduc_phis = slp_node;
+  /* ???  We're leaving slp_node to point to the PHIs, we only
+need it to get at the number of vector stmts which wasn't
+yet initialized for the instance root.  */
+}
+  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
+stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (stmt_info));
+  else
+{
+  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_info)
+ == vect_double_reduction_def);
+  use_operand_p use_p;
+  gimple *use_stmt;
+  bool res = single_imm_use (gimple_phi_result (stmt_info->stmt),
+&use_p, &use_stmt);
+  gcc_assert (res);
+  phi_info = loop_vinfo->lookup_stmt (use_stmt);
+  stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (phi_info));
 }
 
   /* PHIs should not participate in patterns.  */


[PATCH] ifcvt: Improve tests for predicated operations

2021-07-08 Thread Richard Sandiford via Gcc-patches
-msve-vector-bits=128 causes the AArch64 port to list 128-bit Advanced
SIMD as the first-choice mode for vectorisation, with SVE being used for
things that Advanced SIMD can't handle as easily.  However, ifcvt would
not then try to use SVE's predicated FP arithmetic, leading to tests
like TSVC ControlFlow-flt failing to vectorise.

The mask load/store code did try other vector modes, but could also be
improved to make sure that SVEness sticks when computing derived modes.

(Unlike mode_for_vector, related_vector_mode always returns a vector
mode, so there's no need to check VECTOR_MODE_P as well.)

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* internal-fn.c (vectorized_internal_fn_supported_p): Handle
vector types first.  For scalar types, consider both the preferred
vector mode and the alternative vector modes.
* optabs-query.c (can_vec_mask_load_store_p): Use the same
structure as above, in particular using related_vector_mode
for modes provided by autovectorize_vector_modes.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_arith_6.c: New test.
---
 gcc/internal-fn.c | 28 +++
 gcc/optabs-query.c| 23 +--
 .../gcc.target/aarch64/sve/cond_arith_6.c | 14 ++
 3 files changed, 43 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index fb8b43d1ce2..cd5e63f9acd 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -4109,16 +4109,32 @@ expand_internal_call (gcall *stmt)
 bool
 vectorized_internal_fn_supported_p (internal_fn ifn, tree type)
 {
+  if (VECTOR_MODE_P (TYPE_MODE (type)))
+return direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED);
+
   scalar_mode smode;
-  if (!VECTOR_TYPE_P (type) && is_a  (TYPE_MODE (type), &smode))
+  if (!is_a  (TYPE_MODE (type), &smode))
+return false;
+
+  machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode);
+  if (VECTOR_MODE_P (vmode))
 {
-  machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode);
-  if (VECTOR_MODE_P (vmode))
-   type = build_vector_type_for_mode (type, vmode);
+  tree vectype = build_vector_type_for_mode (type, vmode);
+  if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
+   return true;
 }
 
-  return (VECTOR_MODE_P (TYPE_MODE (type))
- && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
+  auto_vector_modes vector_modes;
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
+  for (machine_mode base_mode : vector_modes)
+if (related_vector_mode (base_mode, smode).exists (&vmode))
+  {
+   tree vectype = build_vector_type_for_mode (type, vmode);
+   if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
+ return true;
+  }
+
+  return false;
 }
 
 void
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 3248ce2c06e..05ee5f517da 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -582,27 +582,18 @@ can_vec_mask_load_store_p (machine_mode mode,
 return false;
 
   vmode = targetm.vectorize.preferred_simd_mode (smode);
-  if (!VECTOR_MODE_P (vmode))
-return false;
-
-  if (targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
+  if (VECTOR_MODE_P (vmode)
+  && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
   && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
 return true;
 
   auto_vector_modes vector_modes;
   targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
-  for (unsigned int i = 0; i < vector_modes.length (); ++i)
-{
-  poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]);
-  poly_uint64 nunits;
-  if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits))
-   continue;
-  if (mode_for_vector (smode, nunits).exists (&vmode)
- && VECTOR_MODE_P (vmode)
- && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
- && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
-   return true;
-}
+  for (machine_mode base_mode : vector_modes)
+if (related_vector_mode (base_mode, smode).exists (&vmode)
+   && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
+   && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
+  return true;
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
new file mode 100644
index 000..4085ab12444
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O3 -msve-vector-bits=128" } */
+
+void
+f (float *x)
+{
+  for (int i = 0; i < 100; ++i)
+if (x[i] > 1.0f)
+  x[i] -= 1.0f;
+}
+
+/* { dg-final { scan-assembler {\tld1w

Re: [committed] match.pd: Relax rule to include POLY_INT_CSTs

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 1:52 PM Richard Sandiford via Gcc-patches
 wrote:
>
> match.pd has a rule to simplify an extension, operation and truncation
> back to the original type:
>
>  (simplify
>(convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2)))
>
> Currently it handles cases in which @2 is an INTEGER_CST, but it
> also works for POLY_INT_CSTs.[*]
>
> For INTEGER_CST it doesn't matter whether we test @2 or @4,
> but for POLY_INT_CST it is possible to have unfolded (convert …)s.

But if it is an unfolded conversion then @4 is the conversion and of
course not POLY_INT_CST_P, so I'm not sure what you says makes
sense.  But maybe you want to _not_ simplify the unfolded
conversion case?

> Originally I saw this leading to some bad ivopts decisions, because
> we weren't folding away redundancies from candidate iv expressions.
> It's also possible to test the fold directly using the SVE ACLE.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious.
>
> Richard
>
> [*] Not all INTEGER_CST rules work for POLY_INT_CSTs, since extensions
> don't necessarily distribute over the internals of the POLY_INT_CST.
> But in this case that isn't an issue.
>
>
> gcc/
> * match.pd: Simplify an extend-operate-truncate sequence involving
> a POLY_INT_CST.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/acle/general/cntb_1.c: New test.
> ---
>  gcc/match.pd   |  2 +-
>  .../gcc.target/aarch64/sve/acle/general/cntb_1.c   | 14 ++
>  2 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 334e8cc0496..30680d488ab 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6175,7 +6175,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && (types_match (@1, @2)
> /* Or the second operand is const integer or converted const
>integer from valueize.  */
> -   || TREE_CODE (@2) == INTEGER_CST))
> +   || poly_int_tree_p (@4)))
>   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
> (op @1 (convert @2))
> (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
> new file mode 100644
> index 000..b43fcf0ed6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
> @@ -0,0 +1,14 @@
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +#include 
> +
> +unsigned int
> +foo (unsigned int x)
> +{
> +  unsigned long tmp = x;
> +  tmp += svcntb ();
> +  x = tmp;
> +  return x - svcntb ();
> +}
> +
> +/* { dg-final { scan-tree-dump-not { POLY_INT_CST } optimized } } */
> --
> 2.17.1
>


Re: [PATCH] ifcvt: Improve tests for predicated operations

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:04 PM Richard Sandiford via Gcc-patches
 wrote:
>
> -msve-vector-bits=128 causes the AArch64 port to list 128-bit Advanced
> SIMD as the first-choice mode for vectorisation, with SVE being used for
> things that Advanced SIMD can't handle as easily.  However, ifcvt would
> not then try to use SVE's predicated FP arithmetic, leading to tests
> like TSVC ControlFlow-flt failing to vectorise.
>
> The mask load/store code did try other vector modes, but could also be
> improved to make sure that SVEness sticks when computing derived modes.
>
> (Unlike mode_for_vector, related_vector_mode always returns a vector
> mode, so there's no need to check VECTOR_MODE_P as well.)
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> gcc/
> * internal-fn.c (vectorized_internal_fn_supported_p): Handle
> vector types first.  For scalar types, consider both the preferred
> vector mode and the alternative vector modes.
> * optabs-query.c (can_vec_mask_load_store_p): Use the same
> structure as above, in particular using related_vector_mode
> for modes provided by autovectorize_vector_modes.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/cond_arith_6.c: New test.
> ---
>  gcc/internal-fn.c | 28 +++
>  gcc/optabs-query.c| 23 +--
>  .../gcc.target/aarch64/sve/cond_arith_6.c | 14 ++
>  3 files changed, 43 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index fb8b43d1ce2..cd5e63f9acd 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -4109,16 +4109,32 @@ expand_internal_call (gcall *stmt)
>  bool
>  vectorized_internal_fn_supported_p (internal_fn ifn, tree type)
>  {
> +  if (VECTOR_MODE_P (TYPE_MODE (type)))
> +return direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED);
> +
>scalar_mode smode;
> -  if (!VECTOR_TYPE_P (type) && is_a  (TYPE_MODE (type), &smode))
> +  if (!is_a  (TYPE_MODE (type), &smode))
> +return false;
> +
> +  machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode);
> +  if (VECTOR_MODE_P (vmode))
>  {
> -  machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode);
> -  if (VECTOR_MODE_P (vmode))
> -   type = build_vector_type_for_mode (type, vmode);
> +  tree vectype = build_vector_type_for_mode (type, vmode);
> +  if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
> +   return true;
>  }
>
> -  return (VECTOR_MODE_P (TYPE_MODE (type))
> - && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
> +  auto_vector_modes vector_modes;
> +  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
> +  for (machine_mode base_mode : vector_modes)
> +if (related_vector_mode (base_mode, smode).exists (&vmode))
> +  {
> +   tree vectype = build_vector_type_for_mode (type, vmode);
> +   if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
> + return true;
> +  }
> +
> +  return false;
>  }
>
>  void
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 3248ce2c06e..05ee5f517da 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -582,27 +582,18 @@ can_vec_mask_load_store_p (machine_mode mode,
>  return false;
>
>vmode = targetm.vectorize.preferred_simd_mode (smode);
> -  if (!VECTOR_MODE_P (vmode))
> -return false;
> -
> -  if (targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> +  if (VECTOR_MODE_P (vmode)
> +  && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
>&& convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
>  return true;
>
>auto_vector_modes vector_modes;
>targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
> -  for (unsigned int i = 0; i < vector_modes.length (); ++i)
> -{
> -  poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]);
> -  poly_uint64 nunits;
> -  if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits))
> -   continue;
> -  if (mode_for_vector (smode, nunits).exists (&vmode)
> - && VECTOR_MODE_P (vmode)
> - && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> - && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
> -   return true;
> -}
> +  for (machine_mode base_mode : vector_modes)
> +if (related_vector_mode (base_mode, smode).exists (&vmode)
> +   && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
> +   && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
> +  return true;
>return false;
>  }
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
> new file mode 100644
> inde

Re: [committed] match.pd: Relax rule to include POLY_INT_CSTs

2021-07-08 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Thu, Jul 8, 2021 at 1:52 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> match.pd has a rule to simplify an extension, operation and truncation
>> back to the original type:
>>
>>  (simplify
>>(convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2)))
>>
>> Currently it handles cases in which @2 is an INTEGER_CST, but it
>> also works for POLY_INT_CSTs.[*]
>>
>> For INTEGER_CST it doesn't matter whether we test @2 or @4,
>> but for POLY_INT_CST it is possible to have unfolded (convert …)s.
>
> But if it is an unfolded conversion then @4 is the conversion and of
> course not POLY_INT_CST_P, so I'm not sure what you says makes
> sense.  But maybe you want to _not_ simplify the unfolded
> conversion case?

Yeah, exactly that.  Extensions of POLY_INT_CSTs won't be folded because
extension doesn't distribute over (modulo) +.  If an unfolded POLY_INT_CST
has the same type as @1 then the match will succeed thanks to
types_match (@1, @2).  So the new pattern handles both that case
and the case in which POLY_INT_CST occurs without a conversion.

If an unfolded POLY_INT_CST has a different type from @1 then we'd need
a more complicated check for validity.  Maybe that would be useful,
but it would no longer be a one-line change :-)

Thanks,
Richard

>
>> Originally I saw this leading to some bad ivopts decisions, because
>> we weren't folding away redundancies from candidate iv expressions.
>> It's also possible to test the fold directly using the SVE ACLE.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious.
>>
>> Richard
>>
>> [*] Not all INTEGER_CST rules work for POLY_INT_CSTs, since extensions
>> don't necessarily distribute over the internals of the POLY_INT_CST.
>> But in this case that isn't an issue.
>>
>>
>> gcc/
>> * match.pd: Simplify an extend-operate-truncate sequence involving
>> a POLY_INT_CST.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/sve/acle/general/cntb_1.c: New test.
>> ---
>>  gcc/match.pd   |  2 +-
>>  .../gcc.target/aarch64/sve/acle/general/cntb_1.c   | 14 ++
>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 334e8cc0496..30680d488ab 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -6175,7 +6175,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> && (types_match (@1, @2)
>> /* Or the second operand is const integer or converted const
>>integer from valueize.  */
>> -   || TREE_CODE (@2) == INTEGER_CST))
>> +   || poly_int_tree_p (@4)))
>>   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
>> (op @1 (convert @2))
>> (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
>> new file mode 100644
>> index 000..b43fcf0ed6d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cntb_1.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-options "-O -fdump-tree-optimized" } */
>> +
>> +#include 
>> +
>> +unsigned int
>> +foo (unsigned int x)
>> +{
>> +  unsigned long tmp = x;
>> +  tmp += svcntb ();
>> +  x = tmp;
>> +  return x - svcntb ();
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not { POLY_INT_CST } optimized } } */
>> --
>> 2.17.1
>>


[PATCH 00/10] vect: Reuse reduction accumulators between loops

2021-07-08 Thread Richard Sandiford via Gcc-patches
Quoting from the final patch in the series:


This patch adds support for reusing a main loop's reduction accumulator
in an epilogue loop.  This in turn lets the loops share a single piece
of vector->scalar reduction code.

The patch has the following restrictions:

(1) The epilogue reduction can only operate on a single vector
(e.g. ncopies must be 1 for non-SLP reductions, and the group size
must be <= the element count for SLP reductions).

(2) Both loops must use the same vector mode for their accumulators.
This means that the patch is restricted to targets that support
--param vect-partial-vector-usage=1.

(3) The reduction must be a standard “tree code” reduction.

However, these restrictions could be lifted in future.  For example,
if the main loop operates on 128-bit vectors and the epilogue loop
operates on 64-bit vectors, we could in future reduce the 128-bit
vector by one stage and use the 64-bit result as the starting point
for the epilogue result.

The patch tries to handle chained SLP reductions, unchained SLP
reductions and non-SLP reductions.  It also handles cases in which
the epilogue loop is entered directly (rather than via the main loop)
and cases in which the epilogue loop can be skipped.


However, it ended up being difficult to do that without some preparatory
clean-ups.  Some of them could probably stand on their own, but others
are a bit “meh” without the final patch to justify them.

The diff below shows the effect of the patch when compiling:

  unsigned short __attribute__((noipa))
  add_loop (unsigned short *x, int n)
  {
unsigned short res = 0;
for (int i = 0; i < n; ++i)
  res += x[i];
return res;
  }

with -O3 --param vect-partial-vector-usage=1 on an SVE target:

add_loop:   add_loop:
.LFB0:  .LFB0:
.cfi_startproc  .cfi_startproc
mov x4, x0<
cmp w1, 0   cmp w1, 0
ble .L7 ble .L7
cnthx0| cnthx4
sub w2, w1, #1  sub w2, w1, #1
sub w3, w0, #1| sub w3, w4, #1
cmp w2, w3  cmp w2, w3
bcc .L8 bcc .L8
sub w0, w1, w0| sub w4, w1, w4
mov x3, 0   mov x3, 0
cnthx5  cnthx5
mov z0.b, #0mov z0.b, #0
ptrue   p0.b, all   ptrue   p0.b, all
.p2align 3,,7   .p2align 3,,7
.L4:.L4:
ld1hz1.h, p0/z, [x4, x3,  | ld1hz1.h, p0/z, [x0, x3, 
mov x2, x3  mov x2, x3
add x3, x3, x5  add x3, x3, x5
add z0.h, z0.h, z1.hadd z0.h, z0.h, z1.h
cmp w0, w3| cmp w4, w3
bcs .L4 bcs .L4
uaddv   d0, p0, z0.h  <
umovw0, v0.h[0]   <
inchx2  inchx2
and w0, w0, 65535 <
cmp w1, w2  cmp w1, w2
beq .L2   | beq .L6
.L3:.L3:
sub w1, w1, w2  sub w1, w1, w2
mov z1.b, #0  | add x2, x0, w2, uxtw 1
whilelo p0.h, wzr, w1   whilelo p0.h, wzr, w1
add x2, x4, w2, uxtw 1| ld1hz1.h, p0/z, [x2]
ptrue   p1.b, all | add z0.h, p0/m, z0.h, z1.
ld1hz0.h, p0/z, [x2]  | .L6:
sel z0.h, p0, z0.h, z1.h  | ptrue   p0.b, all
uaddv   d0, p1, z0.h  | uaddv   d0, p0, z0.h
fmovx1, d0| umovw0, v0.h[0]
add w0, w0, w1, uxth  <
and w0, w0, 65535   and w0, w0, 65535
.L2:  <
ret ret
.p2align 2,,3   .p2align 2,,3
.L7:.L7:
mov w0, 0   mov w0, 0
ret ret
.L8:.L8:
mov w2, 0   mov w2, 0
mov w0, 0 | mov z0.b, #0
b   .L3 b   .L3
.cfi_end

[PATCH 01/10] vect: Simplify epilogue reduction code

2021-07-08 Thread Richard Sandiford via Gcc-patches
vect_create_epilog_for_reduction only handles two cases: single-loop
reductions and double reductions.  “nested cycles” (i.e. reductions
in the inner loop when vectorising an outer loop) are handled elsewhere
and don't need a vector->scalar reduction.

The function had variables called nested_in_vect_loop and double_reduc
and asserted that nested_in_vect_loop implied double_reduc, but it
still had code to handle nested_in_vect_loop && !double_reduc.
This patch removes that and uses double_reduc everywhere.

gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Remove
nested_in_vect_loop and use double_reduc everywhere.  Remove dead
assignment to "loop".
---
 gcc/tree-vect-loop.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index bc523d151c6..7c3e3352b43 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5005,7 +5005,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   imm_use_iterator imm_iter, phi_imm_iter;
   use_operand_p use_p, phi_use_p;
   gimple *use_stmt;
-  bool nested_in_vect_loop = false;
   auto_vec new_phis;
   int j, i;
   auto_vec scalar_results;
@@ -5023,10 +5022,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 {
   outer_loop = loop;
   loop = loop->inner;
-  nested_in_vect_loop = true;
-  gcc_assert (!slp_node);
+  gcc_assert (!slp_node && double_reduc);
 }
-  gcc_assert (!nested_in_vect_loop || double_reduc);
 
   vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info);
   gcc_assert (vectype);
@@ -5049,8 +5046,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info);
   else if (double_reduc)
;
-  else if (nested_in_vect_loop)
-   ;
   else
adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info);
 }
@@ -5923,7 +5918,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 {
   gcc_assert (!slp_reduc);
   gimple_seq stmts = NULL;
-  if (nested_in_vect_loop)
+  if (double_reduc)
{
   new_phi = new_phis[0];
  gcc_assert (VECTOR_TYPE_P (TREE_TYPE (adjustment_def)));
@@ -5942,21 +5937,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 
   epilog_stmt = gimple_seq_last_stmt (stmts);
   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
-  if (nested_in_vect_loop)
-{
-  if (!double_reduc)
-scalar_results.quick_push (new_temp);
-  else
-scalar_results[0] = new_temp;
-}
-  else
-scalar_results[0] = new_temp;
-
+  scalar_results[0] = new_temp;
   new_phis[0] = epilog_stmt;
 }
 
   if (double_reduc)
-loop = loop->inner;
+loop = outer_loop;
 
   /* 2.6  Handle the loop-exit phis.  Replace the uses of scalar loop-exit
   phis with new adjusted scalar results, i.e., replace use 
@@ -6017,14 +6003,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  scalar_dest = gimple_assign_lhs (scalar_stmt_info->stmt);
 }
 
-  if (nested_in_vect_loop)
-{
-  if (double_reduc)
-loop = outer_loop;
-  else
-   gcc_unreachable ();
-}
-
   phis.create (3);
   /* Find the loop-closed-use at the loop exit of the original scalar
  result.  (The reduction result is expected to have two immediate uses,


[PATCH 02/10] vect: Create array_slice of live-out stmts

2021-07-08 Thread Richard Sandiford via Gcc-patches
This patch constructs an array_slice of the scalar statements that
produce live-out reduction results in the original unvectorised loop.
There are three cases:

- SLP reduction chains: the final SLP stmt is live-out
- full SLP reductions: all SLP stmts are live-out
- non-SLP reductions: the single scalar stmt is live-out

This is a slight simplification on its own, mostly because it maans
“group_size” has a consistent meaning throughout the function.
The main justification though is that it helps with later patches.

gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate
scalar_results to group_size elements after reducing down from
N*group_size elements.  Construct an array_slice of the live-out
stmts and assert that there is one stmt per scalar result.
---
 gcc/tree-vect-loop.c | 61 +++-
 1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 7c3e3352b43..8390ac80ca0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5010,7 +5010,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   auto_vec scalar_results;
   unsigned int group_size = 1, k;
   auto_vec phis;
-  bool slp_reduc = false;
+  /* SLP reduction without reduction chain, e.g.,
+ # a1 = phi 
+ # b1 = phi 
+ a2 = operation (a1)
+ b2 = operation (b1)  */
+  bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info));
   bool direct_slp_reduc;
   tree new_phi_result;
   tree induction_index = NULL_TREE;
@@ -5050,6 +5055,16 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info);
 }
 
+  stmt_vec_info single_live_out_stmt[] = { stmt_info };
+  array_slice live_out_stmts = single_live_out_stmt;
+  if (slp_reduc)
+/* All statements produce live-out values.  */
+live_out_stmts = SLP_TREE_SCALAR_STMTS (slp_node);
+  else if (slp_node)
+/* The last statement in the reduction chain produces the live-out
+   value.  */
+single_live_out_stmt[0] = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1];
+
   unsigned vec_num;
   int ncopies;
   if (slp_node)
@@ -5248,13 +5263,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   new_scalar_dest = vect_create_destination_var (scalar_dest, NULL);
   bitsize = TYPE_SIZE (scalar_type);
 
-  /* SLP reduction without reduction chain, e.g.,
- # a1 = phi 
- # b1 = phi 
- a2 = operation (a1)
- b2 = operation (b1)  */
-  slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info));
-
   /* True if we should implement SLP_REDUC using native reduction operations
  instead of scalar operations.  */
   direct_slp_reduc = (reduc_fn != IFN_LAST
@@ -5877,6 +5885,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  first_res, res);
   scalar_results[j % group_size] = new_res;
 }
+ scalar_results.truncate (group_size);
  for (k = 0; k < group_size; k++)
scalar_results[k] = gimple_convert (&stmts, scalar_type,
scalar_results[k]);
@@ -5969,39 +5978,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   use   
   use  */
 
-
-  /* In SLP reduction chain we reduce vector results into one vector if
- necessary, hence we set here REDUC_GROUP_SIZE to 1.  SCALAR_DEST is the
- LHS of the last stmt in the reduction chain, since we are looking for
- the loop exit phi node.  */
-  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
-{
-  stmt_vec_info dest_stmt_info
-   = vect_orig_stmt (SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]);
-  scalar_dest = gimple_assign_lhs (dest_stmt_info->stmt);
-  group_size = 1;
-}
-
-  /* In SLP we may have several statements in NEW_PHIS and REDUCTION_PHIS (in
- case that REDUC_GROUP_SIZE is greater than vectorization factor).
- Therefore, we need to match SCALAR_RESULTS with corresponding statements.
- The first (REDUC_GROUP_SIZE / number of new vector stmts) scalar results
- correspond to the first vector stmt, etc.
- (RATIO is equal to (REDUC_GROUP_SIZE / number of new vector stmts)).  */
-  if (group_size > new_phis.length ())
-gcc_assert (!(group_size % new_phis.length ()));
-
-  for (k = 0; k < group_size; k++)
+  gcc_assert (live_out_stmts.size () == scalar_results.length ());
+  for (k = 0; k < live_out_stmts.size (); k++)
 {
-  if (slp_reduc)
-{
- stmt_vec_info scalar_stmt_info = SLP_TREE_SCALAR_STMTS (slp_node)[k];
-
- orig_stmt_info = STMT_VINFO_RELATED_STMT (scalar_stmt_info);
- /* SLP statements can't participate in patterns.  */
- gcc_assert (!orig_stmt_info);
- scalar_dest = gimple_assign_lhs (scalar_stmt_info->stmt);
-}
+  stmt_vec

[PATCH 03/10] vect: Remove new_phis from

2021-07-08 Thread Richard Sandiford via Gcc-patches
vect_create_epilog_for_reduction had a variable called new_phis.
It collected the statements that produce the exit block definitions
of the vector reduction accumulators.  Although those statements
are indeed phis initially, they are often replaced with normal
statements later, leading to puzzling code like:

  FOR_EACH_VEC_ELT (new_phis, i, new_phi)
{
  int bit_offset;
  if (gimple_code (new_phi) == GIMPLE_PHI)
vec_temp = PHI_RESULT (new_phi);
  else
vec_temp = gimple_assign_lhs (new_phi);

Also, although the array collects statements, in practice all users want
the lhs instead.

This patch therefore replaces new_phis with a vector of gimple values
called “reduc_inputs”.

Also, reduction chains and ncopies>1 were handled with identical code
(and there was a comment saying so).  The patch unites them into
a single “if”.

gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Replace
the new_phis vector with a reduc_inputs vector.  Combine handling
of reduction chains and ncopies > 1.
---
 gcc/tree-vect-loop.c | 113 ---
 1 file changed, 41 insertions(+), 72 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 8390ac80ca0..b7f73ca52c7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5005,7 +5005,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   imm_use_iterator imm_iter, phi_imm_iter;
   use_operand_p use_p, phi_use_p;
   gimple *use_stmt;
-  auto_vec new_phis;
+  auto_vec reduc_inputs;
   int j, i;
   auto_vec scalar_results;
   unsigned int group_size = 1, k;
@@ -5017,7 +5017,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  b2 = operation (b1)  */
   bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info));
   bool direct_slp_reduc;
-  tree new_phi_result;
   tree induction_index = NULL_TREE;
 
   if (slp_node)
@@ -5215,7 +5214,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   if (double_reduc)
 loop = outer_loop;
   exit_bb = single_exit (loop)->dest;
-  new_phis.create (slp_node ? vec_num : ncopies);
+  reduc_inputs.create (slp_node ? vec_num : ncopies);
   for (unsigned i = 0; i < vec_num; i++)
 {
   if (slp_node)
@@ -5223,19 +5222,14 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   else
def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
   for (j = 0; j < ncopies; j++)
-{
+   {
  tree new_def = copy_ssa_name (def);
-  phi = create_phi_node (new_def, exit_bb);
-  if (j == 0)
-new_phis.quick_push (phi);
-  else
-   {
- def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
- new_phis.quick_push (phi);
-   }
-
-  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
-}
+ phi = create_phi_node (new_def, exit_bb);
+ if (j)
+   def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
+ SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
+ reduc_inputs.quick_push (new_def);
+   }
 }
 
   exit_gsi = gsi_after_labels (exit_bb);
@@ -5274,52 +5268,32 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  a2 = operation (a1)
  a3 = operation (a2),
 
- we may end up with more than one vector result.  Here we reduce them to
- one vector.  */
-  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc)
+ we may end up with more than one vector result.  Here we reduce them
+ to one vector.
+
+ The same is true if we couldn't use a single defuse cycle.  */
+  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)
+  || direct_slp_reduc
+  || ncopies > 1)
 {
   gimple_seq stmts = NULL;
-  tree first_vect = PHI_RESULT (new_phis[0]);
-  first_vect = gimple_convert (&stmts, vectype, first_vect);
-  for (k = 1; k < new_phis.length (); k++)
+  tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]);
+  for (k = 1; k < reduc_inputs.length (); k++)
 {
- gimple *next_phi = new_phis[k];
-  tree second_vect = PHI_RESULT (next_phi);
- second_vect = gimple_convert (&stmts, vectype, second_vect);
+ tree second_vect = gimple_convert (&stmts, vectype, reduc_inputs[k]);
   first_vect = gimple_build (&stmts, code, vectype,
 first_vect, second_vect);
 }
   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 
-  new_phi_result = first_vect;
-  new_phis.truncate (0);
-  new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect));
+  reduc_inputs.truncate (0);
+  reduc_inputs.safe_push (first_vect);
 }
-  /* Likewise if we couldn't use a single defuse cycle.  */
-  else if (ncopies > 1)
-{
-  gimple_seq stmts = NULL;
-  tree first_vect = PHI

[PATCH 04/10] vect: Ensure reduc_inputs always have vectype

2021-07-08 Thread Richard Sandiford via Gcc-patches
Vector reduction accumulators can differ in signedness from the
final scalar result.  The conversions to handle that case were
distributed through vect_create_epilog_for_reduction; this patch
does the conversion up-front instead.

gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
the phi results to vectype after creating them.  Remove later
conversion code that thus becomes redundant.
---
 gcc/tree-vect-loop.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b7f73ca52c7..1bd9a6ea52c 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5214,9 +5214,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   if (double_reduc)
 loop = outer_loop;
   exit_bb = single_exit (loop)->dest;
+  exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
   for (unsigned i = 0; i < vec_num; i++)
 {
+  gimple_seq stmts = NULL;
   if (slp_node)
def = vect_get_slp_vect_def (slp_node, i);
   else
@@ -5228,12 +5230,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  if (j)
def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
+ new_def = gimple_convert (&stmts, vectype, new_def);
  reduc_inputs.quick_push (new_def);
}
+  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 }
 
-  exit_gsi = gsi_after_labels (exit_bb);
-
   /* 2.2 Get the relevant tree-code to use in the epilog for schemes 2,3
  (i.e. when reduc_fn is not available) and in the final adjustment
 code (if needed).  Also get the original scalar reduction variable as
@@ -5277,17 +5279,14 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   || ncopies > 1)
 {
   gimple_seq stmts = NULL;
-  tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]);
+  tree single_input = reduc_inputs[0];
   for (k = 1; k < reduc_inputs.length (); k++)
-{
- tree second_vect = gimple_convert (&stmts, vectype, reduc_inputs[k]);
-  first_vect = gimple_build (&stmts, code, vectype,
-first_vect, second_vect);
-}
+   single_input = gimple_build (&stmts, code, vectype,
+single_input, reduc_inputs[k]);
   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 
   reduc_inputs.truncate (0);
-  reduc_inputs.safe_push (first_vect);
+  reduc_inputs.safe_push (single_input);
 }
 
   if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION
@@ -5323,10 +5322,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   /* Vector of {0, 0, 0,...}.  */
   tree zero_vec = build_zero_cst (vectype);
 
-  gimple_seq stmts = NULL;
-  reduc_inputs[0] = gimple_convert (&stmts, vectype, reduc_inputs[0]);
-  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
-
   /* Find maximum value from the vector of found indexes.  */
   tree max_index = make_ssa_name (index_scalar_type);
   gcall *max_index_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
@@ -5394,7 +5389,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 
   /* Convert the reduced value back to the result type and set as the
 result.  */
-  stmts = NULL;
+  gimple_seq stmts = NULL;
   new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
   data_reduc);
   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
@@ -5412,7 +5407,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 val = data_reduc[i], idx_val = induction_index[i];
 return val;  */
 
-  tree data_eltype = TREE_TYPE (TREE_TYPE (reduc_inputs[0]));
+  tree data_eltype = TREE_TYPE (vectype);
   tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index));
   unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype));
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index));
@@ -5488,8 +5483,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 "Reduce using direct vector reduction.\n");
 
   gimple_seq stmts = NULL;
-  reduc_inputs[0] = gimple_convert (&stmts, vectype, reduc_inputs[0]);
-  vec_elem_type = TREE_TYPE (TREE_TYPE (reduc_inputs[0]));
+  vec_elem_type = TREE_TYPE (vectype);
   new_temp = gimple_build (&stmts, as_combined_fn (reduc_fn),
   vec_elem_type, reduc_inputs[0]);
   new_temp = gimple_convert (&stmts, scalar_type, new_temp);


[PATCH 05/10] vect: Add a vect_phi_initial_value helper function

2021-07-08 Thread Richard Sandiford via Gcc-patches
This patch adds a helper function called vect_phi_initial_value
for returning the incoming value of a given loop phi.  The main
reason for adding it is to ensure that the right preheader edge
is used when vectorising nested loops.  (PHI_ARG_DEF_FROM_EDGE
itself doesn't assert that the given edge is for the right block,
although I guess that would be good to add separately.)

gcc/
* tree-vectorizer.h: Include tree-ssa-operands.h.
(vect_phi_initial_value): New function.
* tree-vect-loop.c (neutral_op_for_slp_reduction): Use it.
(get_initial_defs_for_reduction, info_for_reduction): Likewise.
(vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
(vect_transform_cycle_phi, vectorizable_induction): Likewise.
---
 gcc/tree-vect-loop.c  | 29 +
 gcc/tree-vectorizer.h | 21 -
 2 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 1bd9a6ea52c..a31d7621c3b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3288,8 +3288,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree 
vector_type,
 has only a single initial value, so that value is neutral for
 all statements.  */
   if (reduc_chain)
-   return PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt,
- loop_preheader_edge (loop));
+   return vect_phi_initial_value (stmt_vinfo);
   return NULL_TREE;
 
 default:
@@ -4829,13 +4828,13 @@ get_initial_defs_for_reduction (vec_info *vinfo,
   /* Get the def before the loop.  In reduction chain we have only
 one initial value.  Else we have as many as PHIs in the group.  */
   if (reduc_chain)
-   op = j != 0 ? neutral_op : PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe);
+   op = j != 0 ? neutral_op : vect_phi_initial_value (stmt_vinfo);
   else if (((vec_oprnds->length () + 1) * nunits
- number_of_places_left_in_vector >= group_size)
   && neutral_op)
op = neutral_op;
   else
-   op = PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe);
+   op = vect_phi_initial_value (stmt_vinfo);
 
   /* Create 'vect_ = {op0,op1,...,opn}'.  */
   number_of_places_left_in_vector--;
@@ -4906,9 +4905,7 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info 
stmt_info)
 }
   else if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
 {
-  edge pe = loop_preheader_edge (gimple_bb (phi)->loop_father);
-  stmt_vec_info info
- = vinfo->lookup_def (PHI_ARG_DEF_FROM_EDGE (phi, pe));
+  stmt_vec_info info = vinfo->lookup_def (vect_phi_initial_value (phi));
   if (info && STMT_VINFO_DEF_TYPE (info) == vect_double_reduction_def)
stmt_info = info;
 }
@@ -5042,8 +5039,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 {
   /* Get at the scalar def before the loop, that defines the initial value
 of the reduction variable.  */
-  initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt,
-  loop_preheader_edge (loop));
+  initial_def = vect_phi_initial_value (reduc_def_stmt);
   /* Optimize: for induction condition reduction, if we can't use zero
  for induc_val, use initial_def.  */
   if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION)
@@ -5558,9 +5554,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 for MIN and MAX reduction, for example.  */
  if (!neutral_op)
{
- tree scalar_value
-   = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt,
-loop_preheader_edge (loop));
+ tree scalar_value = vect_phi_initial_value (orig_phis[i]);
  scalar_value = gimple_convert (&seq, TREE_TYPE (vectype),
 scalar_value);
  vector_identity = gimple_build_vector_from_val (&seq, vectype,
@@ -6752,10 +6746,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (cond_reduc_dt == vect_constant_def)
{
  enum vect_def_type cond_initial_dt;
- tree cond_initial_val
-   = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, loop_preheader_edge (loop));
-
- gcc_assert (cond_reduc_val != NULL_TREE);
+ tree cond_initial_val = vect_phi_initial_value (reduc_def_phi);
  vect_is_simple_use (cond_initial_val, loop_vinfo, &cond_initial_dt);
  if (cond_initial_dt == vect_constant_def
  && types_compatible_p (TREE_TYPE (cond_initial_val),
@@ -7528,8 +7519,7 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
 {
   /* Get at the scalar def before the loop, that defines the initial
 value of the reduction variable.  */
-  tree initial_def = PHI_ARG_DEF_FROM_EDGE (phi,
-   loop_preheader_edge (loop))

[PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction

2021-07-08 Thread Richard Sandiford via Gcc-patches
This patch passes the reduc_info to get_initial_defs_for_reduction,
so that the function can get general information from there rather
than from the first SLP statement.  This isn't a win on its own,
but it becomes important with later patches.

gcc/
* tree-vect-loop.c (get_initial_defs_for_reduction): Take the
reduc_info as an additional parameter.
(vect_transform_cycle_phi): Update accordingly.
---
 gcc/tree-vect-loop.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a31d7621c3b..565c2859477 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4764,32 +4764,28 @@ get_initial_def_for_reduction (loop_vec_info loop_vinfo,
   return init_def;
 }
 
-/* Get at the initial defs for the reduction PHIs in SLP_NODE.
-   NUMBER_OF_VECTORS is the number of vector defs to create.
-   If NEUTRAL_OP is nonnull, introducing extra elements of that
-   value will not change the result.  */
+/* Get at the initial defs for the reduction PHIs for REDUC_INFO, whose
+   associated SLP node is SLP_NODE.  NUMBER_OF_VECTORS is the number of vector
+   defs to create.  If NEUTRAL_OP is nonnull, introducing extra elements of
+   that value will not change the result.  */
 
 static void
 get_initial_defs_for_reduction (vec_info *vinfo,
+   stmt_vec_info reduc_info,
slp_tree slp_node,
vec *vec_oprnds,
unsigned int number_of_vectors,
bool reduc_chain, tree neutral_op)
 {
   vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
-  stmt_vec_info stmt_vinfo = stmts[0];
   unsigned HOST_WIDE_INT nunits;
   unsigned j, number_of_places_left_in_vector;
-  tree vector_type;
+  tree vector_type = STMT_VINFO_VECTYPE (reduc_info);
   unsigned int group_size = stmts.length ();
   unsigned int i;
   class loop *loop;
 
-  vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
-
-  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
-
-  loop = (gimple_bb (stmt_vinfo->stmt))->loop_father;
+  loop = (gimple_bb (reduc_info->stmt))->loop_father;
   gcc_assert (loop);
   edge pe = loop_preheader_edge (loop);
 
@@ -4823,7 +4819,7 @@ get_initial_defs_for_reduction (vec_info *vinfo,
 {
   tree op;
   i = j % group_size;
-  stmt_vinfo = stmts[i];
+  stmt_vec_info stmt_vinfo = stmts[i];
 
   /* Get the def before the loop.  In reduction chain we have only
 one initial value.  Else we have as many as PHIs in the group.  */
@@ -7510,7 +7506,8 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  = neutral_op_for_slp_reduction (slp_node, vectype_out,
  STMT_VINFO_REDUC_CODE 
(reduc_info),
  first != NULL);
- get_initial_defs_for_reduction (loop_vinfo, 
slp_node_instance->reduc_phis,
+ get_initial_defs_for_reduction (loop_vinfo, reduc_info,
+ slp_node_instance->reduc_phis,
  &vec_initial_defs, vec_num,
  first != NULL, neutral_op);
}


[PATCH 07/10] vect: Pass reduc_info to get_initial_def_for_reduction

2021-07-08 Thread Richard Sandiford via Gcc-patches
Similarly to the previous patch, this one passes the reduc_info
to get_initial_def_for_reduction, rather than a stmt_vec_info that
lacks the metadata.  This again becomes useful later.

gcc/
* tree-vect-loop.c (get_initial_def_for_reduction): Take the
reduc_info instead of the original stmt_vec_info.
(vect_transform_cycle_phi): Update accordingly.
---
 gcc/tree-vect-loop.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 565c2859477..a67036f92e0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4625,7 +4625,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
 /* Function get_initial_def_for_reduction
 
Input:
-   STMT_VINFO - a stmt that performs a reduction operation in the loop.
+   REDUC_INFO - the info_for_reduction
INIT_VAL - the initial value of the reduction variable
 
Output:
@@ -4667,7 +4667,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
 
 static tree
 get_initial_def_for_reduction (loop_vec_info loop_vinfo,
-  stmt_vec_info stmt_vinfo,
+  stmt_vec_info reduc_info,
   enum tree_code code, tree init_val,
tree *adjustment_def)
 {
@@ -4685,8 +4685,8 @@ get_initial_def_for_reduction (loop_vec_info loop_vinfo,
   gcc_assert (POINTER_TYPE_P (scalar_type) || INTEGRAL_TYPE_P (scalar_type)
  || SCALAR_FLOAT_TYPE_P (scalar_type));
 
-  gcc_assert (nested_in_vect_loop_p (loop, stmt_vinfo)
- || loop == (gimple_bb (stmt_vinfo->stmt))->loop_father);
+  gcc_assert (nested_in_vect_loop_p (loop, reduc_info)
+ || loop == (gimple_bb (reduc_info->stmt))->loop_father);
 
   /* ADJUSTMENT_DEF is NULL when called from
  vect_create_epilog_for_reduction to vectorize double reduction.  */
@@ -7556,7 +7556,7 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
adjustment_defp = NULL;
  vec_initial_def
-   = get_initial_def_for_reduction (loop_vinfo, reduc_stmt_info, code,
+   = get_initial_def_for_reduction (loop_vinfo, reduc_info, code,
 initial_def, adjustment_defp);
  STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info) = adjustment_def;
  vec_initial_defs.create (ncopies);


[PATCH 08/10] vect: Generalise neutral_op_for_slp_reduction

2021-07-08 Thread Richard Sandiford via Gcc-patches
This patch generalises the interface to neutral_op_for_slp_reduction
so that it can be used for non-SLP reductions too.  This isn't much
of a win on its own, but it helps later patches.

gcc/
* tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with...
(neutral_op_for_reduction): ...this, providing a more general
interface.
(vect_create_epilog_for_reduction): Update accordingly.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
---
 gcc/tree-vect-loop.c | 59 +++-
 1 file changed, 26 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a67036f92e0..744645d8bad 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3248,23 +3248,15 @@ reduction_fn_for_scalar_code (enum tree_code code, 
internal_fn *reduc_fn)
 }
 }
 
-/* If there is a neutral value X such that SLP reduction NODE would not
-   be affected by the introduction of additional X elements, return that X,
-   otherwise return null.  CODE is the code of the reduction and VECTOR_TYPE
-   is the vector type that would hold element X.  REDUC_CHAIN is true if
-   the SLP statements perform a single reduction, false if each statement
-   performs an independent reduction.  */
+/* If there is a neutral value X such that a reduction would not be affected
+   by the introduction of additional X elements, return that X, otherwise
+   return null.  CODE is the code of the reduction and SCALAR_TYPE is type
+   of the scalar elements.  If the reduction has just a single initial value
+   then INITIAL_VALUE is that value, otherwise it is null.  */
 
 static tree
-neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type,
- tree_code code, bool reduc_chain)
+neutral_op_for_reduction (tree scalar_type, tree_code code, tree initial_value)
 {
-  vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
-  stmt_vec_info stmt_vinfo = stmts[0];
-  tree scalar_type = TREE_TYPE (vector_type);
-  class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father;
-  gcc_assert (loop);
-
   switch (code)
 {
 case WIDEN_SUM_EXPR:
@@ -3284,12 +3276,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree 
vector_type,
 
 case MAX_EXPR:
 case MIN_EXPR:
-  /* For MIN/MAX the initial values are neutral.  A reduction chain
-has only a single initial value, so that value is neutral for
-all statements.  */
-  if (reduc_chain)
-   return vect_phi_initial_value (stmt_vinfo);
-  return NULL_TREE;
+  return initial_value;
 
 default:
   return NULL_TREE;
@@ -5535,10 +5522,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   tree neutral_op = NULL_TREE;
   if (slp_node)
{
- stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (stmt_info);
- neutral_op
-   = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis,
-   vectype, code, first != NULL);
+ tree initial_value = NULL_TREE;
+ if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
+   initial_value = vect_phi_initial_value (orig_phis[0]);
+ neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code,
+initial_value);
}
   if (neutral_op)
vector_identity = gimple_build_vector_from_val (&seq, vectype,
@@ -6935,9 +6923,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* For SLP reductions, see if there is a neutral value we can use.  */
   tree neutral_op = NULL_TREE;
   if (slp_node)
-neutral_op = neutral_op_for_slp_reduction
-  (slp_node_instance->reduc_phis, vectype_out, orig_code,
-   REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL);
+{
+  tree initial_value = NULL_TREE;
+  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL)
+   initial_value = vect_phi_initial_value (reduc_def_phi);
+  neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype_out),
+orig_code, initial_value);
+}
 
   if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION)
 {
@@ -7501,15 +7493,16 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
   else
{
  gcc_assert (slp_node == slp_node_instance->reduc_phis);
- stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
- tree neutral_op
- = neutral_op_for_slp_reduction (slp_node, vectype_out,
- STMT_VINFO_REDUC_CODE 
(reduc_info),
- first != NULL);
+ tree initial_value = NULL_TREE;
+ if (REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info))
+   initial_value = vect_phi_initial_value (phi);
+ tree_code code = STMT_VINFO_REDUC_CODE (reduc_info);
+ tree neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype

[PATCH 09/10] vect: Simplify get_initial_def_for_reduction

2021-07-08 Thread Richard Sandiford via Gcc-patches
After previous patches, we can now easily provide the neutral op
as an argument to get_initial_def_for_reduction.  This in turn
allows the adjustment calculation to be moved outside of
get_initial_def_for_reduction, which is the main motivation
of the patch.

gcc/
* tree-vect-loop.c (get_initial_def_for_reduction): Remove
adjustment handling.  Take the neutral value as an argument,
in place of the code argument.
(vect_transform_cycle_phi): Update accordingly.  Handle the
initial values of cond reductions separately from code reductions.
Choose the adjustment here rather than in
get_initial_def_for_reduction.  Sink the splat of vec_initial_def.
---
 gcc/tree-vect-loop.c | 177 +++
 1 file changed, 59 insertions(+), 118 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 744645d8bad..fe7e73f655f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4614,57 +4614,26 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
Input:
REDUC_INFO - the info_for_reduction
INIT_VAL - the initial value of the reduction variable
+   NEUTRAL_OP - a value that has no effect on the reduction, as per
+   neutral_op_for_reduction
 
Output:
-   ADJUSTMENT_DEF - a tree that holds a value to be added to the final result
-of the reduction (used for adjusting the epilog - see below).
Return a vector variable, initialized according to the operation that
STMT_VINFO performs. This vector will be used as the initial value
of the vector of partial results.
 
-   Option1 (adjust in epilog): Initialize the vector as follows:
- add/bit or/xor:[0,0,...,0,0]
- mult/bit and:  [1,1,...,1,1]
- min/max/cond_expr: [init_val,init_val,..,init_val,init_val]
-   and when necessary (e.g. add/mult case) let the caller know
-   that it needs to adjust the result by init_val.
-
-   Option2: Initialize the vector as follows:
- add/bit or/xor:[init_val,0,0,...,0]
- mult/bit and:  [init_val,1,1,...,1]
- min/max/cond_expr: [init_val,init_val,...,init_val]
-   and no adjustments are needed.
-
-   For example, for the following code:
-
-   s = init_val;
-   for (i=0;istmt))->loop_father);
 
-  /* ADJUSTMENT_DEF is NULL when called from
- vect_create_epilog_for_reduction to vectorize double reduction.  */
-  if (adjustment_def)
-*adjustment_def = NULL;
-
-  switch (code)
+  if (operand_equal_p (init_val, neutral_op))
 {
-case WIDEN_SUM_EXPR:
-case DOT_PROD_EXPR:
-case SAD_EXPR:
-case PLUS_EXPR:
-case MINUS_EXPR:
-case BIT_IOR_EXPR:
-case BIT_XOR_EXPR:
-case MULT_EXPR:
-case BIT_AND_EXPR:
-  {
-if (code == MULT_EXPR)
-  {
-real_init_val = dconst1;
-int_init_val = 1;
-  }
-
-if (code == BIT_AND_EXPR)
-  int_init_val = -1;
-
-if (SCALAR_FLOAT_TYPE_P (scalar_type))
-  def_for_init = build_real (scalar_type, real_init_val);
-else
-  def_for_init = build_int_cst (scalar_type, int_init_val);
-
-   if (adjustment_def || operand_equal_p (def_for_init, init_val, 0))
- {
-   /* Option1: the first element is '0' or '1' as well.  */
-   if (!operand_equal_p (def_for_init, init_val, 0))
- *adjustment_def = init_val;
-   init_def = gimple_build_vector_from_val (&stmts, vectype,
-def_for_init);
- }
-   else if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
- {
-   /* Option2 (variable length): the first element is INIT_VAL.  */
-   init_def = gimple_build_vector_from_val (&stmts, vectype,
-def_for_init);
-   init_def = gimple_build (&stmts, CFN_VEC_SHL_INSERT,
-vectype, init_def, init_val);
- }
-   else
- {
-   /* Option2: the first element is INIT_VAL.  */
-   tree_vector_builder elts (vectype, 1, 2);
-   elts.quick_push (init_val);
-   elts.quick_push (def_for_init);
-   init_def = gimple_build_vector (&stmts, &elts);
- }
-  }
-  break;
-
-case MIN_EXPR:
-case MAX_EXPR:
-case COND_EXPR:
-  {
-   init_val = gimple_convert (&stmts, TREE_TYPE (vectype), init_val);
-   init_def = gimple_build_vector_from_val (&stmts, vectype, init_val);
-  }
-  break;
-
-default:
-  gcc_unreachable ();
+  /* If both elements are equal then the vector described above is
+just a splat.  */
+  neutral_op = gimple_convert (&stmts, TREE_TYPE (vectype), neutral_op);
+  init_def = gimple_build_vector_from_val (&stmts, vectype, neutral_op);
+}
+  else
+{
+  neutral_op = gimple_convert (&stmts, TREE_TYPE (vectype), neutral_op);
+  init_val = gimple_convert 

[PATCH 10/10] vect: Reuse reduction accumulators between loops

2021-07-08 Thread Richard Sandiford via Gcc-patches
This patch adds support for reusing a main loop's reduction accumulator
in an epilogue loop.  This in turn lets the loops share a single piece
of vector->scalar reduction code.

The patch has the following restrictions:

(1) The epilogue reduction can only operate on a single vector
(e.g. ncopies must be 1 for non-SLP reductions, and the group size
must be <= the element count for SLP reductions).

(2) Both loops must use the same vector mode for their accumulators.
This means that the patch is restricted to targets that support
--param vect-partial-vector-usage=1.

(3) The reduction must be a standard “tree code” reduction.

However, these restrictions could be lifted in future.  For example,
if the main loop operates on 128-bit vectors and the epilogue loop
operates on 64-bit vectors, we could in future reduce the 128-bit
vector by one stage and use the 64-bit result as the starting point
for the epilogue result.

The patch tries to handle chained SLP reductions, unchained SLP
reductions and non-SLP reductions.  It also handles cases in which
the epilogue loop is entered directly (rather than via the main loop)
and cases in which the epilogue loop can be skipped.

vect_get_main_loop_result is a bit more general than the current
patch needs.

gcc/
* tree-vectorizer.h (vect_reusable_accumulator): New structure.
(_loop_vec_info::main_loop_edge): New field.
(_loop_vec_info::skip_main_loop_edge): Likewise.
(_loop_vec_info::skip_this_loop_edge): Likewise.
(_loop_vec_info::reusable_accumulators): Likewise.
(_stmt_vec_info::reduc_scalar_results): Likewise.
(_stmt_vec_info::reused_accumulator): Likewise.
(vect_get_main_loop_result): Declare.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
reduc_scalar_inputs.
(vec_info::free_stmt_vec_info): Free reduc_scalar_inputs.
* tree-vect-loop-manip.c (vect_get_main_loop_result): New function.
(vect_do_peeling): Fill an epilogue loop's main_loop_edge,
skip_main_loop_edge and skip_this_loop_edge fields.
* tree-vect-loop.c (INCLUDE_ALGORITHM): Define.
(vect_emit_reduction_init_stmts): New function.
(get_initial_def_for_reduction): Use it.
(get_initial_defs_for_reduction): Likewise.  Change the vinfo
parameter to a loop_vec_info.
(vect_create_epilog_for_reduction): Store the scalar results
in the reduc_info.  If an epilogue loop is reusing an accumulator
from the main loop, and if the epilogue loop can also be skipped,
try to place the reduction code in the join block.  Record
accumulators that could potentially be reused by epilogue loops.
(vect_transform_cycle_phi): When vectorizing epilogue loops,
try to reuse accumulators from the main loop.  Record the initial
value in reduc_info for non-SLP reductions too.

gcc/testsuite/
* gcc.target/aarch64/sve/reduc_9.c: New test.
* gcc.target/aarch64/sve/reduc_9_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_10.c: Likewise.
* gcc.target/aarch64/sve/reduc_10_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_11.c: Likewise.
* gcc.target/aarch64/sve/reduc_11_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_12.c: Likewise.
* gcc.target/aarch64/sve/reduc_12_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_13.c: Likewise.
* gcc.target/aarch64/sve/reduc_13_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_14.c: Likewise.
* gcc.target/aarch64/sve/reduc_14_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_15.c: Likewise.
* gcc.target/aarch64/sve/reduc_15_run.c: Likewise.
---
 .../gcc.target/aarch64/sve/reduc_10.c |  77 +
 .../gcc.target/aarch64/sve/reduc_10_run.c |  49 +++
 .../gcc.target/aarch64/sve/reduc_11.c |  71 
 .../gcc.target/aarch64/sve/reduc_11_run.c |  34 ++
 .../gcc.target/aarch64/sve/reduc_12.c |  71 
 .../gcc.target/aarch64/sve/reduc_12_run.c |  66 
 .../gcc.target/aarch64/sve/reduc_13.c | 101 ++
 .../gcc.target/aarch64/sve/reduc_13_run.c |  61 
 .../gcc.target/aarch64/sve/reduc_14.c | 107 ++
 .../gcc.target/aarch64/sve/reduc_14_run.c | 187 +++
 .../gcc.target/aarch64/sve/reduc_15.c |  16 +
 .../gcc.target/aarch64/sve/reduc_15_run.c |  22 ++
 .../gcc.target/aarch64/sve/reduc_9.c  |  77 +
 .../gcc.target/aarch64/sve/reduc_9_run.c  |  29 ++
 gcc/tree-vect-loop-manip.c|  29 ++
 gcc/tree-vect-loop.c  | 309 ++
 gcc/tree-vectorizer.c |   4 +
 gcc/tree-vectorizer.h |  51 ++-
 18 files changed, 1297 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/reduc_10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/reduc_10_run

Re: [PATCH] PR tree-optimization/38943: Preserve trapping instructions with -fnon-call-exceptions

2021-07-08 Thread Eric Botcazou
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.  This should
> be relatively safe, as there are no changes in behaviour unless
> the user explicitly specifies -fnon-call-exceptions, when the C
> compiler then behaves more like the C++/Ada compiler.

I think this will pessimize Ada, which defaults to -fnon-call-exceptions but 
where we do *not* want to preserve trapping instructions just because they may 
trap (i.e. -fdelete-dead-exceptions is enabled by default).

And, as noticed by Richard, EH is orthogonal to side effects and pure/const.

-- 
Eric Botcazou




Re: [x86_64 PATCH]: Improvement to signed division of integer constant.

2021-07-08 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle  wrote:
>
>
> This patch tweaks the way GCC handles 32-bit integer division on
> x86_64, when the numerator is constant.  Currently the function
>
> int foo (int x) {
>   return 100/x;
> }
>
> generates the code:
> foo:movl$100, %eax
> cltd
> idivl   %edi
> ret
>
> where the sign-extension instruction "cltd" creates a long
> dependency chain, as it depends on the "mov" before it, and
> is depended upon by "idivl" after it.
>
> With this patch, GCC now matches both icc and LLVM and
> uses an xor instead, generating:
> foo:xorl%edx, %edx
> movl$100, %eax
> idivl   %edi
> ret
>
> Microbenchmarking confirms that this is faster on Intel
> processors (Kaby lake), and no worse on AMD processors (Zen2),
> which agrees with intuition, but oddly disagrees with the
> llvm-mca cycle count prediction on godbolt.org.
>
> The tricky bit is that this sign-extension instruction is only
> produced by late (postreload) splitting, and unfortunately none
> of the subsequent passes (e.g. cprop_hardreg) is able to
> propagate and simplify its constant argument.  The solution
> here is to introduce a define_insn_and_split that allows the
> constant numerator operand to be captured (by combine) and
> then split into an optimal form after reload.
>
> The above microbenchmarking also shows that eliminating the
> sign extension of negative values (using movl $-1,%edx) is also
> a performance improvement, as performed by icc but not by LLVM.
> Both the xor and movl sign-extensions are larger than cltd,
> so this transformation is prevented for -Os.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-08  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*divmodsi4_const): Optimize SImode
> divmod of a constant numerator with new define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/divmod-9.c: New test case.

+  if (INTVAL (operands[2]) < 0)
+emit_move_insn (operands[1], constm1_rtx);
+  else
+ix86_expand_clear (operands[1]);

No need to call ix86_expand_clear,

emit_move_insn (operands[1], const0_rtx);

will result in xor, too.

OK with the above change.

Thanks,
Uros.

>
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


Re: [PATCH 01/10] vect: Simplify epilogue reduction code

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:41 PM Richard Sandiford via Gcc-patches
 wrote:
>
> vect_create_epilog_for_reduction only handles two cases: single-loop
> reductions and double reductions.  “nested cycles” (i.e. reductions
> in the inner loop when vectorising an outer loop) are handled elsewhere
> and don't need a vector->scalar reduction.
>
> The function had variables called nested_in_vect_loop and double_reduc
> and asserted that nested_in_vect_loop implied double_reduc, but it
> still had code to handle nested_in_vect_loop && !double_reduc.
> This patch removes that and uses double_reduc everywhere.

OK.

(cleaning up after the GCC 10 time refactoring was still on my list :/)

> gcc/
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Remove
> nested_in_vect_loop and use double_reduc everywhere.  Remove dead
> assignment to "loop".
> ---
>  gcc/tree-vect-loop.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index bc523d151c6..7c3e3352b43 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -5005,7 +5005,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>imm_use_iterator imm_iter, phi_imm_iter;
>use_operand_p use_p, phi_use_p;
>gimple *use_stmt;
> -  bool nested_in_vect_loop = false;
>auto_vec new_phis;
>int j, i;
>auto_vec scalar_results;
> @@ -5023,10 +5022,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>  {
>outer_loop = loop;
>loop = loop->inner;
> -  nested_in_vect_loop = true;
> -  gcc_assert (!slp_node);
> +  gcc_assert (!slp_node && double_reduc);
>  }
> -  gcc_assert (!nested_in_vect_loop || double_reduc);
>
>vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info);
>gcc_assert (vectype);
> @@ -5049,8 +5046,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
> induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info);
>else if (double_reduc)
> ;
> -  else if (nested_in_vect_loop)
> -   ;
>else
> adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info);
>  }
> @@ -5923,7 +5918,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>  {
>gcc_assert (!slp_reduc);
>gimple_seq stmts = NULL;
> -  if (nested_in_vect_loop)
> +  if (double_reduc)
> {
>new_phi = new_phis[0];
>   gcc_assert (VECTOR_TYPE_P (TREE_TYPE (adjustment_def)));
> @@ -5942,21 +5937,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>
>epilog_stmt = gimple_seq_last_stmt (stmts);
>gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> -  if (nested_in_vect_loop)
> -{
> -  if (!double_reduc)
> -scalar_results.quick_push (new_temp);
> -  else
> -scalar_results[0] = new_temp;
> -}
> -  else
> -scalar_results[0] = new_temp;
> -
> +  scalar_results[0] = new_temp;
>new_phis[0] = epilog_stmt;
>  }
>
>if (double_reduc)
> -loop = loop->inner;
> +loop = outer_loop;
>
>/* 2.6  Handle the loop-exit phis.  Replace the uses of scalar loop-exit
>phis with new adjusted scalar results, i.e., replace use 
> @@ -6017,14 +6003,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>   scalar_dest = gimple_assign_lhs (scalar_stmt_info->stmt);
>  }
>
> -  if (nested_in_vect_loop)
> -{
> -  if (double_reduc)
> -loop = outer_loop;
> -  else
> -   gcc_unreachable ();
> -}
> -
>phis.create (3);
>/* Find the loop-closed-use at the loop exit of the original scalar
>   result.  (The reduction result is expected to have two immediate 
> uses,


Re: [PATCH 02/10] vect: Create array_slice of live-out stmts

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:42 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This patch constructs an array_slice of the scalar statements that
> produce live-out reduction results in the original unvectorised loop.
> There are three cases:
>
> - SLP reduction chains: the final SLP stmt is live-out
> - full SLP reductions: all SLP stmts are live-out
> - non-SLP reductions: the single scalar stmt is live-out
>
> This is a slight simplification on its own, mostly because it maans
> “group_size” has a consistent meaning throughout the function.
> The main justification though is that it helps with later patches.

OK

> gcc/
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate
> scalar_results to group_size elements after reducing down from
> N*group_size elements.  Construct an array_slice of the live-out
> stmts and assert that there is one stmt per scalar result.
> ---
>  gcc/tree-vect-loop.c | 61 +++-
>  1 file changed, 21 insertions(+), 40 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 7c3e3352b43..8390ac80ca0 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -5010,7 +5010,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>auto_vec scalar_results;
>unsigned int group_size = 1, k;
>auto_vec phis;
> -  bool slp_reduc = false;
> +  /* SLP reduction without reduction chain, e.g.,
> + # a1 = phi 
> + # b1 = phi 
> + a2 = operation (a1)
> + b2 = operation (b1)  */
> +  bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info));
>bool direct_slp_reduc;
>tree new_phi_result;
>tree induction_index = NULL_TREE;
> @@ -5050,6 +5055,16 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
> adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info);
>  }
>
> +  stmt_vec_info single_live_out_stmt[] = { stmt_info };
> +  array_slice live_out_stmts = single_live_out_stmt;
> +  if (slp_reduc)
> +/* All statements produce live-out values.  */
> +live_out_stmts = SLP_TREE_SCALAR_STMTS (slp_node);
> +  else if (slp_node)
> +/* The last statement in the reduction chain produces the live-out
> +   value.  */
> +single_live_out_stmt[0] = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 
> 1];
> +
>unsigned vec_num;
>int ncopies;
>if (slp_node)
> @@ -5248,13 +5263,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>new_scalar_dest = vect_create_destination_var (scalar_dest, NULL);
>bitsize = TYPE_SIZE (scalar_type);
>
> -  /* SLP reduction without reduction chain, e.g.,
> - # a1 = phi 
> - # b1 = phi 
> - a2 = operation (a1)
> - b2 = operation (b1)  */
> -  slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info));
> -
>/* True if we should implement SLP_REDUC using native reduction operations
>   instead of scalar operations.  */
>direct_slp_reduc = (reduc_fn != IFN_LAST
> @@ -5877,6 +5885,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>   first_res, res);
>scalar_results[j % group_size] = new_res;
>  }
> + scalar_results.truncate (group_size);
>   for (k = 0; k < group_size; k++)
> scalar_results[k] = gimple_convert (&stmts, scalar_type,
> scalar_results[k]);
> @@ -5969,39 +5978,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>use 
>use  */
>
> -
> -  /* In SLP reduction chain we reduce vector results into one vector if
> - necessary, hence we set here REDUC_GROUP_SIZE to 1.  SCALAR_DEST is the
> - LHS of the last stmt in the reduction chain, since we are looking for
> - the loop exit phi node.  */
> -  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
> -{
> -  stmt_vec_info dest_stmt_info
> -   = vect_orig_stmt (SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]);
> -  scalar_dest = gimple_assign_lhs (dest_stmt_info->stmt);
> -  group_size = 1;
> -}
> -
> -  /* In SLP we may have several statements in NEW_PHIS and REDUCTION_PHIS (in
> - case that REDUC_GROUP_SIZE is greater than vectorization factor).
> - Therefore, we need to match SCALAR_RESULTS with corresponding 
> statements.
> - The first (REDUC_GROUP_SIZE / number of new vector stmts) scalar results
> - correspond to the first vector stmt, etc.
> - (RATIO is equal to (REDUC_GROUP_SIZE / number of new vector stmts)).  */
> -  if (group_size > new_phis.length ())
> -gcc_assert (!(group_size % new_phis.length ()));
> -
> -  for (k = 0; k < group_size; k++)
> +  gcc_assert (live_out_stmts.size () == scalar_results.length ());
> +  for (k = 0; k < live_out_stmts.size (); k++)
>  {
> -  if (slp_reduc)
> -{
> - stmt_vec_info scalar_stmt_info = SLP

Re: [PATCH 03/10] vect: Remove new_phis from

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:43 PM Richard Sandiford via Gcc-patches
 wrote:
>
> vect_create_epilog_for_reduction had a variable called new_phis.
> It collected the statements that produce the exit block definitions
> of the vector reduction accumulators.  Although those statements
> are indeed phis initially, they are often replaced with normal
> statements later, leading to puzzling code like:
>
>   FOR_EACH_VEC_ELT (new_phis, i, new_phi)
> {
>   int bit_offset;
>   if (gimple_code (new_phi) == GIMPLE_PHI)
> vec_temp = PHI_RESULT (new_phi);
>   else
> vec_temp = gimple_assign_lhs (new_phi);
>
> Also, although the array collects statements, in practice all users want
> the lhs instead.
>
> This patch therefore replaces new_phis with a vector of gimple values
> called “reduc_inputs”.
>
> Also, reduction chains and ncopies>1 were handled with identical code
> (and there was a comment saying so).  The patch unites them into
> a single “if”.

OK.

Thanks,
Richard.

> gcc/
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Replace
> the new_phis vector with a reduc_inputs vector.  Combine handling
> of reduction chains and ncopies > 1.
> ---
>  gcc/tree-vect-loop.c | 113 ---
>  1 file changed, 41 insertions(+), 72 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8390ac80ca0..b7f73ca52c7 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -5005,7 +5005,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>imm_use_iterator imm_iter, phi_imm_iter;
>use_operand_p use_p, phi_use_p;
>gimple *use_stmt;
> -  auto_vec new_phis;
> +  auto_vec reduc_inputs;
>int j, i;
>auto_vec scalar_results;
>unsigned int group_size = 1, k;
> @@ -5017,7 +5017,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>   b2 = operation (b1)  */
>bool slp_reduc = (slp_node && !REDUC_GROUP_FIRST_ELEMENT (stmt_info));
>bool direct_slp_reduc;
> -  tree new_phi_result;
>tree induction_index = NULL_TREE;
>
>if (slp_node)
> @@ -5215,7 +5214,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>if (double_reduc)
>  loop = outer_loop;
>exit_bb = single_exit (loop)->dest;
> -  new_phis.create (slp_node ? vec_num : ncopies);
> +  reduc_inputs.create (slp_node ? vec_num : ncopies);
>for (unsigned i = 0; i < vec_num; i++)
>  {
>if (slp_node)
> @@ -5223,19 +5222,14 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>else
> def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
>for (j = 0; j < ncopies; j++)
> -{
> +   {
>   tree new_def = copy_ssa_name (def);
> -  phi = create_phi_node (new_def, exit_bb);
> -  if (j == 0)
> -new_phis.quick_push (phi);
> -  else
> -   {
> - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> - new_phis.quick_push (phi);
> -   }
> -
> -  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> -}
> + phi = create_phi_node (new_def, exit_bb);
> + if (j)
> +   def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> + SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> + reduc_inputs.quick_push (new_def);
> +   }
>  }
>
>exit_gsi = gsi_after_labels (exit_bb);
> @@ -5274,52 +5268,32 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>   a2 = operation (a1)
>   a3 = operation (a2),
>
> - we may end up with more than one vector result.  Here we reduce them to
> - one vector.  */
> -  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc)
> + we may end up with more than one vector result.  Here we reduce them
> + to one vector.
> +
> + The same is true if we couldn't use a single defuse cycle.  */
> +  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)
> +  || direct_slp_reduc
> +  || ncopies > 1)
>  {
>gimple_seq stmts = NULL;
> -  tree first_vect = PHI_RESULT (new_phis[0]);
> -  first_vect = gimple_convert (&stmts, vectype, first_vect);
> -  for (k = 1; k < new_phis.length (); k++)
> +  tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]);
> +  for (k = 1; k < reduc_inputs.length (); k++)
>  {
> - gimple *next_phi = new_phis[k];
> -  tree second_vect = PHI_RESULT (next_phi);
> - second_vect = gimple_convert (&stmts, vectype, second_vect);
> + tree second_vect = gimple_convert (&stmts, vectype, 
> reduc_inputs[k]);
>first_vect = gimple_build (&stmts, code, vectype,
>  first_vect, second_vect);
>  }
>gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>
> -  new_phi_result = fi

Re: [PATCH 04/10] vect: Ensure reduc_inputs always have vectype

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:44 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Vector reduction accumulators can differ in signedness from the
> final scalar result.  The conversions to handle that case were
> distributed through vect_create_epilog_for_reduction; this patch
> does the conversion up-front instead.

But is that still correct?  The conversions should be unsigned -> signed,
that is, we've performed the reduction in unsigned because we associated
the originally undefined overflow signed reduction.  But the final
reduction of the vector lanes in the epilogue still needs to be done
unsigned.

So it's just not obvious that the patch preserves this - if it does then
the patch is OK.

Richard.

> gcc/
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
> the phi results to vectype after creating them.  Remove later
> conversion code that thus becomes redundant.
> ---
>  gcc/tree-vect-loop.c | 28 +++-
>  1 file changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index b7f73ca52c7..1bd9a6ea52c 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -5214,9 +5214,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>if (double_reduc)
>  loop = outer_loop;
>exit_bb = single_exit (loop)->dest;
> +  exit_gsi = gsi_after_labels (exit_bb);
>reduc_inputs.create (slp_node ? vec_num : ncopies);
>for (unsigned i = 0; i < vec_num; i++)
>  {
> +  gimple_seq stmts = NULL;
>if (slp_node)
> def = vect_get_slp_vect_def (slp_node, i);
>else
> @@ -5228,12 +5230,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>   if (j)
> def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
>   SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> + new_def = gimple_convert (&stmts, vectype, new_def);
>   reduc_inputs.quick_push (new_def);
> }
> +  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>  }
>
> -  exit_gsi = gsi_after_labels (exit_bb);
> -
>/* 2.2 Get the relevant tree-code to use in the epilog for schemes 2,3
>   (i.e. when reduc_fn is not available) and in the final adjustment
>  code (if needed).  Also get the original scalar reduction variable as
> @@ -5277,17 +5279,14 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>|| ncopies > 1)
>  {
>gimple_seq stmts = NULL;
> -  tree first_vect = gimple_convert (&stmts, vectype, reduc_inputs[0]);
> +  tree single_input = reduc_inputs[0];
>for (k = 1; k < reduc_inputs.length (); k++)
> -{
> - tree second_vect = gimple_convert (&stmts, vectype, 
> reduc_inputs[k]);
> -  first_vect = gimple_build (&stmts, code, vectype,
> -first_vect, second_vect);
> -}
> +   single_input = gimple_build (&stmts, code, vectype,
> +single_input, reduc_inputs[k]);
>gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>
>reduc_inputs.truncate (0);
> -  reduc_inputs.safe_push (first_vect);
> +  reduc_inputs.safe_push (single_input);
>  }
>
>if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION
> @@ -5323,10 +5322,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>/* Vector of {0, 0, 0,...}.  */
>tree zero_vec = build_zero_cst (vectype);
>
> -  gimple_seq stmts = NULL;
> -  reduc_inputs[0] = gimple_convert (&stmts, vectype, reduc_inputs[0]);
> -  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> -
>/* Find maximum value from the vector of found indexes.  */
>tree max_index = make_ssa_name (index_scalar_type);
>gcall *max_index_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
> @@ -5394,7 +5389,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>
>/* Convert the reduced value back to the result type and set as the
>  result.  */
> -  stmts = NULL;
> +  gimple_seq stmts = NULL;
>new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
>data_reduc);
>gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> @@ -5412,7 +5407,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>  val = data_reduc[i], idx_val = induction_index[i];
>  return val;  */
>
> -  tree data_eltype = TREE_TYPE (TREE_TYPE (reduc_inputs[0]));
> +  tree data_eltype = TREE_TYPE (vectype);
>tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index));
>unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype));
>poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> (induction_index));
> @@ -5488,8 +5483,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
> 

Re: [PATCH 05/10] vect: Add a vect_phi_initial_value helper function

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:45 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This patch adds a helper function called vect_phi_initial_value
> for returning the incoming value of a given loop phi.  The main
> reason for adding it is to ensure that the right preheader edge
> is used when vectorising nested loops.  (PHI_ARG_DEF_FROM_EDGE
> itself doesn't assert that the given edge is for the right block,
> although I guess that would be good to add separately.)

We were sometimes (most of the time?) using an explicit
loop where you now get it from the PHI - that makes the
assert somewhat pointless to some extent - of course it
makes sense on its own that the loop is the same as that
of the PHI def.  I just wonder if you think any of the existing
code might have been wrong?  If so the new assert doesn't
catch all originally wrong cases.

Otherwise OK,
Richard.

> gcc/
> * tree-vectorizer.h: Include tree-ssa-operands.h.
> (vect_phi_initial_value): New function.
> * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it.
> (get_initial_defs_for_reduction, info_for_reduction): Likewise.
> (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
> (vect_transform_cycle_phi, vectorizable_induction): Likewise.
> ---
>  gcc/tree-vect-loop.c  | 29 +
>  gcc/tree-vectorizer.h | 21 -
>  2 files changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 1bd9a6ea52c..a31d7621c3b 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -3288,8 +3288,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree 
> vector_type,
>  has only a single initial value, so that value is neutral for
>  all statements.  */
>if (reduc_chain)
> -   return PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt,
> - loop_preheader_edge (loop));
> +   return vect_phi_initial_value (stmt_vinfo);
>return NULL_TREE;
>
>  default:
> @@ -4829,13 +4828,13 @@ get_initial_defs_for_reduction (vec_info *vinfo,
>/* Get the def before the loop.  In reduction chain we have only
>  one initial value.  Else we have as many as PHIs in the group.  */
>if (reduc_chain)
> -   op = j != 0 ? neutral_op : PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, 
> pe);
> +   op = j != 0 ? neutral_op : vect_phi_initial_value (stmt_vinfo);
>else if (((vec_oprnds->length () + 1) * nunits
> - number_of_places_left_in_vector >= group_size)
>&& neutral_op)
> op = neutral_op;
>else
> -   op = PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe);
> +   op = vect_phi_initial_value (stmt_vinfo);
>
>/* Create 'vect_ = {op0,op1,...,opn}'.  */
>number_of_places_left_in_vector--;
> @@ -4906,9 +4905,7 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info 
> stmt_info)
>  }
>else if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
>  {
> -  edge pe = loop_preheader_edge (gimple_bb (phi)->loop_father);
> -  stmt_vec_info info
> - = vinfo->lookup_def (PHI_ARG_DEF_FROM_EDGE (phi, pe));
> +  stmt_vec_info info = vinfo->lookup_def (vect_phi_initial_value (phi));
>if (info && STMT_VINFO_DEF_TYPE (info) == vect_double_reduction_def)
> stmt_info = info;
>  }
> @@ -5042,8 +5039,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>  {
>/* Get at the scalar def before the loop, that defines the initial 
> value
>  of the reduction variable.  */
> -  initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt,
> -  loop_preheader_edge (loop));
> +  initial_def = vect_phi_initial_value (reduc_def_stmt);
>/* Optimize: for induction condition reduction, if we can't use zero
>   for induc_val, use initial_def.  */
>if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION)
> @@ -5558,9 +5554,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>  for MIN and MAX reduction, for example.  */
>   if (!neutral_op)
> {
> - tree scalar_value
> -   = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt,
> -loop_preheader_edge (loop));
> + tree scalar_value = vect_phi_initial_value (orig_phis[i]);
>   scalar_value = gimple_convert (&seq, TREE_TYPE (vectype),
>  scalar_value);
>   vector_identity = gimple_build_vector_from_val (&seq, vectype,
> @@ -6752,10 +6746,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>else if (cond_reduc_dt == vect_constant_def)
> {
>   enum vect_def_type cond_initial_dt;
> - tree cond_initial_val
> -   = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi, loop_preheader_edge 
>

RE: [PATCH] testsuite: Add arm_arch_v7a_ok effective-target to pr57351.c

2021-07-08 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 July 2021 13:24
> To: Christophe LYON 
> Cc: gcc Patches 
> Subject: Re: [PATCH] testsuite: Add arm_arch_v7a_ok effective-target to
> pr57351.c
> 
> ping?
> 
> On Wed, Jun 30, 2021 at 3:58 PM Christophe LYON via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> 
> > I've noticed that overriding cpu/arch flags when running the testsuite
> > can cause this test to fail rather than being skipped because of
> > incompatible flags combination.
> >
> > Since the test forces -march=armv7-a, make sure it is accepted in
> > combination with the current runtestflags.

Ok, I would have counted it as obvious I suppose.
Thanks,
Kyrill

> >
> > 2021-06-30  Christophe Lyon  
> >
> >  gcc/testsuite/
> >  * gcc.dg/debug/pr57351.c: Require arm_arch_v7a_ok
> >  effective-target.
> >
> >
> >
> >


Re: [PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:46 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This patch passes the reduc_info to get_initial_defs_for_reduction,
> so that the function can get general information from there rather
> than from the first SLP statement.  This isn't a win on its own,
> but it becomes important with later patches.

So the original code should have used SLP_TREE_REPRESENTATIVE
instead of SLP_TREE_SCALAR_STMTS ()[0] (there might have been
issues with doing that - my recollection is weak here).

I'm not sure if reduc_info is actually better - only the representative
will have STMT_VINFO_VECTYPE set, for the reduc_info
there's STMT_VINFO_REDUC_VECTYPE (and STMT_VINFO_REDUC_VECTYPE_IN).

So I think if you want to use reduc_info then you want to use
STMT_VINFO_REDUC_VECTYPE?

> gcc/
> * tree-vect-loop.c (get_initial_defs_for_reduction): Take the
> reduc_info as an additional parameter.
> (vect_transform_cycle_phi): Update accordingly.
> ---
>  gcc/tree-vect-loop.c | 23 ++-
>  1 file changed, 10 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index a31d7621c3b..565c2859477 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4764,32 +4764,28 @@ get_initial_def_for_reduction (loop_vec_info 
> loop_vinfo,
>return init_def;
>  }
>
> -/* Get at the initial defs for the reduction PHIs in SLP_NODE.
> -   NUMBER_OF_VECTORS is the number of vector defs to create.
> -   If NEUTRAL_OP is nonnull, introducing extra elements of that
> -   value will not change the result.  */
> +/* Get at the initial defs for the reduction PHIs for REDUC_INFO, whose
> +   associated SLP node is SLP_NODE.  NUMBER_OF_VECTORS is the number of 
> vector
> +   defs to create.  If NEUTRAL_OP is nonnull, introducing extra elements of
> +   that value will not change the result.  */
>
>  static void
>  get_initial_defs_for_reduction (vec_info *vinfo,
> +   stmt_vec_info reduc_info,
> slp_tree slp_node,
> vec *vec_oprnds,
> unsigned int number_of_vectors,
> bool reduc_chain, tree neutral_op)
>  {
>vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
> -  stmt_vec_info stmt_vinfo = stmts[0];
>unsigned HOST_WIDE_INT nunits;
>unsigned j, number_of_places_left_in_vector;
> -  tree vector_type;
> +  tree vector_type = STMT_VINFO_VECTYPE (reduc_info);
>unsigned int group_size = stmts.length ();
>unsigned int i;
>class loop *loop;
>
> -  vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
> -
> -  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
> -
> -  loop = (gimple_bb (stmt_vinfo->stmt))->loop_father;
> +  loop = (gimple_bb (reduc_info->stmt))->loop_father;
>gcc_assert (loop);
>edge pe = loop_preheader_edge (loop);
>
> @@ -4823,7 +4819,7 @@ get_initial_defs_for_reduction (vec_info *vinfo,
>  {
>tree op;
>i = j % group_size;
> -  stmt_vinfo = stmts[i];
> +  stmt_vec_info stmt_vinfo = stmts[i];
>
>/* Get the def before the loop.  In reduction chain we have only
>  one initial value.  Else we have as many as PHIs in the group.  */
> @@ -7510,7 +7506,8 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>   = neutral_op_for_slp_reduction (slp_node, vectype_out,
>   STMT_VINFO_REDUC_CODE 
> (reduc_info),
>   first != NULL);
> - get_initial_defs_for_reduction (loop_vinfo, 
> slp_node_instance->reduc_phis,
> + get_initial_defs_for_reduction (loop_vinfo, reduc_info,
> + slp_node_instance->reduc_phis,
>   &vec_initial_defs, vec_num,
>   first != NULL, neutral_op);
> }


Re: [PATCH 05/10] vect: Add a vect_phi_initial_value helper function

2021-07-08 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Jul 8, 2021 at 2:45 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> This patch adds a helper function called vect_phi_initial_value
>> for returning the incoming value of a given loop phi.  The main
>> reason for adding it is to ensure that the right preheader edge
>> is used when vectorising nested loops.  (PHI_ARG_DEF_FROM_EDGE
>> itself doesn't assert that the given edge is for the right block,
>> although I guess that would be good to add separately.)
>
> We were sometimes (most of the time?) using an explicit
> loop where you now get it from the PHI - that makes the
> assert somewhat pointless to some extent - of course it
> makes sense on its own that the loop is the same as that
> of the PHI def.  I just wonder if you think any of the existing
> code might have been wrong?  If so the new assert doesn't
> catch all originally wrong cases.

I don't remember seeing a case where the existing code got it wrong,
but I think one of the patches in the series did initially use the
wrong loop's preheader.

But yeah, the function and assert only help to avoid using
PHI_ARG_DEF_FROM_EDGE with the wrong edge.  If the problem was instead
passing the wrong phi then the patch doesn't help to catch that.

The edge mistake is more likely to be a silent failure though,
since the edge indices for both loops might happen to be the same
(but might not).

Thanks,
Richard

>
> Otherwise OK,
> Richard.
>
>> gcc/
>> * tree-vectorizer.h: Include tree-ssa-operands.h.
>> (vect_phi_initial_value): New function.
>> * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it.
>> (get_initial_defs_for_reduction, info_for_reduction): Likewise.
>> (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
>> (vect_transform_cycle_phi, vectorizable_induction): Likewise.
>> ---
>>  gcc/tree-vect-loop.c  | 29 +
>>  gcc/tree-vectorizer.h | 21 -
>>  2 files changed, 29 insertions(+), 21 deletions(-)
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 1bd9a6ea52c..a31d7621c3b 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -3288,8 +3288,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree 
>> vector_type,
>>  has only a single initial value, so that value is neutral for
>>  all statements.  */
>>if (reduc_chain)
>> -   return PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt,
>> - loop_preheader_edge (loop));
>> +   return vect_phi_initial_value (stmt_vinfo);
>>return NULL_TREE;
>>
>>  default:
>> @@ -4829,13 +4828,13 @@ get_initial_defs_for_reduction (vec_info *vinfo,
>>/* Get the def before the loop.  In reduction chain we have only
>>  one initial value.  Else we have as many as PHIs in the group.  */
>>if (reduc_chain)
>> -   op = j != 0 ? neutral_op : PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, 
>> pe);
>> +   op = j != 0 ? neutral_op : vect_phi_initial_value (stmt_vinfo);
>>else if (((vec_oprnds->length () + 1) * nunits
>> - number_of_places_left_in_vector >= group_size)
>>&& neutral_op)
>> op = neutral_op;
>>else
>> -   op = PHI_ARG_DEF_FROM_EDGE (stmt_vinfo->stmt, pe);
>> +   op = vect_phi_initial_value (stmt_vinfo);
>>
>>/* Create 'vect_ = {op0,op1,...,opn}'.  */
>>number_of_places_left_in_vector--;
>> @@ -4906,9 +4905,7 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info 
>> stmt_info)
>>  }
>>else if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
>>  {
>> -  edge pe = loop_preheader_edge (gimple_bb (phi)->loop_father);
>> -  stmt_vec_info info
>> - = vinfo->lookup_def (PHI_ARG_DEF_FROM_EDGE (phi, pe));
>> +  stmt_vec_info info = vinfo->lookup_def (vect_phi_initial_value (phi));
>>if (info && STMT_VINFO_DEF_TYPE (info) == vect_double_reduction_def)
>> stmt_info = info;
>>  }
>> @@ -5042,8 +5039,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
>> loop_vinfo,
>>  {
>>/* Get at the scalar def before the loop, that defines the initial 
>> value
>>  of the reduction variable.  */
>> -  initial_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt,
>> -  loop_preheader_edge (loop));
>> +  initial_def = vect_phi_initial_value (reduc_def_stmt);
>>/* Optimize: for induction condition reduction, if we can't use zero
>>   for induc_val, use initial_def.  */
>>if (STMT_VINFO_REDUC_TYPE (reduc_info) == 
>> INTEGER_INDUC_COND_REDUCTION)
>> @@ -5558,9 +5554,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
>> loop_vinfo,
>>  for MIN and MAX reduction, for example.  */
>>   if (!neutral_op)
>> {
>> - tree scalar_value
>> -   = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt,
>> -

Re: [PATCH 08/10] vect: Generalise neutral_op_for_slp_reduction

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:48 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This patch generalises the interface to neutral_op_for_slp_reduction
> so that it can be used for non-SLP reductions too.  This isn't much
> of a win on its own, but it helps later patches.

I guess that makes sense - OK.

Richard.

> gcc/
> * tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with...
> (neutral_op_for_reduction): ...this, providing a more general
> interface.
> (vect_create_epilog_for_reduction): Update accordingly.
> (vectorizable_reduction): Likewise.
> (vect_transform_cycle_phi): Likewise.
> ---
>  gcc/tree-vect-loop.c | 59 +++-
>  1 file changed, 26 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index a67036f92e0..744645d8bad 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -3248,23 +3248,15 @@ reduction_fn_for_scalar_code (enum tree_code code, 
> internal_fn *reduc_fn)
>  }
>  }
>
> -/* If there is a neutral value X such that SLP reduction NODE would not
> -   be affected by the introduction of additional X elements, return that X,
> -   otherwise return null.  CODE is the code of the reduction and VECTOR_TYPE
> -   is the vector type that would hold element X.  REDUC_CHAIN is true if
> -   the SLP statements perform a single reduction, false if each statement
> -   performs an independent reduction.  */
> +/* If there is a neutral value X such that a reduction would not be affected
> +   by the introduction of additional X elements, return that X, otherwise
> +   return null.  CODE is the code of the reduction and SCALAR_TYPE is type
> +   of the scalar elements.  If the reduction has just a single initial value
> +   then INITIAL_VALUE is that value, otherwise it is null.  */
>
>  static tree
> -neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type,
> - tree_code code, bool reduc_chain)
> +neutral_op_for_reduction (tree scalar_type, tree_code code, tree 
> initial_value)
>  {
> -  vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
> -  stmt_vec_info stmt_vinfo = stmts[0];
> -  tree scalar_type = TREE_TYPE (vector_type);
> -  class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father;
> -  gcc_assert (loop);
> -
>switch (code)
>  {
>  case WIDEN_SUM_EXPR:
> @@ -3284,12 +3276,7 @@ neutral_op_for_slp_reduction (slp_tree slp_node, tree 
> vector_type,
>
>  case MAX_EXPR:
>  case MIN_EXPR:
> -  /* For MIN/MAX the initial values are neutral.  A reduction chain
> -has only a single initial value, so that value is neutral for
> -all statements.  */
> -  if (reduc_chain)
> -   return vect_phi_initial_value (stmt_vinfo);
> -  return NULL_TREE;
> +  return initial_value;
>
>  default:
>return NULL_TREE;
> @@ -5535,10 +5522,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>tree neutral_op = NULL_TREE;
>if (slp_node)
> {
> - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (stmt_info);
> - neutral_op
> -   = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis,
> -   vectype, code, first != NULL);
> + tree initial_value = NULL_TREE;
> + if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
> +   initial_value = vect_phi_initial_value (orig_phis[0]);
> + neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code,
> +initial_value);
> }
>if (neutral_op)
> vector_identity = gimple_build_vector_from_val (&seq, vectype,
> @@ -6935,9 +6923,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>/* For SLP reductions, see if there is a neutral value we can use.  */
>tree neutral_op = NULL_TREE;
>if (slp_node)
> -neutral_op = neutral_op_for_slp_reduction
> -  (slp_node_instance->reduc_phis, vectype_out, orig_code,
> -   REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL);
> +{
> +  tree initial_value = NULL_TREE;
> +  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL)
> +   initial_value = vect_phi_initial_value (reduc_def_phi);
> +  neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype_out),
> +orig_code, initial_value);
> +}
>
>if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION)
>  {
> @@ -7501,15 +7493,16 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>else
> {
>   gcc_assert (slp_node == slp_node_instance->reduc_phis);
> - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
> - tree neutral_op
> - = neutral_op_for_slp_reduction (slp_node, vectype_out,
> - STMT_VINFO_REDUC_CODE 
> (reduc_info),
> -  

Re: [PATCH 09/10] vect: Simplify get_initial_def_for_reduction

2021-07-08 Thread Richard Biener via Gcc-patches
On Thu, Jul 8, 2021 at 2:49 PM Richard Sandiford via Gcc-patches
 wrote:
>
> After previous patches, we can now easily provide the neutral op
> as an argument to get_initial_def_for_reduction.  This in turn
> allows the adjustment calculation to be moved outside of
> get_initial_def_for_reduction, which is the main motivation
> of the patch.

OK.

> gcc/
> * tree-vect-loop.c (get_initial_def_for_reduction): Remove
> adjustment handling.  Take the neutral value as an argument,
> in place of the code argument.
> (vect_transform_cycle_phi): Update accordingly.  Handle the
> initial values of cond reductions separately from code reductions.
> Choose the adjustment here rather than in
> get_initial_def_for_reduction.  Sink the splat of vec_initial_def.
> ---
>  gcc/tree-vect-loop.c | 177 +++
>  1 file changed, 59 insertions(+), 118 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 744645d8bad..fe7e73f655f 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4614,57 +4614,26 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
> Input:
> REDUC_INFO - the info_for_reduction
> INIT_VAL - the initial value of the reduction variable
> +   NEUTRAL_OP - a value that has no effect on the reduction, as per
> +   neutral_op_for_reduction
>
> Output:
> -   ADJUSTMENT_DEF - a tree that holds a value to be added to the final result
> -of the reduction (used for adjusting the epilog - see below).
> Return a vector variable, initialized according to the operation that
> STMT_VINFO performs. This vector will be used as the initial value
> of the vector of partial results.
>
> -   Option1 (adjust in epilog): Initialize the vector as follows:
> - add/bit or/xor:[0,0,...,0,0]
> - mult/bit and:  [1,1,...,1,1]
> - min/max/cond_expr: [init_val,init_val,..,init_val,init_val]
> -   and when necessary (e.g. add/mult case) let the caller know
> -   that it needs to adjust the result by init_val.
> -
> -   Option2: Initialize the vector as follows:
> - add/bit or/xor:[init_val,0,0,...,0]
> - mult/bit and:  [init_val,1,1,...,1]
> - min/max/cond_expr: [init_val,init_val,...,init_val]
> -   and no adjustments are needed.
> -
> -   For example, for the following code:
> -
> -   s = init_val;
> -   for (i=0;i - s = s + a[i];
> -
> -   STMT_VINFO is 's = s + a[i]', and the reduction variable is 's'.
> -   For a vector of 4 units, we want to return either [0,0,0,init_val],
> -   or [0,0,0,0] and let the caller know that it needs to adjust
> -   the result at the end by 'init_val'.
> -
> -   FORNOW, we are using the 'adjust in epilog' scheme, because this way the
> -   initialization vector is simpler (same element in all entries), if
> -   ADJUSTMENT_DEF is not NULL, and Option2 otherwise.
> -
> -   A cost model should help decide between these two schemes.  */
> +   The value we need is a vector in which element 0 has value INIT_VAL
> +   and every other element has value NEUTRAL_OP.  */
>
>  static tree
>  get_initial_def_for_reduction (loop_vec_info loop_vinfo,
>stmt_vec_info reduc_info,
> -  enum tree_code code, tree init_val,
> -   tree *adjustment_def)
> +  tree init_val, tree neutral_op)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>tree scalar_type = TREE_TYPE (init_val);
>tree vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type);
> -  tree def_for_init;
>tree init_def;
> -  REAL_VALUE_TYPE real_init_val = dconst0;
> -  int int_init_val = 0;
>gimple_seq stmts = NULL;
>
>gcc_assert (vectype);
> @@ -4675,75 +4644,34 @@ get_initial_def_for_reduction (loop_vec_info 
> loop_vinfo,
>gcc_assert (nested_in_vect_loop_p (loop, reduc_info)
>   || loop == (gimple_bb (reduc_info->stmt))->loop_father);
>
> -  /* ADJUSTMENT_DEF is NULL when called from
> - vect_create_epilog_for_reduction to vectorize double reduction.  */
> -  if (adjustment_def)
> -*adjustment_def = NULL;
> -
> -  switch (code)
> +  if (operand_equal_p (init_val, neutral_op))
>  {
> -case WIDEN_SUM_EXPR:
> -case DOT_PROD_EXPR:
> -case SAD_EXPR:
> -case PLUS_EXPR:
> -case MINUS_EXPR:
> -case BIT_IOR_EXPR:
> -case BIT_XOR_EXPR:
> -case MULT_EXPR:
> -case BIT_AND_EXPR:
> -  {
> -if (code == MULT_EXPR)
> -  {
> -real_init_val = dconst1;
> -int_init_val = 1;
> -  }
> -
> -if (code == BIT_AND_EXPR)
> -  int_init_val = -1;
> -
> -if (SCALAR_FLOAT_TYPE_P (scalar_type))
> -  def_for_init = build_real (scalar_type, real_init_val);
> -else
> -  def_for_init = build_int_cst (scalar_type, int_init_val);
> -
> -   if (adjustment_

Re: [PATCH] ipa-sra: Fix thinko when overriding safe_to_import_accesses (PR 101066)

2021-07-08 Thread Jan Hubicka
Hi,
> 2021-06-16  Martin Jambor  
> 
>   PR ipa/101066
>   * ipa-sra.c (class isra_call_summary): New member
>   m_before_any_store, initialize it in the constructor.
>   (isra_call_summary::dump): Dump the new field.
>   (ipa_sra_call_summaries::duplicate): Copy it.
>   (process_scan_results): Set it.
>   (isra_write_edge_summary): Stream it.
>   (isra_read_edge_summary): Likewise.
>   (param_splitting_across_edge): Only override
>   safe_to_import_accesses if m_before_any_store is set.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-06-16  Martin Jambor  
> 
>   PR ipa/101066
>   * gcc.dg/ipa/pr101066.c: New test.
OK, thanks!

The analysis disabling transformation on any memory store is overly
conservative.  We have pointer (which is a parameter and comes from
outer world) and no type infomration, however alias oracle will still be
able to disambiguate when memory access is to non-escaping local memory
or mallocated memory block etc.

Honza


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-08 Thread Martin Jambor
Hi,

On Wed, Jul 07 2021, Qing Zhao via Gcc-patches wrote:
> Hi, 
>
> This is the 4th version of the patch for the new security feature for GCC.

I have been following the threads about this feature only very lightly,
so please accept my apologies if my comments are about something which
has been already discussed, but...

[...]

> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index c05d22f3e8f1..35051d7c6b96 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -384,6 +384,13 @@ static struct
>  
>/* Numbber of components created when splitting aggregate parameters.  */
>int param_reductions_created;
> +
> +  /* Number of deferred_init calls that are modified.  */
> +  int deferred_init;
> +
> +  /* Number of deferred_init calls that are created by
> + generate_subtree_deferred_init.  */
> +  int subtree_deferred_init;
>  } sra_stats;
>  
>  static void
> @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access *racc, 
> tree reg_type)
>return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>  }
>  
> +
> +/* Generate statements to call .DEFERRED_INIT to initialize scalar 
> replacements
> +   of accesses within a subtree ACCESS; all its children, siblings and their
> +   children are to be processed.
> +   GSI is a statement iterator used to place the new statements.  */
> +static void
> +generate_subtree_deferred_init (struct access *access,
> + tree init_type,
> + tree is_vla,
> + gimple_stmt_iterator *gsi,
> + location_t loc)
> +{
> +  do
> +{
> +  if (access->grp_to_be_replaced)
> + {
> +   tree repl = get_access_replacement (access);
> +   gimple *call
> + = gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
> +   TYPE_SIZE_UNIT (TREE_TYPE (repl)),
> +   init_type, is_vla);
> +   gimple_call_set_lhs (call, repl);
> +   gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +   update_stmt (call);
> +   gimple_set_location (call, loc);
> +   sra_stats.subtree_deferred_init++;
> + }
> +  else if (access->grp_to_be_debug_replaced)
> + {
> +   tree drepl = get_access_replacement (access);
> +   tree call = build_call_expr_internal_loc
> +  (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
> +   TREE_TYPE (drepl), 3,
> +   TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
> +   init_type, is_vla);
> +   gdebug *ds = gimple_build_debug_bind (drepl, call,
> + gsi_stmt (*gsi));
> +   gsi_insert_before (gsi, ds, GSI_SAME_STMT);

Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
why?  grp_to_be_debug_replaced accesses are there only to facilitate
debug information about a part of an aggregate decl is that is likely
going to be entirely removed - so that debuggers can sometimes show to
users information about what they would contain had they not removed.
It seems strange you need to mark them as uninitialized because they
should not have any consumers.  (But perhaps it is also harmless.)

On a related note, if the intent of the feature is for optimizers to
behave (almost?) as if it was not taking place, I believe you need to
handle specially, and probably just ignore, calls to IFN_DEFERRED_INIT
in scan_function in tree-sra.c.  Otherwise the generated SRA access
structures will have extra write flags turned on in them and that will
lead to different behavior of the pass.

Martin



> + }
> +  if (access->first_child)
> + generate_subtree_deferred_init (access->first_child, init_type,
> + is_vla, gsi, loc);
> +
> +  access = access ->next_sibling;
> +}
> +  while (access);
> +}
> +
> +/* For a call to .DEFERRED_INIT:
> +   var = .DEFERRED_INIT (size_of_var, init_type, is_vla);
> +   examine the LHS variable VAR and replace it with a scalar replacement if
> +   there is one, also replace the RHS call to a call to .DEFERRED_INIT of
> +   the corresponding scalar relacement variable.  Examine the subtree and
> +   do the scalar replacements in the subtree too.  STMT is the call, GSI is
> +   the statment iterator to place newly created statement.  */
> +
> +static enum assignment_mod_result
> +sra_modify_deferred_init (gimple *stmt, gimple_stmt_iterator *gsi)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree init_type = gimple_call_arg (stmt, 1);
> +  tree is_vla = gimple_call_arg (stmt, 2);
> +
> +  struct access *lhs_access = get_access_for_expr (lhs);
> +  if (!lhs_access)
> +return SRA_AM_NONE;
> +
> +  location_t loc = gimple_location (stmt);
> +
> +  if (lhs_access->grp_to_be_replaced)
> +{
> +  tree lhs_repl = get_access_replacement (lhs_access);
> +  gimple_call_set_lhs (stmt, lhs_repl);
> +  tree arg0_repl = TYPE_SIZE_UNIT (TREE_T

Re: [PATCH] c++: Fix noexcept with unevaluated operand [PR101087]

2021-07-08 Thread Jason Merrill via Gcc-patches

On 7/7/21 9:40 PM, Marek Polacek wrote:

It sounds plausible that this assert

   int f();
   static_assert(noexcept(sizeof(f(;

should pass: sizeof produces a std::size_t and its operand is not
evaluated, so it can't throw.  noexcept should only evaluate to
false for potentially evaluated operands.  Therefore I think that
check_noexcept_r shouldn't walk into operands of sizeof/decltype/
alignof/typeof.  Only checking cp_unevaluated_operand therein does
not work, because expr_noexcept_p can be called in an unevaluated
context, so I resorted to the following cp_evaluated hack.  Does
that seem acceptable?


I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR 
directly?



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/101087

gcc/cp/ChangeLog:

* except.c (check_noexcept_r): Don't walk into unevaluated
operands.
(expr_noexcept_p): Use cp_evaluated.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept70.C: New test.
---
  gcc/cp/except.c | 14 +++---
  gcc/testsuite/g++.dg/cpp0x/noexcept70.C |  5 +
  2 files changed, 16 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept70.C

diff --git a/gcc/cp/except.c b/gcc/cp/except.c
index a8cea53cf91..6f97ac40b4b 100644
--- a/gcc/cp/except.c
+++ b/gcc/cp/except.c
@@ -1033,12 +1033,15 @@ check_handlers (tree handlers)
   expression whose type is a polymorphic class type (10.3).  */
  
  static tree

-check_noexcept_r (tree *tp, int * /*walk_subtrees*/, void * /*data*/)
+check_noexcept_r (tree *tp, int *walk_subtrees, void *)
  {
tree t = *tp;
enum tree_code code = TREE_CODE (t);
-  if ((code == CALL_EXPR && CALL_EXPR_FN (t))
-  || code == AGGR_INIT_EXPR)
+
+  if (cp_unevaluated_operand)
+*walk_subtrees = false;
+  else if ((code == CALL_EXPR && CALL_EXPR_FN (t))
+  || code == AGGR_INIT_EXPR)
  {
/* We can only use the exception specification of the called function
 for determining the value of a noexcept expression; we can't use
@@ -1155,6 +1158,11 @@ expr_noexcept_p (tree expr, tsubst_flags_t complain)
if (expr == error_mark_node)
  return false;
  
+  /* Even though the operand of noexcept is an _unevaluated_ operand,

+ temporarily clearing cp_unevaluated_operand allows us to check it
+ in check_noexcept_r, to handle noexcept(sizeof(f())).  It could be
+ set when we are called in the context of synthesized_method_walk.  */
+  cp_evaluated ev;
fn = cp_walk_tree_without_duplicates (&expr, check_noexcept_r, 0);
if (fn)
  {
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept70.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C
new file mode 100644
index 000..45a6137dd6f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C
@@ -0,0 +1,5 @@
+// PR c++/101087
+// { dg-do compile { target c++11 } }
+
+int f();
+static_assert(noexcept(sizeof(f())), "");

base-commit: a110855667782dac7b674d3e328b253b3b3c919b





Re: [PATCH] c++: Fix noexcept with unevaluated operand [PR101087]

2021-07-08 Thread Marek Polacek via Gcc-patches
On Thu, Jul 08, 2021 at 09:30:27AM -0400, Jason Merrill wrote:
> On 7/7/21 9:40 PM, Marek Polacek wrote:
> > It sounds plausible that this assert
> > 
> >int f();
> >static_assert(noexcept(sizeof(f(;
> > 
> > should pass: sizeof produces a std::size_t and its operand is not
> > evaluated, so it can't throw.  noexcept should only evaluate to
> > false for potentially evaluated operands.  Therefore I think that
> > check_noexcept_r shouldn't walk into operands of sizeof/decltype/
> > alignof/typeof.  Only checking cp_unevaluated_operand therein does
> > not work, because expr_noexcept_p can be called in an unevaluated
> > context, so I resorted to the following cp_evaluated hack.  Does
> > that seem acceptable?
> 
> I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR
> directly?

I thought I would, but then it occurred to me that it might be better to
rely on cp_walk_subtrees which ++/--s cp_unevaluated_operand for those
codes.  I'd be happy to change the patch to check those codes directly;
maybe I'm overthinking things here.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)

2021-07-08 Thread Christophe Lyon via Gcc-patches
On Thu, 8 Jul 2021 at 12:42, Andreas Schwab  wrote:
>
> On Jul 07 2021, Marek Polacek via Gcc-patches wrote:
>
> > On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches 
> > wrote:
> >> I certainly will.  Pushed in r12-2132.
> >
> > I think this patch breaks bootstrap on x86_64:
>
> It also breaks bootstrap on aarch64 and ia64 in stage2.
>
> In file included from ../../gcc/c-family/c-common.h:26,
>  from ../../gcc/cp/cp-tree.h:40,
>  from ../../gcc/cp/module.cc:209:
> In function 'tree_node* identifier(const cpp_hashnode*)',
> inlined from 'bool module_state::read_macro_maps()' at 
> ../../gcc/cp/module.cc:16305:10:
> ../../gcc/tree.h:1089:58: error: array subscript -1 is outside array bounds 
> of 'cpp_hashnode [288230376151711743]' [-Werror=array-bounds]
>  1089 |   ((tree) ((char *) (NODE) - sizeof (struct tree_common)))
>   |  ^
> ../../gcc/cp/module.cc:277:10: note: in expansion of macro 
> 'HT_IDENT_TO_GCC_IDENT'
>   277 |   return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast 
> (node)));
>   |  ^
> In file included from ../../gcc/tree.h:23,
>  from ../../gcc/c-family/c-common.h:26,
>  from ../../gcc/cp/cp-tree.h:40,
>  from ../../gcc/cp/module.cc:209:
> ../../gcc/tree-core.h: In member function 'bool 
> module_state::read_macro_maps()':
> ../../gcc/tree-core.h:1445:24: note: at offset -24 into object 
> 'tree_identifier::id' of size 16
>  1445 |   struct ht_identifier id;
>   |^~
>
> Andreas.
>

on arm-linux-gnueabi, it breaks in:
libatomic/config/linux/arm/host-config.h:42:34: error: array subscript
0 is outside array bounds of 'unsigned int[0]' [-Werror=array-bounds]

Christophe

> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


[Ada] Simplify string manipulation related to preprocessing

2021-07-08 Thread Pierre-Marie de Rodat
Code cleanup; semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sinput-l.adb (Load_File): Simplify foreword manipulation with
concatenation; similar for filename with preprocessed output.diff --git a/gcc/ada/sinput-l.adb b/gcc/ada/sinput-l.adb
--- a/gcc/ada/sinput-l.adb
+++ b/gcc/ada/sinput-l.adb
@@ -551,19 +551,10 @@ package body Sinput.L is
 Set_Source_File_Index_Table (X);
 
 if Opt.List_Preprocessing_Symbols then
-   Get_Name_String (N);
-
declare
-  Foreword : String (1 .. Foreword_Start'Length +
-  Name_Len + Foreword_End'Length);
-
+  Foreword : constant String :=
+Foreword_Start & Get_Name_String (N) & Foreword_End;
begin
-  Foreword (1 .. Foreword_Start'Length) := Foreword_Start;
-  Foreword (Foreword_Start'Length + 1 ..
-  Foreword_Start'Length + Name_Len) :=
-Name_Buffer (1 .. Name_Len);
-  Foreword (Foreword'Last - Foreword_End'Length + 1 ..
-  Foreword'Last) := Foreword_End;
   Prep.List_Symbols (Foreword);
end;
 end if;
@@ -654,14 +645,13 @@ package body Sinput.L is
 NB : Integer;
 Status : Boolean;
 
- begin
-Get_Name_String (N);
-Add_Str_To_Name_Buffer (Prep_Suffix);
+Prep_Filename : constant String :=
+  Get_Name_String (N) & Prep_Suffix;
 
-Delete_File (Name_Buffer (1 .. Name_Len), Status);
+ begin
+Delete_File (Prep_Filename, Status);
 
-FD :=
-  Create_New_File (Name_Buffer (1 .. Name_Len), Text);
+FD := Create_New_File (Prep_Filename, Text);
 
 Status := FD /= Invalid_FD;
 




[Ada] Avoid linear search when ensuring dependency on System

2021-07-08 Thread Pierre-Marie de Rodat
Replace a linear search with a hash table query.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-writ.adb (Ensure_System_Dependency): Replace search in
Lib.Units with a search in Lib.Unit_Names.diff --git a/gcc/ada/lib-writ.adb b/gcc/ada/lib-writ.adb
--- a/gcc/ada/lib-writ.adb
+++ b/gcc/ada/lib-writ.adb
@@ -137,7 +137,8 @@ package body Lib.Writ is
--
 
procedure Ensure_System_Dependency is
-  System_Uname : Unit_Name_Type;
+  System_Uname : constant Unit_Name_Type :=
+Name_To_Unit_Name (Name_System);
   --  Unit name for system spec if needed for dummy entry
 
   System_Fname : File_Name_Type;
@@ -146,11 +147,9 @@ package body Lib.Writ is
begin
   --  Nothing to do if we already compiled System
 
-  for Unum in Units.First .. Last_Unit loop
- if Source_Index (Unum) = System_Source_File_Index then
-return;
- end if;
-  end loop;
+  if Unit_Names.Get (System_Uname) /= No_Unit then
+ return;
+  end if;
 
   --  If no entry for system.ads in the units table, then add a entry
   --  to the units table for system.ads, which will be referenced when
@@ -158,7 +157,6 @@ package body Lib.Writ is
   --  on system as a result of Targparm scanning the system.ads file to
   --  determine the target dependent parameters for the compilation.
 
-  System_Uname := Name_To_Unit_Name (Name_System);
   System_Fname := File_Name (System_Source_File_Index);
 
   Units.Increment_Last;




[Ada] Make tools compatible with No_Dynamic_Accessibility_Checks

2021-07-08 Thread Pierre-Marie de Rodat
To help experiment with this new model.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* make.adb, osint.adb: Make code compatible with
No_Dynamic_Accessibility_Checks restriction.diff --git a/gcc/ada/make.adb b/gcc/ada/make.adb
--- a/gcc/ada/make.adb
+++ b/gcc/ada/make.adb
@@ -2364,7 +2364,7 @@ package body Make is
 Osint.Full_Source_Name
   (Source.File,
Full_File => Full_Source_File,
-   Attr  => Source_File_Attr'Access);
+   Attr  => Source_File_Attr'Unchecked_Access);
 
 Lib_File := Osint.Lib_File_Name (Source.File, Source.Index);
 
@@ -2392,7 +2392,7 @@ package body Make is
   Get_Name_String (Full_Lib_File);
   Name_Buffer (Name_Len + 1) := ASCII.NUL;
   Read_Only := not Is_Writable_File
-(Name_Buffer'Address, Lib_File_Attr'Access);
+(Name_Buffer'Address, Lib_File_Attr'Unchecked_Access);
else
   Read_Only := False;
end if;
@@ -2460,7 +2460,7 @@ package body Make is
  The_Args   => Args,
  Lib_File   => Lib_File,
  Full_Lib_File  => Full_Lib_File,
- Lib_File_Attr  => Lib_File_Attr'Access,
+ Lib_File_Attr  => Lib_File_Attr'Unchecked_Access,
  Read_Only  => Read_Only,
  ALI=> ALI,
  O_File => Obj_File,
@@ -2630,7 +2630,8 @@ package body Make is
 
   Text :=
 Read_Library_Info_From_Full
-  (Data.Full_Lib_File, Data.Lib_File_Attr'Access);
+  (Data.Full_Lib_File,
+   Data.Lib_File_Attr'Unchecked_Access);
 
   --  Restore Check_Object_Consistency to its initial value
 


diff --git a/gcc/ada/osint.adb b/gcc/ada/osint.adb
--- a/gcc/ada/osint.adb
+++ b/gcc/ada/osint.adb
@@ -1915,7 +1915,8 @@ package body Osint is
   begin
  if Opt.Look_In_Primary_Dir then
 Locate_File
-  (N, Source, Primary_Directory, File_Name, File, Attr'Access);
+  (N, Source, Primary_Directory, File_Name, File,
+   Attr'Unchecked_Access);
 
 if File /= No_File and then T = File_Stamp (N) then
return File;
@@ -1925,7 +1926,7 @@ package body Osint is
  Last_Dir := Src_Search_Directories.Last;
 
  for D in Primary_Directory + 1 .. Last_Dir loop
-Locate_File (N, Source, D, File_Name, File, Attr'Access);
+Locate_File (N, Source, D, File_Name, File, Attr'Unchecked_Access);
 
 if File /= No_File and then T = File_Stamp (File) then
return File;




[Ada] Revert meaning of -gnatd_b

2021-07-08 Thread Pierre-Marie de Rodat
As part of experimenting with No_Dynamic_Accessibility_Checks, it seems
that reverting the meaning of -gnatd_b is a better default for this
experiment.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* debug.adb, sem_util.adb: Revert meaning of -gnatd_b.
* sem_res.adb: Minor reformatting.diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -140,7 +140,7 @@ package body Debug is
--  d.Z  Do not enable expansion in configurable run-time mode
 
--  d_a  Stop elaboration checks on accept or select statement
-   --  d_b  Use compatibility model under No_Dynamic_Accessibility_Checks
+   --  d_b  Use designated type model under No_Dynamic_Accessibility_Checks
--  d_c  CUDA compilation : compile for the host
--  d_d
--  d_e  Ignore entry calls and requeue statements for elaboration
@@ -956,6 +956,10 @@ package body Debug is
--   behavior is similar to that of No_Entry_Calls_In_Elaboration_Code,
--   but does not penalize actual entry calls in elaboration code.
 
+   --  d_b  When the restriction No_Dynamic_Accessibility_Checks is enabled,
+   --   use the simple "designated type" accessibility model, instead of
+   --   using the implicit level of the anonymous access type declaration.
+
--  d_e  The compiler ignores simple entry calls, asynchronous transfer of
--   control, conditional entry calls, timed entry calls, and requeue
--   statements in both the static and dynamic elaboration models.


diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -13738,8 +13738,7 @@ package body Sem_Res is
 Deepest_Type_Access_Level (Target_Type)
   and then (Nkind (Associated_Node_For_Itype (Opnd_Type)) /=
  N_Function_Specification
-or else Ekind (Target_Type) in
-  Anonymous_Access_Kind)
+or else Ekind (Target_Type) in Anonymous_Access_Kind)
 
   --  Check we are not in a return value ???
 


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -410,17 +410,18 @@ package body Sem_Util is
   and then No_Dynamic_Accessibility_Checks_Enabled (N)
   and then Is_Anonymous_Access_Type (Etype (N))
 then
-   --  In the alternative model the level is that of the subprogram
+   --  In the alternative model the level is that of the
+   --  designated type.
 
if Debug_Flag_Underscore_B then
+  return Make_Level_Literal (Typ_Access_Level (Etype (N)));
+
+   --  Otherwise the level is that of the subprogram
+
+   else
   return Make_Level_Literal
(Subprogram_Access_Level (Current_Subprogram));
end if;
-
-   --  Otherwise the level is that of the designated type
-
-   return Make_Level_Literal
-(Typ_Access_Level (Etype (N)));
 end if;
 
 if Nkind (N) = N_Function_Call then
@@ -659,24 +660,22 @@ package body Sem_Util is
if Allow_Alt_Model
  and then No_Dynamic_Accessibility_Checks_Enabled (E)
then
-  --  In the alternative model the level depends on the
-  --  entity's context.
+  --  In the alternative model the level is that of the
+  --  designated type entity's context.
 
   if Debug_Flag_Underscore_B then
- if Is_Formal (E) then
-return Make_Level_Literal
- (Subprogram_Access_Level
-   (Enclosing_Subprogram (E)));
- end if;
+ return Make_Level_Literal (Typ_Access_Level (Etype (E)));
+
+  --  Otherwise the level depends on the entity's context
 
+  elsif Is_Formal (E) then
+ return Make_Level_Literal
+  (Subprogram_Access_Level
+(Enclosing_Subprogram (E)));
+  else
  return Make_Level_Literal
   (Scope_Depth (Enclosing_Dynamic_Scope (E)));
   end if;
-
-  --  Otherwise the level is that of the designated type
-
-  return Make_Level_Literal
-   (Typ_Access_Level (Etype (E)));
end if;
 
--  Return the dynamic level in the normal case
@@ -701,10 +700,11 @@ package body Sem_Util is
 
 elsif Is_Type (E) then
--  When restriction No_Dynamic_Accessibility_Checks is active
+   --  along with -gnatd_b.
 

[Ada] Incorrect iteration over hashed containers after multiple Inserts

2021-07-08 Thread Pierre-Marie de Rodat
Cursors for Hashed maps and hashed sets include a component that speeds
up iteration over these containers. However, in the presence of multiple
insertions into the corresponding hash-tables, this component may become
unreliable when a cursor obtained before an iteration is compared with a
cursor denoting the same element but obtained during a loop over the
container. To prevent these anomalies, we introduce an explicit equality
operator for the corresponding Cursor types, which ignores the
additional component. This patch assumes that the mention of
"predefined" equality in the sections of the RM that discuss these
cursors is in fact an over specification.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-cohama.ads: Introduce an equality operator over
cursors.
* libgnat/a-cohase.ads: Ditto.
* libgnat/a-cohama.adb: Add body for "=" over cursors.
(Insert): Do not set the Position component of the cursor that
denotes the inserted element.
* libgnat/a-cohase.adb: Ditto.diff --git a/gcc/ada/libgnat/a-cohama.adb b/gcc/ada/libgnat/a-cohama.adb
--- a/gcc/ada/libgnat/a-cohama.adb
+++ b/gcc/ada/libgnat/a-cohama.adb
@@ -116,6 +116,13 @@ is
-- "=" --
-
 
+   function "=" (Left, Right : Cursor) return Boolean is
+   begin
+  return
+   Left.Container = Right.Container
+ and then Left.Node = Right.Node;
+   end "=";
+
function "=" (Left, Right : Map) return Boolean is
begin
   return Is_Equal (Left.HT, Right.HT);
@@ -636,7 +643,11 @@ is
   end if;
 
   Position.Container := Container'Unrestricted_Access;
-  Position.Position := HT_Ops.Index (HT, Position.Node);
+
+  --  Note that we do not set the Position component of the cursor,
+  --  because it may become incorrect on subsequent insertions/deletions
+  --  from the container. This will lose some optimizations but prevents
+  --  anomalies when the underlying hash-table is expanded or shrunk.
end Insert;
 
procedure Insert
@@ -679,7 +690,6 @@ is
   end if;
 
   Position.Container := Container'Unrestricted_Access;
-  Position.Position := HT_Ops.Index (HT, Position.Node);
end Insert;
 
procedure Insert


diff --git a/gcc/ada/libgnat/a-cohama.ads b/gcc/ada/libgnat/a-cohama.ads
--- a/gcc/ada/libgnat/a-cohama.ads
+++ b/gcc/ada/libgnat/a-cohama.ads
@@ -110,6 +110,14 @@ is
type Cursor is private;
pragma Preelaborable_Initialization (Cursor);
 
+   function "=" (Left, Right : Cursor) return Boolean;
+   --  The representation of cursors includes a component used to optimize
+   --  iteration over maps. This component may become unreliable after
+   --  multiple map insertions, and must be excluded from cursor equality,
+   --  so we need to provide an explicit definition for it, instead of
+   --  using predefined equality (as implied by a questionable comment
+   --  in the RM).
+
Empty_Map : constant Map;
--  Map objects declared without an initialization expression are
--  initialized to the value Empty_Map.


diff --git a/gcc/ada/libgnat/a-cohase.adb b/gcc/ada/libgnat/a-cohase.adb
--- a/gcc/ada/libgnat/a-cohase.adb
+++ b/gcc/ada/libgnat/a-cohase.adb
@@ -145,6 +145,13 @@ is
-- "=" --
-
 
+   function "=" (Left, Right : Cursor) return Boolean is
+   begin
+  return
+   Left.Container = Right.Container
+ and then Left.Node = Right.Node;
+   end "=";
+
function "=" (Left, Right : Set) return Boolean is
begin
   return Is_Equal (Left.HT, Right.HT);
@@ -763,11 +770,14 @@ is
   Position  : out Cursor;
   Inserted  : out Boolean)
is
-  HT : Hash_Table_Type renames Container'Unrestricted_Access.HT;
begin
   Insert (Container.HT, New_Item, Position.Node, Inserted);
   Position.Container := Container'Unchecked_Access;
-  Position.Position := HT_Ops.Index (HT, Position.Node);
+
+  --  Note that we do not set the Position component of the cursor,
+  --  because it may become incorrect on subsequent insertions/deletions
+  --  from the container. This will lose some optimizations but prevents
+  --  anomalies when the underlying hash-table is expanded or shrunk.
end Insert;
 
procedure Insert


diff --git a/gcc/ada/libgnat/a-cohase.ads b/gcc/ada/libgnat/a-cohase.ads
--- a/gcc/ada/libgnat/a-cohase.ads
+++ b/gcc/ada/libgnat/a-cohase.ads
@@ -69,6 +69,15 @@ is
type Cursor is private;
pragma Preelaborable_Initialization (Cursor);
 
+   function "=" (Left, Right : Cursor) return Boolean;
+   --  The representation of cursors includes a component used to optimize
+   --  iteration over sets. This component may become unreliable after
+   --  multiple set insertions, and must be excluded from cursor equality,
+   --  so we need to provide an explicit definition for it, instead of
+   --  using predefined equality (as implied by a questionable comment
+   --  in the RM). This is also the c

[Ada] Add No_Tasking restriction is system.ads for bootstrap

2021-07-08 Thread Pierre-Marie de Rodat
Make it explicit that tasking is not used in the compiler, which also
allows generating simpler and more efficient code.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gcc-interface/system.ads: Add No_Tasking restriction.diff --git a/gcc/ada/gcc-interface/system.ads b/gcc/ada/gcc-interface/system.ads
--- a/gcc/ada/gcc-interface/system.ads
+++ b/gcc/ada/gcc-interface/system.ads
@@ -50,6 +50,10 @@ pragma Restrictions (No_Finalization);
 --  access type on incomplete type Perm_Tree_Wrapper (which is required for
 --  defining a recursive type).
 
+pragma Restrictions (No_Tasking);
+--  Make it explicit that tasking is not used in the compiler, which also
+--  allows generating simpler and more efficient code.
+
 package System is
pragma Pure;
--  Note that we take advantage of the implementation permission to make




[Ada] Unsynchronized concurrent access to a Boolean variable

2021-07-08 Thread Pierre-Marie de Rodat
If an exception declaration occurs in a nonstatic scope (for example,
within the body of a task type),
System.Exception_Table.Register_Exception is to be called the first (and
*only* the first) time the declaration is elaborated.  A library-level
"this exception has been registered" Boolean flag was being used to
accomplish this, but this solution introduces potential problems with
concurrency. So instead of Boolean, use the type
System.Atomic_Operations.Test_And_Set.Test_And_Set_Flag if this option
is available and concurrent access via tasking is a possibility;
otherwise, stick with the old Boolean-based approach.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* rtsfind.ads, rtsfind.adb: Add support for finding the packages
System.Atomic_Operations and
System.Atomic_Operations.Test_And_Set and the declarations
within that latter package of the type Test_And_Set_Flag and the
function Atomic_Test_And_Set.
* exp_ch11.adb (Expand_N_Exception_Declaration): If an exception
is declared other than at library level, then we need to call
Register_Exception the first time (and only the first time) the
declaration is elaborated.  In order to decide whether to
perform this call for a given elaboration of the declaration, we
used to unconditionally use a (library-level) Boolean variable.
Now we instead use a variable of type
System.Atomic_Operations.Test_And_Set.Test_And_Set_Flag unless
either that type is unavailable or a No_Tasking restriction is
in effect (in which case we use a Boolean variable as before).diff --git a/gcc/ada/exp_ch11.adb b/gcc/ada/exp_ch11.adb
--- a/gcc/ada/exp_ch11.adb
+++ b/gcc/ada/exp_ch11.adb
@@ -1088,10 +1088,19 @@ package body Exp_Ch11 is
 
--  (protecting test only needed if not at library level)
 
-   -- exceptF : Boolean := True --  static data
+   -- exceptF : aliased System.Atomic_Operations.Test_And_Set.
+   -- .Test_And_Set_Flag := 0; --  static data
+   -- if not Atomic_Test_And_Set (exceptF) then
+   --Register_Exception (except'Unrestricted_Access);
+   -- end if;
+
+   --  If a No_Tasking restriction is in effect, or if Test_And_Set_Flag
+   --  is unavailable, then use Boolean instead. In that case, we generate:
+   --
+   -- exceptF : Boolean := True; --  static data
-- if exceptF then
-   --exceptF := False;
-   --Register_Exception (except'Unchecked_Access);
+   --ExceptF := False;
+   --Register_Exception (except'Unrestricted_Access);
-- end if;
 
procedure Expand_N_Exception_Declaration (N : Node_Id) is
@@ -1275,7 +1284,7 @@ package body Exp_Ch11 is
 
   Force_Static_Allocation_Of_Referenced_Objects (Expression (N));
 
-  --  Register_Exception (except'Unchecked_Access);
+  --  Register_Exception (except'Unrestricted_Access);
 
   if not No_Exception_Handlers_Set
 and then not Restriction_Active (No_Exception_Registration)
@@ -1296,27 +1305,59 @@ package body Exp_Ch11 is
 Flag_Id :=
   Make_Defining_Identifier (Loc,
 Chars => New_External_Name (Chars (Id), 'F'));
-
-Insert_Action (N,
-  Make_Object_Declaration (Loc,
-Defining_Identifier => Flag_Id,
-Object_Definition   =>
-  New_Occurrence_Of (Standard_Boolean, Loc),
-Expression  =>
-  New_Occurrence_Of (Standard_True, Loc)));
-
 Set_Is_Statically_Allocated (Flag_Id);
 
-Append_To (L,
-  Make_Assignment_Statement (Loc,
-Name   => New_Occurrence_Of (Flag_Id, Loc),
-Expression => New_Occurrence_Of (Standard_False, Loc)));
+declare
+   Use_Test_And_Set_Flag : constant Boolean :=
+ (not Global_No_Tasking)
+ and then RTE_Available (RE_Test_And_Set_Flag);
+
+   Flag_Decl : Node_Id;
+   Condition : Node_Id;
+begin
+   if Use_Test_And_Set_Flag then
+  Flag_Decl :=
+Make_Object_Declaration (Loc,
+  Defining_Identifier => Flag_Id,
+  Aliased_Present => True,
+  Object_Definition   =>
+New_Occurrence_Of (RTE (RE_Test_And_Set_Flag), Loc),
+  Expression  =>
+Make_Integer_Literal (Loc, 0));
+   else
+  Flag_Decl :=
+Make_Object_Declaration (Loc,
+  Defining_Identifier => Flag_Id,
+  Object_Definition   =>
+New_Occurrence_Of (Standard_Boolean, Loc),
+  Expression  =>
+New_Occurrence_Of (Standard_True, Loc));
+

[Ada] Compute sizes when possible for packed array with Component_Size

2021-07-08 Thread Pierre-Marie de Rodat
For a packed constrained array type with a Component_Size clause, it
may be possible to compute both its RM_Size and Esize. Do this as it
benefits GNATprove for checking validity of overlays.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* layout.adb (Layout_Type): Special case when RM_Size and Esize
can be computed for packed arrays.diff --git a/gcc/ada/layout.adb b/gcc/ada/layout.adb
--- a/gcc/ada/layout.adb
+++ b/gcc/ada/layout.adb
@@ -487,6 +487,48 @@ package body Layout is
  then
 Set_Alignment (E, Alignment (Component_Type (E)));
  end if;
+
+ --  If packing was requested, the one-dimensional array is constrained
+ --  with static bounds, the component size was set explicitly, and
+ --  the alignment is known, we can set (if not set explicitly) the
+ --  RM_Size and the Esize of the array type, as RM_Size is equal to
+ --  (arr'length * arr'component_size) and Esize is the same value
+ --  rounded to the next multiple of arr'alignment. This is not
+ --  applicable to packed arrays that are implemented specially
+ --  in GNAT, i.e. when Packed_Array_Impl_Type is set.
+
+ if Is_Array_Type (E)
+   and then Number_Dimensions (E) = 1
+   and then not Present (Packed_Array_Impl_Type (E))
+   and then Has_Pragma_Pack (E)
+   and then Is_Constrained (E)
+   and then Compile_Time_Known_Bounds (E)
+   and then Known_Component_Size (E)
+   and then Known_Alignment (E)
+ then
+declare
+   Abits : constant Int := UI_To_Int (Alignment (E)) * SSU;
+   Lo, Hi : Node_Id;
+   Siz : Uint;
+
+begin
+   Get_Index_Bounds (First_Index (E), Lo, Hi);
+   Siz := (Expr_Value (Hi) - Expr_Value (Lo) + 1)
+ * Component_Size (E);
+
+   --  Do not overwrite a different value of 'Size specified
+   --  explicitly by the user. In that case, also do not set Esize.
+
+   if Unknown_RM_Size (E) or else RM_Size (E) = Siz then
+  Set_RM_Size (E, Siz);
+
+  if Unknown_Esize (E) then
+ Siz := ((Siz + (Abits - 1)) / Abits) * Abits;
+ Set_Esize (E, Siz);
+  end if;
+   end if;
+end;
+ end if;
   end if;
 
   --  Even if the backend performs the layout, we still do a little in




[Ada] Make runtime code compatible with No_Dynamic_Accessibility_Checks

2021-07-08 Thread Pierre-Marie de Rodat
To help experiment with this new model.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-cbdlli.adb, libgnat/a-cbhama.adb,
libgnat/a-cbhase.adb, libgnat/a-cbmutr.adb,
libgnat/a-cborma.adb, libgnat/a-cborse.adb,
libgnat/a-cobove.adb, libgnat/a-textio.adb,
libgnat/a-witeio.adb, libgnat/a-ztexio.adb: Make code compatible
with No_Dynamic_Accessibility_Checks restriction.diff --git a/gcc/ada/libgnat/a-cbdlli.adb b/gcc/ada/libgnat/a-cbdlli.adb
--- a/gcc/ada/libgnat/a-cbdlli.adb
+++ b/gcc/ada/libgnat/a-cbdlli.adb
@@ -312,7 +312,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -1608,7 +1608,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);


diff --git a/gcc/ada/libgnat/a-cbhama.adb b/gcc/ada/libgnat/a-cbhama.adb
--- a/gcc/ada/libgnat/a-cbhama.adb
+++ b/gcc/ada/libgnat/a-cbhama.adb
@@ -213,7 +213,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -239,7 +239,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -1028,7 +1028,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -1053,7 +1053,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);


diff --git a/gcc/ada/libgnat/a-cbhase.adb b/gcc/ada/libgnat/a-cbhase.adb
--- a/gcc/ada/libgnat/a-cbhase.adb
+++ b/gcc/ada/libgnat/a-cbhase.adb
@@ -232,7 +232,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -1643,7 +1643,7 @@ is
   Container.TC'Unrestricted_Access;
  begin
 return R : constant Constant_Reference_Type :=
-  (Element => N.Element'Access,
+  (Element => N.Element'Unchecked_Access,
Control => (Controlled with TC))
 do
Busy (TC.all);


diff --git a/gcc/ada/libgnat/a-cbmutr.adb b/gcc/ada/libgnat/a-cbmutr.adb
--- a/gcc/ada/libgnat/a-cbmutr.adb
+++ b/gcc/ada/libgnat/a-cbmutr.adb
@@ -600,7 +600,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => Container.Elements (Position.Node)'Access,
+   (Element => Container.Elements (Position.Node)'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -2533,7 +2533,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Reference_Type :=
-   (Element => Container.Elements (Position.Node)'Access,
+   (Element => Container.Elements (Position.Node)'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);


diff --git a/gcc/ada/libgnat/a-cborma.adb b/gcc/ada/libgnat/a-cborma.adb
--- a/gcc/ada/libgnat/a-cborma.adb
+++ b/gcc/ada/libgnat/a-cborma.adb
@@ -420,7 +420,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'Unchecked_Access,
 Control => (Controlled with TC))
  do
 Busy (TC.all);
@@ -445,7 +445,7 @@ is
Container.TC'Unrestricted_Access;
   begin
  return R : constant Constant_Reference_Type :=
-   (Element => N.Element'Access,
+   (Element => N.Element'U

[Ada] Fix on computation of packed array size in case of error

2021-07-08 Thread Pierre-Marie de Rodat
In case of compilation error, the low and high bounds of the array
type might have been replaced by an error node. Deal with this case
by checking that the bounds are known at compile time.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* layout.adb (Layout_Type): Add guard before calling Expr_Value.diff --git a/gcc/ada/layout.adb b/gcc/ada/layout.adb
--- a/gcc/ada/layout.adb
+++ b/gcc/ada/layout.adb
@@ -513,18 +513,28 @@ package body Layout is
 
 begin
Get_Index_Bounds (First_Index (E), Lo, Hi);
-   Siz := (Expr_Value (Hi) - Expr_Value (Lo) + 1)
- * Component_Size (E);
 
-   --  Do not overwrite a different value of 'Size specified
-   --  explicitly by the user. In that case, also do not set Esize.
+   --  Even if the bounds are known at compile time, they could
+   --  have been replaced by an error node. Check each bound
+   --  explicitly.
 
-   if Unknown_RM_Size (E) or else RM_Size (E) = Siz then
-  Set_RM_Size (E, Siz);
+   if Compile_Time_Known_Value (Lo)
+ and then Compile_Time_Known_Value (Hi)
+   then
+  Siz := (Expr_Value (Hi) - Expr_Value (Lo) + 1)
+* Component_Size (E);
+
+  --  Do not overwrite a different value of 'Size specified
+  --  explicitly by the user. In that case, also do not set
+  --  Esize.
 
-  if Unknown_Esize (E) then
- Siz := ((Siz + (Abits - 1)) / Abits) * Abits;
- Set_Esize (E, Siz);
+  if Unknown_RM_Size (E) or else RM_Size (E) = Siz then
+ Set_RM_Size (E, Siz);
+
+ if Unknown_Esize (E) then
+Siz := ((Siz + (Abits - 1)) / Abits) * Abits;
+Set_Esize (E, Siz);
+ end if;
   end if;
end if;
 end;




[Ada] Prevent crash on inspection point for unfrozen entity

2021-07-08 Thread Pierre-Marie de Rodat
Before this patch, the following program would make GNAT crash:

procedure P is
Unused_Var : Integer with Shared => False;
pragma Inspection_Point;
begin
null;
end tmp;

This was because the Shared aspect resulted in a freeze node being
inserted after the Inspection_Point pragma. This made Gigi delay the
translation of the declaration of Unused_Var to the freeze node.
This delaying resulted in a reference to an undeclared entity when
trying to translate Inspection_Point from gnat to gnu.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_prag.adb (Expand_Pragma_Inspection_Point): After expansion
of the Inspection_Point pragma, check if referenced entities
that have a freeze node are already frozen. If they aren't, emit
a warning and turn the pragma into a no-op.diff --git a/gcc/ada/exp_prag.adb b/gcc/ada/exp_prag.adb
--- a/gcc/ada/exp_prag.adb
+++ b/gcc/ada/exp_prag.adb
@@ -2361,6 +2361,7 @@ package body Exp_Prag is
   S : Entity_Id;
   E : Entity_Id;
 
+  Remove_Inspection_Point : Boolean := False;
begin
   if No (Pragma_Argument_Associations (N)) then
  A := New_List;
@@ -2400,6 +2401,36 @@ package body Exp_Prag is
  Expand (Expression (Assoc));
  Next (Assoc);
   end loop;
+
+  --  If any of the references have a freeze node, it must appear before
+  --  pragma Inspection_Point, otherwise the entity won't be available when
+  --  Gigi processes Inspection_Point.
+  --  When this requirement isn't met, turn the pragma into a no-op.
+
+  Assoc := First (Pragma_Argument_Associations (N));
+  while Present (Assoc) loop
+
+ if Present (Freeze_Node (Entity (Expression (Assoc and then
+   not Is_Frozen (Entity (Expression (Assoc)))
+ then
+Error_Msg_NE ("?inspection point references unfrozen object &",
+  Assoc,
+  Entity (Expression (Assoc)));
+Remove_Inspection_Point := True;
+ end if;
+
+ Next (Assoc);
+  end loop;
+
+  if Remove_Inspection_Point then
+ Error_Msg_N ("\pragma will be ignored", N);
+
+ --  We can't just remove the pragma from the tree as it might be
+ --  iterated over by the caller. Turn it into a null statement
+ --  instead.
+
+ Rewrite (N, Make_Null_Statement (Sloc (N)));
+  end if;
end Expand_Pragma_Inspection_Point;
 
--




[Ada] Skip types in error for test to compute array size

2021-07-08 Thread Pierre-Marie de Rodat
After a syntax error, if the code is compiled with -gnatq, semantic
analysis should still proceed without internal errors if possible. Add
special case to recognize ill-formed array type.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* layout.adb (Layout_Type): Do not call Number_Dimensions if the
type does not have First_Index set.diff --git a/gcc/ada/layout.adb b/gcc/ada/layout.adb
--- a/gcc/ada/layout.adb
+++ b/gcc/ada/layout.adb
@@ -498,6 +498,7 @@ package body Layout is
  --  in GNAT, i.e. when Packed_Array_Impl_Type is set.
 
  if Is_Array_Type (E)
+   and then Present (First_Index (E))  --  Skip types in error
and then Number_Dimensions (E) = 1
and then not Present (Packed_Array_Impl_Type (E))
and then Has_Pragma_Pack (E)




[Ada] Fix use of single question mark in error message

2021-07-08 Thread Pierre-Marie de Rodat
Single question marks are deprecated.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_prag.adb (Expand_Pragma_Inspection_Point): Fix error
message.diff --git a/gcc/ada/exp_prag.adb b/gcc/ada/exp_prag.adb
--- a/gcc/ada/exp_prag.adb
+++ b/gcc/ada/exp_prag.adb
@@ -2413,7 +2413,7 @@ package body Exp_Prag is
  if Present (Freeze_Node (Entity (Expression (Assoc and then
not Is_Frozen (Entity (Expression (Assoc)))
  then
-Error_Msg_NE ("?inspection point references unfrozen object &",
+Error_Msg_NE ("??inspection point references unfrozen object &",
   Assoc,
   Entity (Expression (Assoc)));
 Remove_Inspection_Point := True;




[Ada] Fix style in comments and code related to compilation units

2021-07-08 Thread Pierre-Marie de Rodat
Only style fixes; comments and code themselves are unchanged.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-load.adb (Load_Unit): Fix style in comment.
* par-load.adb (Load): Likewise.
* scng.adb (Initialize_Scanner): Fix whitespace.diff --git a/gcc/ada/lib-load.adb b/gcc/ada/lib-load.adb
--- a/gcc/ada/lib-load.adb
+++ b/gcc/ada/lib-load.adb
@@ -823,7 +823,7 @@ package body Lib.Load is
  Units.Table (Calling_Unit).Fatal_Error := Error_Detected;
 
   --  If with'ed unit had an ignored error, then propagate it
-  --  but do not overide an existring setting.
+  --  but do not overide an existing setting.
 
   when Error_Ignored =>
  if Units.Table (Calling_Unit).Fatal_Error = None then
@@ -900,7 +900,7 @@ package body Lib.Load is
Remove_Unit (Unum);
 
 --  If unit not required, remove load stack entry and the junk
---  file table entry, and return No_Unit to indicate not found,
+--  file table entry, and return No_Unit to indicate not found.
 
 else
Load_Stack.Decrement_Last;


diff --git a/gcc/ada/par-load.adb b/gcc/ada/par-load.adb
--- a/gcc/ada/par-load.adb
+++ b/gcc/ada/par-load.adb
@@ -129,8 +129,8 @@ begin
Save_Style_Check_Options (Save_Style_Checks);
Save_Style_Check := Opt.Style_Check;
 
-   --  If main unit, set Main_Unit_Entity (this will get overwritten if
-   --  the main unit has a separate spec, that happens later on in Load)
+   --  If main unit, set Main_Unit_Entity (this will get overwritten if the
+   --  main unit has a separate spec, that happens later on in Load).
 
if Cur_Unum = Main_Unit then
   Main_Unit_Entity := Cunit_Entity (Main_Unit);


diff --git a/gcc/ada/scng.adb b/gcc/ada/scng.adb
--- a/gcc/ada/scng.adb
+++ b/gcc/ada/scng.adb
@@ -230,16 +230,16 @@ package body Scng is
 
   --  Initialize scan control variables
 
-  Current_Source_File   := Index;
-  Source:= Source_Text (Current_Source_File);
-  Scan_Ptr  := Source_First (Current_Source_File);
-  Token := No_Token;
-  Token_Ptr := Scan_Ptr;
-  Current_Line_Start:= Scan_Ptr;
-  Token_Node:= Empty;
-  Token_Name:= No_Name;
-  Start_Column  := Set_Start_Column;
-  First_Non_Blank_Location  := Scan_Ptr;
+  Current_Source_File  := Index;
+  Source   := Source_Text (Current_Source_File);
+  Scan_Ptr := Source_First (Current_Source_File);
+  Token:= No_Token;
+  Token_Ptr:= Scan_Ptr;
+  Current_Line_Start   := Scan_Ptr;
+  Token_Node   := Empty;
+  Token_Name   := No_Name;
+  Start_Column := Set_Start_Column;
+  First_Non_Blank_Location := Scan_Ptr;
 
   Initialize_Checksum;
   Wide_Char_Byte_Count := 0;




[Ada] Prevent infinite recursion when there is no expected unit

2021-07-08 Thread Pierre-Marie de Rodat
The comment in Par.Load says "... or we are in big trouble, and abandon
the compilation", but the code merely emitted errors and kept going. Now
it emits errors, flags the problem in the unit table and gives up. Also,
it was wrong for this routine to remove the unit, because the callers
who add entries to the unit table assume those entries to be filled by
the parser and not removed, even when irrecoverable errors happen.

This prevents an infinite recursion that happened when parsing a file
with multiple compilation units and wrong indexes, so the compiler was
scanning unit X, followed its WITH Y clause but instead of unit Y it was
getting unit X and scanned it again and again...

Also, it fixes a crash when compiling a program with subunit that
contains unexpected program unit (previously the compiler only cared
about avoiding such a crash with -gnatc switch).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par-load.adb (Load): Don't remove unit, but flag it as
erroneous and return.diff --git a/gcc/ada/par-load.adb b/gcc/ada/par-load.adb
--- a/gcc/ada/par-load.adb
+++ b/gcc/ada/par-load.adb
@@ -234,9 +234,10 @@ begin
  Error_Msg ("\\found unit $!", Loc);
   end if;
 
-  --  In both cases, remove the unit so that it is out of the way later
+  --  In both cases, flag the fatal error and give up
 
-  Remove_Unit (Cur_Unum);
+  Set_Fatal_Error (Cur_Unum, Error_Detected);
+  return;
end if;
 
--  If current unit is a body, load its corresponding spec




[Ada] Replace low-level condition with a high-level call

2021-07-08 Thread Pierre-Marie de Rodat
Code cleanup; semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-writ.adb (Ensure_System_Dependency): Simplify condition.diff --git a/gcc/ada/lib-writ.adb b/gcc/ada/lib-writ.adb
--- a/gcc/ada/lib-writ.adb
+++ b/gcc/ada/lib-writ.adb
@@ -147,7 +147,7 @@ package body Lib.Writ is
begin
   --  Nothing to do if we already compiled System
 
-  if Unit_Names.Get (System_Uname) /= No_Unit then
+  if Is_Loaded (System_Uname) then
  return;
   end if;
 




[Ada] Restore context on failure in loading of renamed child unit

2021-07-08 Thread Pierre-Marie de Rodat
When loading of renamed child unit failed, we didn't properly restore
the value of a global Parsing_Main_Extended_Source variable.

This is primarily a cleanup change; behaviour is not affected (perhaps
except for errors reported on complicated code that is illegal anyway).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-load.adb (Load): Replace early return with goto to properly
restore context on failure.diff --git a/gcc/ada/lib-load.adb b/gcc/ada/lib-load.adb
--- a/gcc/ada/lib-load.adb
+++ b/gcc/ada/lib-load.adb
@@ -451,8 +451,8 @@ package body Lib.Load is
   With_Node  => With_Node);
 
  if Unump = No_Unit then
-Parsing_Main_Extended_Source := Save_PMES;
-return No_Unit;
+Unum := No_Unit;
+goto Done;
  end if;
 
  --  If parent is a renaming, then we use the renamed package as




[Ada] Remove redundant condition for listing compilation units

2021-07-08 Thread Pierre-Marie de Rodat
There is only one call to Unit_Display and it is guarded by the
List_Units global variable. There is no need to retest this variable
inside the Unit_Display routine.

Code cleanup; semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par-ch10.adb (Unit_Display): Remove redundant condition; fix
whitespace.diff --git a/gcc/ada/par-ch10.adb b/gcc/ada/par-ch10.adb
--- a/gcc/ada/par-ch10.adb
+++ b/gcc/ada/par-ch10.adb
@@ -1162,24 +1162,22 @@ package body Ch10 is
   Loc: Source_Ptr;
   SR_Present : Boolean)
is
-  Unum : constant Unit_Number_Type:= Get_Cunit_Unit_Number (Cunit);
-  Sind : constant Source_File_Index   := Source_Index (Unum);
-  Unam : constant Unit_Name_Type  := Unit_Name (Unum);
+  Unum : constant Unit_Number_Type  := Get_Cunit_Unit_Number (Cunit);
+  Sind : constant Source_File_Index := Source_Index (Unum);
+  Unam : constant Unit_Name_Type:= Unit_Name (Unum);
 
begin
-  if List_Units then
- Write_Str ("Unit ");
- Write_Unit_Name (Unit_Name (Unum));
- Unit_Location (Sind, Loc);
+  Write_Str ("Unit ");
+  Write_Unit_Name (Unit_Name (Unum));
+  Unit_Location (Sind, Loc);
 
- if SR_Present then
-Write_Str (", SR");
- end if;
-
- Write_Str (", file name ");
- Write_Name (Get_File_Name (Unam, Nkind (Unit (Cunit)) = N_Subunit));
- Write_Eol;
+  if SR_Present then
+ Write_Str (", SR");
   end if;
+
+  Write_Str (", file name ");
+  Write_Name (Get_File_Name (Unam, Nkind (Unit (Cunit)) = N_Subunit));
+  Write_Eol;
end Unit_Display;
 
---




[Ada] Simplify redundant checks for non-empty lists

2021-07-08 Thread Pierre-Marie de Rodat
Simplify "Present (L) and then not Is_Empty_List (L)" into "not
Is_Empty_List (L)", since Is_Empty_List can be called on No_List
and returns True.

Code cleanup; semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch12.adb, sem_ch6.adb, sem_ch9.adb, sprint.adb: Simplify
checks for non-empty lists.diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -9724,7 +9724,6 @@ package body Sem_Ch12 is
 
  if Nkind (Par_N) = N_Package_Specification
and then Decls = Visible_Declarations (Par_N)
-   and then Present (Private_Declarations (Par_N))
and then not Is_Empty_List (Private_Declarations (Par_N))
  then
 Decls := Private_Declarations (Par_N);


diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -549,7 +549,6 @@ package body Sem_Ch6 is
 else
if Nkind (Par) = N_Package_Specification
  and then Decls = Visible_Declarations (Par)
- and then Present (Private_Declarations (Par))
  and then not Is_Empty_List (Private_Declarations (Par))
then
   Decls := Private_Declarations (Par);


diff --git a/gcc/ada/sem_ch9.adb b/gcc/ada/sem_ch9.adb
--- a/gcc/ada/sem_ch9.adb
+++ b/gcc/ada/sem_ch9.adb
@@ -1955,9 +1955,7 @@ package body Sem_Ch9 is
   Tasking_Used := True;
   Analyze_Declarations (Visible_Declarations (N));
 
-  if Present (Private_Declarations (N))
-and then not Is_Empty_List (Private_Declarations (N))
-  then
+  if not Is_Empty_List (Private_Declarations (N)) then
  Last_Id := Last_Entity (Prot_Typ);
  Analyze_Declarations (Private_Declarations (N));
 


diff --git a/gcc/ada/sprint.adb b/gcc/ada/sprint.adb
--- a/gcc/ada/sprint.adb
+++ b/gcc/ada/sprint.adb
@@ -1065,16 +1065,12 @@ package body Sprint is
if Present (Expressions (Node)) then
   Sprint_Comma_List (Expressions (Node));
 
-  if Present (Component_Associations (Node))
-and then not Is_Empty_List (Component_Associations (Node))
-  then
+  if not Is_Empty_List (Component_Associations (Node)) then
  Write_Str (", ");
   end if;
end if;
 
-   if Present (Component_Associations (Node))
- and then not Is_Empty_List (Component_Associations (Node))
-   then
+   if not Is_Empty_List (Component_Associations (Node)) then
   Indent_Begin;
 
   declare




[Ada] Fix violation of No_Implicit_Loops restriction for enumeration type

2021-07-08 Thread Pierre-Marie de Rodat
The perfect hash function generated by the compiler to speed up the Value
attribute of an enumeration type contains an implicit loop and, therefore,
violates the No_Implicit_Loops restriction when it is active.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_imgv.adb: Add with and use clause for Restrict and Rident.
(Build_Enumeration_Image_Tables): Do not generate the hash function
if the No_Implicit_Loops restriction is active.diff --git a/gcc/ada/exp_imgv.adb b/gcc/ada/exp_imgv.adb
--- a/gcc/ada/exp_imgv.adb
+++ b/gcc/ada/exp_imgv.adb
@@ -37,6 +37,8 @@ with Namet;  use Namet;
 with Nmake;  use Nmake;
 with Nlists; use Nlists;
 with Opt;use Opt;
+with Restrict;   use Restrict;
+with Rident; use Rident;
 with Rtsfind;use Rtsfind;
 with Sem_Aux;use Sem_Aux;
 with Sem_Res;use Sem_Res;
@@ -160,6 +162,8 @@ package body Exp_Imgv is
  Expression  => Make_Aggregate (Loc, Expressions => V)));
   end Append_Table_To;
 
+   --  Start of Build_Enumeration_Image_Tables
+
begin
   --  Nothing to do for types other than a root enumeration type
 
@@ -247,7 +251,7 @@ package body Exp_Imgv is
   Append_Table_To (Act, Eind, Nlit, Ityp, Ind);
 
   --  If the number of literals is not greater than Threshold, then we are
-  --  done. Otherwise we compute a (perfect) hash function for use by the
+  --  done. Otherwise we generate a (perfect) hash function for use by the
   --  Value attribute.
 
   if Nlit > Threshold then
@@ -283,11 +287,12 @@ package body Exp_Imgv is
 
  --  If the unit where the type is declared is the main unit, and the
  --  number of literals is greater than Threshold_For_Size when we are
- --  optimizing for size, and -gnatd_h is not specified, try to compute
- --  the hash function.
+ --  optimizing for size, and the restriction No_Implicit_Loops is not
+ --  active, and -gnatd_h is not specified, generate the hash function.
 
  if In_Main_Unit
and then (Optimize_Size = 0 or else Nlit > Threshold_For_Size)
+   and then not Restriction_Active (No_Implicit_Loops)
and then not Debug_Flag_Underscore_H
  then
 declare




[Ada] Spurious warning in generic instance

2021-07-08 Thread Pierre-Marie de Rodat
In the case of complex generic instantiations, the warning on component
not being present can be spurious (corresponding to dead code for the
given instance), so we disable it.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.ads, sem_util.adb
(Apply_Compile_Time_Constraint_Error): New parameter
Emit_Message.
* sem_ch4.adb (Analyze_Selected_Component): Disable warning
within an instance.diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -5471,7 +5471,9 @@ package body Sem_Ch4 is
  Apply_Compile_Time_Constraint_Error
(N, "component not present in }??",
 CE_Discriminant_Check_Failed,
-Ent => Prefix_Type);
+Ent  => Prefix_Type,
+Emit_Message =>
+  SPARK_Mode = On or not In_Instance_Not_Visible);
  return;
   end if;
 


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -1510,13 +1510,14 @@ package body Sem_Util is
-
 
procedure Apply_Compile_Time_Constraint_Error
- (N  : Node_Id;
-  Msg: String;
-  Reason : RT_Exception_Code;
-  Ent: Entity_Id  := Empty;
-  Typ: Entity_Id  := Empty;
-  Loc: Source_Ptr := No_Location;
-  Warn   : Boolean:= False)
+ (N: Node_Id;
+  Msg  : String;
+  Reason   : RT_Exception_Code;
+  Ent  : Entity_Id  := Empty;
+  Typ  : Entity_Id  := Empty;
+  Loc  : Source_Ptr := No_Location;
+  Warn : Boolean:= False;
+  Emit_Message : Boolean:= True)
is
   Stat   : constant Boolean := Is_Static_Expression (N);
   R_Stat : constant Node_Id :=
@@ -1530,8 +1531,10 @@ package body Sem_Util is
  Rtyp := Typ;
   end if;
 
-  Discard_Node
-(Compile_Time_Constraint_Error (N, Msg, Ent, Loc, Warn => Warn));
+  if Emit_Message then
+ Discard_Node
+   (Compile_Time_Constraint_Error (N, Msg, Ent, Loc, Warn => Warn));
+  end if;
 
   --  Now we replace the node by an N_Raise_Constraint_Error node
   --  This does not need reanalyzing, so set it as analyzed now.


diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -161,13 +161,14 @@ package Sem_Util is
--  part of the current package.
 
procedure Apply_Compile_Time_Constraint_Error
- (N  : Node_Id;
-  Msg: String;
-  Reason : RT_Exception_Code;
-  Ent: Entity_Id  := Empty;
-  Typ: Entity_Id  := Empty;
-  Loc: Source_Ptr := No_Location;
-  Warn   : Boolean:= False);
+ (N: Node_Id;
+  Msg  : String;
+  Reason   : RT_Exception_Code;
+  Ent  : Entity_Id  := Empty;
+  Typ  : Entity_Id  := Empty;
+  Loc  : Source_Ptr := No_Location;
+  Warn : Boolean:= False;
+  Emit_Message : Boolean:= True);
--  N is a subexpression that will raise Constraint_Error when evaluated
--  at run time. Msg is a message that explains the reason for raising the
--  exception. The last character is ? if the message is always a warning,
@@ -189,6 +190,7 @@ package Sem_Util is
--  when the caller wants to parameterize whether an error or warning is
--  given), or when the message should be treated as a warning even when
--  SPARK_Mode is On (which otherwise would force an error).
+   --  If Emit_Message is False, then do not emit any message.
 
function Async_Readers_Enabled (Id : Entity_Id) return Boolean;
--  Id should be the entity of a state abstraction, an object, or a type.




[Ada] AI12-0156 Use subtype indication in generalized iterators

2021-07-08 Thread Pierre-Marie de Rodat
Add syntax and semantic support for this new Ada 2022 feature.
Support for proper accessibility levels to be investigated in a second step.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par-ch5.adb (P_Iterator_Specification): Add support for access
definition in loop parameter.
* sem_ch5.adb (Check_Subtype_Indication): Renamed...
(Check_Subtype_Definition): ... into this and check for conformance
on access definitions, and improve error messages.
(Analyze_Iterator_Specification): Add support for access definition
in loop parameter.diff --git a/gcc/ada/par-ch5.adb b/gcc/ada/par-ch5.adb
--- a/gcc/ada/par-ch5.adb
+++ b/gcc/ada/par-ch5.adb
@@ -1741,7 +1741,15 @@ package body Ch5 is
 
   if Token = Tok_Colon then
  Scan;  --  past :
- Set_Subtype_Indication (Node1, P_Subtype_Indication);
+
+ if Token = Tok_Access then
+Error_Msg_Ada_2022_Feature
+  ("access definition in loop parameter", Token_Ptr);
+Set_Subtype_Indication (Node1, P_Access_Definition (False));
+
+ else
+Set_Subtype_Indication (Node1, P_Subtype_Indication);
+ end if;
   end if;
 
   if Token = Tok_Of then
@@ -1761,7 +1769,7 @@ package body Ch5 is
  Set_Of_Present (Node1);
  Error_Msg_N
("subtype indication is only legal on an element iterator",
-  Subtype_Indication (Node1));
+Subtype_Indication (Node1));
 
   else
  return Error;


diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -2176,9 +2176,11 @@ package body Sem_Ch5 is
   --  indicator, verify that the container type has an Iterate aspect that
   --  implements the reversible iterator interface.
 
-  procedure Check_Subtype_Indication (Comp_Type : Entity_Id);
+  procedure Check_Subtype_Definition (Comp_Type : Entity_Id);
   --  If a subtype indication is present, verify that it is consistent
   --  with the component type of the array or container name.
+  --  In Ada 2022, the subtype indication may be an access definition,
+  --  if the array or container has elements of an anonymous access type.
 
   function Get_Cursor_Type (Typ : Entity_Id) return Entity_Id;
   --  For containers with Iterator and related aspects, the cursor is
@@ -2209,24 +2211,46 @@ package body Sem_Ch5 is
   end Check_Reverse_Iteration;
 
   ---
-  --  Check_Subtype_Indication --
+  --  Check_Subtype_Definition --
   ---
 
-  procedure Check_Subtype_Indication (Comp_Type : Entity_Id) is
+  procedure Check_Subtype_Definition (Comp_Type : Entity_Id) is
   begin
- if Present (Subt)
-   and then (not Covers (Base_Type ((Bas)), Comp_Type)
+ if not Present (Subt) then
+return;
+ end if;
+
+ if Is_Anonymous_Access_Type (Entity (Subt)) then
+if not Is_Anonymous_Access_Type (Comp_Type) then
+   Error_Msg_NE
+ ("component type& is not an anonymous access",
+  Subt, Comp_Type);
+
+elsif not Conforming_Types
+(Designated_Type (Entity (Subt)),
+ Designated_Type (Comp_Type),
+ Fully_Conformant)
+then
+   Error_Msg_NE
+ ("subtype indication does not match component type&",
+  Subt, Comp_Type);
+end if;
+
+ elsif Present (Subt)
+   and then (not Covers (Base_Type (Bas), Comp_Type)
   or else not Subtypes_Statically_Match (Bas, Comp_Type))
  then
 if Is_Array_Type (Typ) then
-   Error_Msg_N
- ("subtype indication does not match component type", Subt);
+   Error_Msg_NE
+ ("subtype indication does not match component type&",
+  Subt, Comp_Type);
 else
-   Error_Msg_N
- ("subtype indication does not match element type", Subt);
+   Error_Msg_NE
+ ("subtype indication does not match element type&",
+  Subt, Comp_Type);
 end if;
  end if;
-  end Check_Subtype_Indication;
+  end Check_Subtype_Definition;
 
   -
   -- Get_Cursor_Type --
@@ -2288,6 +2312,39 @@ package body Sem_Ch5 is
Analyze (Decl);
Rewrite (Subt, New_Occurrence_Of (S, Sloc (Subt)));
 end;
+
+ --  Ada 2022: the subtype definition may be for an anonymous
+ --  access type.
+
+ elsif Nkind (Subt) = N_Access_Definition then
+declare
+   S: constant Entity_Id := Make_Temporary (Sloc (Subt), 'S');
+   Decl : Node_Id;
+begin
+   if Present

[Ada] Spurious style message on missing overriding indicator

2021-07-08 Thread Pierre-Marie de Rodat
In the presence of style switch -gnatyO, the compiler emits a spurious
style violation message naming an inherited operation that does not come
from an explicit subprogram declaration.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* style.adb (Missing_Overriding): Do not emit message when
parent of subprogram is a full type declaration.diff --git a/gcc/ada/style.adb b/gcc/ada/style.adb
--- a/gcc/ada/style.adb
+++ b/gcc/ada/style.adb
@@ -265,11 +265,15 @@ package body Style is
   --  indicators were introduced in Ada 2005. We apply Comes_From_Source
   --  to Original_Node to catch the case of a procedure body declared with
   --  "is null" that has been rewritten as a normal empty body.
+  --  We do not emit a warning on an inherited operation that comes from
+  --  a type derivation.
 
   if Style_Check_Missing_Overriding
 and then (Comes_From_Source (Original_Node (N))
or else Is_Generic_Instance (E))
 and then Ada_Version_Explicit >= Ada_2005
+and then Present (Parent (E))
+and then Nkind (Parent (E)) /= N_Full_Type_Declaration
   then
  --  If the subprogram is an instantiation,  its declaration appears
  --  within a wrapper package that precedes the instance node. Place




[Ada] Duplicated D lines in ali files

2021-07-08 Thread Pierre-Marie de Rodat
GNATcoverage possibly relies on the presence of the duplicate D lines in ALI
files for its Source Coverage Obligation tables among different instantiations
of a same generic. Mention this in comments.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-writ.ads: Mention SCOs dependency as reason for duplicates.
* lib.ads (Units): Update documentation to mention duplicated
units.diff --git a/gcc/ada/lib-writ.ads b/gcc/ada/lib-writ.ads
--- a/gcc/ada/lib-writ.ads
+++ b/gcc/ada/lib-writ.ads
@@ -1053,6 +1053,9 @@ package Lib.Writ is
--  The Object parameter is true if an object file is created, and false
--  otherwise. Note that the pseudo-object file generated in GNATprove mode
--  does count as an object file from this point of view.
+   --  May output duplicate D lines caused by generic instantiations. This is
+   --  by design as GNATcoverage relies on them for its coverage of generic
+   --  instantiations, although this may be revisited in the future.
 
procedure Add_Preprocessing_Dependency (S : Source_File_Index);
--  Indicate that there is a dependency to be added on a preprocessing data


diff --git a/gcc/ada/lib.ads b/gcc/ada/lib.ads
--- a/gcc/ada/lib.ads
+++ b/gcc/ada/lib.ads
@@ -926,7 +926,9 @@ private
--  The following table records a mapping between a name and the entry in
--  the units table whose Unit_Name is this name. It is used to speed up
--  the Is_Loaded function, whose original implementation (linear search)
-   --  could account for 2% of the time spent in the front end. Note that, in
+   --  could account for 2% of the time spent in the front end. When the unit
+   --  is an instance of a generic, the unit might get duplicated in the unit
+   --  table - see Make_Instance_Unit for more information. Note that, in
--  the case of source files containing multiple units, the units table may
--  temporarily contain two entries with the same Unit_Name during parsing,
--  which means that the mapping must be to the first entry in the table.




[Ada] Rename sigtramp-vxworks-target.inc to sigtramp-vxworks-target.h

2021-07-08 Thread Pierre-Marie de Rodat
The .inc extension isn't recognized by gprconfig. The original
motivation for using this extension was to match the convention
of putting code in .inc ala unwind.inc.  However it's easier in this
situation to just rename it to a .h file.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sigtramp-vxworks-target.inc: Rename to...
* sigtramp-vxworks-target.h: ... this.
* sigtramp-vxworks.c, Makefile.rtl: Likewise.diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -1043,7 +1043,7 @@ EXTRA_GNATRTL_NONTASKING_OBJS=
 EXTRA_GNATRTL_TASKING_OBJS=
 
 # Subsets of extra libgnat sources that always go together
-VX_SIGTRAMP_EXTRA_SRCS=sigtramp.h sigtramp-vxworks-target.inc
+VX_SIGTRAMP_EXTRA_SRCS=sigtramp.h sigtramp-vxworks-target.h
 
 # Additional object files that should go in the same directory as libgnat,
 # aside the library itself. Typically useful for crtbegin/crtend kind of files.


diff --git a/gcc/ada/sigtramp-vxworks-target.inc b/gcc/ada/sigtramp-vxworks-target.h
--- a/gcc/ada/sigtramp-vxworks-target.inc
+++ b/gcc/ada/sigtramp-vxworks-target.h
@@ -6,7 +6,7 @@
  *  *
  * Asm Implementation Include File  *
  *  *
- * Copyright (C) 2011-2018, Free Software Foundation, Inc.  *
+ * Copyright (C) 2011-2021, Free Software Foundation, Inc.  *
  *  *
  * GNAT is free software;  you can  redistribute it  and/or modify it under *
  * terms of the  GNU General Public License as published  by the Free Soft- *


diff --git a/gcc/ada/sigtramp-vxworks.c b/gcc/ada/sigtramp-vxworks.c
--- a/gcc/ada/sigtramp-vxworks.c
+++ b/gcc/ada/sigtramp-vxworks.c
@@ -180,7 +180,7 @@ void __gnat_sigtramp (int signo, void *si, void *sc,
 }
 
 /* Include the target specific bits.  */
-#include "sigtramp-vxworks-target.inc"
+#include "sigtramp-vxworks-target.h"
 
 /* sigtramp stub for common registers.  */
 




[Ada] Transient scope cleanup

2021-07-08 Thread Pierre-Marie de Rodat
Misc cleanups found while working on transient scopes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* comperr.adb (Compiler_Abort): Call Sinput.Unlock, because if
this is called late, then Source_Dump would crash otherwise.
* debug.adb: Correct documentation of the -gnatd.9 switch.
* exp_ch4.adb (Expand_Allocator_Expression): Add a comment.
* exp_ch6.adb: Minor comment fixes.  Add assertion.
* exp_ch6.ads (Is_Build_In_Place_Result_Type): Correct comment.
* exp_ch7.adb, checks.ads: Minor comment fixes.diff --git a/gcc/ada/checks.ads b/gcc/ada/checks.ads
--- a/gcc/ada/checks.ads
+++ b/gcc/ada/checks.ads
@@ -851,7 +851,7 @@ package Checks is
--are not following the flow graph (more properly the flow of actual
--processing only corresponds to the flow graph for local assignments).
--For non-local variables, we preserve the current setting, i.e. a
-   --validity check is performed when assigning to a knonwn valid global.
+   --validity check is performed when assigning to a known valid global.
 
--  Note: no validity checking is required if range checks are suppressed
--  regardless of the setting of the validity checking mode.


diff --git a/gcc/ada/comperr.adb b/gcc/ada/comperr.adb
--- a/gcc/ada/comperr.adb
+++ b/gcc/ada/comperr.adb
@@ -404,6 +404,7 @@ package body Comperr is
  Set_Standard_Output;
 
  Tree_Dump;
+ Sinput.Unlock; -- so Source_Dump can modify it
  Source_Dump;
  raise Unrecoverable_Error;
   end if;


diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -1101,7 +1101,7 @@ package body Debug is
--   issues (e.g., assuming that a low bound of an array parameter
--   of an unconstrained subtype belongs to the index subtype).
 
-   --  d.9  Enable build-in-place for function calls returning some nonlimited
+   --  d.9  Disable build-in-place for function calls returning nonlimited
--   types.
 
--


diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -1166,6 +1166,9 @@ package body Exp_Ch4 is
  --  secondary stack). In that case, the object will be moved, so we do
  --  want to Adjust. However, if it's a nonlimited build-in-place
  --  function call, Adjust is not wanted.
+ --
+ --  Needs_Finalization (DesigT) can differ from Needs_Finalization (T)
+ --  if one of the two types is class-wide, and the other is not.
 
  if Needs_Finalization (DesigT)
and then Needs_Finalization (T)


diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -4913,7 +4913,7 @@ package body Exp_Ch6 is
   --  Optimization, if the returned value (which is on the sec-stack) is
   --  returned again, no need to copy/readjust/finalize, we can just pass
   --  the value thru (see Expand_N_Simple_Return_Statement), and thus no
-  --  attachment is needed
+  --  attachment is needed.
 
   if Nkind (Parent (N)) = N_Simple_Return_Statement then
  return;
@@ -7310,15 +7310,16 @@ package body Exp_Ch6 is
 
  Set_Enclosing_Sec_Stack_Return (N);
 
- --  Optimize the case where the result is a function call. In this
- --  case the result is already on the secondary stack and no further
- --  processing is required except to set the By_Ref flag to ensure
- --  that gigi does not attempt an extra unnecessary copy. (Actually
- --  not just unnecessary but wrong in the case of a controlled type,
- --  where gigi does not know how to do a copy.)
+ --  Optimize the case where the result is a function call that also
+ --  returns on the secondary stack. In this case the result is already
+ --  on the secondary stack and no further processing is required
+ --  except to set the By_Ref flag to ensure that gigi does not attempt
+ --  an extra unnecessary copy. (Actually not just unnecessary but
+ --  wrong in the case of a controlled type, where gigi does not know
+ --  how to do a copy.)
 
- if Requires_Transient_Scope (Exp_Typ)
-   and then Exp_Is_Function_Call
+ pragma Assert (Requires_Transient_Scope (R_Type));
+ if Exp_Is_Function_Call and then Requires_Transient_Scope (Exp_Typ)
  then
 Set_By_Ref (N);
 
@@ -7849,7 +7850,7 @@ package body Exp_Ch6 is
 
   Compute_Returns_By_Ref (Subp);
 
-  --  Wnen freezing a null procedure, analyze its delayed aspects now
+  --  When freezing a null procedure, analyze its delayed aspects now
   --  because we may not have reached the end of the declarative list when
   --  delayed aspects are normally analyzed. This ensures that dispatching
   -

[Ada] Use encoded names only with -fgnat-encodings=all

2021-07-08 Thread Pierre-Marie de Rodat
This disables the last special encoding done in Get_Encoded_Name, except
when -fgnat-encodings=all is passed on the command line.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_dbug.adb (Get_Encoded_Name): Do not encode names of discrete
types with custom bounds, except with -fgnat-encodings=all.
* exp_pakd.adb (Create_Packed_Array_Impl_Type): Adjust comment.diff --git a/gcc/ada/exp_dbug.adb b/gcc/ada/exp_dbug.adb
--- a/gcc/ada/exp_dbug.adb
+++ b/gcc/ada/exp_dbug.adb
@@ -655,10 +655,10 @@ package body Exp_Dbug is
 
   Has_Suffix := True;
 
-  --  Fixed-point case: generate GNAT encodings when asked to
+  --  Generate GNAT encodings when asked to for fixed-point case
 
-  if Is_Fixed_Point_Type (E)
-and then GNAT_Encodings = DWARF_GNAT_Encodings_All
+  if GNAT_Encodings = DWARF_GNAT_Encodings_All
+and then Is_Fixed_Point_Type (E)
   then
  Get_External_Name (E, True, "XF_");
  Add_Real_To_Buffer (Delta_Value (E));
@@ -668,10 +668,9 @@ package body Exp_Dbug is
 Add_Real_To_Buffer (Small_Value (E));
  end if;
 
-  --  Discrete case where bounds do not match size. Not necessary if we can
-  --  emit standard DWARF.
+  --  Likewise for discrete case where bounds do not match size
 
-  elsif GNAT_Encodings /= DWARF_GNAT_Encodings_Minimal
+  elsif GNAT_Encodings = DWARF_GNAT_Encodings_All
 and then Is_Discrete_Type (E)
 and then not Bounds_Match_Size (E)
   then


diff --git a/gcc/ada/exp_pakd.adb b/gcc/ada/exp_pakd.adb
--- a/gcc/ada/exp_pakd.adb
+++ b/gcc/ada/exp_pakd.adb
@@ -828,8 +828,8 @@ package body Exp_Pakd is
 
   elsif not Is_Constrained (Typ) then
 
- --  When generating standard DWARF (i.e when GNAT_Encodings is
- --  DWARF_GNAT_Encodings_Minimal), the ___XP suffix will be stripped
+ --  When generating standard DWARF (i.e when GNAT_Encodings is not
+ --  DWARF_GNAT_Encodings_All), the ___XP suffix will be stripped
  --  by the back-end but generate it anyway to ease compiler debugging.
  --  This will help to distinguish implementation types from original
  --  packed arrays.




[Ada] Diagnose properly illegal uses of Target_Name

2021-07-08 Thread Pierre-Marie de Rodat
Ada_2022 introduces the notion of Target_Name, written @, to be used in
assignment statements, where it denotes the value of the left-hand side
prior to the assignment. This patch diagnoses illegal uses of the target
name outside of its legal context.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch5.adb (Analyze_Target_Name): Properly reject a
Target_Name when it appears outside of an assignment statement,
or within the left-hand side of one.diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -4233,10 +4233,50 @@ package body Sem_Ch5 is
-
 
procedure Analyze_Target_Name (N : Node_Id) is
+  procedure Report_Error;
+
+  --
+  -- Report_Error --
+  --
+
+  procedure Report_Error is
+  begin
+ Error_Msg_N
+   ("must appear in the right-hand side of an assignment statement",
+ N);
+ Rewrite (N, New_Occurrence_Of (Any_Id, Sloc (N)));
+  end Report_Error;
+
begin
   --  A target name has the type of the left-hand side of the enclosing
   --  assignment.
 
+  --  First, verify that the context is the right-hand side of an
+  --  assignment statement.
+
+  if No (Current_Assignment) then
+ Report_Error;
+ return;
+
+  else
+ declare
+P : Node_Id := N;
+ begin
+while Present (P)
+  and then Nkind (Parent (P)) /= N_Assignment_Statement
+loop
+   P := Parent (P);
+end loop;
+
+if No (P)
+  or else P /= Expression (Parent (P))
+then
+   Report_Error;
+   return;
+end if;
+ end;
+  end if;
+
   Set_Etype (N, Etype (Name (Current_Assignment)));
end Analyze_Target_Name;
 




[Ada] Tune detection of illegal occurrences of target_name

2021-07-08 Thread Pierre-Marie de Rodat
Prevent AST climbing from going outside of the current program unit;
tune style; add comments. Also, only set the Current_Assignment global
variable when needed and clear it once the analysis of an assignment
statement is done.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch5.adb (Analyze_Assignment): Clear Current_Assignment at
exit.
(Analyze_Target_Name): Prevent AST climbing from going too far.diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -480,12 +480,11 @@ package body Sem_Ch5 is
   Mark_And_Set_Ghost_Assignment (N);
 
   if Has_Target_Names (N) then
+ pragma Assert (No (Current_Assignment));
  Current_Assignment := N;
  Expander_Mode_Save_And_Set (False);
  Save_Full_Analysis := Full_Analysis;
  Full_Analysis  := False;
-  else
- Current_Assignment := Empty;
   end if;
 
   Analyze (Lhs);
@@ -1302,6 +1301,7 @@ package body Sem_Ch5 is
  if Has_Target_Names (N) then
 Expander_Mode_Restore;
 Full_Analysis := Save_Full_Analysis;
+Current_Assignment := Empty;
  end if;
 
  pragma Assert (not Should_Transform_BIP_Assignment (Typ => T1));
@@ -4234,6 +4234,8 @@ package body Sem_Ch5 is
 
procedure Analyze_Target_Name (N : Node_Id) is
   procedure Report_Error;
+  --  Complain about illegal use of target_name and rewrite it into unknown
+  --  identifier.
 
   --
   -- Report_Error --
@@ -4247,6 +4249,8 @@ package body Sem_Ch5 is
  Rewrite (N, New_Occurrence_Of (Any_Id, Sloc (N)));
   end Report_Error;
 
+   --  Start of processing for Analyze_Target_Name
+
begin
   --  A target name has the type of the left-hand side of the enclosing
   --  assignment.
@@ -4257,27 +4261,39 @@ package body Sem_Ch5 is
   if No (Current_Assignment) then
  Report_Error;
  return;
+  end if;
 
-  else
- declare
-P : Node_Id := N;
- begin
-while Present (P)
-  and then Nkind (Parent (P)) /= N_Assignment_Statement
-loop
-   P := Parent (P);
-end loop;
+  declare
+ Current : Node_Id := N;
+ Context : Node_Id := Parent (N);
+  begin
+ while Present (Context) loop
 
-if No (P)
-  or else P /= Expression (Parent (P))
-then
+--  Check if target_name appears in the expression of the enclosing
+--  assignment.
+
+if Nkind (Context) = N_Assignment_Statement then
+   if Current = Expression (Context) then
+  pragma Assert (Context = Current_Assignment);
+  Set_Etype (N, Etype (Name (Current_Assignment)));
+   else
+  Report_Error;
+   end if;
+   return;
+
+--  Prevent the search from going too far
+
+elsif Is_Body_Or_Package_Declaration (Context) then
Report_Error;
return;
 end if;
- end;
-  end if;
 
-  Set_Etype (N, Etype (Name (Current_Assignment)));
+Current := Context;
+Context := Parent (Context);
+ end loop;
+
+ Report_Error;
+  end;
end Analyze_Target_Name;
 





[Ada] Remove Unknown_ functions

2021-07-08 Thread Pierre-Marie de Rodat
Remove the Unknown_ type representation attribute predicates from
Einfo.Utils. "not Known_Alignment (...)" is at least as readable as
"Unknown_Alignment (...)" -- we don't need a bunch of functions that
just do a "not".

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* einfo-utils.ads, einfo-utils.adb (Unknown_Alignment,
Unknown_Component_Bit_Offset, Unknown_Component_Size,
Unknown_Esize, Unknown_Normalized_First_Bit,
Unknown_Normalized_Position, Unknown_Normalized_Position_Max,
Unknown_RM_Size): Remove these functions.
* exp_pakd.adb, exp_util.adb, fe.h, freeze.adb, layout.adb,
repinfo.adb, sem_ch13.adb, sem_ch3.adb, sem_util.adb: Remove
calls to these functions; do "not Known_..." instead.
* gcc-interface/decl.c, gcc-interface/trans.c
(Unknown_Alignment, Unknown_Component_Size, Unknown_Esize,
Unknown_RM_Size): Remove calls to these functions; do
"!Known_..." instead.diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb
--- a/gcc/ada/einfo-utils.adb
+++ b/gcc/ada/einfo-utils.adb
@@ -597,46 +597,6 @@ package body Einfo.Utils is
 and then not Is_Generic_Type (E);
end Known_Static_RM_Size;
 
-   function Unknown_Alignment (E : Entity_Id) return B is
-   begin
-  return not Known_Alignment (E);
-   end Unknown_Alignment;
-
-   function Unknown_Component_Bit_Offset  (E : Entity_Id) return B is
-   begin
-  return not Known_Component_Bit_Offset (E);
-   end Unknown_Component_Bit_Offset;
-
-   function Unknown_Component_Size(E : Entity_Id) return B is
-   begin
-  return not Known_Component_Size (E);
-   end Unknown_Component_Size;
-
-   function Unknown_Esize (E : Entity_Id) return B is
-   begin
-  return not Known_Esize (E);
-   end Unknown_Esize;
-
-   function Unknown_Normalized_First_Bit  (E : Entity_Id) return B is
-   begin
-  return not Known_Normalized_First_Bit (E);
-   end Unknown_Normalized_First_Bit;
-
-   function Unknown_Normalized_Position   (E : Entity_Id) return B is
-   begin
-  return not Known_Normalized_Position (E);
-   end Unknown_Normalized_Position;
-
-   function Unknown_Normalized_Position_Max   (E : Entity_Id) return B is
-   begin
-  return not Known_Normalized_Position_Max (E);
-   end Unknown_Normalized_Position_Max;
-
-   function Unknown_RM_Size   (E : Entity_Id) return B is
-   begin
-  return not Known_RM_Size (E);
-   end Unknown_RM_Size;
-

-- Address_Clause --



diff --git a/gcc/ada/einfo-utils.ads b/gcc/ada/einfo-utils.ads
--- a/gcc/ada/einfo-utils.ads
+++ b/gcc/ada/einfo-utils.ads
@@ -314,12 +314,11 @@ package Einfo.Utils is
-- Type Representation Attribute Predicates --
--
 
-   --  These predicates test the setting of the indicated attribute. If the
-   --  value has been set, then Known is True, and Unknown is False. If no
-   --  value is set, then Known is False and Unknown is True. The Known_Static
-   --  predicate is true only if the value is set (Known) and is set to a
-   --  compile time known value. Note that in the case of Alignment and
-   --  Normalized_First_Bit, dynamic values are not possible, so we do not
+   --  These predicates test the setting of the indicated attribute. The
+   --  Known predicate is True if and only if the value has been set. The
+   --  Known_Static predicate is True only if the value is set (Known) and is
+   --  set to a compile time known value. Note that in the case of Alignment
+   --  and Normalized_First_Bit, dynamic values are not possible, so we do not
--  need a separate Known_Static calls in these cases. The not set (unknown)
--  values are as follows:
 
@@ -364,15 +363,6 @@ package Einfo.Utils is
function Known_Static_Normalized_Position_Max  (E : Entity_Id) return B;
function Known_Static_RM_Size  (E : Entity_Id) return B;
 
-   function Unknown_Alignment (E : Entity_Id) return B;
-   function Unknown_Component_Bit_Offset  (E : Entity_Id) return B;
-   function Unknown_Component_Size(E : Entity_Id) return B;
-   function Unknown_Esize (E : Entity_Id) return B;
-   function Unknown_Normalized_First_Bit  (E : Entity_Id) return B;
-   function Unknown_Normalized_Position   (E : Entity_Id) return B;
-   function Unknown_Normalized_Position_Max   (E : Entity_Id) return B;
-   function Unknown_RM_Size   (E : Entity_Id) return B;
-
pragma Inline (Known_Alignment);
pragma Inline (Known_Component_Bit_Offset);
pragma Inline (Known_Component_Size);
@@ -390,15 +380,6 @@ package Einfo.Utils is
pragma Inline (Known_Static_Normalized_Position_Max);
pragma Inline (Known_Static_RM_Size);
 
-   pragma Inline 

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-08 Thread Qing Zhao via Gcc-patches
Hi, Martin,

Thank you for the review and comment.

On Jul 8, 2021, at 8:29 AM, Martin Jambor 
mailto:mjam...@suse.cz>> wrote:

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index c05d22f3e8f1..35051d7c6b96 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -384,6 +384,13 @@ static struct

  /* Numbber of components created when splitting aggregate parameters.  */
  int param_reductions_created;
+
+  /* Number of deferred_init calls that are modified.  */
+  int deferred_init;
+
+  /* Number of deferred_init calls that are created by
+ generate_subtree_deferred_init.  */
+  int subtree_deferred_init;
} sra_stats;

static void
@@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access *racc, 
tree reg_type)
  return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
}

+
+/* Generate statements to call .DEFERRED_INIT to initialize scalar replacements
+   of accesses within a subtree ACCESS; all its children, siblings and their
+   children are to be processed.
+   GSI is a statement iterator used to place the new statements.  */
+static void
+generate_subtree_deferred_init (struct access *access,
+ tree init_type,
+ tree is_vla,
+ gimple_stmt_iterator *gsi,
+ location_t loc)
+{
+  do
+{
+  if (access->grp_to_be_replaced)
+ {
+  tree repl = get_access_replacement (access);
+  gimple *call
+= gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
+  TYPE_SIZE_UNIT (TREE_TYPE (repl)),
+  init_type, is_vla);
+  gimple_call_set_lhs (call, repl);
+  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+  update_stmt (call);
+  gimple_set_location (call, loc);
+  sra_stats.subtree_deferred_init++;
+ }
+  else if (access->grp_to_be_debug_replaced)
+ {
+  tree drepl = get_access_replacement (access);
+  tree call = build_call_expr_internal_loc
+ (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
+  TREE_TYPE (drepl), 3,
+  TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
+  init_type, is_vla);
+  gdebug *ds = gimple_build_debug_bind (drepl, call,
+ gsi_stmt (*gsi));
+  gsi_insert_before (gsi, ds, GSI_SAME_STMT);

Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
why?  grp_to_be_debug_replaced accesses are there only to facilitate
debug information about a part of an aggregate decl is that is likely
going to be entirely removed - so that debuggers can sometimes show to
users information about what they would contain had they not removed.
It seems strange you need to mark them as uninitialized because they
should not have any consumers.  (But perhaps it is also harmless.)

This part has been discussed during the 2nd version of the patch, but I think 
that more discussion might be necessary.

In the previous discussion, Richard Sandiford mentioned: 
(https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html):

=

I guess the thing we need to decide here is whether -ftrivial-auto-var-init
should affect debug-only constructs too.  If it doesn't, exmaining removed
components in a debugger might show uninitialised values in cases where
the user was expecting initialised ones.  There would be no security
concern, but it might be surprising.

I think in principle the DRHS can contain a call to DEFERRED_INIT.
Doing that would probably require further handling elsewhere though.

=

I am still not very confident now for this part of the change.

My questions:

1. If we don’t handle grp_to_be_debug_replaced at all, what will happen?  ( the 
user of the debugger will see uninitialized values in
the removed part of the aggregate?  Or something else?)
2. On the other hand, if we handle grp_to_be_debug_replaced as the current 
patch, what will the user of the debugger see?



On a related note, if the intent of the feature is for optimizers to
behave (almost?) as if it was not taking place,

What’s you mean by “it” here?
I believe you need to
handle specially, and probably just ignore, calls to IFN_DEFERRED_INIT
in scan_function in tree-sra.c.

Do you mean to let tree-sra phase ignore IFN_DEFERRED_INIT calls completely?

My major purpose of change tree-sra.c phase is:

Change:

tmp = .DEFERRED_INIT (24, 2, 0)

To

tmp1 = .DEFERRED_INIT (8, 2, 0);
tmp2 = .DEFERRED_INIT (8, 2, 0);
tmp3 = .DEFERRED_INIT (8, 2, 0);

Doing this is to reduce the stack usage.

 Otherwise the generated SRA access
structures will have extra write flags turned on in them and that will
lead to different behavior of the pass.

Could you please explain this more?

thanks.

Qing

Martin



+ }
+  if (access->first_child)
+ generate_subtree_deferred_init (access->first_child, init_type,
+ is_vla, gsi, loc);
+
+  access = access ->next_sibling;
+}
+  while (access);
+}
+
+/* For a call to .DEFERRED_INIT:
+   var = .DEFERRED_INIT (size_of_var, init_type, is_vla);
+   examine the LHS variable VAR and replace it with a scalar replacement if
+   there is one, also replace the RHS call to a call to .DEFERRED_INIT of
+   the corresponding scalar relacement variable.  Examine the subtree and
+   do the s

[PATCH] c++: requires-expr with dependent extra args [PR101181]

2021-07-08 Thread Patrick Palka via Gcc-patches
Here we're crashing ultimately because the mechanism for delaying
substitution into a requires-expression (or constexpr if) doesn't
expect to see dependent args.  But we end up capturing dependent
args here when substituting into the default template argument during
coerce_template_parms for the dependent specialization p.

This patch enables the commented out code in add_extra_args for
handling this situation.  It turns out we also need to make a copy of
the captured arguments so that coerce_template_parms doesn't later
add to the argument, which would form an unexpected cycle.  And we
need to make tsubst_template_args more forgiving about missing template
arguments, since the arguments we capture from coerce_template_parms are
incomplete.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/101181

gcc/cp/ChangeLog:

* constraint.cc (tsubst_requires_expr): Pass complain/in_decl to
add_extra_args.
* cp-tree.h (add_extra_args): Add complain/in_decl parameters.
* pt.c (build_extra_args): Make a copy of args.
(add_extra_args): Add complain/in_decl parameters.  Handle the
case where the extra arguments are dependent.
(tsubst_pack_expansion): Pass complain/in_decl to
add_extra_args.
(tsubst_template_args): Handle missing template arguments.
(tsubst_expr) : Pass complain/in_decl to
add_extra_args.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires26.C: New test.
* g++.dg/cpp2a/lambda-uneval16.C: New test.
---
 gcc/cp/constraint.cc  |  3 +-
 gcc/cp/cp-tree.h  |  2 +-
 gcc/cp/pt.c   | 31 +--
 .../g++.dg/cpp2a/concepts-requires26.C| 18 +++
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C  | 22 +
 5 files changed, 58 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires26.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 99d3ccc6998..4ee5215df50 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2266,7 +2266,8 @@ tsubst_requires_expr (tree t, tree args, sat_info info)
   /* A requires-expression is an unevaluated context.  */
   cp_unevaluated u;
 
-  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args);
+  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args,
+info.complain, info.in_decl);
   if (processing_template_decl)
 {
   /* We're partially instantiating a generic lambda.  Substituting into
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 58da7460001..0a5f13489cc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7289,7 +7289,7 @@ extern void add_mergeable_specialization(bool 
is_decl, bool is_alias,
 tree outer, unsigned);
 extern tree add_to_template_args   (tree, tree);
 extern tree add_outermost_template_args(tree, tree);
-extern tree add_extra_args (tree, tree);
+extern tree add_extra_args (tree, tree, tsubst_flags_t, 
tree);
 extern tree build_extra_args   (tree, tree, tsubst_flags_t);
 
 /* in rtti.c */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 06116d16887..e4bdac087ad 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -12928,7 +12928,9 @@ extract_local_specs (tree pattern, tsubst_flags_t 
complain)
 tree
 build_extra_args (tree pattern, tree args, tsubst_flags_t complain)
 {
-  tree extra = args;
+  /* Make a copy of the extra arguments so that they won't get changed
+ from under us.  */
+  tree extra = copy_template_args (args);
   if (local_specializations)
 if (tree locals = extract_local_specs (pattern, complain))
   extra = tree_cons (NULL_TREE, extra, locals);
@@ -12939,7 +12941,7 @@ build_extra_args (tree pattern, tree args, 
tsubst_flags_t complain)
normal template args to ARGS.  */
 
 tree
-add_extra_args (tree extra, tree args)
+add_extra_args (tree extra, tree args, tsubst_flags_t complain, tree in_decl)
 {
   if (extra && TREE_CODE (extra) == TREE_LIST)
 {
@@ -12959,20 +12961,14 @@ add_extra_args (tree extra, tree args)
   gcc_assert (!TREE_PURPOSE (extra));
   extra = TREE_VALUE (extra);
 }
-#if 1
-  /* I think we should always be able to substitute dependent args into the
- pattern.  If that turns out to be incorrect in some cases, enable the
- alternate code (and add complain/in_decl parms to this function).  */
-  gcc_checking_assert (!uses_template_parms (extra));
-#else
-  if (!uses_template_parms (extra))
+  if (uses_template_parms (extra))
 {
-  gcc_unreachable ();
+  /* This can happen during dependent substitution into a requires-expr
+or a lambda that uses constexpr if.  */
   extra = tsubst_template_args (extra

Re: [PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction

2021-07-08 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Jul 8, 2021 at 2:46 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> This patch passes the reduc_info to get_initial_defs_for_reduction,
>> so that the function can get general information from there rather
>> than from the first SLP statement.  This isn't a win on its own,
>> but it becomes important with later patches.
>
> So the original code should have used SLP_TREE_REPRESENTATIVE
> instead of SLP_TREE_SCALAR_STMTS ()[0] (there might have been
> issues with doing that - my recollection is weak here).
>
> I'm not sure if reduc_info is actually better - only the representative
> will have STMT_VINFO_VECTYPE set, for the reduc_info
> there's STMT_VINFO_REDUC_VECTYPE (and STMT_VINFO_REDUC_VECTYPE_IN).
>
> So I think if you want to use reduc_info then you want to use
> STMT_VINFO_REDUC_VECTYPE?

I guess I'm a bit fuzzy on the details, but AIUI STMT_VINFO_REDUC_VECTYPE
is the type that we do the arithmetic in, which might be different from
the types of the phis.  Is that right?

In this context we want the types of the phis, since the routine is
providing the initial values.  Using STMT_VINFO_REDUC_VECTYPE gives
things like:

---
gcc.dg/torture/pr92345.c:8:1: error: incompatible types in 'PHI' argument 1
vector(4) int

vector(4) unsigned int

vect_fr_lsm.11_58 = PHI 
---

Thanks,
Richard

>
>> gcc/
>> * tree-vect-loop.c (get_initial_defs_for_reduction): Take the
>> reduc_info as an additional parameter.
>> (vect_transform_cycle_phi): Update accordingly.
>> ---
>>  gcc/tree-vect-loop.c | 23 ++-
>>  1 file changed, 10 insertions(+), 13 deletions(-)
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index a31d7621c3b..565c2859477 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -4764,32 +4764,28 @@ get_initial_def_for_reduction (loop_vec_info 
>> loop_vinfo,
>>return init_def;
>>  }
>>
>> -/* Get at the initial defs for the reduction PHIs in SLP_NODE.
>> -   NUMBER_OF_VECTORS is the number of vector defs to create.
>> -   If NEUTRAL_OP is nonnull, introducing extra elements of that
>> -   value will not change the result.  */
>> +/* Get at the initial defs for the reduction PHIs for REDUC_INFO, whose
>> +   associated SLP node is SLP_NODE.  NUMBER_OF_VECTORS is the number of 
>> vector
>> +   defs to create.  If NEUTRAL_OP is nonnull, introducing extra elements of
>> +   that value will not change the result.  */
>>
>>  static void
>>  get_initial_defs_for_reduction (vec_info *vinfo,
>> +   stmt_vec_info reduc_info,
>> slp_tree slp_node,
>> vec *vec_oprnds,
>> unsigned int number_of_vectors,
>> bool reduc_chain, tree neutral_op)
>>  {
>>vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
>> -  stmt_vec_info stmt_vinfo = stmts[0];
>>unsigned HOST_WIDE_INT nunits;
>>unsigned j, number_of_places_left_in_vector;
>> -  tree vector_type;
>> +  tree vector_type = STMT_VINFO_VECTYPE (reduc_info);
>>unsigned int group_size = stmts.length ();
>>unsigned int i;
>>class loop *loop;
>>
>> -  vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
>> -
>> -  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
>> -
>> -  loop = (gimple_bb (stmt_vinfo->stmt))->loop_father;
>> +  loop = (gimple_bb (reduc_info->stmt))->loop_father;
>>gcc_assert (loop);
>>edge pe = loop_preheader_edge (loop);
>>
>> @@ -4823,7 +4819,7 @@ get_initial_defs_for_reduction (vec_info *vinfo,
>>  {
>>tree op;
>>i = j % group_size;
>> -  stmt_vinfo = stmts[i];
>> +  stmt_vec_info stmt_vinfo = stmts[i];
>>
>>/* Get the def before the loop.  In reduction chain we have only
>>  one initial value.  Else we have as many as PHIs in the group.  */
>> @@ -7510,7 +7506,8 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>>   = neutral_op_for_slp_reduction (slp_node, vectype_out,
>>   STMT_VINFO_REDUC_CODE 
>> (reduc_info),
>>   first != NULL);
>> - get_initial_defs_for_reduction (loop_vinfo, 
>> slp_node_instance->reduc_phis,
>> + get_initial_defs_for_reduction (loop_vinfo, reduc_info,
>> + slp_node_instance->reduc_phis,
>>   &vec_initial_defs, vec_num,
>>   first != NULL, neutral_op);
>> }


[committed] Use Object Size Type zero for -Warray-bounds [PR101374]

2021-07-08 Thread Martin Sebor via Gcc-patches

I have committed the attached patch to unblock the bootstrap errors
due to the tightening up of the -Warray-bounds checking in r12-213.

I have also temporarily disabled a couple of instances of the warning
in gcc/cp/module.cc.  They don't appear to be caused by the same
tighter checking but I haven't determined the root cause yet.  I'll
submit another patch and/or bug when I do.

I tested this change by configuring with --enable-checking=release
and --enable-checking=yes,extra and successfully bootstrapping all
languages but libgo.  Libgo fails with a couple of new instances
of -Warray-bounds where it writes into an invalid address.  I have
a patch that suppresses these two -Warray-bounds instances but it
doesn't look like I can commit it myself so I'll forward the patch
to Ian separately.

Martin
Use Object Size Type zero for -Warray-bounds [PR101374].

PR bootstrap/101374 - -Warray-bounds accessing a member subobject as derived

gcc/cp/ChangeLog:

	* module.cc (module_state::read_macro_maps): Temporarily disable
	-Warray-bounds.
	(module_state::install_macros): Same.

gcc/ChangeLog:

	* gimple-array-bounds.cc (array_bounds_checker::check_mem_ref):
	Use Object Size Type 0 instead of 1.

gcc/testsuite/ChangeLog:

	* c-c++-common/Warray-bounds-3.c: Xfail assertion.
	* c-c++-common/Warray-bounds-4.c: Same.

libgo/ChangeLog:
	* runtime/proc.c (runtime_mstart): Suppress -Warray-bounds.
	* runtime/runtime_c.c (runtime_signalstack): Same.

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f259515a498..8a890c167cf 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16301,11 +16301,18 @@ module_state::read_macro_maps ()
 	}
   if (count)
 	sec.set_overrun ();
+
+  /* FIXME: Re-enable or fix after root causing.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Warray-bounds"
+
   dump (dumper::LOCATION)
 	&& dump ("Macro:%u %I %u/%u*2 locations [%u,%u)",
 		 ix, identifier (node), runs, n_tokens,
 		 MAP_START_LOCATION (macro),
 		 MAP_START_LOCATION (macro) + n_tokens);
+
+#pragma GCC diagnostic pop
 }
   location_t lwm = sec.u ();
   macro_locs.first = lwm - slurp->loc_deltas.second;
@@ -16911,6 +16918,10 @@ module_state::install_macros ()
   macro_import::slot &slot = imp.append (mod, flags);
   slot.offset = sec.u ();
 
+  /* FIXME: Re-enable or fix after root causing.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Warray-bounds"
+
   dump (dumper::MACRO)
 	&& dump ("Read %s macro %s%s%s %I at %u",
 		 imp.length () > 1 ? "add" : "new",
@@ -16931,6 +16942,8 @@ module_state::install_macros ()
 	exp.def = cur;
 	dump (dumper::MACRO)
 	  && dump ("Saving current #define %I", identifier (node));
+
+#pragma GCC diagnostic pop
 	  }
 }
 
diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 83b8db9755e..8dfd6f9500a 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -427,7 +427,7 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref,
 	axssize = wi::to_offset (access_size);
 
   access_ref aref;
-  if (!compute_objsize (ref, 1, &aref, ranges))
+  if (!compute_objsize (ref, 0, &aref, ranges))
 return false;
 
   if (aref.offset_in_range (axssize))
diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-3.c b/gcc/testsuite/c-c++-common/Warray-bounds-3.c
index 3d7c7687374..75f9a496eae 100644
--- a/gcc/testsuite/c-c++-common/Warray-bounds-3.c
+++ b/gcc/testsuite/c-c++-common/Warray-bounds-3.c
@@ -178,7 +178,7 @@ void test_memcpy_bounds_memarray_range (void)
 
   TM (ma.a5, ma.a5 + i, ma.a5, 1);
   TM (ma.a5, ma.a5 + i, ma.a5, 3);
-  TM (ma.a5, ma.a5 + i, ma.a5, 5); /* { dg-warning "\\\[-Warray-bounds" } */
+  TM (ma.a5, ma.a5 + i, ma.a5, 5); /* { dg-warning "\\\[-Warray-bounds" "pr101374" { xfail *-*-* } } */
   TM (ma.a5, ma.a5 + i, ma.a5, 7); /* diagnosed with -Warray-bounds=2 */
 }
 
diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-4.c b/gcc/testsuite/c-c++-common/Warray-bounds-4.c
index 1f73f11943f..835c634fd27 100644
--- a/gcc/testsuite/c-c++-common/Warray-bounds-4.c
+++ b/gcc/testsuite/c-c++-common/Warray-bounds-4.c
@@ -52,7 +52,7 @@ void test_memcpy_bounds_memarray_range (void)
  = MEM  [(char * {ref-all})&ma];
  and could be improved.  Just verify that one is issued but not its
  full text.  */
-  TM (ma.a5, ma.a5 + j, ma.a5, 5);/* { dg-warning "\\\[-Warray-bounds" } */
+  TM (ma.a5, ma.a5 + j, ma.a5, 5);/* { dg-warning "\\\[-Warray-bounds" "pr101374" { xfail *-*-* } } */
 
   TM (ma.a5, ma.a5 + j, ma.a5, 7);/* { dg-warning "offset \\\[5, 7] from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a5. with type .char ?\\\[5]. at offset 0" } */
   TM (ma.a5, ma.a5 + j, ma.a5, 9);/* { dg-warning "offset \\\[5, 9] from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a5. with type .char ?\\\[5]. at offset 0" } */


disable -Warray-bounds in libgo (PR 101374)

2021-07-08 Thread Martin Sebor via Gcc-patches

Hi Ian,

Yesterday's enhancement to -Warray-bounds has exposed a couple of
issues in libgo where the code writes into an invalid constant
address that the warning is designed to flag.

On the assumption that those invalid addresses are deliberate,
the attached patch suppresses these instances by using #pragma
GCC diagnostic but I don't think I'm supposed to commit it (at
least Git won't let me).  To avoid Go bootstrap failures please
either apply the patch or otherwise suppress the warning (e.g.,
by using a volatile pointer temporary).

Thanks
Martin
Use Object Size Type zero for -Warray-bounds [PR101374].

PR bootstrap/101374 - -Warray-bounds accessing a member subobject as derived

libgo/ChangeLog:
	PR bootstrap/101374
	* runtime/proc.c (runtime_mstart): Suppress -Warray-bounds.
	* runtime/runtime_c.c (runtime_signalstack): Same.

diff --git a/libgo/runtime/proc.c b/libgo/runtime/proc.c
index 38bf7a6b255..61635e6c1ea 100644
--- a/libgo/runtime/proc.c
+++ b/libgo/runtime/proc.c
@@ -594,7 +594,14 @@ runtime_mstart(void *arg)
 		gp->entry = nil;
 		gp->param = nil;
 		__builtin_call_with_static_chain(pfn(gp1), fv);
+
+		/* Writing to an invalid address is detected.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Warray-bounds"
+
 		*(int*)0x21 = 0x21;
+
+#pragma GCC diagnostic push
 	}
 
 	if(mp->exiting) {
@@ -662,7 +669,12 @@ setGContext(void)
 		gp->entry = nil;
 		gp->param = nil;
 		__builtin_call_with_static_chain(pfn(gp1), fv);
+
+		/* Writing to an invalid address is detected.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Warray-bounds"
 		*(int*)0x22 = 0x22;
+#pragma GCC diagnostic pop
 	}
 }
 
diff --git a/libgo/runtime/runtime_c.c b/libgo/runtime/runtime_c.c
index 18222c14465..53feaa075c7 100644
--- a/libgo/runtime/runtime_c.c
+++ b/libgo/runtime/runtime_c.c
@@ -116,7 +116,11 @@ runtime_signalstack(byte *p, uintptr n)
 	if(p == nil)
 		st.ss_flags = SS_DISABLE;
 	if(sigaltstack(&st, nil) < 0)
+	  /* Writing to an invalid address is detected.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Warray-bounds"
 		*(int *)0xf1 = 0xf1;
+#pragma GCC diagnostic push
 }
 
 int32 go_open(char *, int32, int32)


Re: PING 2 [PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)

2021-07-08 Thread Martin Sebor via Gcc-patches

On 7/8/21 4:41 AM, Andreas Schwab wrote:

On Jul 07 2021, Marek Polacek via Gcc-patches wrote:


On Wed, Jul 07, 2021 at 02:38:11PM -0600, Martin Sebor via Gcc-patches wrote:

I certainly will.  Pushed in r12-2132.


I think this patch breaks bootstrap on x86_64:


It also breaks bootstrap on aarch64 and ia64 in stage2.

In file included from ../../gcc/c-family/c-common.h:26,
  from ../../gcc/cp/cp-tree.h:40,
  from ../../gcc/cp/module.cc:209:
In function 'tree_node* identifier(const cpp_hashnode*)',
 inlined from 'bool module_state::read_macro_maps()' at 
../../gcc/cp/module.cc:16305:10:
../../gcc/tree.h:1089:58: error: array subscript -1 is outside array bounds of 
'cpp_hashnode [288230376151711743]' [-Werror=array-bounds]
  1089 |   ((tree) ((char *) (NODE) - sizeof (struct tree_common)))
   |  ^
../../gcc/cp/module.cc:277:10: note: in expansion of macro 
'HT_IDENT_TO_GCC_IDENT'
   277 |   return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast 
(node)));
   |  ^
In file included from ../../gcc/tree.h:23,
  from ../../gcc/c-family/c-common.h:26,
  from ../../gcc/cp/cp-tree.h:40,
  from ../../gcc/cp/module.cc:209:
../../gcc/tree-core.h: In member function 'bool 
module_state::read_macro_maps()':
../../gcc/tree-core.h:1445:24: note: at offset -24 into object 
'tree_identifier::id' of size 16
  1445 |   struct ht_identifier id;
   |^~


Thanks.  This is a different issue than what triggered the other
warnings.  I have temporarily suppressed these two instances until
I root cause them.  Bootstrap should now be restored (at least on
x86_64).  If there are any outstanding warnings that are causing
failures please either update pr101374 or open new bugs.

Martin


Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack

2021-07-08 Thread Jeff Law via Gcc-patches




On 7/5/2021 5:17 AM, Richard Biener via Gcc-patches wrote:

On Fri, Jul 2, 2021 at 6:13 PM Jeff Law  wrote:


This is a minor missed optimization we found with our internal port.

Given this code:

typedef struct {short a; short b;} T;

extern void g1();

void f(T s)
{
  if (s.a < 0)
  g1();
}


"s" is passed in a register, but it's still a BLKmode object because the
alignment of T is smaller than the alignment that an integer of the same
size would have (16 bits vs 32 bits).


Because "s" is BLKmode function.c is going to store it into a stack slot
and we'll load it from the that slot for each reference.  So on the v850
(just to pick a port that likely has the same behavior we're seeing) we
have this RTL from CSE2:


(insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
  (const_int -4 [0xfffc])) [2 S4 A32])
  (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 6 r6)
  (nil)))
(note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
  (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
  (const_int -4 [0xfffc])) [1 s.a+0 S2 A32]))
"j.c":7:5 3 {*movhi_internal}
   (nil))
(insn 9 8 10 2 (parallel [
  (set (reg:SI 45)
  (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
  (const_int 16 [0x10])))
  (clobber (reg:CC 32 psw))
  ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
   (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
  (expr_list:REG_UNUSED (reg:CC 32 psw)
  (nil
(insn 10 9 11 2 (parallel [
  (set (reg:SI 43)
  (ashiftrt:SI (reg:SI 45)
  (const_int 16 [0x10])))
  (clobber (reg:CC 32 psw))
  ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
   (expr_list:REG_DEAD (reg:SI 45)
  (expr_list:REG_UNUSED (reg:CC 32 psw)
  (nil


Insn 2 is the store into the stack. insn 8 is the load for s.a in the
conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6)
has the value we want.  After that the store at insn 2 is dead.  Sadly
DSE never removes the store.

The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
which keeps the MEM live.  The lack of a MEM_EXPR is due to call to
change_address to twiddle the mode on the MEM for the store at insn 2.
It should be safe to copy the MEM_EXPR (which should always be a
PARM_DECL) from the original memory to the memory returned by
change_address.  Doing so results in DSE1 removing the store at insn 2.

It would be nice to remove the stack setup/teardown.   I'm not offhand
aware of mechanisms to remove the setup/teardown after we've already
allocated a slot, even if the slot is no longer used.

Bootstrapped and regression tested on x86, though I don't think that's a
particularly useful test.  So I also ran it through my tester across
those pesky embedded targets without regressions as well.

I didn't include a test simply because I didn't want to have an insane
target selector.  I guess if we really wanted a test we could look after
DSE1 is done and verify there aren't any MEMs left at all.  Willing to
try that if the consensus is we want this tested.

OK for the trunk?

I wonder why the code doesn't use adjust_address instead?  That
handles most cases already and the code doesn't change the
address but just the mode (and access size)?
Yea, adjust_address seems to work fine.  I'm spinning that in my tester 
at the moment.


Jeff



Re: [PATCH v2] c++: Fix noexcept with unevaluated operand [PR101087]

2021-07-08 Thread Marek Polacek via Gcc-patches
On Thu, Jul 08, 2021 at 09:35:02AM -0400, Marek Polacek wrote:
> On Thu, Jul 08, 2021 at 09:30:27AM -0400, Jason Merrill wrote:
> > On 7/7/21 9:40 PM, Marek Polacek wrote:
> > > It sounds plausible that this assert
> > > 
> > >int f();
> > >static_assert(noexcept(sizeof(f(;
> > > 
> > > should pass: sizeof produces a std::size_t and its operand is not
> > > evaluated, so it can't throw.  noexcept should only evaluate to
> > > false for potentially evaluated operands.  Therefore I think that
> > > check_noexcept_r shouldn't walk into operands of sizeof/decltype/
> > > alignof/typeof.  Only checking cp_unevaluated_operand therein does
> > > not work, because expr_noexcept_p can be called in an unevaluated
> > > context, so I resorted to the following cp_evaluated hack.  Does
> > > that seem acceptable?
> > 
> > I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR
> > directly?
> 
> I thought I would, but then it occurred to me that it might be better to
> rely on cp_walk_subtrees which ++/--s cp_unevaluated_operand for those
> codes.  I'd be happy to change the patch to check those codes directly;
> maybe I'm overthinking things here.

So here's v2 which checks the codes directly, via a new inline:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
It sounds plausible that this assert

  int f();
  static_assert(noexcept(sizeof(f(;

should pass: sizeof produces a std::size_t and its operand is not
evaluated, so it can't throw.  noexcept should only evaluate to
false for potentially evaluated operands.  Therefore I think that
check_noexcept_r shouldn't walk into operands of sizeof/decltype/
alignof/typeof.

PR c++/101087

gcc/cp/ChangeLog:

* cp-tree.h (unevaluated_p): New.
* except.c (check_noexcept_r): Use it.  Don't walk into
unevaluated operands.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept70.C: New test.
---
 gcc/cp/cp-tree.h| 13 +
 gcc/cp/except.c |  9 ++---
 gcc/testsuite/g++.dg/cpp0x/noexcept70.C |  5 +
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept70.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b4501576b26..d4810c0c986 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8465,6 +8465,19 @@ is_constrained_auto (const_tree t)
   return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t);
 }
 
+/* True if CODE, a tree code, denotes a tree whose operand is not evaluated
+   as per [expr.context], i.e., an operand to sizeof, typeof, decltype, or
+   alignof.  */
+
+inline bool
+unevaluated_p (tree_code code)
+{
+  return (code == DECLTYPE_TYPE
+ || code == ALIGNOF_EXPR
+ || code == SIZEOF_EXPR
+ || code == NOEXCEPT_EXPR);
+}
+
 /* RAII class to push/pop the access scope for T.  */
 
 struct push_access_scope_guard
diff --git a/gcc/cp/except.c b/gcc/cp/except.c
index a8cea53cf91..a8acbc4b7b2 100644
--- a/gcc/cp/except.c
+++ b/gcc/cp/except.c
@@ -1033,12 +1033,15 @@ check_handlers (tree handlers)
  expression whose type is a polymorphic class type (10.3).  */
 
 static tree
-check_noexcept_r (tree *tp, int * /*walk_subtrees*/, void * /*data*/)
+check_noexcept_r (tree *tp, int *walk_subtrees, void *)
 {
   tree t = *tp;
   enum tree_code code = TREE_CODE (t);
-  if ((code == CALL_EXPR && CALL_EXPR_FN (t))
-  || code == AGGR_INIT_EXPR)
+
+  if (unevaluated_p (code))
+*walk_subtrees = false;
+  else if ((code == CALL_EXPR && CALL_EXPR_FN (t))
+  || code == AGGR_INIT_EXPR)
 {
   /* We can only use the exception specification of the called function
 for determining the value of a noexcept expression; we can't use
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept70.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C
new file mode 100644
index 000..45a6137dd6f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C
@@ -0,0 +1,5 @@
+// PR c++/101087
+// { dg-do compile { target c++11 } }
+
+int f();
+static_assert(noexcept(sizeof(f())), "");

base-commit: 763121ccd908f52bc666f277ea2cf42110b3aad9
-- 
2.31.1



Re: disable -Warray-bounds in libgo (PR 101374)

2021-07-08 Thread Rainer Orth
Hi Martin,

> Yesterday's enhancement to -Warray-bounds has exposed a couple of
> issues in libgo where the code writes into an invalid constant
> address that the warning is designed to flag.
>
> On the assumption that those invalid addresses are deliberate,
> the attached patch suppresses these instances by using #pragma
> GCC diagnostic but I don't think I'm supposed to commit it (at
> least Git won't let me).  To avoid Go bootstrap failures please
> either apply the patch or otherwise suppress the warning (e.g.,
> by using a volatile pointer temporary).

while this patch does fix the libgo bootstrap failure, Go is completely
broken: almost 1000 go.test failures and all libgo tests FAIL as well.
Seen on both i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Please fix.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-08 Thread Qing Zhao via Gcc-patches
(Resend this email since the previous one didn’t quote, I changed one setting 
in my mail client, hopefully that can fix this issue).

Hi, Martin,

Thank you for the review and comment.

> On Jul 8, 2021, at 8:29 AM, Martin Jambor  wrote:
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index c05d22f3e8f1..35051d7c6b96 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -384,6 +384,13 @@ static struct
>> 
>>   /* Numbber of components created when splitting aggregate parameters.  */
>>   int param_reductions_created;
>> +
>> +  /* Number of deferred_init calls that are modified.  */
>> +  int deferred_init;
>> +
>> +  /* Number of deferred_init calls that are created by
>> + generate_subtree_deferred_init.  */
>> +  int subtree_deferred_init;
>> } sra_stats;
>> 
>> static void
>> @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access *racc, 
>> tree reg_type)
>>   return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>> }
>> 
>> +
>> +/* Generate statements to call .DEFERRED_INIT to initialize scalar 
>> replacements
>> +   of accesses within a subtree ACCESS; all its children, siblings and their
>> +   children are to be processed.
>> +   GSI is a statement iterator used to place the new statements.  */
>> +static void
>> +generate_subtree_deferred_init (struct access *access,
>> +tree init_type,
>> +tree is_vla,
>> +gimple_stmt_iterator *gsi,
>> +location_t loc)
>> +{
>> +  do
>> +{
>> +  if (access->grp_to_be_replaced)
>> +{
>> +  tree repl = get_access_replacement (access);
>> +  gimple *call
>> += gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
>> +  TYPE_SIZE_UNIT (TREE_TYPE (repl)),
>> +  init_type, is_vla);
>> +  gimple_call_set_lhs (call, repl);
>> +  gsi_insert_before (gsi, call, GSI_SAME_STMT);
>> +  update_stmt (call);
>> +  gimple_set_location (call, loc);
>> +  sra_stats.subtree_deferred_init++;
>> +}
>> +  else if (access->grp_to_be_debug_replaced)
>> +{
>> +  tree drepl = get_access_replacement (access);
>> +  tree call = build_call_expr_internal_loc
>> + (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
>> +  TREE_TYPE (drepl), 3,
>> +  TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
>> +  init_type, is_vla);
>> +  gdebug *ds = gimple_build_debug_bind (drepl, call,
>> +gsi_stmt (*gsi));
>> +  gsi_insert_before (gsi, ds, GSI_SAME_STMT);
> 
> Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
> why?  grp_to_be_debug_replaced accesses are there only to facilitate
> debug information about a part of an aggregate decl is that is likely
> going to be entirely removed - so that debuggers can sometimes show to
> users information about what they would contain had they not removed.
> It seems strange you need to mark them as uninitialized because they
> should not have any consumers.  (But perhaps it is also harmless.)

This part has been discussed during the 2nd version of the patch, but I think 
that more discussion might be necessary.

In the previous discussion, Richard Sandiford mentioned: 
(https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html):

=

I guess the thing we need to decide here is whether -ftrivial-auto-var-init
should affect debug-only constructs too.  If it doesn't, exmaining removed
components in a debugger might show uninitialised values in cases where
the user was expecting initialised ones.  There would be no security
concern, but it might be surprising.

I think in principle the DRHS can contain a call to DEFERRED_INIT.
Doing that would probably require further handling elsewhere though.

=

I am still not very confident now for this part of the change.

My questions:

1. If we don’t handle grp_to_be_debug_replaced at all, what will happen?  ( the 
user of the debugger will see uninitialized values in
the removed part of the aggregate?  Or something else?)
2. On the other hand, if we handle grp_to_be_debug_replaced as the current 
patch, what will the user of the debugger see?

> 
> On a related note, if the intent of the feature is for optimizers to
> behave (almost?) as if it was not taking place,

What’s you mean by “it” here?

> I believe you need to
> handle specially, and probably just ignore, calls to IFN_DEFERRED_INIT
> in scan_function in tree-sra.c.


Do you mean to let tree-sra phase ignore IFN_DEFERRED_INIT calls completely?

My major purpose of change tree-sra.c phase is:

Change:

tmp = .DEFERRED_INIT (24, 2, 0)

To

tmp1 = .DEFERRED_INIT (8, 2, 0);
tmp2 = .DEFERRED_INIT (8, 2, 0);
tmp3 = .DEFERRED_INIT (8, 2, 0);

Doing this is to reduce the stack usage.


>  Otherwise the generated SRA access
> structures will have extra write flags tu

[committed] Further improvements to H8 variable shift patterns

2021-07-08 Thread Jeff Law via Gcc-patches
And another installment in optimizing a dead architecture.   This builds 
on prior patches to improve compare/test elimination for shifts.  
Specifically for the older chips in the H8 family we have to handle 
variable shifts with a loop.


Right now the splitter generates (set (pc) (if_then_else (lt (countreg) 
(const_int 0))) to test the shift count.  That will get lowered into a 
compare and a conditional branch using CC_REG. However, this lowering 
happens after the compare-elim pass, so we don't get much benefit.


Instead we can lower directly to the cc exposing form and remove the 
unnecessary test ourselves (particularly for the case where the shift 
count pseudo does not die).  Essentially we know that the copy into the 
scratch register is going to set the condition codes in a useful way.  
So we expose the condition codes on that copy and emit a condition code 
exposed conditional branch and no longer generate the comparison.


Built & tested in the usual way on the H8 without regressions.  
Committed to the trunk.


Probably the last H8 patch before going on vacation :-)

Jeff
commit b14ac7b29c9a05c94f62fe065c219bbaa83653db
Author: Jeff Law 
Date:   Thu Jul 8 17:09:36 2021 -0400

Further improvements to H8 variable shift patterns

gcc/

* config/h8300/shiftrotate.md (variable shifts): Expose condition
code handling for the test before the loop.

diff --git a/gcc/config/h8300/shiftrotate.md b/gcc/config/h8300/shiftrotate.md
index 485303cb906..d3aa6bea064 100644
--- a/gcc/config/h8300/shiftrotate.md
+++ b/gcc/config/h8300/shiftrotate.md
@@ -377,8 +377,10 @@
(clobber (reg:CC CC_REG))]
   "epilogue_completed
&& find_regno_note (insn, REG_DEAD, REGNO (operands[1]))"
-  [(set (pc)
-(if_then_else (le (match_dup 1) (const_int 0))
+  [(set (reg:CCZN CC_REG)
+(compare:CCZN (match_dup 1) (const_int 0)))
+   (set (pc)
+(if_then_else (le (reg:CCZN CC_REG)  (const_int 0))
  (label_ref (match_dup 5))
  (pc)))
(match_dup 4)
@@ -411,10 +413,12 @@
(clobber (reg:CC CC_REG))]
   "epilogue_completed
&& !find_regno_note (insn, REG_DEAD, REGNO (operands[1]))"
-  [(set (match_dup 3)
-   (match_dup 1))
+  [(parallel
+ [(set (reg:CCZN CC_REG)
+  (compare:CCZN (match_dup 1) (const_int 0)))
+  (set (match_dup 3) (match_dup 1))])
(set (pc)
-(if_then_else (le (match_dup 3) (const_int 0))
+(if_then_else (le (reg:CCZN CC_REG) (const_int 0))
  (label_ref (match_dup 5))
  (pc)))
(match_dup 4)


[PATCH] Fix for powerpc64 long double complex divide failure

2021-07-08 Thread Patrick McGehearty via Gcc-patches
This patch resolves the failure of powerpc64 long double complex divide
in native ibm long double format after the patch "Practical improvement
to libgcc complex divide".
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101104

The new code uses the following macros which are intended to be mapped
to appropriate values according to the underlying hardware representation.

RBIG a value near the maximum representation
RMIN a value near the minimum representation
 (but not in the subnormal range)
RMIN2a value moderately less than 1
RMINSCAL the inverse of RMIN2
RMAX2RBIG * RMIN2  - a value to limit scaling to not overflow

When "long double" values were not using the IEEE 128-bit format but
the traditional IBM 128-bit, the previous code used the LDBL values
which caused overflow for RMINSCAL. The new code uses the DBL values.

RBIG  LDBL_MAX = 0x1.f800p+1022
  DBL_MAX  = 0x1.f000p+1022

RMIN  LDBL_MIN = 0x1.p-969
RMIN  DBL_MIN  = 0x1.p-1022

RMIN2 LDBL_EPSILON = 0x0.1000p-1022 = 0x1.0p-1074
RMIN2 DBL_EPSILON  = 0x1.p-52

RMINSCAL 1/LDBL_EPSILON = inf (1.0p+1074 does not fit in IBM 128-bit).
 1/DBL_EPSILON  = 0x1.p+52

RMAX2 = RBIG * RMIN2 = 0x1.f800p-52
RBIG * RMIN2 = 0x1.f000p+970

The MAX and MIN values have only modest changes since the exponent
field for IBM 128-bit floating point values is the same size as
the exponent field for IBM 64-bit floating point values. However
the EPSILON field is considerably different. Due to how small
values can be represented in the lower 64 bits of the IBM 128-bit
floating point, EPSILON is extremely small, so far beyond the
desired value that inversion of the value overflows and even
without the overflow, the RMAX2 is so small as to eliminate
most usage of the test.

In addition, the gcc support for the KF fields (IBM native long double
format) does not exist on older gcc compilers such as the default
compilers on the gcc compiler farm. That adds build complexity
for users who's environment is only a few years out of date.
Instead of just replacing the use of KF_EPSILON with DF_ESPILON,
we replace all uses of KF_* with DF_*.

The change has been tested on gcc135.fsffrance.org and gains the
expected improvements in accuracy for long double complex divide.

libgcc/
* config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2):
Fix long double complex divide for native IBM 128-bit
---
 libgcc/config/rs6000/_divkc3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
index a1d29d2..2b229c8 100644
--- a/libgcc/config/rs6000/_divkc3.c
+++ b/libgcc/config/rs6000/_divkc3.c
@@ -38,10 +38,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #endif
 
 #ifndef __LONG_DOUBLE_IEEE128__
-#define RBIG   (__LIBGCC_KF_MAX__ / 2)
-#define RMIN   (__LIBGCC_KF_MIN__)
-#define RMIN2  (__LIBGCC_KF_EPSILON__)
-#define RMINSCAL (1 / __LIBGCC_KF_EPSILON__)
+#define RBIG   (__LIBGCC_DF_MAX__ / 2)
+#define RMIN   (__LIBGCC_DF_MIN__)
+#define RMIN2  (__LIBGCC_DF_EPSILON__)
+#define RMINSCAL (1 / __LIBGCC_DF_EPSILON__)
 #define RMAX2  (RBIG * RMIN2)
 #else
 #define RBIG   (__LIBGCC_TF_MAX__ / 2)
-- 
1.8.3.1



Re: [PATCH v2] c++: Fix noexcept with unevaluated operand [PR101087]

2021-07-08 Thread Jason Merrill via Gcc-patches

On 7/8/21 4:26 PM, Marek Polacek wrote:

On Thu, Jul 08, 2021 at 09:35:02AM -0400, Marek Polacek wrote:

On Thu, Jul 08, 2021 at 09:30:27AM -0400, Jason Merrill wrote:

On 7/7/21 9:40 PM, Marek Polacek wrote:

It sounds plausible that this assert

int f();
static_assert(noexcept(sizeof(f(;

should pass: sizeof produces a std::size_t and its operand is not
evaluated, so it can't throw.  noexcept should only evaluate to
false for potentially evaluated operands.  Therefore I think that
check_noexcept_r shouldn't walk into operands of sizeof/decltype/
alignof/typeof.  Only checking cp_unevaluated_operand therein does
not work, because expr_noexcept_p can be called in an unevaluated
context, so I resorted to the following cp_evaluated hack.  Does
that seem acceptable?


I suppose, but why not check for SIZEOF_EXPR/ALIGNOF_EXPR/NOEXCEPT_EXPR
directly?


I thought I would, but then it occurred to me that it might be better to
rely on cp_walk_subtrees which ++/--s cp_unevaluated_operand for those
codes.  I'd be happy to change the patch to check those codes directly;
maybe I'm overthinking things here.


So here's v2 which checks the codes directly, via a new inline:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK for trunk and 11, at least.  I lean toward putting it on older 
release branches as well, but it doesn't seem urgent.



-- >8 --
It sounds plausible that this assert

   int f();
   static_assert(noexcept(sizeof(f(;

should pass: sizeof produces a std::size_t and its operand is not
evaluated, so it can't throw.  noexcept should only evaluate to
false for potentially evaluated operands.  Therefore I think that
check_noexcept_r shouldn't walk into operands of sizeof/decltype/
alignof/typeof.

PR c++/101087

gcc/cp/ChangeLog:

* cp-tree.h (unevaluated_p): New.
* except.c (check_noexcept_r): Use it.  Don't walk into
unevaluated operands.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept70.C: New test.
---
  gcc/cp/cp-tree.h| 13 +
  gcc/cp/except.c |  9 ++---
  gcc/testsuite/g++.dg/cpp0x/noexcept70.C |  5 +
  3 files changed, 24 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept70.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b4501576b26..d4810c0c986 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8465,6 +8465,19 @@ is_constrained_auto (const_tree t)
return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t);
  }
  
+/* True if CODE, a tree code, denotes a tree whose operand is not evaluated

+   as per [expr.context], i.e., an operand to sizeof, typeof, decltype, or
+   alignof.  */
+
+inline bool
+unevaluated_p (tree_code code)
+{
+  return (code == DECLTYPE_TYPE
+ || code == ALIGNOF_EXPR
+ || code == SIZEOF_EXPR
+ || code == NOEXCEPT_EXPR);
+}
+
  /* RAII class to push/pop the access scope for T.  */
  
  struct push_access_scope_guard

diff --git a/gcc/cp/except.c b/gcc/cp/except.c
index a8cea53cf91..a8acbc4b7b2 100644
--- a/gcc/cp/except.c
+++ b/gcc/cp/except.c
@@ -1033,12 +1033,15 @@ check_handlers (tree handlers)
   expression whose type is a polymorphic class type (10.3).  */
  
  static tree

-check_noexcept_r (tree *tp, int * /*walk_subtrees*/, void * /*data*/)
+check_noexcept_r (tree *tp, int *walk_subtrees, void *)
  {
tree t = *tp;
enum tree_code code = TREE_CODE (t);
-  if ((code == CALL_EXPR && CALL_EXPR_FN (t))
-  || code == AGGR_INIT_EXPR)
+
+  if (unevaluated_p (code))
+*walk_subtrees = false;
+  else if ((code == CALL_EXPR && CALL_EXPR_FN (t))
+  || code == AGGR_INIT_EXPR)
  {
/* We can only use the exception specification of the called function
 for determining the value of a noexcept expression; we can't use
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept70.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C
new file mode 100644
index 000..45a6137dd6f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept70.C
@@ -0,0 +1,5 @@
+// PR c++/101087
+// { dg-do compile { target c++11 } }
+
+int f();
+static_assert(noexcept(sizeof(f())), "");

base-commit: 763121ccd908f52bc666f277ea2cf42110b3aad9





Re: [PATCH v2] c++: Fix noexcept with unevaluated operand [PR101087]

2021-07-08 Thread Marek Polacek via Gcc-patches
On Thu, Jul 08, 2021 at 05:34:24PM -0400, Jason Merrill wrote:
> On 7/8/21 4:26 PM, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> OK for trunk and 11, at least.  I lean toward putting it on older release
> branches as well, but it doesn't seem urgent.

Ok, I'll backport to 11 and 10, it seems very safe.  Thanks,

Marek



Re: [PATCH] c++: requires-expr with dependent extra args [PR101181]

2021-07-08 Thread Jason Merrill via Gcc-patches

On 7/8/21 11:28 AM, Patrick Palka wrote:

Here we're crashing ultimately because the mechanism for delaying
substitution into a requires-expression (or constexpr if) doesn't
expect to see dependent args.  But we end up capturing dependent
args here when substituting into the default template argument during
coerce_template_parms for the dependent specialization p.

This patch enables the commented out code in add_extra_args for
handling this situation.  It turns out we also need to make a copy of
the captured arguments so that coerce_template_parms doesn't later
add to the argument, which would form an unexpected cycle.  And we
need to make tsubst_template_args more forgiving about missing template
arguments, since the arguments we capture from coerce_template_parms are
incomplete.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/101181

gcc/cp/ChangeLog:

* constraint.cc (tsubst_requires_expr): Pass complain/in_decl to
add_extra_args.
* cp-tree.h (add_extra_args): Add complain/in_decl parameters.
* pt.c (build_extra_args): Make a copy of args.
(add_extra_args): Add complain/in_decl parameters.  Handle the
case where the extra arguments are dependent.
(tsubst_pack_expansion): Pass complain/in_decl to
add_extra_args.
(tsubst_template_args): Handle missing template arguments.
(tsubst_expr) : Pass complain/in_decl to
add_extra_args.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires26.C: New test.
* g++.dg/cpp2a/lambda-uneval16.C: New test.
---
  gcc/cp/constraint.cc  |  3 +-
  gcc/cp/cp-tree.h  |  2 +-
  gcc/cp/pt.c   | 31 +--
  .../g++.dg/cpp2a/concepts-requires26.C| 18 +++
  gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C  | 22 +
  5 files changed, 58 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires26.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval16.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 99d3ccc6998..4ee5215df50 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2266,7 +2266,8 @@ tsubst_requires_expr (tree t, tree args, sat_info info)
/* A requires-expression is an unevaluated context.  */
cp_unevaluated u;
  
-  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args);

+  args = add_extra_args (REQUIRES_EXPR_EXTRA_ARGS (t), args,
+info.complain, info.in_decl);
if (processing_template_decl)
  {
/* We're partially instantiating a generic lambda.  Substituting into
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 58da7460001..0a5f13489cc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7289,7 +7289,7 @@ extern void add_mergeable_specialization(bool 
is_decl, bool is_alias,
 tree outer, unsigned);
  extern tree add_to_template_args  (tree, tree);
  extern tree add_outermost_template_args   (tree, tree);
-extern tree add_extra_args (tree, tree);
+extern tree add_extra_args (tree, tree, tsubst_flags_t, 
tree);
  extern tree build_extra_args  (tree, tree, tsubst_flags_t);
  
  /* in rtti.c */

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 06116d16887..e4bdac087ad 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -12928,7 +12928,9 @@ extract_local_specs (tree pattern, tsubst_flags_t 
complain)
  tree
  build_extra_args (tree pattern, tree args, tsubst_flags_t complain)
  {
-  tree extra = args;
+  /* Make a copy of the extra arguments so that they won't get changed
+ from under us.  */
+  tree extra = copy_template_args (args);
if (local_specializations)
  if (tree locals = extract_local_specs (pattern, complain))
extra = tree_cons (NULL_TREE, extra, locals);
@@ -12939,7 +12941,7 @@ build_extra_args (tree pattern, tree args, 
tsubst_flags_t complain)
 normal template args to ARGS.  */
  
  tree

-add_extra_args (tree extra, tree args)
+add_extra_args (tree extra, tree args, tsubst_flags_t complain, tree in_decl)
  {
if (extra && TREE_CODE (extra) == TREE_LIST)
  {
@@ -12959,20 +12961,14 @@ add_extra_args (tree extra, tree args)
gcc_assert (!TREE_PURPOSE (extra));
extra = TREE_VALUE (extra);
  }
-#if 1
-  /* I think we should always be able to substitute dependent args into the
- pattern.  If that turns out to be incorrect in some cases, enable the
- alternate code (and add complain/in_decl parms to this function).  */


Ah, because these cases aren't pack expansions, so we aren't trying to 
do the substitution; I wonder if it would be feasible to do so.  But 
this approach is probably simpler.  OK.



-  gcc_checking_assert (!uses_template_parms (extra));
-#else
-  if (!use

rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-08 Thread Peter Bergner via Gcc-patches
The MMA build built-ins currently use individual lxv instructions to
load up the registers of a __vector_pair or __vector_quad.  If the
memory addresses of the built-in operands are to adjacent locations,
then we could use an lxvp in some cases to load up two registers at once.
The patch below adds support for checking whether memory addresses are
adjacent and emitting an lxvp instead of two lxv instructions.

This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
Ok for trunk?

This seems simple enough, that I'd like to backport this to GCC 11
after some burn in on trunk, if that is ok?

Given the MMA redesign from GCC 10 to GCC 11, I have no plans to
backport this to GCC 10.

Peter


gcc/
* config/rs6000/rs6000.c (consecutive_mem_locations): New function.
(rs6000_split_multireg_move): Handle MMA build built-ins with operands
in consecutive memory locations.
(adjacent_mem_locations): Return the lower addressed memory rtx, if any.
(power6_sched_reorder2): Update for adjacent_mem_locations change.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-9.c: New test.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9a5db63d0ef..de36c5ecd91 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -293,6 +293,8 @@ bool cpu_builtin_p = false;
don't link in rs6000-c.c, so we can't call it directly.  */
 void (*rs6000_target_modify_macros_ptr) (bool, HOST_WIDE_INT, HOST_WIDE_INT);
 
+static bool consecutive_mem_locations (rtx, rtx);
+
 /* Simplfy register classes into simpler classifications.  We assume
GPR_REG_TYPE - FPR_REG_TYPE are ordered so that we can use a simple range
check for standard register classes (gpr/floating/altivec/vsx) and
@@ -16841,8 +16843,35 @@ rs6000_split_multireg_move (rtx dst, rtx src)
  for (int i = 0; i < nvecs; i++)
{
  int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i;
- rtx dst_i = gen_rtx_REG (reg_mode, reg + index);
- emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
+ int index_next = WORDS_BIG_ENDIAN ? index + 1 : index - 1;
+ rtx dst_i;
+ int regno = reg + i;
+
+ /* If we are loading an even VSX register and our memory location
+is adjacent to the next register's memory location (if any),
+then we can load them both with one LXVP instruction.  */
+ if ((regno & 1) == 0
+ && VSX_REGNO_P (regno)
+ && MEM_P (XVECEXP (src, 0, index))
+ && MEM_P (XVECEXP (src, 0, index_next)))
+   {
+ rtx base = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index)
+ : XVECEXP (src, 0, index_next);
+ rtx next = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index_next)
+ : XVECEXP (src, 0, index);
+
+ if (consecutive_mem_locations (base, next))
+   {
+ dst_i = gen_rtx_REG (OOmode, regno);
+ emit_move_insn (dst_i, adjust_address (base, OOmode, 0));
+ /* Skip the next register, since we just loaded it.  */
+ i++;
+ continue;
+   }
+   }
+
+ dst_i = gen_rtx_REG (reg_mode, reg + i);
+ emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, index)));
}
 
  /* We are writing an accumulator register, so we have to
@@ -18427,23 +18456,37 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT 
*offset,
   return true;
 }
 
-/* The function returns true if the target storage location of
-   mem1 is adjacent to the target storage location of mem2 */
-/* Return 1 if memory locations are adjacent.  */
+/* If the target storage locations of arguments MEM1 and MEM2 are
+   adjacent, then return the argument that has the lower address.
+   Otherwise, return NULL_RTX.  */
 
-static bool
+static rtx
 adjacent_mem_locations (rtx mem1, rtx mem2)
 {
   rtx reg1, reg2;
   HOST_WIDE_INT off1, size1, off2, size2;
 
   if (get_memref_parts (mem1, ®1, &off1, &size1)
-  && get_memref_parts (mem2, ®2, &off2, &size2))
-return ((REGNO (reg1) == REGNO (reg2))
-   && ((off1 + size1 == off2)
-   || (off2 + size2 == off1)));
+  && get_memref_parts (mem2, ®2, &off2, &size2)
+  && REGNO (reg1) == REGNO (reg2))
+{
+  if (off1 + size1 == off2)
+   return mem1;
+  else if (off2 + size2 == off1)
+   return mem2;
+}
 
-  return false;
+  return NULL_RTX;
+}
+
+/* The function returns true if the target storage location of
+   MEM1 is adjacent to the target storage location of MEM2 and
+   MEM1 has a lower address then MEM2.  */
+
+static bool
+consecutive_mem_locations (rtx mem1, rtx mem2)
+{
+  return adjacent_mem_locations (mem1, mem2) == mem1;
 }
 
 /* This fun

[committed] avoid including to ease cross-compiler testing

2021-07-08 Thread Martin Sebor via Gcc-patches

I have committed the attached change to ease testing with bare
bones cross-compilers with no libstdc++ headers.

Tested on x86_64 and with a powerpc64 cross-compiler.

Martin
commit c68cac900ab4ccaf6b1a31168bc9a302ebc46428
Author: Martin Sebor 
Date:   Thu Jul 8 16:02:01 2021 -0600

Avoid including  to make cross-compiler testing easy.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Warray-bounds-11.C: Avoid including .
* g++.dg/warn/Warray-bounds-13.C: Same.

diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C
index 70b39122f78..9670898770f 100644
--- a/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-11.C
@@ -4,7 +4,24 @@
{ dg-do compile }
{ dg-options "-O2 -Wall -Warray-bounds -ftrack-macro-expansion=0" } */
 
-#include 
+#if 0
+// Avoid including  to make cross-compiler testing easy.
+// #include 
+#else
+namespace std {
+
+typedef __SIZE_TYPE__ size_t;
+struct nothrow_t { };
+extern const nothrow_t nothrow;
+
+}
+
+void* operator new (std::size_t, const std::nothrow_t &) throw ()
+  __attribute__  ((__alloc_size__ (1), __malloc__));
+void* operator new[] (std::size_t, const std::nothrow_t &) throw ()
+__attribute__  ((__alloc_size__ (1), __malloc__));
+
+#endif
 
 typedef __INT32_TYPE__ int32_t;
 
diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C
index 2d3e9dcfd68..449324a315d 100644
--- a/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-13.C
@@ -4,7 +4,24 @@
{ dg-do compile }
{ dg-options "-O2 -Wall -Warray-bounds -ftrack-macro-expansion=0" } */
 
-#include 
+#if 0
+// Avoid including  to make cross-compiler testing easy.
+// #include 
+#else
+namespace std {
+
+typedef __SIZE_TYPE__ size_t;
+struct nothrow_t { };
+extern const nothrow_t nothrow;
+
+}
+
+void* operator new (std::size_t, const std::nothrow_t &) throw ()
+  __attribute__  ((__alloc_size__ (1), __malloc__));
+void* operator new[] (std::size_t, const std::nothrow_t &) throw ()
+__attribute__  ((__alloc_size__ (1), __malloc__));
+
+#endif
 
 typedef __INT32_TYPE__ int32_t;
 


[committed] adjust expected test output to LP32 (PR100451)

2021-07-08 Thread Martin Sebor via Gcc-patches

I have committed the attached change adjusting the expected test
output to the difference between LP64 and ILP32.

Tested in both modes on x86_64 and with a powerpc64 cross-compiler.

Martin
Adjust expected output for LP32 [PR100451].

gcc/testsuite/ChangeLog:

	PR testsuite/100451
	* g++.dg/warn/Warray-bounds-20.C: Adjust expected output for LP32.

diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C
index a65b29e6269..f4876d8a269 100644
--- a/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-20.C
@@ -27,7 +27,7 @@ struct D1: virtual B, virtual C
  to the opening brace.  */
   D1 ()
   {   // { dg-warning "\\\[-Warray-bounds" "brace" }
-ci = 0;   // { dg-warning "\\\[-Warray-bounds" "assign" { xfail *-*-* } }
+ci = 0;   // { dg-warning "\\\[-Warray-bounds" "assign" { xfail lp64 } }
   }
 };
 
@@ -35,7 +35,8 @@ void sink (void*);
 
 void warn_derived_ctor_access_new_decl ()
 {
-  char a[sizeof (D1)];// { dg-message "at offset 1 into object 'a' of size 40" "note" }
+  char a[sizeof (D1)];// { dg-message "at offset 1 into object 'a' of size 40" "LP64 note" { target lp64} }
+  // { dg-message "at offset 1 into object 'a' of size 20" "LP64 note" { target ilp32} .-1 }
   char *p = a;
   ++p;
   D1 *q = new (p) D1;
@@ -52,7 +53,8 @@ void warn_derived_ctor_access_new_alloc ()
 
 void warn_derived_ctor_access_new_array_decl ()
 {
-  char b[sizeof (D1) * 2];// { dg-message "at offset \\d+ into object 'b' of size 80" "note" }
+  char b[sizeof (D1) * 2];// { dg-message "at offset \\d+ into object 'b' of size 80" "LP64 note" { target lp64 } }
+  // { dg-message "at offset \\d+ into object 'b' of size 40" "LP64 note" { target ilp32 } .-1 }
   char *p = b;
   ++p;
   D1 *q = new (p) D1[2];


[PATCH] [wwwdocs] Update description of GM2 and document branch

2021-07-08 Thread Gaius Mulley via Gcc-patches


Hello Gerald,

Here are two proposed patches to wwwdocs:

htdocs/frontends.html: Update the description of GNU Modula-2.
htdocs/git.html: Document the new devel/modula-2 branch.

regards,
Gaius

=

diff --git a/htdocs/frontends.html b/htdocs/frontends.html
index bec33b7b..60f08aa4 100644
--- a/htdocs/frontends.html
+++ b/htdocs/frontends.html
@@ -42,10 +42,10 @@ has a back end that generates assembler directly, using the 
GCC back end.
 
 http://www.nongnu.org/gm2/";>GNU Modula-2 implements
 the PIM2, PIM3, PIM4 and ISO dialects of the language.  The compiler
-is fully operational with the GCC 4.1.2 back end (on GNU/Linux x86
-systems).  Work is in progress to move the front end to the GCC trunk.
-The front end is mostly written in Modula-2, but includes a bootstrap
-procedure via a heavily modified version of p2c.
+is fully operational with the GCC 10 and GCC 11 back ends (on
+GNU/Linux x86 systems).  Work is in progress to move the front end to
+the GCC trunk.  The front end is mostly written in Modula-2, but
+includes a bootstrap procedure using mc.
 
 Modula-3 (for links see http://www.modula3.org/";>www.modula3.org); SRC M3 is based on an old
diff --git a/htdocs/git.html b/htdocs/git.html
index 2bbfc334..4fea5224 100644
--- a/htdocs/git.html
+++ b/htdocs/git.html
@@ -471,6 +471,17 @@ in Git.
   Further information can be found on the
   https://github.com/Intrepid/GUPC";>GNU UPC page.
 
+  modula-2
+  This branch is for the
+http://nongnu.org/gm2/homepage.html";>GNU Modula-2
+front end to gcc prior to its integration with the mainline.  The
+branch will be regularly rebased against the mainline.  It is
+maintained by
+mailto:gaius.mul...@southwales.ac.uk";>Gaius Mulley.
+Patches should be
+prefixed with [modula-2] in the subject line.
+  
+
   pph
   This branch implements https://gcc.gnu.org/wiki/pph";> Pre-Parsed
   Headers for C++.  It is maintained by 

[RFC,PATCH] Allow means for targets to out out of CTF/BTF support

2021-07-08 Thread Indu Bhagat via Gcc-patches
Hello,

It was brought up when discussing PR debug/101283 (Several tests fail on
Darwin with -gctf/gbtf) that it will be good to provide means for targets to
opt out of CTF/BTF support.

By and large, it seems to me that CTF/BTF debug formats can be safely enabled
for all ELF-based targets by default in GCC.

So, at a high level:
  - By default, CTF/BTF formats can be enabled for all ELF-based targets.
  - By default, CTF/BTF formats can be disabled for all non ELF-based targets.
  - If the user passed a -gctf but CTF is not enabled for the target, GCC
  issues an error to the user (as is done currently with other debug formats) -
  "target system does not support the 'ctf' debug format".

This is a makeshift patch which fulfills the above requirements and is based on
the approach taken for DWARF via DWARF2_DEBUGGING_INFO (I still have to see if
I need some specific handling in common_handle_option in opts.c). On minimal
testing, the patch works as desired on x86_64-pc-linux-gnu and a darwin-based
target.

My question is - Looking around in config.gcc etc., it seems defining in elfos.h
gives targets/platforms means to override it by virtue of the recommended order
of # includes in $tm_file. What I cannot say for certain is if this is true in
practice ? On first look, I believe this could work fine. What do you think ? 

If you think this approach could work, I will continue on this track and
test/refine the patch.

Thanks
Indu

-

gcc/ChangeLog:

* config/elfos.h (CTF_DEBUGGING_INFO): New definition.
(BTF_DEBUGGING_INFO): Likewise.
* toplev.c: Guard initialization of debug hooks.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf.exp: Do not run BTF testsuite if target does not
support BTF format.
* gcc.dg/debug/ctf/ctf.exp: Do not run CTF testsuite if target does not
support CTF format.
---
 gcc/config/elfos.h |  8 
 gcc/testsuite/gcc.dg/debug/btf/btf.exp | 11 +--
 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp | 11 +--
 gcc/toplev.c   | 11 +--
 4 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h
index 7a736cc..e5cb487 100644
--- a/gcc/config/elfos.h
+++ b/gcc/config/elfos.h
@@ -68,6 +68,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #define DWARF2_DEBUGGING_INFO 1
 
+/* All ELF targets can support CTF.  */
+
+#define CTF_DEBUGGING_INFO 1
+
+/* All ELF targets can support BTF.  */
+
+#define BTF_DEBUGGING_INFO 1
+
 /* The GNU tools operate better with dwarf2, and it is required by some
psABI's.  Since we don't have any native tools to be compatible with,
default to dwarf2.  */
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf.exp 
b/gcc/testsuite/gcc.dg/debug/btf/btf.exp
index e173515..a3e680c 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf.exp
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf.exp
@@ -39,8 +39,15 @@ if ![info exists DEFAULT_CFLAGS] then {
 dg-init
 
 # Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
-   "" $DEFAULT_CFLAGS
+set comp_output [gcc_target_compile \
+"$srcdir/$subdir/../trivial.c" "trivial.S" assembly \
+"additional_flags=-gbtf"]
+if { ! [string match "*: target system does not support the * debug format*" \
+$comp_output] } {
+remove-build-file "trivial.S"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
+   "" $DEFAULT_CFLAGS
+}
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp 
b/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
index 0b650ed..c53cd8b 100644
--- a/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
+++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
@@ -39,8 +39,15 @@ if ![info exists DEFAULT_CFLAGS] then {
 dg-init
 
 # Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
-   "" $DEFAULT_CFLAGS
+set comp_output [gcc_target_compile \
+"$srcdir/$subdir/../trivial.c" "trivial.S" assembly \
+"additional_flags=-gctf"]
+if { ! [string match "*: target system does not support the * debug format*" \
+$comp_output] } {
+remove-build-file "trivial.S"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
+   "" $DEFAULT_CFLAGS
+}
 
 # All done.
 dg-finish
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 43f1f7d..8103812 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1463,8 +1463,15 @@ process_options (void)
 debug_hooks = &xcoff_debug_hooks;
 #endif
 #ifdef DWARF2_DEBUGGING_INFO
-  else if (dwarf_debuginfo_p ()
-  || dwarf_based_debuginfo_p ())
+  else if (dwarf_debuginfo_p ())
+debug_hooks = &dwarf2_debug_hooks;
+#endif
+#ifdef CTF_DEBUGGING_INFO
+  else if (write_symbols & CTF_DEBUG)
+debug_hooks = &dwarf2_debug_hooks;
+#endif
+#ifdef BTF_DEBUGGING_INFO
+  else if (btf_debuginfo_p ())
 debug_hooks = &dwarf2_debug_hooks;
 #endif
 #ifdef VMS_DEBUGGING_INFO
-- 
1.8.3.1



[committed] remove an xfail

2021-07-08 Thread Martin Sebor via Gcc-patches

The test xfailed for ILP32 has been apparently passing for some time.
I've removed the xfail after confirming in with -m32 on x86_64 and
powerpc64.

Martin
commit 68b938fada4c728c0b850b44125d9a173c01c8fb
Author: Martin Sebor 
Date:   Thu Jul 8 16:22:25 2021 -0600

testsuite: Remove an xfail.

gcc/testsuite/ChangeLog:

* gcc.dg/Wstringop-overflow-43.c: Remove an xfail.

diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c
index 14ab925afdc..6d045c58bf6 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-43.c
@@ -167,9 +167,7 @@ void warn_memset_reversed_range (void)
   /* The following are represented as ordinary ranges with reversed bounds
  and those are handled. */
   T1 (p, SAR (INT_MIN,  11), n11);  // { dg-warning "writing 11 or more bytes into a region of size 0" }
-  /* In ILP32 the offset in the following has no range info associated
- with it.  */
-  T1 (p, SAR (INT_MIN,   1), n11);  // { dg-warning "writing 11 or more bytes into a region of size 0" "pr?" { xfail ilp32 } }
+  T1 (p, SAR (INT_MIN,   1), n11);  // { dg-warning "writing 11 or more bytes into a region of size 0" }
   T1 (p, SAR (INT_MIN,   0), n11);  // { dg-warning "writing 11 or more bytes into a region of size 0" }
   /* Also represented as a true anti-range.  */
   T1 (p, SAR (-12, -11), n11);  // { dg-warning "writing 11 or more bytes into a region of size \\d+" }


Re: [PATCH] [wwwdocs] Update description of GM2 and document branch

2021-07-08 Thread Gerald Pfeifer
Hi Gaius,

On Thu, 8 Jul 2021, Gaius Mulley wrote:
> Here are two proposed patches to wwwdocs:

thank you for thinking of updating the web pages, too!

> diff --git a/htdocs/frontends.html b/htdocs/frontends.html
:
>  http://www.nongnu.org/gm2/";>GNU Modula-2 implements
>  the PIM2, PIM3, PIM4 and ISO dialects of the language.  The compiler
> +is fully operational with the GCC 10 and GCC 11 back ends (on
> +GNU/Linux x86 systems).

I realize this predates your patch (which merely changes version numbers),
but a reference to back ends could be misunderstood. I assume GNU Modula-2
doesn't just use the back ends (x86, aarch64,...), but also the middle-end
and tree optimizers etc.?

What do you think about just saying "with GCC 10 and GCC 11".

>  Work is in progress to move the front end to
> +the GCC trunk.  The front end is mostly written in Modula-2, but
> +includes a bootstrap procedure using mc.

On my system mc refers to Midnight Commander :-), whereas I guess mc
here is about "Modula Compiler"?  Can you rephrase this for the sake
of those not so closely involved?


> --- a/htdocs/git.html
> +++ b/htdocs/git.html
> +  This branch is for the
> +http://nongnu.org/gm2/homepage.html";>GNU Modula-2
> +front end to gcc prior to its integration with the mainline.  The

GCC (all uppercase)

> +branch will be regularly rebased against the mainline.  It is
> +maintained by
> +mailto:gaius.mul...@southwales.ac.uk";>Gaius Mulley.
> +Patches should be
> +prefixed with [modula-2] in the subject line.

Usually I'd just say "subject", which is a header in our mail systems;
the term "subject line" isn't widely used.


Thanks (and okay considering the above),
Gerald


[committed] move warning suppression closer to invalid access (PR101372)

2021-07-08 Thread Martin Sebor via Gcc-patches

To unblock bootstrap this morning that was failing due to stricter
array bounds checking, I suppressed two -Warray-bounds instances
in cp/modules.cc without analyzing them, tracking the to-do in
pr101372.  Now that I understand what's going on -- the warning
is behaving as designed, flagging accesses to one member via
a pointer derived from another -- I believe the suppression is
still appropriate but can be moved to the inline function that
does the access.  Thanks to the recent improvements to warning
suppression (r12-1992 and related) this more targeted fix should
work reliably while also avoiding a recurrence of the warning in
future uses of the function.  I have committed the attached patch
to make this change after testing it on x86_64-linux.

Martin
commit 79d3378c7d73814442eb468c562ab8aa572f9c43
Author: Martin Sebor 
Date:   Thu Jul 8 16:36:15 2021 -0600

Move warning suppression to the ultimate callee.

Resolves:
PR bootstrap/101372 - -Warray-bounds in gcc/cp/module.cc causing bootstrap failure

gcc/cp/ChangeLog:

PR bootstrap/101372
* module.cc (identifier): Suppress warning.
(module_state::read_macro_maps): Remove warning suppression.
(module_state::install_macros): Ditto.

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 8a890c167cf..ccbde292c22 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -274,7 +274,14 @@ static inline cpp_hashnode *cpp_node (tree id)
 
 static inline tree identifier (const cpp_hashnode *node)
 {
+  /* HT_NODE() expands to node->ident that HT_IDENT_TO_GCC_IDENT()
+ then subtracts a nonzero constant, deriving a pointer to
+ a different member than ident.  That's strictly undefined
+ and detected by -Warray-bounds.  Suppress it.  See PR 101372.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Warray-bounds"
   return HT_IDENT_TO_GCC_IDENT (HT_NODE (const_cast (node)));
+#pragma GCC diagnostic pop
 }
 
 /* Id for dumping module information.  */
@@ -16301,18 +16308,11 @@ module_state::read_macro_maps ()
 	}
   if (count)
 	sec.set_overrun ();
-
-  /* FIXME: Re-enable or fix after root causing.  */
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Warray-bounds"
-
   dump (dumper::LOCATION)
 	&& dump ("Macro:%u %I %u/%u*2 locations [%u,%u)",
 		 ix, identifier (node), runs, n_tokens,
 		 MAP_START_LOCATION (macro),
 		 MAP_START_LOCATION (macro) + n_tokens);
-
-#pragma GCC diagnostic pop
 }
   location_t lwm = sec.u ();
   macro_locs.first = lwm - slurp->loc_deltas.second;
@@ -16918,10 +16918,6 @@ module_state::install_macros ()
   macro_import::slot &slot = imp.append (mod, flags);
   slot.offset = sec.u ();
 
-  /* FIXME: Re-enable or fix after root causing.  */
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Warray-bounds"
-
   dump (dumper::MACRO)
 	&& dump ("Read %s macro %s%s%s %I at %u",
 		 imp.length () > 1 ? "add" : "new",
@@ -16942,8 +16938,6 @@ module_state::install_macros ()
 	exp.def = cur;
 	dump (dumper::MACRO)
 	  && dump ("Saving current #define %I", identifier (node));
-
-#pragma GCC diagnostic pop
 	  }
 }
 


Re: PING: [PATCH] mips: check MSA support for vector modes [PR100760,PR100761,PR100762]

2021-07-08 Thread Jeff Law via Gcc-patches




On 7/5/2021 8:04 PM, Paul Hua wrote:

Looks good to me,  but I have no right to approve.

But your opinions are well respected :-)

I'll go ahead and ACK, though in general I'm stepping away from 
reviewing target specific work.


jeff



Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-08 Thread Segher Boessenkool
Hi!

On Thu, Jul 08, 2021 at 05:01:05PM -0500, Peter Bergner wrote:
> The MMA build built-ins currently use individual lxv instructions to
> load up the registers of a __vector_pair or __vector_quad.  If the
> memory addresses of the built-in operands are to adjacent locations,
> then we could use an lxvp in some cases to load up two registers at once.
> The patch below adds support for checking whether memory addresses are
> adjacent and emitting an lxvp instead of two lxv instructions.
> 
> This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
> Ok for trunk?

It needs testing on BE.

> +static bool consecutive_mem_locations (rtx, rtx);

Please don't; just move functions to somewhere earlier than where they
are used.

> @@ -16841,8 +16843,35 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> for (int i = 0; i < nvecs; i++)
>   {
> int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i;
> -   rtx dst_i = gen_rtx_REG (reg_mode, reg + index);
> -   emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
> +   int index_next = WORDS_BIG_ENDIAN ? index + 1 : index - 1;

What does index_next mean?  The machine instructions do the same thing
in any endianness.

> +   rtx dst_i;
> +   int regno = reg + i;
> +
> +   /* If we are loading an even VSX register and our memory location
> +  is adjacent to the next register's memory location (if any),
> +  then we can load them both with one LXVP instruction.  */
> +   if ((regno & 1) == 0
> +   && VSX_REGNO_P (regno)
> +   && MEM_P (XVECEXP (src, 0, index))
> +   && MEM_P (XVECEXP (src, 0, index_next)))
> + {
> +   rtx base = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index)
> +   : XVECEXP (src, 0, index_next);
> +   rtx next = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index_next)
> +   : XVECEXP (src, 0, index);

Please get rid of index_next, if you still have to do different code for
LE here -- it doesn't make the code any clearer (in fact I cannot follow
it at all anymore :-( )

So this converts pairs of lxv to an lxvp in only a very limited case,
right?  Can we instead do it more generically?  And what about stxvp?


Segher


[r12-2132 Regression] FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++98 note (test for warnings, line 55) on Linux/x86_64

2021-07-08 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

a110855667782dac7b674d3e328b253b3b3c919b is the first bad commit
commit a110855667782dac7b674d3e328b253b3b3c919b
Author: Martin Sebor 
Date:   Wed Jul 7 14:05:25 2021 -0600

Correct handling of variable offset minus constant in -Warray-bounds 
[PR100137]

caused

FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 34)
FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 37)
FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 42)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 note (test for warnings, 
line 38)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 note (test for warnings, 
line 55)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++17 note (test for warnings, 
line 38)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++17 note (test for warnings, 
line 55)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++2a note (test for warnings, 
line 38)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++2a note (test for warnings, 
line 55)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++98 note (test for warnings, 
line 38)
FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++98 note (test for warnings, 
line 55)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2132/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-47.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-47.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Warray-bounds-20.C --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Warray-bounds-20.C --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


  1   2   >