[PATCH] s390: Make use of new copysign RTL

2023-10-05 Thread Stefan Schulze Frielinghaus
gcc/ChangeLog:

* config/s390/s390.md: Make use of new copysign RTL.
---
 gcc/config/s390/s390.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9631b2a8c60..3f29ba21442 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -124,7 +124,6 @@
 
; Byte-wise Population Count
UNSPEC_POPCNT
-   UNSPEC_COPYSIGN
 
; Load FP Integer
UNSPEC_FPINT_FLOOR
@@ -11918,9 +11917,8 @@
 
 (define_insn "copysign3"
   [(set (match_operand:FP 0 "register_operand" "=f")
-  (unspec:FP [(match_operand:FP 1 "register_operand" "")
-  (match_operand:FP 2 "register_operand" "f")]
-  UNSPEC_COPYSIGN))]
+   (copysign:FP (match_operand:FP 1 "register_operand" "")
+(match_operand:FP 2 "register_operand" "f")))]
   "TARGET_Z196"
   "cpsdr\t%0,%2,%1"
   [(set_attr "op_type"  "RRF")
-- 
2.41.0



[PATCH] Avoid left around copies when value-numbering BBs

2023-10-05 Thread Richard Biener
The following makes sure to treat values whose definition we didn't
visit as available since those by definition must dominate the entry
of the region.  That avoids unpropagated copies after if-conversion
and resulting SLP discovery fails (which doesn't handle plain copies).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-sccvn.cc (rpo_elim::eliminate_avail): Not
visited value numbers are available itself.
---
 gcc/tree-ssa-sccvn.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index e46498568cb..d2aab38c2d2 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -7688,7 +7688,11 @@ rpo_elim::eliminate_avail (basic_block bb, tree op)
 {
   if (SSA_NAME_IS_DEFAULT_DEF (valnum))
return valnum;
-  vn_avail *av = VN_INFO (valnum)->avail;
+  vn_ssa_aux_t valnum_info = VN_INFO (valnum);
+  /* See above.  */
+  if (!valnum_info->visited)
+   return valnum;
+  vn_avail *av = valnum_info->avail;
   if (!av)
return NULL_TREE;
   if (av->location == bb->index)
-- 
2.35.3


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Robin Dapp
Hi Tamar,

> So in the
> 
>   if (slp_node)
> {
> 
> Add something like:
> 
> If (is_cond_op)
> {
>   if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"left fold reduction on SLP not supported.\n");
>   return false;
> }

Yes, seems reasonable, added.

> The only comment I have is whether you actually need this helper function?
> It looks like all the uses of it are in cases you have, or will call 
> conditional_internal_fn_code
> directly.
> 
> e.g. in vect_transform_reduction you can replace it by 
> 
> bool cond_fn_p = cond_fn != ERROR_MARK;
> 
> and in 
> 
>   if (cond_fn_p (orig_code))
>   orig_code = conditional_internal_fn_code (internal_fn(orig_code));
> 
> just 
> 
> internal_fn new_fn = conditional_internal_fn_code (internal_fn(orig_code));
> if (new_fn != ERROR_MARK)
>   orig_code = new_fn;
> 
> which would save the repeated testing of the condition.

I see what you mean.  One complication is that we want to disambiguate
(among others):

 (1) code = IFN_COND_ADD, cond_fn = IFN_LAST.   (new case)
 (2) code = IFN_MAX, cond_fn = IFN_COND_MAX.
 (3) code = IFN_SOMETHING, cond_fn = IFN_LAST.

So just checking cond_fn is not enough (even if we made
get_conditional_internal_fn (IFN_COND_ADD) return IFN_COND_ADD).
We need to know if the initial code already was an IFN_COND.

It's a bit of a mess but I didn't dare untangling.  Well, actually, I
tried but made it worse ;)  The cond_fn_p check seemed least
intrusive to me.  Maybe you have another idea?

Regards
 Robin


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Robin Dapp
Ah, sorry, read your remark incorrectly.  Will try again.

Regards
 Robin


[X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Roger Sayle

This patch avoids long lea instructions for performing x<<2 and x<<3
by splitting them into shorter sal and move (or xchg instructions).
Because this increases the number of instructions, but reduces the
total size, its suitable for -Oz (but not -Os).

The impact can be seen in the new test case:

int foo(int x) { return x<<2; }
int bar(int x) { return x<<3; }
long long fool(long long x) { return x<<2; }
long long barl(long long x) { return x<<3; }

where with -O2 we generate:

foo:lea0x0(,%rdi,4),%eax// 7 bytes
retq
bar:lea0x0(,%rdi,8),%eax// 7 bytes
retq
fool:   lea0x0(,%rdi,4),%rax// 8 bytes
retq
barl:   lea0x0(,%rdi,8),%rax// 8 bytes
retq

and with -Oz we now generate:

foo:xchg   %eax,%edi// 1 byte
shl$0x2,%eax// 3 bytes
retq
bar:xchg   %eax,%edi// 1 byte
shl$0x3,%eax// 3 bytes
retq
fool:   xchg   %rax,%rdi// 2 bytes
shl$0x2,%rax// 4 bytes
retq
barl:   xchg   %rax,%rdi// 2 bytes
shl$0x3,%rax// 4 bytes
retq

Over the entirety of the CSiBE code size benchmark this saves 1347
bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32.
Conveniently, there's already a backend function in i386.cc for
deciding whether to split an lea into its component instructions,
ix86_avoid_lea_for_addr, all that's required is an additional clause
checking for -Oz (i.e. optimize_size > 1).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board='unix{-m32}'
with no new failures.  Additional testing was performed by repeating
these steps after removing the "optimize_size > 1" condition, so that
suitable lea instructions were always split [-Oz is not heavily
tested, so this invoked the new code during the bootstrap and
regression testing], again with no regressions.  Ok for mainline?


2023-10-05  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs used
to perform left shifts into shorter instructions with -Oz.

gcc/testsuite/ChangeLog
* gcc.target/i386/lea-2.c: New test case.

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6ce..9557bff 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -15543,6 +15543,13 @@ ix86_avoid_lea_for_addr (rtx_insn *insn, rtx 
operands[])
   && (regno0 == regno1 || regno0 == regno2))
 return true;
 
+  /* Split with -Oz if the encoding requires fewer bytes.  */
+  if (optimize_size > 1
+  && parts.scale > 1
+  && !parts.base
+  && (!parts.disp || parts.disp == const0_rtx)) 
+return true;
+
   /* Check we need to optimize.  */
   if (!TARGET_AVOID_LEA_FOR_ADDR || optimize_function_for_size_p (cfun))
 return false;
diff --git a/gcc/testsuite/gcc.target/i386/lea-2.c 
b/gcc/testsuite/gcc.target/i386/lea-2.c
new file mode 100644
index 000..20aded8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/lea-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Oz" } */
+int foo(int x) { return x<<2; }
+int bar(int x) { return x<<3; }
+long long fool(long long x) { return x<<2; }
+long long barl(long long x) { return x<<3; }
+/* { dg-final { scan-assembler-not "lea\[lq\]" } } */


Re: [PATCH] wwwdocs: Add ADL to C++ non-bugs

2023-10-05 Thread Jonathan Wakely
On Wed, 4 Oct 2023 at 20:17, Jason Merrill  wrote:
>
> On 10/3/23 10:45, Jonathan Wakely wrote:
> > We have a long history of INVALID bugs about std functions being
> > available in the global namespace (PRs 27846, 67566, 82619, 99865,
> > 110602, 111553, probably others). Let's document it.
> >
> > Also de-prioritize the C++98-only bugs, which are unlikely to affect
> > anybody nowadays.
> >
> > OK for wwwdocs?
>
> OK, thanks.
>
> Jason


After pushing it I realised the formatting looks bad compared to the
other items in the list, so I've pushed the attached follow-up as
obvious.
commit 1b1a0cf29826ce9287a203cde00fd1512918fc17
Author: Jonathan Wakely 
Date:   Thu Oct 5 10:09:54 2023 +0100

Add  to new item in C++ non-bugs list

diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index 41edc561..da3d4c0d 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -541,12 +541,14 @@ for details.
 
 Functions can be called without qualifying them with their namespace.
 
+
 Argument Dependent Lookup (ADL) means that functions can be found in namespaces
 associated with their arguments. This means that move(arg) can
 call std::move if arg is a type defined in namespace
 std, such as std::string or std::vector.
 If std::move is not the function you intended to call, use a
 qualified name such as ::move(arg) or foo::move(arg).
+
 
 
 Nested classes can access private members and types of the containing


Re: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle  wrote:
>
>
> This patch avoids long lea instructions for performing x<<2 and x<<3
> by splitting them into shorter sal and move (or xchg instructions).
> Because this increases the number of instructions, but reduces the
> total size, its suitable for -Oz (but not -Os).
>
> The impact can be seen in the new test case:
>
> int foo(int x) { return x<<2; }
> int bar(int x) { return x<<3; }
> long long fool(long long x) { return x<<2; }
> long long barl(long long x) { return x<<3; }
>
> where with -O2 we generate:
>
> foo:lea0x0(,%rdi,4),%eax// 7 bytes
> retq
> bar:lea0x0(,%rdi,8),%eax// 7 bytes
> retq
> fool:   lea0x0(,%rdi,4),%rax// 8 bytes
> retq
> barl:   lea0x0(,%rdi,8),%rax// 8 bytes
> retq
>
> and with -Oz we now generate:
>
> foo:xchg   %eax,%edi// 1 byte
> shl$0x2,%eax// 3 bytes
> retq
> bar:xchg   %eax,%edi// 1 byte
> shl$0x3,%eax// 3 bytes
> retq
> fool:   xchg   %rax,%rdi// 2 bytes
> shl$0x2,%rax// 4 bytes
> retq
> barl:   xchg   %rax,%rdi// 2 bytes
> shl$0x3,%rax// 4 bytes
> retq
>
> Over the entirety of the CSiBE code size benchmark this saves 1347
> bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32.
> Conveniently, there's already a backend function in i386.cc for
> deciding whether to split an lea into its component instructions,
> ix86_avoid_lea_for_addr, all that's required is an additional clause
> checking for -Oz (i.e. optimize_size > 1).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board='unix{-m32}'
> with no new failures.  Additional testing was performed by repeating
> these steps after removing the "optimize_size > 1" condition, so that
> suitable lea instructions were always split [-Oz is not heavily
> tested, so this invoked the new code during the bootstrap and
> regression testing], again with no regressions.  Ok for mainline?
>
>
> 2023-10-05  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs used
> to perform left shifts into shorter instructions with -Oz.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/lea-2.c: New test case.
>

OK, but ...

@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */

Is there a reason to avoid 32-bit targets? I'd expect that the
optimization also triggers on x86_32 for 32bit integers.

+/* { dg-options "-Oz" } */
+int foo(int x) { return x<<2; }
+int bar(int x) { return x<<3; }
+long long fool(long long x) { return x<<2; }
+long long barl(long long x) { return x<<3; }
+/* { dg-final { scan-assembler-not "lea\[lq\]" } } */

Uros.


[PATCH] Fix SIMD call SLP discovery

2023-10-05 Thread Richard Biener
When we do SLP discovery of SIMD calls we run into the issue that
when the call is neither builtin nor internal function we have
cfn == CFN_LAST but internal_fn_p of that returns true.  Since
IFN_LAST isn't vectorizable we fail spuriously.

Fixed by checking for cfn != CFN_LAST && internal_fn_p (cfn)
instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_build_slp_tree_1): Do not
ask for internal_fn_p (CFN_LAST).
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4dd899404d9..08e8418b33e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1084,7 +1084,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  ldst_p = true;
  rhs_code = CFN_MASK_STORE;
}
- else if ((internal_fn_p (cfn)
+ else if ((cfn != CFN_LAST
+   && internal_fn_p (cfn)
&& !vectorizable_internal_fn_p (as_internal_fn (cfn)))
   || gimple_call_tail_p (call_stmt)
   || gimple_call_noreturn_p (call_stmt)
-- 
2.35.3


Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
the compiler about system registers known to the assembler and how
these can be used.

The macros used to hold system register information reflect those in
use by binutils, a design choice made to facilitate the sharing of data
between different parts of the toolchain.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; any `SYSREG (...)' that is added in one
project can just as easily be added to its counterpart.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-system-regs.def: New.
---
  gcc/config/aarch64/aarch64-sys-regs.def | 1059 +++
  1 file changed, 1059 insertions(+)
  create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def


This file is supposed to be /identical/ to the one in GNU Binutils, 
right?  If so, I think it needs to continue to say that it is part of 
GNU Binutils, not part of GCC.  Ramana, has this happened before?  If 
not, does the SC have a position here?


R.


diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
new file mode 100644
index 000..d77fee1d5e3
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1059 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Arm Ltd
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* Array of system registers and their associated arch features.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1", CPENC (3,0,13,0,5), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1",   CPENC (3,0,1,0,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2",   CPENC (3,4,1,0,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3",   CPENC (3,6,1,0,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el1",   CPENC (3,0,5,1,0),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el12",  CPENC (3,5,5,1,0),  F_ARCHEXT,
  AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr0_el2",   CPENC (3,4,5,1,0),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el3",   CPENC (3,6,5,1,0),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el1",   CPENC (3,0,5,1,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el12",  CPENC (3,5,5,1,1),  F_ARCHEXT,
  AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr1_el2",   CPENC (3,4,5,1,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el3",   CPENC (3,6,5,1,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("aidr_el1",   

[committed 1/5] arc: Remove unused/incomplete alignment assembly annotation.

2023-10-05 Thread Claudiu Zissulescu
Removes '&' print operant punct character, disable -mannotate-align
option and clean up the port.

gcc/

* config/arc/arc-protos.h (arc_clear_unalign): Remove.
(arc_toggle_unalign): Likewise.
* config/arc/arc.cc (machine_function) Remove unalign.
(arc_init): Remove `&` punct character.
(arc_print_operand): Remove `&` related functions.
(arc_verify_short): Update function's number of parameters.
(output_short_suffix): Update function.
(arc_short_long): Likewise.
(arc_clear_unalign): Remove.
(arc_toggle_unalign): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Remove.
(ASM_OUTPUT_ALIGN): Update.
* config/arc/arc.md: Remove all `%&` references.
* config/arc/arc.opt (mannotate-align): Ignore option.
* doc/invoke.texi (mannotate-align): Update description.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc-protos.h |   2 -
 gcc/config/arc/arc.cc   |  33 ++
 gcc/config/arc/arc.h|  16 -
 gcc/config/arc/arc.md   | 125 ++--
 gcc/config/arc/arc.opt  |   4 +-
 gcc/doc/invoke.texi |   3 +-
 6 files changed, 70 insertions(+), 113 deletions(-)

diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index 5ce92ba261f..0e89ac7ae33 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -83,8 +83,6 @@ extern void arc_expand_prologue (void);
 extern void arc_expand_epilogue (int);
 extern void arc_init_expanders (void);
 extern int arc_check_millicode (rtx op, int offset, int load_p);
-extern void arc_clear_unalign (void);
-extern void arc_toggle_unalign (void);
 extern void split_subsi (rtx *);
 extern void arc_split_move (rtx *);
 extern const char *arc_short_long (rtx_insn *insn, const char *, const char *);
diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 2a59618ab6a..5e597d1bfeb 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -423,8 +423,6 @@ typedef struct GTY (()) machine_function
 {
   unsigned int fn_type;
   struct arc_frame_info frame_info;
-  /* To keep track of unalignment caused by short insns.  */
-  int unalign;
   struct arc_ccfsm ccfsm_current;
   /* Map from uid to ccfsm state during branch shortening.  */
   rtx ccfsm_current_insn;
@@ -1133,7 +1131,6 @@ arc_init (void)
   arc_punct_chars['?'] = 1;
   arc_punct_chars['!'] = 1;
   arc_punct_chars['^'] = 1;
-  arc_punct_chars['&'] = 1;
   arc_punct_chars['+'] = 1;
   arc_punct_chars['_'] = 1;
 }
@@ -5011,10 +5008,7 @@ arc_print_operand (FILE *file, rtx x, int code)
  return;
}
   break;
-case '&':
-  if (TARGET_ANNOTATE_ALIGN)
-   fprintf (file, "; unalign: %d", cfun->machine->unalign);
-  return;
+
 case '+':
   if (TARGET_V2)
fputs ("m", file);
@@ -5682,7 +5676,7 @@ arc_ccfsm_cond_exec_p (void)
If CHECK_ATTR is greater than 0, check the iscompact attribute first.  */
 
 static int
-arc_verify_short (rtx_insn *insn, int, int check_attr)
+arc_verify_short (rtx_insn *insn, int check_attr)
 {
   enum attr_iscompact iscompact;
 
@@ -5697,8 +5691,7 @@ arc_verify_short (rtx_insn *insn, int, int check_attr)
 }
 
 /* When outputting an instruction (alternative) that can potentially be short,
-   output the short suffix if the insn is in fact short, and update
-   cfun->machine->unalign accordingly.  */
+   output the short suffix if the insn is in fact short.  */
 
 static void
 output_short_suffix (FILE *file)
@@ -5707,10 +5700,9 @@ output_short_suffix (FILE *file)
   if (!insn)
 return;
 
-  if (arc_verify_short (insn, cfun->machine->unalign, 1))
+  if (arc_verify_short (insn, 1))
 {
   fprintf (file, "_s");
-  cfun->machine->unalign ^= 2;
 }
   /* Restore recog_operand.  */
   extract_insn_cached (insn);
@@ -10056,21 +10048,6 @@ arc_check_millicode (rtx op, int offset, int load_p)
   return 1;
 }
 
-/* Accessor functions for cfun->machine->unalign.  */
-
-void
-arc_clear_unalign (void)
-{
-  if (cfun)
-cfun->machine->unalign = 0;
-}
-
-void
-arc_toggle_unalign (void)
-{
-  cfun->machine->unalign ^= 2;
-}
-
 /* Operands 0..2 are the operands of a subsi which uses a 12 bit
constant in operand 1, but which would require a LIMM because of
operand mismatch.
@@ -10309,7 +10286,7 @@ arc_split_move (rtx *operands)
 const char *
 arc_short_long (rtx_insn *insn, const char *s_tmpl, const char *l_tmpl)
 {
-  int is_short = arc_verify_short (insn, cfun->machine->unalign, -1);
+  int is_short = arc_verify_short (insn, -1);
 
   extract_constrain_insn_cached (insn);
   return is_short ? s_tmpl : l_tmpl;
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 8daae41ff5b..5877389a10d 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1312,20 +1312,6 @@ do { 
\
 /* Defined to also emit an .align in elfos.h.  We don't want that.  */
 #undef ASM_OUT

[X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Roger Sayle


This patch tweaks the i386 back-end's ix86_split_ashl to implement
doubleword left shifts by 1 bit, using an add followed by an add-with-carry
(i.e. a doubleword x+x) instead of using the x86's shld instruction.
The replacement sequence both requires fewer bytes and is faster on
both Intel and AMD architectures (from Agner Fog's latency tables and
confirmed by my own microbenchmarking).

For the test case:
__int128 foo(__int128 x) { return x << 1; }

with -O2 we previously generated:

foo:movq%rdi, %rax
movq%rsi, %rdx
shldq   $1, %rdi, %rdx
addq%rdi, %rax
ret

with this patch we now generate:

foo:movq%rdi, %rax
movq%rsi, %rdx
addq%rdi, %rax
adcq%rsi, %rdx
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-05  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
one into add3_cc_overflow_1 followed by add3_carry.
* config/i386/i386.md (@add3_cc_overflow_1): Renamed from
"*add3_cc_overflow_1" to provide generator function.

gcc/testsuite/ChangeLog
* gcc.target/i386/ashldi3-2.c: New 32-bit test case.
* gcc.target/i386/ashlti3-3.c: New 64-bit test case.


Thanks in advance,
Roger
--




[committed 2/5] arc: Update/remove ARC specific tests

2023-10-05 Thread Claudiu Zissulescu
Update tests and remove old mtune-* tests.

gcc/testsuite

* gcc.target/arc/add_n-combine.c: Recognize add2 instruction.
* gcc.target/arc/firq-4.c: FP register is a temp reg. Update test.
* gcc.target/arc/firq-6.c: Likewise.
* gcc.target/arc/mtune-ARC600.c: Remove test.
* gcc.target/arc/mtune-ARC601.c: Likewise.
* gcc.target/arc/mtune-ARC700-xmac: Likewise.
* gcc.target/arc/mtune-ARC700.c: Likewise.
* gcc.target/arc/mtune-ARC725D.c: Likewise.
* gcc.target/arc/mtune-ARC750D.c: Likewise.
* gcc.target/arc/uncached-7.c: Set it to XFAIL.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/testsuite/gcc.target/arc/add_n-combine.c   | 2 +-
 gcc/testsuite/gcc.target/arc/firq-4.c  | 1 -
 gcc/testsuite/gcc.target/arc/firq-6.c  | 1 -
 gcc/testsuite/gcc.target/arc/mtune-ARC600.c| 4 
 gcc/testsuite/gcc.target/arc/mtune-ARC601.c| 4 
 gcc/testsuite/gcc.target/arc/mtune-ARC700-xmac | 4 
 gcc/testsuite/gcc.target/arc/mtune-ARC700.c| 4 
 gcc/testsuite/gcc.target/arc/mtune-ARC725D.c   | 4 
 gcc/testsuite/gcc.target/arc/mtune-ARC750D.c   | 4 
 gcc/testsuite/gcc.target/arc/uncached-7.c  | 2 +-
 10 files changed, 2 insertions(+), 28 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/arc/mtune-ARC600.c
 delete mode 100644 gcc/testsuite/gcc.target/arc/mtune-ARC601.c
 delete mode 100644 gcc/testsuite/gcc.target/arc/mtune-ARC700-xmac
 delete mode 100644 gcc/testsuite/gcc.target/arc/mtune-ARC700.c
 delete mode 100644 gcc/testsuite/gcc.target/arc/mtune-ARC725D.c
 delete mode 100644 gcc/testsuite/gcc.target/arc/mtune-ARC750D.c

diff --git a/gcc/testsuite/gcc.target/arc/add_n-combine.c 
b/gcc/testsuite/gcc.target/arc/add_n-combine.c
index 84e261ece8f..fd311b3839c 100644
--- a/gcc/testsuite/gcc.target/arc/add_n-combine.c
+++ b/gcc/testsuite/gcc.target/arc/add_n-combine.c
@@ -46,5 +46,5 @@ void f() {
 }
 
 /* { dg-final { scan-assembler "@at1\\+1" } } */
-/* { dg-final { scan-assembler "@at2\\+2" } } */
+/* { dg-final { scan-assembler "add2" } } */
 /* { dg-final { scan-assembler "add3" } } */
diff --git a/gcc/testsuite/gcc.target/arc/firq-4.c 
b/gcc/testsuite/gcc.target/arc/firq-4.c
index 969ee796f03..cd939bf8ca3 100644
--- a/gcc/testsuite/gcc.target/arc/firq-4.c
+++ b/gcc/testsuite/gcc.target/arc/firq-4.c
@@ -28,4 +28,3 @@ handler1 (void)
 
 /* { dg-final { scan-assembler-not "fp,\\\[sp" } } */
 /* { dg-final { scan-assembler-not "push.*fp" } } */
-/* { dg-final { scan-assembler "mov_s.*fp,sp" } } */
diff --git a/gcc/testsuite/gcc.target/arc/firq-6.c 
b/gcc/testsuite/gcc.target/arc/firq-6.c
index 9421200d630..df04e46dd31 100644
--- a/gcc/testsuite/gcc.target/arc/firq-6.c
+++ b/gcc/testsuite/gcc.target/arc/firq-6.c
@@ -18,4 +18,3 @@ handler1 (void)
  "r25", "fp");
 }
 /* { dg-final { scan-assembler-not 
"(s|l)(t|d)d.*r\[0-9\]+,\\\[sp,\[0-9\]+\\\]" } } */
-/* { dg-final { scan-assembler "mov_s.*fp,sp" } } */
diff --git a/gcc/testsuite/gcc.target/arc/mtune-ARC600.c 
b/gcc/testsuite/gcc.target/arc/mtune-ARC600.c
deleted file mode 100644
index a483d1435ca..000
--- a/gcc/testsuite/gcc.target/arc/mtune-ARC600.c
+++ /dev/null
@@ -1,4 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mtune=ARC600" } */
-
-/* { dg-final { scan-assembler ".cpu ARC700" } } */
diff --git a/gcc/testsuite/gcc.target/arc/mtune-ARC601.c 
b/gcc/testsuite/gcc.target/arc/mtune-ARC601.c
deleted file mode 100644
index ed57bd7092d..000
--- a/gcc/testsuite/gcc.target/arc/mtune-ARC601.c
+++ /dev/null
@@ -1,4 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mtune=ARC601" } */
-
-/* { dg-final { scan-assembler ".cpu ARC700" } } */
diff --git a/gcc/testsuite/gcc.target/arc/mtune-ARC700-xmac 
b/gcc/testsuite/gcc.target/arc/mtune-ARC700-xmac
deleted file mode 100644
index 2f1e137be4d..000
--- a/gcc/testsuite/gcc.target/arc/mtune-ARC700-xmac
+++ /dev/null
@@ -1,4 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mtune=ARC700-xmac" } */
-
-/* { dg-final { scan-assembler ".cpu ARC700" } } */
diff --git a/gcc/testsuite/gcc.target/arc/mtune-ARC700.c 
b/gcc/testsuite/gcc.target/arc/mtune-ARC700.c
deleted file mode 100644
index 851ea7305e0..000
--- a/gcc/testsuite/gcc.target/arc/mtune-ARC700.c
+++ /dev/null
@@ -1,4 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mtune=ARC700" } */
-
-/* { dg-final { scan-assembler ".cpu ARC700" } } */
diff --git a/gcc/testsuite/gcc.target/arc/mtune-ARC725D.c 
b/gcc/testsuite/gcc.target/arc/mtune-ARC725D.c
deleted file mode 100644
index e2aa4846291..000
--- a/gcc/testsuite/gcc.target/arc/mtune-ARC725D.c
+++ /dev/null
@@ -1,4 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mtune=ARC725D" } */
-
-/* { dg-final { scan-assembler ".cpu ARC700" } } */
diff --git a/gcc/testsuite/gcc.target/arc/mtune-ARC750D.c 
b/gcc/testsuite/gcc.target/arc/mtune-ARC750D.c
deleted file mode 100644
index 20923300ee1..000
--- a/gcc/testsuite/gcc.target/arc/mtune-ARC750D.

[committed 3/5] arc: Remove '^' print punct character

2023-10-05 Thread Claudiu Zissulescu
The '^' was used to print '@' character in the ouput assembly. This is
not anylonger required by the ARC binutils. Remove it.

gcc/

* config/arc/arc.cc (arc_init): Remove '^' punct char.
(arc_print_operand): Remove related code.
* config/arc/arc.md: Update patterns which uses '%&'.

gcc/testsuite/

* gcc.target/arc/loop-3.c: Update test.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.cc |  9 -
 gcc/config/arc/arc.md | 18 +-
 gcc/testsuite/gcc.target/arc/loop-3.c |  2 +-
 3 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 5e597d1bfeb..a1428eb41c3 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -1130,7 +1130,6 @@ arc_init (void)
   arc_punct_chars['*'] = 1;
   arc_punct_chars['?'] = 1;
   arc_punct_chars['!'] = 1;
-  arc_punct_chars['^'] = 1;
   arc_punct_chars['+'] = 1;
   arc_punct_chars['_'] = 1;
 }
@@ -4529,7 +4528,6 @@ static int output_sdata = 0;
 'V': cache bypass indicator for volatile
 'P'
 'F'
-'^'
 'O': Operator
 'o': original symbol - no @ prepending.  */
 
@@ -4953,14 +4951,7 @@ arc_print_operand (FILE *file, rtx x, int code)
 case 'F':
   fputs (reg_names[REGNO (x)]+1, file);
   return;
-case '^':
-   /* This punctuation character is needed because label references are
-   printed in the output template using %l. This is a front end
-   character, and when we want to emit a '@' before it, we have to use
-   this '^'.  */
 
-   fputc('@',file);
-   return;
 case 'O':
   /* Output an operator.  */
   switch (GET_CODE (x))
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 2a3ff05b66b..945cc4042d1 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -3934,9 +3934,9 @@ (define_insn "*branch_insn"
 {
   arc_ccfsm_record_condition (operands[1], false, insn, 0);
   if (get_attr_length (insn) == 2)
-return \"b%d1%? %^%l0\";
+return \"b%d1%?\\t%l0\";
   else
-return \"b%d1%# %^%l0\";
+return \"b%d1%#\\t%l0\";
 }
 }"
   [(set_attr "type" "branch")
@@ -3984,9 +3984,9 @@ (define_insn "*rev_branch_insn"
 {
   arc_ccfsm_record_condition (operands[1], true, insn, 0);
   if (get_attr_length (insn) == 2)
-return \"b%D1%? %^%l0\";
+return \"b%D1%?\\t%l0\";
   else
-return \"b%D1%# %^%l0\";
+return \"b%D1%#\\t%l0\";
 }
 }"
   [(set_attr "type" "branch")
@@ -4026,7 +4026,7 @@ (define_expand "jump"
 (define_insn "jump_i"
   [(set (pc) (label_ref (match_operand 0 "" "")))]
   "!TARGET_LONG_CALLS_SET || !CROSSING_JUMP_P (insn)"
-  "b%!%* %^%l0"
+  "b%!%*\\t%l0"
   [(set_attr "type" "uncond_branch")
(set (attr "iscompact")
(if_then_else (match_test "get_attr_length (insn) == 2")
@@ -4990,13 +4990,13 @@ (define_insn "cbranchsi4_scratch"
"*
  switch (get_attr_length (insn))
  {
-   case 2: return \"br%d0%? %1, %2, %^%l3\";
-   case 4: return \"br%d0%* %1, %B2, %^%l3\";
+   case 2: return \"br%d0%?\\t%1,%2,%l3\";
+   case 4: return \"br%d0%*\\t%1,%B2,%l3\";
case 8: if (!brcc_nolimm_operator (operands[0], VOIDmode))
-return \"br%d0%* %1, %B2, %^%l3\";
+return \"br%d0%*\\t%1,%B2,%l3\";
/* FALLTHRU */
case 6: case 10:
-   case 12:return \"cmp%? %1, %B2\\n\\tb%d0%* %^%l3 ;br%d0 out of range\";
+   case 12:return \"cmp%? %1, %B2\\n\\tb%d0%*\\t%l3 ;br%d0 out of range\";
default: fprintf (stderr, \"unexpected length %d\\n\", get_attr_length 
(insn)); fflush (stderr); gcc_unreachable ();
  }
"
diff --git a/gcc/testsuite/gcc.target/arc/loop-3.c 
b/gcc/testsuite/gcc.target/arc/loop-3.c
index 7f55e2f43fa..ae0d6110f18 100644
--- a/gcc/testsuite/gcc.target/arc/loop-3.c
+++ b/gcc/testsuite/gcc.target/arc/loop-3.c
@@ -23,5 +23,5 @@ void fn1(void)
   }
 }
 
-/* { dg-final { scan-assembler "bne.*@.L2" } } */
+/* { dg-final { scan-assembler "bne.*\\.L2" } } */
 /* { dg-final { scan-assembler-not "add.eq" } } */
-- 
2.30.2



[committed 5/5] arc: Update tests predicates when using linux toolchain.

2023-10-05 Thread Claudiu Zissulescu
gcc/testsuite:

* gcc.target/arc/enter-dw2-1.c: Remove tests when using linux
build.
* gcc.target/arc/tls-ld.c: Update test.
* gcc.target/arc/tls-le.c: Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/testsuite/gcc.target/arc/enter-dw2-1.c | 18 +-
 gcc/testsuite/gcc.target/arc/tls-ld.c  |  3 +--
 gcc/testsuite/gcc.target/arc/tls-le.c  |  2 +-
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arc/enter-dw2-1.c 
b/gcc/testsuite/gcc.target/arc/enter-dw2-1.c
index 25d03562198..653ea7231be 100644
--- a/gcc/testsuite/gcc.target/arc/enter-dw2-1.c
+++ b/gcc/testsuite/gcc.target/arc/enter-dw2-1.c
@@ -16,13 +16,13 @@ void foo (void)
 }
 
 
-/* { dg-final { scan-assembler-times "enter_s" 1 } } */
+/* { dg-final { scan-assembler-times "enter_s" 1 {xfail *-linux-* } } } */
 /* { dg-final { scan-assembler-times "\.cfi_def_cfa_offset 32" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 31, -32" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 13, -28" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 14, -24" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 15, -20" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 16, -16" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 17, -12" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 18, -8" 1 } } */
-/* { dg-final { scan-assembler-times "\.cfi_offset 19, -4" 1 } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 31, -32" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 13, -28" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 14, -24" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 15, -20" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 16, -16" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 17, -12" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 18, -8" 1 {xfail *-linux-* 
} } } */
+/* { dg-final { scan-assembler-times "\.cfi_offset 19, -4" 1 {xfail *-linux-* 
} } } */
diff --git a/gcc/testsuite/gcc.target/arc/tls-ld.c 
b/gcc/testsuite/gcc.target/arc/tls-ld.c
index 68ab9bf809c..47c71f5d273 100644
--- a/gcc/testsuite/gcc.target/arc/tls-ld.c
+++ b/gcc/testsuite/gcc.target/arc/tls-ld.c
@@ -13,6 +13,5 @@ int *ae2 (void)
   return &e2;
 }
 
-/* { dg-final { scan-assembler "add\\s+r0,pcl,@.tbss@tlsgd" } } */
+/* { dg-final { scan-assembler "add\\s+r0,pcl,@e2@tlsgd" } } */
 /* { dg-final { scan-assembler "bl\\s+@__tls_get_addr@plt" } } */
-/* { dg-final { scan-assembler "add_s\\s+r0,r0,@e2@dtpoff" } } */
diff --git a/gcc/testsuite/gcc.target/arc/tls-le.c 
b/gcc/testsuite/gcc.target/arc/tls-le.c
index ae3089b5070..6deca1a133d 100644
--- a/gcc/testsuite/gcc.target/arc/tls-le.c
+++ b/gcc/testsuite/gcc.target/arc/tls-le.c
@@ -13,4 +13,4 @@ int *ae2 (void)
   return &e2;
 }
 
-/* { dg-final { scan-assembler "add r0,r25,@e2@tpoff" } } */
+/* { dg-final { scan-assembler "add\\sr0,r25,@e2@tpoff" } } */
-- 
2.30.2



RE: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Roger Sayle
Doh! ENOPATCH.

> -Original Message-
> From: Roger Sayle 
> Sent: 05 October 2023 12:44
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Uros Bizjak' 
> Subject: [X86 PATCH] Implement doubleword shift left by 1 bit using
add+adc.
> 
> 
> This patch tweaks the i386 back-end's ix86_split_ashl to implement
doubleword
> left shifts by 1 bit, using an add followed by an add-with-carry (i.e. a
doubleword
> x+x) instead of using the x86's shld instruction.
> The replacement sequence both requires fewer bytes and is faster on both
Intel
> and AMD architectures (from Agner Fog's latency tables and confirmed by my
> own microbenchmarking).
> 
> For the test case:
> __int128 foo(__int128 x) { return x << 1; }
> 
> with -O2 we previously generated:
> 
> foo:movq%rdi, %rax
> movq%rsi, %rdx
> shldq   $1, %rdi, %rdx
> addq%rdi, %rax
> ret
> 
> with this patch we now generate:
> 
> foo:movq%rdi, %rax
> movq%rsi, %rdx
> addq%rdi, %rax
> adcq%rsi, %rdx
> ret
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no new
> failures.  Ok for mainline?
> 
> 
> 2023-10-05  Roger Sayle  
> 
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
> one into add3_cc_overflow_1 followed by add3_carry.
> * config/i386/i386.md (@add3_cc_overflow_1): Renamed from
> "*add3_cc_overflow_1" to provide generator function.
> 
> gcc/testsuite/ChangeLog
> * gcc.target/i386/ashldi3-2.c: New 32-bit test case.
> * gcc.target/i386/ashlti3-3.c: New 64-bit test case.
> 
> 
> Thanks in advance,
> Roger
> --

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e42ff27..09e41c8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -6342,6 +6342,18 @@ ix86_split_ashl (rtx *operands, rtx scratch, 
machine_mode mode)
  if (count > half_width)
ix86_expand_ashl_const (high[0], count - half_width, mode);
}
+  else if (count == 1)
+   {
+ if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+ rtx x3 = gen_rtx_REG (CCCmode, FLAGS_REG);
+ rtx x4 = gen_rtx_LTU (mode, x3, const0_rtx);
+ half_mode = mode == DImode ? SImode : DImode;
+ emit_insn (gen_add3_cc_overflow_1 (half_mode, low[0],
+low[0], low[0]));
+ emit_insn (gen_add3_carry (half_mode, high[0], high[0], high[0],
+x3, x4));
+   }
   else
{
  gen_shld = mode == DImode ? gen_x86_shld : gen_x86_64_shld;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eef8a0e..6a5bc16 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8864,7 +8864,7 @@
   [(set_attr "type" "alu")
(set_attr "mode" "")])
 
-(define_insn "*add3_cc_overflow_1"
+(define_insn "@add3_cc_overflow_1"
   [(set (reg:CCC FLAGS_REG)
(compare:CCC
(plus:SWI
diff --git a/gcc/testsuite/gcc.target/i386/ashldi3-2.c 
b/gcc/testsuite/gcc.target/i386/ashldi3-2.c
new file mode 100644
index 000..053389d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ashldi3-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mno-stv" } */
+
+long long foo(long long x)
+{
+  return x << 1;
+}
+
+/* { dg-final { scan-assembler "adcl" } } */
+/* { dg-final { scan-assembler-not "shldl" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ashlti3-3.c 
b/gcc/testsuite/gcc.target/i386/ashlti3-3.c
new file mode 100644
index 000..4f14ca0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ashlti3-3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x)
+{
+  return x << 1;
+}
+
+/* { dg-final { scan-assembler "adcq" } } */
+/* { dg-final { scan-assembler-not "shldq" } } */


[committed 4/5] arc: Remove obsolete ccfsm instruction predication mechanism

2023-10-05 Thread Claudiu Zissulescu
Remove old ccfsm responsible for conditional execution support in ARC.
This machinery is not needed as the current gcc conditional execution
support is mature.

gcc/

* config/arc/arc-passes.def: Remove arc_ifcvt pass.
* config/arc/arc-protos.h (arc_ccfsm_branch_deleted_p): Remove.
(arc_ccfsm_record_branch_deleted): Likewise.
(arc_ccfsm_cond_exec_p): Likewise.
(arc_ccfsm): Likewise.
(arc_ccfsm_record_condition): Likewise.
(make_pass_arc_ifcvt): Likewise.
* config/arc/arc.cc (arc_ccfsm): Remove.
(arc_ccfsm_current): Likewise.
(ARC_CCFSM_BRANCH_DELETED_P): Likewise.
(ARC_CCFSM_RECORD_BRANCH_DELETED): Likewise.
(ARC_CCFSM_COND_EXEC_P): Likewise.
(CCFSM_ISCOMPACT): Likewise.
(CCFSM_DBR_ISCOMPACT): Likewise.
(machine_function): Remove ccfsm related fields.
(arc_ifcvt): Remove pass.
(arc_print_operand): Remove `#` punct operand and other ccfsm
related code.
(arc_ccfsm_advance): Remove.
(arc_ccfsm_at_label): Likewise.
(arc_ccfsm_record_condition): Likewise.
(arc_ccfsm_post_advance): Likewise.
(arc_ccfsm_branch_deleted_p): Likewise.
(arc_ccfsm_record_branch_deleted): Likewise.
(arc_ccfsm_cond_exec_p): Likewise.
(arc_get_ccfsm_cond): Likewise.
(arc_final_prescan_insn): Remove ccfsm references.
(arc_internal_label): Likewise.
(arc_reorg): Likewise.
(arc_output_libcall): Likewise.
* config/arc/arc.md: Remove ccfsm references and update related
instruction patterns.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc-passes.def |   6 -
 gcc/config/arc/arc-protos.h   |   7 -
 gcc/config/arc/arc.cc | 830 +-
 gcc/config/arc/arc.md | 118 +
 4 files changed, 41 insertions(+), 920 deletions(-)

diff --git a/gcc/config/arc/arc-passes.def b/gcc/config/arc/arc-passes.def
index 0cb5d56a6d4..3f9222a8099 100644
--- a/gcc/config/arc/arc-passes.def
+++ b/gcc/config/arc/arc-passes.def
@@ -17,12 +17,6 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-/* First target dependent ARC if-conversion pass.  */
-INSERT_PASS_AFTER (pass_delay_slots, 1, pass_arc_ifcvt);
-
-/* Second target dependent ARC if-conversion pass.  */
-INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_arc_ifcvt);
-
 /* Find annulled delay insns and convert them to use the appropriate
predicate.  This allows branch shortening to size up these
instructions properly.  */
diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index 0e89ac7ae33..026ea99c9c6 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -52,8 +52,6 @@ extern bool arc_can_use_return_insn (void);
 extern bool arc_split_move_p (rtx *);
 #endif /* RTX_CODE */
 
-extern bool arc_ccfsm_branch_deleted_p (void);
-extern void arc_ccfsm_record_branch_deleted (void);
 
 void arc_asm_output_aligned_decl_local (FILE *, tree, const char *,
unsigned HOST_WIDE_INT,
@@ -67,7 +65,6 @@ extern bool arc_raw_symbolic_reference_mentioned_p (rtx, 
bool);
 extern bool arc_is_longcall_p (rtx);
 extern bool arc_is_shortcall_p (rtx);
 extern bool valid_brcc_with_delay_p (rtx *);
-extern bool arc_ccfsm_cond_exec_p (void);
 extern rtx disi_highpart (rtx);
 extern int arc_adjust_insn_length (rtx_insn *, int, bool);
 extern int arc_corereg_hazard (rtx, rtx);
@@ -76,9 +73,6 @@ extern int arc_write_ext_corereg (rtx);
 extern rtx gen_acc1 (void);
 extern rtx gen_acc2 (void);
 extern bool arc_branch_size_unknown_p (void);
-struct arc_ccfsm;
-extern void arc_ccfsm_record_condition (rtx, bool, rtx_insn *,
-   struct arc_ccfsm *);
 extern void arc_expand_prologue (void);
 extern void arc_expand_epilogue (int);
 extern void arc_init_expanders (void);
@@ -104,5 +98,4 @@ extern bool arc_is_jli_call_p (rtx);
 extern void arc_file_end (void);
 extern bool arc_is_secure_call_p (rtx);
 
-rtl_opt_pass * make_pass_arc_ifcvt (gcc::context *ctxt);
 rtl_opt_pass * make_pass_arc_predicate_delay_insns (gcc::context *ctxt);
diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index a1428eb41c3..ecc681cff61 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -101,16 +101,6 @@ HARD_REG_SET overrideregs;
 /* Array of valid operand punctuation characters.  */
 char arc_punct_chars[256];
 
-/* State used by arc_ccfsm_advance to implement conditional execution.  */
-struct GTY (()) arc_ccfsm
-{
-  int state;
-  int cc;
-  rtx cond;
-  rtx_insn *target_insn;
-  int target_label;
-};
-
 /* Status of the IRQ_CTRL_AUX register.  */
 typedef struct irq_ctrl_saved_t
 {
@@ -143,36 +133,6 @@ static irq_ctrl_saved_t irq_ctrl_saved;
 /* Number of registers in second bank for FIRQ support.  */
 static int rgf_banked_register_count;
 
-#define arc_ccfsm_current cfun->

RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-10-05 Thread Li, Pan2
Thanks Jeff and Robin for comments, sorry for late reply.

> Conceptually the rounding mode is just a property.  The call, in effect, 
> should demand a "normal" rounding mode and set the rounding mode to 
> unknown if I understand how this is supposed to work.  If my 
> understanding is wrong, then maybe that's where we should start -- with 
> a good description of the problem ;-)

I think we are on the same page of how it works, I may need to take a look at 
how x86 taking care of this.

> That's probably dead code at this point.  IIRC rth did further work in 
> this space because inserting in the end of the block with the abnormal 
> edge isn't semantically correct.

> It's been 20+ years, but IIRC he adjusted the PRE bitmaps so that we 
> never would need to do an insertion on an abnormal edge.  Search for 
> EDGE_ABNORMAL in gcse.cc.

That is quite old up to a point, will have a try for the EDGE_ABNORMAL case.

> Having said that, it looks like Pan's patch just tries to move some of
> the dirty work from the backend to the mode-switching pass by making it
> easier to do something after a call.  I believe I asked for that back in
> one of the reviews even?

Yes, that is what I would like to do in this PATCH, as the following up of some 
comments from Robin in previous.

Pan

-Original Message-
From: Robin Dapp  
Sent: Monday, October 2, 2023 4:26 PM
To: Jeff Law ; Li, Pan2 ; 
gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

> Conceptually the rounding mode is just a property.  The call, in
> effect, should demand a "normal" rounding mode and set the rounding
> mode to unknown if I understand how this is supposed to work.  If my
> understanding is wrong, then maybe that's where we should start --
> with a good description of the problem ;-)

That's also what I what struggled with last time this was discussed.

Normally, mode switching is used to switch to a requested mode for
an insn or a call and potentially switch back afterwards.

For those riscv intrinsics that specify a variable, non-default rounding
mode we have two options:

- Save and restore before and after each mode-changing intrinsic
 fegetround old_rounding
 fesetround new_rounding 
 actual instruction
 fesetround old_rounding)

- Have mode switching do it for us (lazily) to avoid most of the
storing of the old rounding mode by storing an (e.g.) function-level
rounding-mode backup value.  The backup value is used to lazily
restore the currently valid rounding mode.

The problem with this now is that whenever fesetround gets called
our backup is outdated.  Therefore we need to update our backup after
each function call (as fesetround can of course be present anywhere)
and this is where most of the complications come from.

So in that case the callee _does_ impact the caller via the backup
clobbering.  That was one of my complaints about the whole procedure
last time.  Besides, I didn't see the need for those intrinsics
anyway and would much rather have explicit fesetround calls but well :)

Having said that, it looks like Pan's patch just tries to move some of
the dirty work from the backend to the mode-switching pass by making it
easier to do something after a call.  I believe I asked for that back in
one of the reviews even?

Regards
 Robin


Re: [PATCH v2] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Jan Hubicka
> From: Sergei Trofimovich 
> 
> r14-3459-g0c78240fd7d519 "Check that passes do not forget to define profile"
> exposed check failures in cases when gcc produces uninitialized profile
> probabilities. In case of PR/111559 uninitialized profile is generated
> by edges executed 0 times reported by IPA profile:
> 
> $ gcc -O2 -fprofile-generate pr111559.c -o b -fopt-info
> $ ./b
> $ gcc -O2 -fprofile-use -fprofile-correction pr111559.c -o b -fopt-info
> 
> pr111559.c: In function 'rule1':
> pr111559.c:6:13: error: probability of edge 3->4 not initialized
> 6 | static void rule1(void) { if (p) edge(); }
>   | ^
> during GIMPLE pass: fixup_cfg
> pr111559.c:6:13: internal compiler error: verify_flow_info failed
> 
> The change conservatively ignores updates with uninitialized values and
> uses initially assigned probabilities (`always` probability in case of
> the example).
> 
>   PR ipa/111283
>   PR gcov-profile/111559
> 
> gcc/
>   * ipa-utils.cc (ipa_merge_profiles): Avoid producing
>   uninitialized probabilities when merging counters with zero
>   denominators.
> 
> gcc/testsuite/
>   * gcc.dg/tree-prof/pr111559.c: New test.
> ---
>  gcc/ipa-utils.cc  |  6 +-
>  gcc/testsuite/gcc.dg/tree-prof/pr111559.c | 16 
>  2 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> 
> diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> index 956c6294fd7..7c53ae9dd45 100644
> --- a/gcc/ipa-utils.cc
> +++ b/gcc/ipa-utils.cc
> @@ -651,13 +651,17 @@ ipa_merge_profiles (struct cgraph_node *dst,
>   {
> edge srce = EDGE_SUCC (srcbb, i);
> edge dste = EDGE_SUCC (dstbb, i);
> -   dste->probability = 
> +   profile_probability merged =
>   dste->probability * dstbb->count.ipa ().probability_in
>(dstbb->count.ipa ()
> + srccount.ipa ())
>   + srce->probability * srcbb->count.ipa ().probability_in
>(dstbb->count.ipa ()
> + srccount.ipa ());
> +   /* We produce uninitialized probabilities when
> +  denominator is zero: https://gcc.gnu.org/PR111559.  */
> +   if (merged.initialized_p ())
> + dste->probability = merged;

Thanks for the patch.
We usually avoid the uninitialized value here by simply checking that
parameter of probability_in satifies nonzero_p.  So I think it would be
more consistent doing it here to:

  profile_probability sum = dstbb->count.ipa () + srccount.ipa ()
  if (sum.nonzero_p ())
  {
 dste->probability = .
  }

OK with this change.
Honza
>   }
> dstbb->count = dstbb->count.ipa () + srccount.ipa ();
>   }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr111559.c 
> b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> new file mode 100644
> index 000..43202c6c888
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> @@ -0,0 +1,16 @@
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) static void edge(void) {}
> +
> +int p = 0;
> +
> +__attribute__((noinline))
> +static void rule1(void) { if (p) edge(); }
> +
> +__attribute__((noinline))
> +static void rule1_same(void) { if (p) edge(); }
> +
> +__attribute__((noipa)) int main(void) {
> +rule1();
> +rule1_same();
> +}
> -- 
> 2.42.0
> 


[pushed] wwwdocs: conduct: Use instead of

2023-10-05 Thread Gerald Pfeifer
On the way break overly long lines.

Pushed.

Gerald
---
 htdocs/conduct-faq.html  | 3 ++-
 htdocs/conduct-report.html   | 3 ++-
 htdocs/conduct-response.html | 3 ++-
 htdocs/conduct.html  | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/htdocs/conduct-faq.html b/htdocs/conduct-faq.html
index 380e9166..5b7a82a3 100644
--- a/htdocs/conduct-faq.html
+++ b/htdocs/conduct-faq.html
@@ -63,4 +63,5 @@ your question either,
 email mailto:cond...@gcc.gnu.org";>cond...@gcc.gnu.org with any
 additional questions or feedback.
 
-http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />This work 
is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
+http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />
+This work is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
diff --git a/htdocs/conduct-report.html b/htdocs/conduct-report.html
index 87758745..5f3fae90 100644
--- a/htdocs/conduct-report.html
+++ b/htdocs/conduct-report.html
@@ -113,7 +113,8 @@ directly to another member, or to a member of the Steering 
Committee.
 of the committee's decision. To make such a request, contact a member of the
 Steering Committee with your request and motivation.
 
-http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />This work 
is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
+http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />
+This work is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
 
 Text derived from
 the https://www.djangoproject.com/conduct/reporting/";>Django project
diff --git a/htdocs/conduct-response.html b/htdocs/conduct-response.html
index c67e8b0b..a25f6ae4 100644
--- a/htdocs/conduct-response.html
+++ b/htdocs/conduct-response.html
@@ -132,7 +132,8 @@ excluded from the response process. For these cases, anyone 
can make a report
 directly to any of the committee members, as documented in the reporting
 guidelines.
 
-http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />This work 
is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
+http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />
+This work is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
 
 Text derived from
 the https://www.djangoproject.com/conduct/enforcement-manual/";>Django
diff --git a/htdocs/conduct.html b/htdocs/conduct.html
index 736e2f6d..87bd01bf 100644
--- a/htdocs/conduct.html
+++ b/htdocs/conduct.html
@@ -114,7 +114,8 @@ email mailto:cond...@gcc.gnu.org";>cond...@gcc.gnu.org.
 that doesn't answer your questions, feel free
 to mailto:cond...@gcc.gnu.org";>contact us.
 
-http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />This work 
is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
+http://creativecommons.org/licenses/by-sa/4.0/";>https://i.creativecommons.org/l/by-sa/4.0/88x31.png"; />
+This work is licensed under a http://creativecommons.org/licenses/by-sa/4.0/";>Creative Commons 
Attribution-ShareAlike 4.0 International License.
 
 Text derived from the https://www.djangoproject.com/conduct/";>Django
 project Code of Conduct, used under
-- 
2.42.0


[PATCH] Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)"

2023-10-05 Thread Martin Jambor
Hello,

I am going to commit the following patch to fix PR 111688 (bootstrap on
ppc64le broken) and will re-fix 108007 (issues with IPA-SRA when user
explicitely turns off DCE) when I figure out what's going wrong.

Sorry for the breakage,

Martin



[PATCH] Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)"

This reverts commit 1be18ea110a2d69570dbc494588a7c73173883be.

As reported in PR bootstrap/111688, it broke ppc64le bootstrap because
of a debug-compare failure.
---
 gcc/cgraph.cc   | 10 +---
 gcc/cgraph.h|  9 +--
 gcc/ipa-param-manipulation.cc   | 88 -
 gcc/ipa-param-manipulation.h|  3 +-
 gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 ---
 gcc/tree-inline.cc  | 28 -
 6 files changed, 38 insertions(+), 132 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index b82367ac342..e41e5ad3ae7 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1403,17 +1403,11 @@ cgraph_edge::redirect_callee (cgraph_node *n)
speculative indirect call, remove "speculative" of the indirect call and
also redirect stmt to it's final direct target.
 
-   When called from within tree-inline, KILLED_SSAs has to contain the pointer
-   to killed_new_ssa_names within the copy_body_data structure and SSAs
-   discovered to be useless (if LHS is removed) will be added to it, otherwise
-   it needs to be NULL.
-
It is up to caller to iteratively transform each "speculative"
direct call as appropriate.  */
 
 gimple *
-cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
-  hash_set  *killed_ssas)
+cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
 {
   tree decl = gimple_call_fndecl (e->call_stmt);
   gcall *new_stmt;
@@ -1533,7 +1527,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
remove_stmt_from_eh_lp (e->call_stmt);
 
   tree old_fntype = gimple_call_fntype (e->call_stmt);
-  new_stmt = padjs->modify_call (e, false, killed_ssas);
+  new_stmt = padjs->modify_call (e, false);
   cgraph_node *origin = e->callee;
   while (origin->clone_of)
origin = origin->clone_of;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index d7162efeeb4..cedaaac3a45 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1833,16 +1833,9 @@ public:
  speculative indirect call, remove "speculative" of the indirect call and
  also redirect stmt to it's final direct target.
 
- When called from within tree-inline, KILLED_SSAs has to contain the
- pointer to killed_new_ssa_names within the copy_body_data structure and
- SSAs discovered to be useless (if LHS is removed) will be added to it,
- otherwise it needs to be NULL.
-
  It is up to caller to iteratively transform each "speculative"
  direct call as appropriate.  */
-  static gimple *redirect_call_stmt_to_callee (cgraph_edge *e,
-  hash_set 
-  *killed_ssas = nullptr);
+  static gimple *redirect_call_stmt_to_callee (cgraph_edge *e);
 
   /* Create clone of edge in the node N represented
  by CALL_EXPR the callgraph.  */
diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 014939bf754..ae52f17b2c9 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -593,66 +593,14 @@ isra_get_ref_base_and_offset (tree expr, tree *base_p, 
unsigned *unit_offset_p)
   return true;
 }
 
-/* Remove all statements that use NAME and transitively those that use the
-   result of such statements.  KILLED_SSAS contains the SSA_NAMEs that are
-   already being or have been processed and new ones need to be added to it.
-   The funtction only has to process situations handled by
-   ssa_name_only_returned_p in ipa-sra.cc with the exception that it can assume
-   it must never reach a use in a return statement.  */
-
-static void
-purge_transitive_uses (tree name, hash_set  *killed_ssas)
-{
-  imm_use_iterator imm_iter;
-  gimple *stmt;
-  auto_vec  worklist;
-
-  worklist.safe_push (name);
-  while (!worklist.is_empty ())
-{
-  tree cur_name = worklist.pop ();
-  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, cur_name)
-   {
- if (gimple_debug_bind_p (stmt))
-   {
- /* When runing within tree-inline, we will never end up here but
-adding the SSAs to killed_ssas will do the trick in this case
-and the respective debug statements will get reset. */
- gimple_debug_bind_reset_value (stmt);
- update_stmt (stmt);
- continue;
-   }
-
- tree lhs = NULL_TREE;
- if (is_gimple_assign (stmt))
-   lhs = gimple_assign_lhs (stmt);
- else if (gimple_code (stmt) == GIMPLE_PHI)
-   lhs = gimple_phi_result (stmt);
- gcc_assert (l

RE: [PATCH] RISC-V: Remove @ of vec_series

2023-10-05 Thread Li, Pan2
Committed, thanks Jeff and Robin.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, October 4, 2023 11:40 PM
To: Robin Dapp ; Juzhe-Zhong ; 
gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com
Subject: Re: [PATCH] RISC-V: Remove @ of vec_series



On 10/4/23 09:06, Robin Dapp wrote:
> I'm currently in the process of removing some unused @s.
> This is OK.
Agreed.  And if you or Juzhe have other @ cases that are unused, such 
changes should be considered pre-approved.

Jeff


Re: [PATCH 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

   SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
   - NAME:  The system register name, as used in the assembly language.
   - CPENC: The system register encoding, mapping to:

   s__c_c_

   - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
   - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc (sysreg_names): New.
(sysreg_names_generic): Likewise.
(sysreg_reqs): Likewise.
(sysreg_properties): Likewise.
(nsysreg): Likewise.
* gcc/config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
  gcc/config/aarch64/aarch64.cc | 55 +++
  gcc/config/aarch64/aarch64.h  | 36 +++
  2 files changed, 91 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a89..030b39ded1a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -89,6 +89,8 @@
  /* This file should be included last.  */
  #include "target-def.h"
  
+#include "aarch64.h"


This shouldn't be needed.  target.h (included further up) includes tm.h 
which includes this file.


Otherwise OK.

Reviewed-by: rearn...@arm.com


+
  /* Defined for convenience.  */
  #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
  
@@ -2807,6 +2809,59 @@ static const struct processor all_cores[] =

{NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
  };
  
+/* Database of system register names.  */

+const char *sysreg_names[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) NAME,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+const char *sysreg_names_generic[] =
+{
+#define CPENC(SN, OP1, CN, CM, OP2) "s"#SN"_"#OP1"_c"#CN"_c"#CM"_"#OP2
+#define SYSREG(NAME, ENC, FLAGS, ARCH) ENC,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+/* Database of system register architectural requirements.  */
+const unsigned long long sysreg_reqs[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) ARCH,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+/* Database 

Re: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle  wrote:
>
> Doh! ENOPATCH.
>
> > -Original Message-
> > From: Roger Sayle 
> > Sent: 05 October 2023 12:44
> > To: 'gcc-patches@gcc.gnu.org' 
> > Cc: 'Uros Bizjak' 
> > Subject: [X86 PATCH] Implement doubleword shift left by 1 bit using
> add+adc.
> >
> >
> > This patch tweaks the i386 back-end's ix86_split_ashl to implement
> doubleword
> > left shifts by 1 bit, using an add followed by an add-with-carry (i.e. a
> doubleword
> > x+x) instead of using the x86's shld instruction.
> > The replacement sequence both requires fewer bytes and is faster on both
> Intel
> > and AMD architectures (from Agner Fog's latency tables and confirmed by my
> > own microbenchmarking).
> >
> > For the test case:
> > __int128 foo(__int128 x) { return x << 1; }
> >
> > with -O2 we previously generated:
> >
> > foo:movq%rdi, %rax
> > movq%rsi, %rdx
> > shldq   $1, %rdi, %rdx
> > addq%rdi, %rax
> > ret
> >
> > with this patch we now generate:
> >
> > foo:movq%rdi, %rax
> > movq%rsi, %rdx
> > addq%rdi, %rax
> > adcq%rsi, %rdx
> > ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> > make -k check, both with and without --target_board=unix{-m32} with no new
> > failures.  Ok for mainline?
> >
> >
> > 2023-10-05  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
> > one into add3_cc_overflow_1 followed by add3_carry.
> > * config/i386/i386.md (@add3_cc_overflow_1): Renamed from
> > "*add3_cc_overflow_1" to provide generator function.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/ashldi3-2.c: New 32-bit test case.
> > * gcc.target/i386/ashlti3-3.c: New 64-bit test case.

OK.

Thanks,
Uros.

> >
> >
> > Thanks in advance,
> > Roger
> > --
>


RE: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Roger Sayle


Hi Uros,
Very many thanks for the speedy reviews.

Uros Bizjak wrote:
> On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle 
> wrote:
> >
> >
> > This patch avoids long lea instructions for performing x<<2 and x<<3
> > by splitting them into shorter sal and move (or xchg instructions).
> > Because this increases the number of instructions, but reduces the
> > total size, its suitable for -Oz (but not -Os).
> >
> > The impact can be seen in the new test case:
> >
> > int foo(int x) { return x<<2; }
> > int bar(int x) { return x<<3; }
> > long long fool(long long x) { return x<<2; } long long barl(long long
> > x) { return x<<3; }
> >
> > where with -O2 we generate:
> >
> > foo:lea0x0(,%rdi,4),%eax// 7 bytes
> > retq
> > bar:lea0x0(,%rdi,8),%eax// 7 bytes
> > retq
> > fool:   lea0x0(,%rdi,4),%rax// 8 bytes
> > retq
> > barl:   lea0x0(,%rdi,8),%rax// 8 bytes
> > retq
> >
> > and with -Oz we now generate:
> >
> > foo:xchg   %eax,%edi// 1 byte
> > shl$0x2,%eax// 3 bytes
> > retq
> > bar:xchg   %eax,%edi// 1 byte
> > shl$0x3,%eax// 3 bytes
> > retq
> > fool:   xchg   %rax,%rdi// 2 bytes
> > shl$0x2,%rax// 4 bytes
> > retq
> > barl:   xchg   %rax,%rdi// 2 bytes
> > shl$0x3,%rax// 4 bytes
> > retq
> >
> > Over the entirety of the CSiBE code size benchmark this saves 1347
> > bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32.
> > Conveniently, there's already a backend function in i386.cc for
> > deciding whether to split an lea into its component instructions,
> > ix86_avoid_lea_for_addr, all that's required is an additional clause
> > checking for -Oz (i.e. optimize_size > 1).
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board='unix{-m32}'
> > with no new failures.  Additional testing was performed by repeating
> > these steps after removing the "optimize_size > 1" condition, so that
> > suitable lea instructions were always split [-Oz is not heavily
> > tested, so this invoked the new code during the bootstrap and
> > regression testing], again with no regressions.  Ok for mainline?
> >
> >
> > 2023-10-05  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs used
> > to perform left shifts into shorter instructions with -Oz.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/lea-2.c: New test case.
> >
> 
> OK, but ...
> 
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> 
> Is there a reason to avoid 32-bit targets? I'd expect that the optimization 
> also
> triggers on x86_32 for 32bit integers.

Good catch.  You're 100% correct; because the test case just checks that an LEA
is not used, and not for the specific sequence of shift instructions used 
instead,
this test also passes with --target_board='unix{-m32}'.  I'll remove the target 
clause
from the dg-do compile directive.

> +/* { dg-options "-Oz" } */
> +int foo(int x) { return x<<2; }
> +int bar(int x) { return x<<3; }
> +long long fool(long long x) { return x<<2; } long long barl(long long
> +x) { return x<<3; }
> +/* { dg-final { scan-assembler-not "lea\[lq\]" } } */

Thanks again.
Roger
--




[PATCH 1/3] ipa-cp: Templatize filtering of m_agg_values

2023-10-05 Thread Martin Jambor
PR 57 points to another place where IPA-CP collected aggregate
compile-time constants need to be filtered, in addition to the one
place that already does this in ipa-sra.  In order to re-use code,
this patch turns the common bit into a template.

The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.

gcc/ChangeLog:

2023-09-13  Martin Jambor  

PR ipa/57
* ipa-prop.h (ipcp_transformation): New member function template
remove_argaggs_if.
* ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
filter aggreagate constants.
---
 gcc/ipa-prop.h | 33 +
 gcc/ipa-sra.cc | 33 -
 2 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 7e033d2a7b8..815855006e8 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -966,6 +966,39 @@ struct GTY(()) ipcp_transformation
 
   void maybe_create_parm_idx_map (tree fndecl);
 
+  /* Remove all elements in m_agg_values on which PREDICATE returns true.  */
+
+  template
+  void remove_argaggs_if (pred_function &&predicate)
+  {
+unsigned ts_len = vec_safe_length (m_agg_values);
+if (ts_len == 0)
+  return;
+
+bool removed_item = false;
+unsigned dst_index = 0;
+
+for (unsigned i = 0; i < ts_len; i++)
+  {
+   ipa_argagg_value *v = &(*m_agg_values)[i];
+   if (!predicate (*v))
+ {
+   if (removed_item)
+ (*m_agg_values)[dst_index] = *v;
+   dst_index++;
+ }
+   else
+ removed_item = true;
+  }
+if (dst_index == 0)
+  {
+   ggc_free (m_agg_values);
+   m_agg_values = NULL;
+  }
+else if (removed_item)
+  m_agg_values->truncate (dst_index);
+  }
+
   /* Known aggregate values.  */
   vec  *m_agg_values;
   /* Known bits information.  */
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index edba364f56e..1551b694679 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -4047,35 +4047,10 @@ mark_callers_calls_comdat_local (struct cgraph_node 
*node, void *)
 static void
 zap_useless_ipcp_results (const isra_func_summary *ifs, ipcp_transformation 
*ts)
 {
-  unsigned ts_len = vec_safe_length (ts->m_agg_values);
-
-  if (ts_len == 0)
-return;
-
-  bool removed_item = false;
-  unsigned dst_index = 0;
-
-  for (unsigned i = 0; i < ts_len; i++)
-{
-  ipa_argagg_value *v = &(*ts->m_agg_values)[i];
-  const isra_param_desc *desc = &(*ifs->m_parameters)[v->index];
-
-  if (!desc->locally_unused)
-   {
- if (removed_item)
-   (*ts->m_agg_values)[dst_index] = *v;
- dst_index++;
-   }
-  else
-   removed_item = true;
-}
-  if (dst_index == 0)
-{
-  ggc_free (ts->m_agg_values);
-  ts->m_agg_values = NULL;
-}
-  else if (removed_item)
-ts->m_agg_values->truncate (dst_index);
+  ts->remove_argaggs_if ([ifs](const ipa_argagg_value &v)
+  {
+return (*ifs->m_parameters)[v.index].locally_unused;
+  });
 
   bool useful_bits = false;
   unsigned count = vec_safe_length (ts->bits);
-- 
2.42.0



[PATCH 3/3] ipa: Limit pruning of IPA-CP aggregate constants if there are loads

2023-10-05 Thread Martin Jambor
This patch makes the previous one less conservative by looking whether
there are known ipa-modref loads from areas covered by the IPA-CP
aggregate constant entry in question.  Because ipa-modref relies on
alias information which IPA-CP does not have (yet), the test is much
more crude and only reports overlapping accesses with known offsets
and max_size.

I was not able to put together a testcase which would fail without
this patch however.  It basically needs to be a combination of
testcases for PR 92497 (so that IPA-CP transformation phase is not
enough), PR 57 (to get a load) and PR 103669 (to get a
clobber/kill) in a way that ipa-modref can still track things.
Therefore I am not sure if we actually want this patch.

gcc/ChangeLog:

2023-10-04  Martin Jambor  

* ipa-modref.cc (ipcp_argagg_and_access_must_overlap_p): New function.
(ipcp_argagg_and_modref_tree_must_overlap_p): Likewise.
(update_signature): Use ipcp_argagg_and_modref_tree_must_overlap_p.

Combined third step
---
 gcc/ipa-modref.cc | 65 +--
 1 file changed, 63 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index a8fcf159259..d2bfca3445d 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -4090,6 +4090,64 @@ ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value 
&v,
   return false;
 }
 
+/* Return true if V overlaps with ACCESS_NODE.  When in doubt, return
+   false.  */
+
+static bool
+ipcp_argagg_and_access_must_overlap_p (const ipa_argagg_value &v,
+  const modref_access_node &access_node)
+{
+  if (access_node.parm_index == MODREF_GLOBAL_MEMORY_PARM
+  || access_node.parm_index == MODREF_UNKNOWN_PARM
+  || access_node.parm_index == MODREF_GLOBAL_MEMORY_PARM)
+  return false;
+
+  if (access_node.parm_index == v.index)
+{
+  if (!access_node.parm_offset_known)
+   return false;
+
+  poly_int64 repl_size;
+  bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)),
+&repl_size);
+  gcc_assert (ok);
+  poly_int64 repl_offset (v.unit_offset);
+  repl_offset <<= LOG2_BITS_PER_UNIT;
+  poly_int64 combined_offset
+   = (access_node.parm_offset << LOG2_BITS_PER_UNIT) + access_node.offset;
+  if (ranges_maybe_overlap_p (repl_offset, repl_size,
+ combined_offset, access_node.max_size))
+   return true;
+}
+  return false;
+}
+
+/* Return true if MT contains an access that certainly overlaps with V even
+   when we cannot evaluate alias references.  When in doubt, return false.  */
+
+template 
+static bool
+ipcp_argagg_and_modref_tree_must_overlap_p (const ipa_argagg_value &v,
+   const modref_tree &mt)
+{
+  for (auto base_node : mt.bases)
+{
+  if (base_node->every_ref)
+   return false;
+  for (auto ref_node : base_node->refs)
+   {
+ if (ref_node->every_access)
+   return false;
+ for (auto access_node : ref_node->accesses)
+   {
+ if (ipcp_argagg_and_access_must_overlap_p (v, access_node))
+   return true;
+   }
+   }
+}
+  return false;
+}
+
 /* If signature changed, update the summary.  */
 
 static void
@@ -4111,14 +4169,17 @@ update_signature (struct cgraph_node *node)
  continue;
if (r)
  for (const modref_access_node &kill : r->kills)
-   if (ipcp_argagg_and_kill_overlap_p (v, kill))
+   if (ipcp_argagg_and_kill_overlap_p (v, kill)
+   && !ipcp_argagg_and_modref_tree_must_overlap_p (v, *r->loads))
  {
v.killed = true;
break;
  }
if (!v.killed && r_lto)
  for (const modref_access_node &kill : r_lto->kills)
-   if (ipcp_argagg_and_kill_overlap_p (v, kill))
+   if (ipcp_argagg_and_kill_overlap_p (v, kill)
+   && !ipcp_argagg_and_modref_tree_must_overlap_p (v,
+   *r_lto->loads))
  {
v.killed = 1;
break;
-- 
2.42.0


[PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

2023-10-05 Thread Martin Jambor
PR 57 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.

This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed."  Unfortunately, doing so is tricky.  First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary.  Looking stuff up there means doing what most of what
modref_may_alias may do but doing exactly what it does is tricky
because it takes also aliasing into account and has bail-out counters.

To err on the side of caution in order to avoid this miscompilation we
have to prune a constant when in doubt.  However, pruning can
interfere with the mechanism of how clone materialization
distinguishes between the cases when a parameter was entirely removed
and when it was both IPA-CPed and IPA-SRAed (in order to make up for
the removal in debug info, which can bump into an assert when
compiling g++.dg/torture/pr103669.C when we are not careful).

Therefore this patch:

  1) marks constants that IPA-modref has in its kill list with a new
 "killed" flag, and
  2) prunes the list from entries with this flag after materialization
 and IPA-CP transformation is done using the template introduced in
 the previous patch

It does not try to look up anything in the load lists, this will be
done as a follow-up in order to ease review.

gcc/ChangeLog:

2023-09-19  Martin Jambor  

PR ipa/57
* ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
* ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
(update_signature): Mark any any IPA-CP aggregate constants at
positions known to be killed as killed.  Move check that there is
clone_info after this pruning.
* ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
(ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
(push_agg_values_from_plats): Likewise.
(ipa_push_agg_values_from_jfunc): Likewise.
(estimate_local_effects): Likewise.
(push_agg_values_for_index_from_edge): Likewise.
* ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
flag.
(read_ipcp_transformation_info): Likewise.
(ipcp_get_aggregate_const): Update comment, assert that encountered
record does not have killed flag set.
(ipcp_transform_function): Prune all aggregate constants with killed
set.

gcc/testsuite/ChangeLog:

2023-09-18  Martin Jambor  

PR ipa/57
* gcc.dg/lto/pr57_0.c: New test.
* gcc.dg/lto/pr57_1.c: Second file of the same new test.
---
 gcc/ipa-cp.cc |  8 
 gcc/ipa-modref.cc | 58 +--
 gcc/ipa-prop.cc   | 17 +++-
 gcc/ipa-prop.h|  4 ++
 gcc/testsuite/gcc.dg/lto/pr57_0.c | 24 +++
 gcc/testsuite/gcc.dg/lto/pr57_1.c | 10 +
 6 files changed, 115 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr57_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr57_1.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 071c607fbe8..bb49a1b2959 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1271,6 +1271,8 @@ ipa_argagg_value_list::dump (FILE *f)
   print_generic_expr (f, av.value);
   if (av.by_ref)
fprintf (f, "(by_ref)");
+  if (av.killed)
+   fprintf (f, "(killed)");
   comma = true;
 }
   fprintf (f, "\n");
@@ -1437,6 +1439,8 @@ ipa_argagg_value_list::push_adjusted_values (unsigned 
src_index,
  new_av.unit_offset = av->unit_offset - unit_delta;
  new_av.index = dest_index;
  new_av.by_ref = av->by_ref;
+ gcc_assert (!av->killed);
+ new_av.killed = false;
 
  /* Quick check that the offsets we push are indeed increasing.  */
  gcc_assert (first
@@ -1473,6 +1477,7 @@ push_agg_values_from_plats (ipcp_param_lattices *plats, 
int dest_index,
iav.unit_offset = aglat->offset / BITS_PER_UNIT - unit_delta;
iav.index = dest_index;
iav.by_ref = plats->aggs_by_ref;
+   iav.killed = false;
 
gcc_assert (first
|| iav.unit_offset > prev_unit_offset);
@@ -2139,6 +2144,7 @@ ipa_push_agg_values_from_jfunc (ipa_node_params *info, 
cgraph_node *node,
   iav.unit_offset = item.offset / BITS_PER_UNIT;
   iav.index = dst_index;
   iav.by_ref = agg_jfunc->by_ref;
+  iav.killed = 0;
 
   gcc_assert (first
  || iav.unit_offset > prev_unit_offset);
@@ -3970,6 +3976,7 @@ estimate_local_effects (struct cgraph_node *node)
  avals.m_known_aggs[j].unit_offset = unit_offset;
  avals.m_kn

Re: [PATCH 3/6] aarch64: Implement system register validation tools

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

   1. Ensuring a supplied string corresponds to a known system
  register name.  System registers can be accessed either via their
  name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
  Register names are validated using a binary search of the
  `sysreg_names' structure populated from the
  `aarch64_system_regs.def' file via `match_reg'.
  The encoding naming convention is validated via a parser
  implemented in this patch - `is_implem_def_reg'.
   2. Once a given register name is deemed to be valid, it is checked
  against a further 2 criteria:
a. Is the referenced register implemented in the target
   architecture?  This is achieved by comparing the ARCH field
  in the relevant SYSREG entry from `aarch64_system_regs.def'
  against `aarch64_feature_flags' flags set at compile-time.
b. Is the register being used correctly?  Check the requested
  operation against the FLAGS specified in SYSREG.
  This prevents operations like writing to a read-only system
  register.
NOTE: For registers specified via their encoding
(e.g. `S3_0_C4_C0_0'), once the encoding value is deemed valid
(as per step 1) no further checks such as read/write support or
architectural feature requirements are done and this second step
is skipped, as is done in gas.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): 
New.
(aarch64_retrieve_sysreg): Likewise.
* gcc/config/aarch64/aarch64.cc (match_reg): Likewise.
(is_implem_def_reg): Likewise.
(aarch64_valid_sysreg_name_p): Likewise.
(aarch64_retrieve_sysreg): Likewise.
(aarch64_sysreg_valid_for_rw_p): Likewise.
* gcc/config/aarch64/predicates.md (aarch64_sysreg_string): New.
---
  gcc/config/aarch64/aarch64-protos.h |   2 +
  gcc/config/aarch64/aarch64.cc   | 121 
  gcc/config/aarch64/predicates.md|   4 +
  3 files changed, 127 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..a134e2fcf8e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
  bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
  bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+bool aarch64_valid_sysreg_name_p (const char *);
+const char *aarch64_retrieve_sysreg (char *, bool);
  rtx aarch64_check_zero_based_sve_index_immediate (rtx);
  bool aarch64_sve_index_immediate_p (rtx);
  bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 030b39ded1a..dd5ac1cbc8d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28070,6 +28070,127 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
return false;
  }
  
+/* Binary search of a user-supplied system register name against

+   a database of known register names.  Upon match the index of
+   hit in database is returned, else return -1.  */


Given that we expect the number of explicit sysregs in a single 
compilation unit to be small, this is probably OK.  An alternative would 
be to build a hashmap of the register names the first time this routine 
is called and then do a lookup in that.  That would also avoid the need 
for the list to be maintained alphabetically.



+int
+match_reg (const char *ref, const char *database[], int db_len)
+{
+  /* Check for named system registers.  */
+  int imin = 0, imax = db_len - 1, mid, cmp_res;
+  while (imin <= imax)
+{
+  mid = (imin + imax) / 2;
+
+  cmp_res = strcmp (ref, database[mid]);
+  if (cmp_res == 0)
+   return mid;
+  else if (cmp_res > 0)
+   imin = mid+1;
+  else
+   imax = mid-1;
+}
+  return -1;
+}
+
+/* Parse an implementation-defined system register name of
+   the form S[0-3]_[0-7]_C[0-15]_C[0-15]_[1-7].
+   Return true if name matched against above pattern, false
+   otherwise.  */


Another advantage of using a hash map above would be that we could then 
add registers matched by this routine to the map and therefore optimize 
rescanning for them (on the basis that if they are used once, there's a 
good chance of them being used again).



+bool
+is_implem_def_reg (const char *regname)
+{
+/* Check for implementation-defined system registers.  */
+  int name_len = strlen (regname);
+  if (name_len < 12 || name_len > 14)
+return false;
+
+  in

[PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Jakub Jelinek
Hi!

The following patch removes ipa_bits struct pointer/vector from ipa
jump functions and ipa cp transformations.

The reason is because the struct uses widest_int to represent
mask/value pair, which in the RFC patches to allow larger precisions
for wide_int/widest_int is GC unfriendly because those types become
non-trivially default constructible/copyable/destructible.
One option would be to use trailing_wide_int for that instead, but
as pointed out by Aldy, irange_storage which we already use under
the hood for ipa_vr when type of parameter is integral or pointer
already stores the mask/value pair because VRP now does the bit cp
as well.
So, this patch just uses m_vr to store both the value range and
the bitmask.  There is still separate propagation of the
ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but
when storing we merge the two into the same container.

I've bootstrapped/regtested a slightly older version of this
patch on x86_64-linux and i686-linux and that version regressed
+FAIL: gcc.dg/ipa/propalign-3.c scan-ipa-dump-not cp "align:"
+FAIL: gcc.dg/ipa/propalign-3.c scan-tree-dump optimized "fail_the_test"
+FAIL: gcc.dg/ipa/propbits-1.c scan-ipa-dump cp "Adjusting mask for param 0 to 
0x7"
+FAIL: gcc.dg/ipa/propbits-2.c scan-ipa-dump cp "Adjusting mask for param 0 to 
0xf"
The last 2 were solely about the earlier patch not actually copying
the if (dump_file) dumping of message that we set some mask for some
parameter (since then added in the @@ -5985,6 +5741,77 @@ hunk).
The first testcase is a test for -fno-ipa-bit-cp disabling bit cp
for alignments.  For integral types I'm afraid it is a lost case
when -fno-ipa-bit-cp -fipa-vrp is on when value ranges track bit cp
as well, but for pointer alignments I've added
  && opt_for_fn (cs->caller->decl, flag_ipa_bit_cp)
and
  && opt_for_fn (node->decl, flag_ipa_bit_cp)
guards such that even just -fno-ipa-bit-cp disables it (alternatively
we could just add -fno-ipa-vrp to propalign-3.c dg-options).

Ok for trunk if this passes another bootstrap/regtest?
Or defer until it is really needed (when the wide_int/widest_int
changes are about to be committed)?

2023-10-05  Jakub Jelinek  

* ipa-prop.h (ipa_bits): Remove.
(struct ipa_jump_func): Remove bits member.
(struct ipcp_transformation): Remove bits member, adjust
ctor and dtor.
(ipa_get_ipa_bits_for_value): Remove.
* ipa-prop.cc (struct ipa_bit_ggc_hash_traits): Remove.
(ipa_bits_hash_table): Remove.
(ipa_print_node_jump_functions_for_edge): Don't print bits.
(ipa_get_ipa_bits_for_value): Remove.
(ipa_set_jfunc_bits): Remove.
(ipa_compute_jump_functions_for_edge): For pointers query
pointer alignment before ipa_set_jfunc_vr and update_bitmask
in there.  For integral types, just rely on bitmask already
being handled in value ranges.
(ipa_check_create_edge_args): Don't create ipa_bits_hash_table.
(ipcp_transformation_initialize): Neither here.
(ipcp_transformation_t::duplicate): Don't copy bits vector.
(ipa_write_jump_function): Don't stream bits here.
(ipa_read_jump_function): Neither here.
(useful_ipcp_transformation_info_p): Don't test bits vec.
(write_ipcp_transformation_info): Don't stream bits here.
(read_ipcp_transformation_info): Neither here.
(ipcp_get_parm_bits): Get mask and value from m_vr rather
than bits.
(ipcp_update_bits): Remove.
(ipcp_update_vr): For pointers, set_ptr_info_alignment from
bitmask stored in value range.
(ipcp_transform_function): Don't test bits vector, don't call
ipcp_update_bits.
* ipa-cp.cc (propagate_bits_across_jump_function): Don't use
jfunc->bits, instead get mask and value from jfunc->m_vr.
(ipcp_store_bits_results): Remove.
(ipcp_store_vr_results): Incorporate parts of
ipcp_store_bits_results here, merge the bitmasks with value
range if both are supplied.
(ipcp_driver): Don't call ipcp_store_bits_results.
* ipa-sra.cc (zap_useless_ipcp_results): Remove *ts->bits
clearing.

--- gcc/ipa-prop.h.jj   2023-10-05 11:32:40.172739988 +0200
+++ gcc/ipa-prop.h  2023-10-05 11:36:45.405378086 +0200
@@ -292,18 +292,6 @@ public:
   array_slice m_elts;
 };
 
-/* Information about zero/non-zero bits.  */
-class GTY(()) ipa_bits
-{
-public:
-  /* The propagated value.  */
-  widest_int value;
-  /* Mask corresponding to the value.
- Similar to ccp_lattice_t, if xth bit of mask is 0,
- implies xth bit of value is constant.  */
-  widest_int mask;
-};
-
 /* Info about value ranges.  */
 
 class GTY(()) ipa_vr
@@ -342,11 +330,6 @@ struct GTY (()) ipa_jump_func
  and its description.  */
   struct ipa_agg_jump_function agg;
 
-  /* Information about zero/non-zero bits.  The pointed to structure is shared
- betweed different jump functions.  Use ipa_set

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

   (set (reg/i:DI 0 x0)
  (unspec:DI [(const_string ("amcgcr_el0"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

   "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
  gcc/config/aarch64/aarch64.cc | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index dd5ac1cbc8d..d6dd0586ac1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12400,6 +12400,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
  
switch (GET_CODE (x))

{
+   case CONST_STRING:
+ {
+   const char *output_op = XSTR (x, 0);
+   asm_fprintf (f, "%s", output_op);
+   break;
+ }
case REG:
  if (aarch64_sve_data_mode_p (GET_MODE (x)))
{


Didn't we discuss (off list) always printing out the generic register 
names, so that there was less dependency on having a specific assembler 
version that knows about newer sysregs?


R.


[committed] sreal: Fix typo in function name

2023-10-05 Thread Jakub Jelinek
Hi!

My earlier version of the ipa_bits removal patch resulted in self-test
failures in sreal.  When debugging it, I was really confused that I couldn't
find verify_arithmetics function in the source.  Turns out it had bad
spelling...

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-10-05  Jakub Jelinek  

* sreal.cc (verify_aritmetics): Rename to ...
(verify_arithmetics): ... this.
(sreal_verify_arithmetics): Adjust caller.

--- gcc/sreal.cc.jj 2023-08-08 15:55:08.366138409 +0200
+++ gcc/sreal.cc2023-10-05 10:20:20.528806377 +0200
@@ -323,7 +323,7 @@ sreal_verify_basics (void)
of given arguments A and B.  */
 
 static void
-verify_aritmetics (int64_t a, int64_t b)
+verify_arithmetics (int64_t a, int64_t b)
 {
   ASSERT_EQ (a, -(-(sreal (a))).to_int ());
   ASSERT_EQ (a < b, sreal (a) < sreal (b));
@@ -356,7 +356,7 @@ sreal_verify_arithmetics (void)
int a = values[i];
int b = values[j];
 
-   verify_aritmetics (a, b);
+   verify_arithmetics (a, b);
   }
 }
 

Jakub



Re: [PATCH 5/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
  gcc/config/aarch64/aarch64-builtins.cc| 200 ++
  gcc/config/aarch64/aarch64.md |  17 ++
  gcc/config/aarch64/arm_acle.h |  30 +++
  .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
  5 files changed, 411 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
AARCH64_RBIT,
AARCH64_RBITL,
AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
AARCH64_BUILTIN_MAX
  };
  
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)

   AARCH64_BUILTIN_RNG_RNDRRS);
  }
  
+/* Add builtins for reading system register.  */

+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF, wsrf, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+ 

Re: [PATCH v2] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Sergei Trofimovich
On Thu, Oct 05, 2023 at 01:52:30PM +0200, Jan Hubicka wrote:
> > From: Sergei Trofimovich 
> > 
> > r14-3459-g0c78240fd7d519 "Check that passes do not forget to define profile"
> > exposed check failures in cases when gcc produces uninitialized profile
> > probabilities. In case of PR/111559 uninitialized profile is generated
> > by edges executed 0 times reported by IPA profile:
> > 
> > $ gcc -O2 -fprofile-generate pr111559.c -o b -fopt-info
> > $ ./b
> > $ gcc -O2 -fprofile-use -fprofile-correction pr111559.c -o b -fopt-info
> > 
> > pr111559.c: In function 'rule1':
> > pr111559.c:6:13: error: probability of edge 3->4 not initialized
> > 6 | static void rule1(void) { if (p) edge(); }
> >   | ^
> > during GIMPLE pass: fixup_cfg
> > pr111559.c:6:13: internal compiler error: verify_flow_info failed
> > 
> > The change conservatively ignores updates with uninitialized values and
> > uses initially assigned probabilities (`always` probability in case of
> > the example).
> > 
> > PR ipa/111283
> > PR gcov-profile/111559
> > 
> > gcc/
> > * ipa-utils.cc (ipa_merge_profiles): Avoid producing
> > uninitialized probabilities when merging counters with zero
> > denominators.
> > 
> > gcc/testsuite/
> > * gcc.dg/tree-prof/pr111559.c: New test.
> > ---
> >  gcc/ipa-utils.cc  |  6 +-
> >  gcc/testsuite/gcc.dg/tree-prof/pr111559.c | 16 
> >  2 files changed, 21 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> > 
> > diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> > index 956c6294fd7..7c53ae9dd45 100644
> > --- a/gcc/ipa-utils.cc
> > +++ b/gcc/ipa-utils.cc
> > @@ -651,13 +651,17 @@ ipa_merge_profiles (struct cgraph_node *dst,
> > {
> >   edge srce = EDGE_SUCC (srcbb, i);
> >   edge dste = EDGE_SUCC (dstbb, i);
> > - dste->probability = 
> > + profile_probability merged =
> > dste->probability * dstbb->count.ipa ().probability_in
> >  (dstbb->count.ipa ()
> >   + srccount.ipa ())
> > + srce->probability * srcbb->count.ipa ().probability_in
> >  (dstbb->count.ipa ()
> >   + srccount.ipa ());
> > + /* We produce uninitialized probabilities when
> > +denominator is zero: https://gcc.gnu.org/PR111559.  */
> > + if (merged.initialized_p ())
> > +   dste->probability = merged;
> 
> Thanks for the patch.
> We usually avoid the uninitialized value here by simply checking that
> parameter of probability_in satifies nonzero_p.  So I think it would be
> more consistent doing it here to:
> 
>   profile_probability sum = dstbb->count.ipa () + srccount.ipa ()
>   if (sum.nonzero_p ())
>   {
>  dste->probability = .
>   }

Aha, sounds good! I had to do `s/profile_probability/profile_count` as
it's a denominator value for probability.

Attached v3 just in case.

> OK with this change.
> Honza
> > }
> >   dstbb->count = dstbb->count.ipa () + srccount.ipa ();
> > }
> > diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr111559.c 
> > b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> > new file mode 100644
> > index 000..43202c6c888
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-options "-O2" } */
> > +
> > +__attribute__((noipa)) static void edge(void) {}
> > +
> > +int p = 0;
> > +
> > +__attribute__((noinline))
> > +static void rule1(void) { if (p) edge(); }
> > +
> > +__attribute__((noinline))
> > +static void rule1_same(void) { if (p) edge(); }
> > +
> > +__attribute__((noipa)) int main(void) {
> > +rule1();
> > +rule1_same();
> > +}
> > -- 
> > 2.42.0
> > 

-- 

  Sergei
>From 97122ebae5a7ed43b6c31574c761a54bee3a96ec Mon Sep 17 00:00:00 2001
From: Sergei Trofimovich 
Date: Wed, 27 Sep 2023 14:29:12 +0100
Subject: [PATCH v3] ipa-utils: avoid uninitialized probabilities on ICF
 [PR111559]

r14-3459-g0c78240fd7d519 "Check that passes do not forget to define profile"
exposed check failures in cases when gcc produces uninitialized profile
probabilities. In case of PR/111559 uninitialized profile is generated
by edges executed 0 times reported by IPA profile:

$ gcc -O2 -fprofile-generate pr111559.c -o b -fopt-info
$ ./b
$ gcc -O2 -fprofile-use -fprofile-correction pr111559.c -o b -fopt-info

pr111559.c: In function 'rule1':
pr111559.c:6:13: error: probability of edge 3->4 not initialized
6 | static void rule1(void) { if (p) edge(); }
  | ^
during GIMPLE pass: fixup_cfg
pr111559.c:6:13: internal compiler error: verify_flow_info failed

The change conservatively ignores updates with zero execu

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Victor Do Nascimento

On 10/5/23 13:26, Richard Earnshaw wrote:



On 03/10/2023 16:18, Victor Do Nascimento wrote:

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

   (set (reg/i:DI 0 x0)
  (unspec:DI [(const_string ("amcgcr_el0"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

   "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
  gcc/config/aarch64/aarch64.cc | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc 
b/gcc/config/aarch64/aarch64.cc

index dd5ac1cbc8d..d6dd0586ac1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12400,6 +12400,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
    switch (GET_CODE (x))
  {
+    case CONST_STRING:
+  {
+    const char *output_op = XSTR (x, 0);
+    asm_fprintf (f, "%s", output_op);
+    break;
+  }
  case REG:
    if (aarch64_sve_data_mode_p (GET_MODE (x)))
  {


Didn't we discuss (off list) always printing out the generic register 
names, so that there was less dependency on having a specific assembler 
version that knows about newer sysregs?


R.


That's right, Richard.

We did settle on generic register names.

The example above is unfortunate and can, nay should, be amended to be 
less misleading.


It's not wrong to say that an rtx such as

>>(set (reg/i:DI 0 x0)
>>   (unspec:DI [(const_string ("amcgcr_el0"))])

would now be understood by the `aarch64_print_operand' function, but 
we'd never see that being generated as a result of this work.


For "amcgcr_el0" what we'd expect to see reaching the back-end is, in 
fact, the following:


(set (reg/i:DI 0 x0)
   (unspec:DI [(const_string ("s3_3_c13_c2_2"))])

Thanks for picking up on this and bringing it to my attention!

V.


Re: [PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

   const char *regname = "amcgcr_el0";
   long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

   long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.


Doesn't this prevent reasonable wrapping of the __builtin... names with 
something more palatable?  Eg:


static inline __attribute__(("always_inline")) long long get_sysreg_ll 
(const char *regname)

{
  return __builtin_aarch64_rsr64 (regname);
}

...
  long long x = get_sysreg_ll("amcgcr_el0");
...

?

R.



gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
New.
* gcc/config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
Add check_general_builtin_call call.
* gcc/config/aarch64/aarch64-protos.h (check_general_builtin_call):
New.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c: New.
---
  gcc/config/aarch64/aarch64-builtins.cc| 33 +++
  gcc/config/aarch64/aarch64-c.cc   |  4 +--
  gcc/config/aarch64/aarch64-protos.h   |  3 ++
  .../gcc.target/aarch64/acle/rwsr-2.c  | 15 +
  4 files changed, 53 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index d8bb2a989a5..6734361f4f4 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2126,6 +2126,39 @@ aarch64_general_builtin_decl (unsigned code, bool)
return aarch64_builtin_decls[code];
  }
  
+bool

+check_general_builtin_call (location_t location, vec,
+   unsigned int code, tree fndecl,
+   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
+{
+  switch (code)
+{
+case AARCH64_RSR:
+case AARCH64_RSRP:
+case AARCH64_RSR64:
+case AARCH64_RSRF:
+case AARCH64_RSRF64:
+case AARCH64_WSR:
+case AARCH64_WSRP:
+case AARCH64_WSR64:
+case AARCH64_WSRF:
+case AARCH64_WSRF64:
+  if (TREE_CODE (args[0]) == VAR_DECL
+ || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
+ != STRING_CST)
+   {
+ const char  *fn_name, *err_msg;
+ fn_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+ err_msg = "first argument to %<%s%> must be a string literal";
+ error_at (location, err_msg, fn_name);
+ return false;
+   }
+}
+  /* Default behavior.  */
+  return true;
+}
+
  typedef enum
  {
SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 578ec6f45b0..6e2b83b8308 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -338,8 +338,8 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
switch (code & AARCH64_BUILTIN_CLASS)
  {
  case AARCH64_BUILTIN_GENERAL:
-  return true;
-
+  return check_general_builtin_call (loc, arg_loc, subcode, orig_fndecl,
+nargs, args);
  case AARCH64_BUILTIN_SVE:
return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
  orig_fndecl, nargs, args);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a134e2fcf8e..9ef96ff511f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -990,6 +990,9 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
  void handle_arm_acle_h (void);
  void handle_arm_neon_h (void);
  
+bool check_general_builtin_call (location_t, vec, unsigned int,

+ tree, unsigned int, tree *);
+
  namespace aarch64_sve {
void init_builtins ();
void handle_arm_sve_h ();
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
new file mode 100644
index 000..72e5fb75b21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
@@ -0,0 +1,15 @@
+/* Test the __arm_[r,w]sr ACLE intrinsics fami

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Richard Earnshaw




On 05/10/2023 13:26, Richard Earnshaw wrote:



On 03/10/2023 16:18, Victor Do Nascimento wrote:

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

   (set (reg/i:DI 0 x0)
  (unspec:DI [(const_string ("amcgcr_el0"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

   "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
  gcc/config/aarch64/aarch64.cc | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc 
b/gcc/config/aarch64/aarch64.cc

index dd5ac1cbc8d..d6dd0586ac1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12400,6 +12400,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
    switch (GET_CODE (x))
  {
+    case CONST_STRING:
+  {
+    const char *output_op = XSTR (x, 0);
+    asm_fprintf (f, "%s", output_op);
+    break;
+  }
  case REG:
    if (aarch64_sve_data_mode_p (GET_MODE (x)))
  {


Didn't we discuss (off list) always printing out the generic register 
names, so that there was less dependency on having a specific assembler 
version that knows about newer sysregs?




You can ignore this.  I've just seen that the tests show that is happening.

Reviewed-by: rearn...@arm.com


R.


Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jan Hubicka
> 
> Like Wahlen et al this implementation records coverage in fixed-size
> bitsets which gcov knows how to interpret. This is very fast, but
> introduces a limit on the number of terms in a single boolean
> expression, the number of bits in a gcov_unsigned_type (which is
> typedef'd to uint64_t), so for most practical purposes this would be
> acceptable. This limitation is in the implementation and not the
> algorithm, so support for more conditions can be added by also
> introducing arbitrary-sized bitsets.

This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?
> 
> For space overhead, the instrumentation needs two accumulators
> (gcov_unsigned_type) per condition in the program which will be written
> to the gcov file. In addition, every function gets a pair of local
> accumulators, but these accmulators are reused between conditions in the
> same function.
> 
> For time overhead, there is a zeroing of the local accumulators for
> every condition and one or two bitwise operation on every edge taken in
> the an expression.
> 
> In action it looks pretty similar to the branch coverage. The -g short
> opt carries no significance, but was chosen because it was an available
> option with the upper-case free too.
> 
> gcov --conditions:
> 
> 3:   17:void fn (int a, int b, int c, int d) {
> 3:   18:if ((a && (b || c)) && d)
> conditions covered 3/8
> condition  0 not covered (true)
> condition  0 not covered (false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?
> 
> Some expressions, mostly those without else-blocks, are effectively
> "rewritten" in the CFG construction making the algorithm unable to
> distinguish them:
> 
> and.c:
> 
> if (a && b && c)
> x = 1;
> 
> ifs.c:
> 
> if (a)
> if (b)
> if (c)
> x = 1;
> 
> gcc will build the same graph for both these programs, and gcov will
> report boths as 3-term expressions. It is vital that it is not
> interpreted the other way around (which is consistent with the shape of
> the graph) because otherwise the masking would be wrong for the and.c
> program which is a more severe error. While surprising, users would
> probably expect some minor rewriting of semantically-identical
> expressions.
> 
> and.c.gcov:
> #:2:if (a && b && c)
> conditions covered 6/6
> #:3:x = 1;
> 
> ifs.c.gcov:
> #:2:if (a)
> #:3:if (b)
> #:4:if (c)
> #:5:x = 1;
> conditions covered 6/6

Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is also used 
for
auto-FDO?
> 
> gcc/ChangeLog:
> 
>   * builtins.cc (expand_builtin_fork_or_exec): Check
>   profile_condition_flag.
> * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
>   * common.opt: Add new options -fprofile-conditions and
>   * doc/gcov.texi: Add --conditions documentation.
>   * doc/invoke.texi: Add -fprofile-conditions documentation.
>   * gcc.cc: Link gcov on -fprofile-conditions.
>   * gcov-counter.def (GCOV_COUNTER_CONDS): New.
>   * gcov-dump.cc (tag_conditions): New.
>   * gcov-io.h (GCOV_TAG_CONDS): New.
>   (GCOV_TAG_CONDS_LENGTH): Likewise.
>   (GCOV_TAG_CONDS_NUM): Likewise.
>   * gcov.cc (class condition_info): New.
>   (condition_info::condition_info): New.
>   (condition_info::popcount): New.
>   (struct coverage_info): New.
>   (add_condition_counts): New.
>   (output_conditions): New.
>   (print_usage): Add -g, --conditions.
>   (process_args): Likewise.
>   (output_intermediate_json_line): Output conditions.
>   (read_graph_file): Read conditions counters.
>   (read_count_file): Read conditions counters.
>   (file_summary): Print conditions.
>   (accumulate_line_info): Accumulate conditions.
>   (output_line_details): Print conditions.
>   * ipa-inline.cc (can_early_inline_edge_p): Check
>   profile_condition_flag.
>   * ipa-split.cc (pass_split_functions::gate): Likewise.
>   * passes.cc (finish_optimization_passes): Likewise.
>   * profile.cc (find_conditions): New declaration.
>   (cov_length): Likewise.
>   (cov_blocks): Likewise.
>   (cov_masks): Likewise.
>   (cov_free): Likewise.
>   (instrument_decisions): New.
>   (read_thunk

Re: [PATCH 06/22] Use popcount_hwi rather than builtin

2023-10-05 Thread Jan Hubicka
Hi,
can you please also squash those changes which fixes patch #1
so it is easier to review?
Honza
> From: Jørgen Kvalsvik 
> 
> ---
>  gcc/gcov.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/gcov.cc b/gcc/gcov.cc
> index 274f2fc5d9f..35be97cf5ac 100644
> --- a/gcc/gcov.cc
> +++ b/gcc/gcov.cc
> @@ -46,6 +46,7 @@ along with Gcov; see the file COPYING3.  If not see
>  #include "color-macros.h"
>  #include "pretty-print.h"
>  #include "json.h"
> +#include "hwint.h"
>  
>  #include 
>  #include 
> @@ -159,7 +160,7 @@ condition_info::condition_info (): truev (0), falsev (0), 
> n_terms (0)
>  
>  int condition_info::popcount () const
>  {
> -return __builtin_popcountll (truev) + __builtin_popcountll (falsev);
> +return popcount_hwi (truev) + popcount_hwi (falsev);
>  }
>  
>  /* Describes a basic block. Contains lists of arcs to successor and
> -- 
> 2.30.2
> 


Re: [PATCH v2] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Jan Hubicka
> diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> index 956c6294fd7..1355ccac6f0 100644
> --- a/gcc/ipa-utils.cc
> +++ b/gcc/ipa-utils.cc
> @@ -651,13 +651,16 @@ ipa_merge_profiles (struct cgraph_node *dst,
>   {
> edge srce = EDGE_SUCC (srcbb, i);
> edge dste = EDGE_SUCC (dstbb, i);
> -   dste->probability = 
> - dste->probability * dstbb->count.ipa ().probability_in
> -  (dstbb->count.ipa ()
> -   + srccount.ipa ())
> - + srce->probability * srcbb->count.ipa ().probability_in
> -  (dstbb->count.ipa ()
> -   + srccount.ipa ());
> +   profile_count sum =
> + dstbb->count.ipa () + srccount.ipa ();
> +   if (sum.nonzero_p ())
> + dste->probability =
> +   dste->probability * dstbb->count.ipa ().probability_in
> +(dstbb->count.ipa ()
> + + srccount.ipa ())
> +   + srce->probability * srcbb->count.ipa ().probability_in
> +(dstbb->count.ipa ()
> + + srccount.ipa ());

looks good.  You can use probability_in (sum) 
in both of the places.

Honza


Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-05 Thread Victor Do Nascimento




On 10/5/23 12:42, Richard Earnshaw wrote:



On 03/10/2023 16:18, Victor Do Nascimento wrote:

This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
the compiler about system registers known to the assembler and how
these can be used.

The macros used to hold system register information reflect those in
use by binutils, a design choice made to facilitate the sharing of data
between different parts of the toolchain.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; any `SYSREG (...)' that is added in one
project can just as easily be added to its counterpart.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-system-regs.def: New.
---
  gcc/config/aarch64/aarch64-sys-regs.def | 1059 +++
  1 file changed, 1059 insertions(+)
  create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def


This file is supposed to be /identical/ to the one in GNU Binutils, 
right?


You're right Richard.

We want the same file to be compatible with both parts of the toolchain 
and, consequently, there is no compelling reason as to why the copy of 
the file found in GCC should in any way diverge from its Binutils 
counterpart.


If so, I think it needs to continue to say that it is part of 
GNU Binutils, not part of GCC.  Ramana, has this happened before?  If 
not, does the SC have a position here?


R.


This does raise a very interesting question on the intellectual property 
front and one that is well beyond my competence to opine about.


Nonetheless, this is a question which may arise again if we abstract 
away more target description data into such .def files, as has been 
discussed for architectural feature flags (for example).


So what might be nice (but not necessarily tenable) is if we had 
appropriate provisions in place for where files were shared across 
different parts of the toolchain.


Something like "This file is a shared resource of GCC and Binutils."

Anyway, that's my two cents on the matter :).

Let's see what Ramana has to say on the matter.

V.


diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def

new file mode 100644
index 000..d77fee1d5e3
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1059 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Arm Ltd
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* Array of system registers and their associated arch features.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1",    CPENC (3,0,13,0,5),    0,
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1",    CPENC (3,0,1,0,1),    0,
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2",    CPENC (3,4,1,0,1),    0,
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3",    CPENC (3,6,1,0,1),    0,
AARCH64_NO_FEAT

[avr,committed] Use monic denominator polynomials to save a multiplication.

2023-10-05 Thread Georg-Johann Lay

This is a small tweak in LibF7 to save one multiplication in computation
of denominator polynomials.  The polynomials are monic now, and
f7_horner needs one multiplication less.

Johann

--

LibF7: Use monic denominator polynomials to save a multiplication.

libgcc/config/avr/libf7/
* libf7.h (F7_FLAGNO_plusx, F7_FLAG_plusx): New macros.
* libf7.c (f7_horner): Handle F7_FLAG_plusx in highest coefficient.
* libf7-const.def [F7MOD_atan_]: Denominator: Set F7_FLAG_plusx
and omit highest term.
[F7MOD_asinacos_]: Use rational function with normalized denominator.


diff --git a/libgcc/config/avr/libf7/libf7-const.def 
b/libgcc/config/avr/libf7/libf7-const.def

index 8764c81ffa4..0e4c4d8701e 100644
--- a/libgcc/config/avr/libf7/libf7-const.def
+++ b/libgcc/config/avr/libf7/libf7-const.def
@@ -121,8 +121,7 @@ F7_CONST_DEF (X, 0, 
0xd6,0xa5,0x2d,0x73,0x34,0xd8,0x60, 11)

 F7_CONST_DEF (X, 0, 0xe5,0x08,0xb8,0x24,0x20,0x81,0xe7, 11)
 F7_CONST_DEF (X, 0, 0xe3,0xb3,0x35,0xfa,0xbf,0x1f,0x81, 10)
 F7_CONST_DEF (X, 0, 0xd3,0x89,0x2b,0xb6,0x3e,0x2e,0x05, 8)
-F7_CONST_DEF (X, 0, 0x9f,0xab,0xe9,0xd9,0x35,0xed,0x27, 5)
-F7_CONST_DEF (X, 0, 0x80,0x00,0x00,0x00,0x00,0x00,0x00, 0)
+F7_CONST_DEF (X, 8, 0x9f,0xab,0xe9,0xd9,0x35,0xed,0x27, 5)
 #endif

 #elif defined (SWIFT_3_4)
@@ -147,24 +146,22 @@ F7_CONST_DEF (pi_6, 0, 
0x86,0x0a,0x91,0xc1,0x6b,0x9b,0x2c, -1)

 #endif // which MiniMax

 #elif defined (F7MOD_asinacos_)
-// Relative error < 5.6E-18, quality = 1.0037 (ideal = 1).
+// f(x) = asin(w) / w,  w = sqrt(x/2),  w in [0, 0.5].
+// Relative error < 4.9E-18, Q10 = 21.7
 #if defined (FOR_NUMERATOR)
-// 0.9442491073135027586203 - 
1.035234033892197627842731209x + 
0.35290206232981519813422591897720574012x^2 - 
0.04333483170641685705612351801x^3 + 
0.0012557428614630796315205218507940285622x^4 + 
0.084705471128435769021718764878041684288x^5
-// p = Poly ([Decimal('0.9442491073135027586203'), 
Decimal('-1.0352340338921976278427312087167692142'), 
Decimal('0.35290206232981519813422591897720574012'), 
Decimal('-0.043334831706416857056123518013656946650'), 
Decimal('0.0012557428614630796315205218507940285622'), 
Decimal('0.084705471128435769021718764878041684288')])

-F7_CONST_DEF (X, 0, 0x80,0x00,0x00,0x00,0x00,0x00,0x00, 0)
-F7_CONST_DEF (X, 1, 0x84,0x82,0x8c,0x7f,0xa2,0xf6,0x65, 0)
-F7_CONST_DEF (X, 0, 0xb4,0xaf,0x94,0x40,0xcb,0x86,0x69, -2)
-F7_CONST_DEF (X, 1, 0xb1,0x7f,0xdd,0x4f,0x4e,0xbe,0x1d, -5)
-F7_CONST_DEF (X, 0, 0xa4,0x97,0xbd,0x0b,0x59,0xc9,0x25, -10)
-F7_CONST_DEF (X, 0, 0x8e,0x1c,0xb9,0x0b,0x50,0x6c,0xce, -17)
+// -41050.4389591195072042579 + 43293.8985171424974364797 x - 
15230.0535110759003163511 x^2 + 1996.35047839480810448269 x^3 - 
72.2973010025603956782375 x^4

+F7_CONST_DEF (X, 1, 0xa0,0x5a,0x70,0x5f,0x9f,0xf6,0x90, 15)
+F7_CONST_DEF (X, 0, 0xa9,0x1d,0xe6,0x05,0x38,0x2d,0xec, 15)
+F7_CONST_DEF (X, 1, 0xed,0xf8,0x36,0xcb,0x9b,0x83,0xdd, 13)
+F7_CONST_DEF (X, 0, 0xf9,0x8b,0x37,0x1e,0x77,0x74,0xf9, 10)
+F7_CONST_DEF (X, 1, 0x90,0x98,0x37,0xd6,0x46,0x21,0x3c, 6)
 #elif defined (FOR_DENOMINATOR)
-// 1 - 1.118567367225532923662371649x + 
0.42736600959872448854098334016758333519x^2 - 
0.06355588484963171659942148390x^3 + 
0.0028820878185134035637440105959294542908x^4
-// q = Poly ([Decimal('1'), 
Decimal('-1.1185673672255329236623716486696411533'), 
Decimal('0.42736600959872448854098334016758333519'), 
Decimal('-0.063555884849631716599421483898013782858'), 
Decimal('0.0028820878185134035637440105959294542908')])

-F7_CONST_DEF (X, 0, 0x80,0x00,0x00,0x00,0x00,0x00,0x00, 0)
-F7_CONST_DEF (X, 1, 0x8f,0x2d,0x37,0x2a,0x4d,0xa1,0x57, 0)
-F7_CONST_DEF (X, 0, 0xda,0xcf,0xb7,0xb5,0x4c,0x0d,0xee, -2)
-F7_CONST_DEF (X, 1, 0x82,0x29,0x96,0x77,0x2e,0x19,0xc7, -4)
-F7_CONST_DEF (X, 0, 0xbc,0xe1,0x68,0xec,0xba,0x20,0x29, -9)
+// -41050.4389591195074048679 + 46714.7684304025268691353 x - 
18353.2551497967388796235 x^2 + 2878.9626098308300020834 x^3 - 
150.822900775648362380508 x^4 + x^5

+F7_CONST_DEF (X, 1, 0xa0,0x5a,0x70,0x5f,0x9f,0xf6,0x91, 15)
+F7_CONST_DEF (X, 0, 0xb6,0x7a,0xc4,0xb7,0xda,0xd8,0x1b, 15)
+F7_CONST_DEF (X, 1, 0x8f,0x62,0x82,0xa2,0xfe,0x81,0x26, 14)
+F7_CONST_DEF (X, 0, 0xb3,0xef,0x66,0xd9,0x90,0xe3,0x91, 11)
+F7_CONST_DEF (X, 9, 0x96,0xd2,0xa9,0xa0,0x0f,0x43,0x44, 7)
 #endif

 #elif defined (F7MOD_sincos_)
diff --git a/libgcc/config/avr/libf7/libf7.c 
b/libgcc/config/avr/libf7/libf7.c

index 8fb57ef90cc..373a8a55d90 100644
--- a/libgcc/config/avr/libf7/libf7.c
+++ b/libgcc/config/avr/libf7/libf7.c
@@ -1527,6 +1527,9 @@ void f7_horner (f7_t *cc, const f7_t *xx, uint8_t 
n_coeff, const f7_t *coeff,


   f7_copy_flash (yy, pcoeff);

+  if (yy->flags & F7_FLAG_plusx)
+f7_Iadd (yy, xx);
+
   while (1)
 {
   --pcoeff;
diff --git a/libgcc/config/avr/libf7/libf7.h 
b/libgcc/config/avr/libf7/libf7.h

index 03fe6abe839..3f81b5f1f88 100644
--- a/libgcc/config/avr/libf7/libf7.h
+++ b/libgcc/config/avr/libf7/libf7.h
@@ -47,6 +47,1

Re: [PATCH 1/3] ipa-cp: Templatize filtering of m_agg_values

2023-10-05 Thread Jan Hubicka
> PR 57 points to another place where IPA-CP collected aggregate
> compile-time constants need to be filtered, in addition to the one
> place that already does this in ipa-sra.  In order to re-use code,
> this patch turns the common bit into a template.
> 
> The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.
> 
> gcc/ChangeLog:
> 
> 2023-09-13  Martin Jambor  
> 
>   PR ipa/57
>   * ipa-prop.h (ipcp_transformation): New member function template
>   remove_argaggs_if.
>   * ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
>   filter aggreagate constants.
OK,
Honza


[PATCH v4] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Sergei Trofimovich
On Thu, Oct 05, 2023 at 03:04:55PM +0200, Jan Hubicka wrote:
> > diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> > index 956c6294fd7..1355ccac6f0 100644
> > --- a/gcc/ipa-utils.cc
> > +++ b/gcc/ipa-utils.cc
> > @@ -651,13 +651,16 @@ ipa_merge_profiles (struct cgraph_node *dst,
> > {
> >   edge srce = EDGE_SUCC (srcbb, i);
> >   edge dste = EDGE_SUCC (dstbb, i);
> > - dste->probability = 
> > -   dste->probability * dstbb->count.ipa ().probability_in
> > -(dstbb->count.ipa ()
> > - + srccount.ipa ())
> > -   + srce->probability * srcbb->count.ipa ().probability_in
> > -(dstbb->count.ipa ()
> > - + srccount.ipa ());
> > + profile_count sum =
> > +   dstbb->count.ipa () + srccount.ipa ();
> > + if (sum.nonzero_p ())
> > +   dste->probability =
> > + dste->probability * dstbb->count.ipa ().probability_in
> > +  (dstbb->count.ipa ()
> > +   + srccount.ipa ())
> > + + srce->probability * srcbb->count.ipa ().probability_in
> > +  (dstbb->count.ipa ()
> > +   + srccount.ipa ());
> 
> looks good.  You can use probability_in (sum) 
> in both of the places.

Oh, great point! Completely forgot about it. Attached v4.

If it still looks reasonable I'll check again if `python` and
`profiledbootstrap` still survives it and will push.

-- 

  Sergei
>From cb9852216b5b2524f72964b399c133557ec98df0 Mon Sep 17 00:00:00 2001
From: Sergei Trofimovich 
Date: Wed, 27 Sep 2023 14:29:12 +0100
Subject: [PATCH v4] ipa-utils: avoid uninitialized probabilities on ICF
 [PR111559]

r14-3459-g0c78240fd7d519 "Check that passes do not forget to define profile"
exposed check failures in cases when gcc produces uninitialized profile
probabilities. In case of PR/111559 uninitialized profile is generated
by edges executed 0 times reported by IPA profile:

$ gcc -O2 -fprofile-generate pr111559.c -o b -fopt-info
$ ./b
$ gcc -O2 -fprofile-use -fprofile-correction pr111559.c -o b -fopt-info

pr111559.c: In function 'rule1':
pr111559.c:6:13: error: probability of edge 3->4 not initialized
6 | static void rule1(void) { if (p) edge(); }
  | ^
during GIMPLE pass: fixup_cfg
pr111559.c:6:13: internal compiler error: verify_flow_info failed

The change conservatively ignores updates with zero execution counts and
uses initially assigned probabilities (`always` probability in case of
the example).

PR ipa/111283
PR gcov-profile/111559

gcc/
* ipa-utils.cc (ipa_merge_profiles): Avoid producing
uninitialized probabilities when merging counters with zero
denominators.

gcc/testsuite/
* gcc.dg/tree-prof/pr111559.c: New test.
---
 gcc/ipa-utils.cc  | 15 ---
 gcc/testsuite/gcc.dg/tree-prof/pr111559.c | 16 
 2 files changed, 24 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr111559.c

diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
index 956c6294fd7..6024ac69cc2 100644
--- a/gcc/ipa-utils.cc
+++ b/gcc/ipa-utils.cc
@@ -651,13 +651,14 @@ ipa_merge_profiles (struct cgraph_node *dst,
{
  edge srce = EDGE_SUCC (srcbb, i);
  edge dste = EDGE_SUCC (dstbb, i);
- dste->probability = 
-   dste->probability * dstbb->count.ipa ().probability_in
-(dstbb->count.ipa ()
- + srccount.ipa ())
-   + srce->probability * srcbb->count.ipa ().probability_in
-(dstbb->count.ipa ()
- + srccount.ipa ());
+ profile_count sum =
+   dstbb->count.ipa () + srccount.ipa ();
+ if (sum.nonzero_p ())
+   dste->probability =
+ dste->probability * dstbb->count.ipa ().probability_in
+  (sum)
+ + srce->probability * srcbb->count.ipa ().probability_in
+  (sum);
}
  dstbb->count = dstbb->count.ipa () + srccount.ipa ();
}
diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr111559.c 
b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
new file mode 100644
index 000..43202c6c888
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/pr111559.c
@@ -0,0 +1,16 @@
+/* { dg-options "-O2" } */
+
+__attribute__((noipa

Re: [PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

2023-10-05 Thread Jan Hubicka
> gcc/ChangeLog:
> 
> 2023-09-19  Martin Jambor  
> 
>   PR ipa/57
>   * ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
>   * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
>   (update_signature): Mark any any IPA-CP aggregate constants at
>   positions known to be killed as killed.  Move check that there is
>   clone_info after this pruning.
>   * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
>   (ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
>   (push_agg_values_from_plats): Likewise.
>   (ipa_push_agg_values_from_jfunc): Likewise.
>   (estimate_local_effects): Likewise.
>   (push_agg_values_for_index_from_edge): Likewise.
>   * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
>   flag.
>   (read_ipcp_transformation_info): Likewise.
>   (ipcp_get_aggregate_const): Update comment, assert that encountered
>   record does not have killed flag set.
>   (ipcp_transform_function): Prune all aggregate constants with killed
>   set.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-09-18  Martin Jambor  
> 
>   PR ipa/57
>   * gcc.dg/lto/pr57_0.c: New test.
>   * gcc.dg/lto/pr57_1.c: Second file of the same new test.

> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
> index c04f9f44c06..a8fcf159259 100644
> --- a/gcc/ipa-modref.cc
> +++ b/gcc/ipa-modref.cc
> @@ -4065,21 +4065,71 @@ remap_kills (vec  &kills, const 
> vec  &map)
>i++;
>  }
>  
> +/* Return true if the V can overlap with KILL.  */
> +
> +static bool
> +ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value &v,
> + const modref_access_node &kill)
> +{
> +  if (kill.parm_index == v.index)
> +{
> +  gcc_assert (kill.parm_offset_known);
> +  gcc_assert (known_eq (kill.max_size, kill.size));
> +  poly_int64 repl_size;
> +  bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)),
> +  &repl_size);
> +  gcc_assert (ok);
> +  poly_int64 repl_offset (v.unit_offset);
> +  repl_offset <<= LOG2_BITS_PER_UNIT;
> +  poly_int64 combined_offset
> + = (kill.parm_offset << LOG2_BITS_PER_UNIT) + kill.offset;
parm_offset may be negative which I think will confuse ranges_maybe_overlap_p. 
I think you need to test for this and if it is negative adjust
repl_offset instead of kill.offset
> +  if (ranges_maybe_overlap_p (repl_offset, repl_size,
> +   combined_offset, kill.size))
> + return true;
> +}
> +  return false;
> +}
> +
>  /* If signature changed, update the summary.  */
>  
>  static void
>  update_signature (struct cgraph_node *node)
>  {
> -  clone_info *info = clone_info::get (node);
> -  if (!info || !info->param_adjustments)
> -return;
> -
>modref_summary *r = optimization_summaries
> ? optimization_summaries->get (node) : NULL;
>modref_summary_lto *r_lto = summaries_lto
> ? summaries_lto->get (node) : NULL;
>if (!r && !r_lto)
>  return;
> +
> +  ipcp_transformation *ipcp_ts = ipcp_get_transformation_summary (node);
Please add comment on why this is necessary.
> +  if (ipcp_ts)
> +{
> +for (auto &v : ipcp_ts->m_agg_values)
> +  {
> + if (!v.by_ref)
> +   continue;
> + if (r)
> +   for (const modref_access_node &kill : r->kills)
> + if (ipcp_argagg_and_kill_overlap_p (v, kill))
> +   {
> + v.killed = true;
> + break;
> +   }
> + if (!v.killed && r_lto)
> +   for (const modref_access_node &kill : r_lto->kills)
> + if (ipcp_argagg_and_kill_overlap_p (v, kill))
> +   {
> + v.killed = 1;
 = true?
> + break;
> +   }
> +  }
> +}
> +
> +  clone_info *info = clone_info::get (node);
> +  if (!info || !info->param_adjustments)
> +return;
> +
OK.
Honza


Re: [PATCH v4] ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

2023-10-05 Thread Jan Hubicka
> On Thu, Oct 05, 2023 at 03:04:55PM +0200, Jan Hubicka wrote:
> > > diff --git a/gcc/ipa-utils.cc b/gcc/ipa-utils.cc
> > > index 956c6294fd7..1355ccac6f0 100644
> > > --- a/gcc/ipa-utils.cc
> > > +++ b/gcc/ipa-utils.cc
> > > @@ -651,13 +651,16 @@ ipa_merge_profiles (struct cgraph_node *dst,
> > >   {
> > > edge srce = EDGE_SUCC (srcbb, i);
> > > edge dste = EDGE_SUCC (dstbb, i);
> > > -   dste->probability = 
> > > - dste->probability * dstbb->count.ipa ().probability_in
> > > -  (dstbb->count.ipa ()
> > > -   + srccount.ipa ())
> > > - + srce->probability * srcbb->count.ipa ().probability_in
> > > -  (dstbb->count.ipa ()
> > > -   + srccount.ipa ());
> > > +   profile_count sum =
> > > + dstbb->count.ipa () + srccount.ipa ();
> > > +   if (sum.nonzero_p ())
> > > + dste->probability =
> > > +   dste->probability * dstbb->count.ipa ().probability_in
> > > +(dstbb->count.ipa ()
> > > + + srccount.ipa ())
> > > +   + srce->probability * srcbb->count.ipa ().probability_in
> > > +(dstbb->count.ipa ()
> > > + + srccount.ipa ());
> > 
> > looks good.  You can use probability_in (sum) 
> > in both of the places.
> 
> Oh, great point! Completely forgot about it. Attached v4.
> 
> If it still looks reasonable I'll check again if `python` and
> `profiledbootstrap` still survives it and will push.
Looks good, thanks!
Honza


[avr,committed] Remove all uses of attribute pure from LibF7.

2023-10-05 Thread Georg-Johann Lay

Applied the following patch.

Johann


LibF7: Remove uses of attribute pure.

libgcc/config/avr/libf7/
* libf7.h (F7_PURE): Remove all occurrences.
* libf7.c: Same.

diff --git a/libgcc/config/avr/libf7/libf7.c 
b/libgcc/config/avr/libf7/libf7.c

index 373a8a55d90..0d9e4c325b2 100644
--- a/libgcc/config/avr/libf7/libf7.c
+++ b/libgcc/config/avr/libf7/libf7.c
@@ -352,7 +352,7 @@ float f7_get_float (const f7_t *aa)

   return make_float (mant);
 }
-F7_PURE ALIAS (f7_get_float, f7_truncdfsf2)
+ALIAS (f7_get_float, f7_truncdfsf2)
 #endif // F7MOD_get_float_

 #define DBL_DIG_EXP   11
@@ -572,7 +572,7 @@ int32_t f7_get_s32 (const f7_t *aa)
   extern int32_t to_s32 (const f7_t*, uint8_t) F7ASM(f7_to_integer_asm);
   return to_s32 (aa, 0x1f);
 }
-F7_PURE ALIAS (f7_get_s32, f7_fixdfsi)
+ALIAS (f7_get_s32, f7_fixdfsi)
 #endif // F7MOD_get_s32_


@@ -583,7 +583,7 @@ F7_PURE ALIAS (f7_get_s32, f7_fixdfsi)
   extern int64_t to_s64 (const f7_t*, uint8_t) F7ASM(f7_to_integer_asm);
   return to_s64 (aa, 0x3f);
 }
-F7_PURE ALIAS (f7_get_s64, f7_fixdfdi)
+ALIAS (f7_get_s64, f7_fixdfdi)
 #endif // F7MOD_get_s64_

 #ifdef F7MOD_get_u16_
@@ -603,7 +603,7 @@ uint32_t f7_get_u32 (const f7_t *aa)
   extern uint32_t to_u32 (const f7_t*, uint8_t) F7ASM(f7_to_unsigned_asm);
   return to_u32 (aa, 0x1f);
 }
-F7_PURE ALIAS (f7_get_u32, f7_fixunsdfsi)
+ALIAS (f7_get_u32, f7_fixunsdfsi)
 #endif // F7MOD_get_u32_


@@ -614,7 +614,7 @@ uint64_t f7_get_u64 (const f7_t *aa)
   extern int64_t to_u64 (const f7_t*, uint8_t) F7ASM(f7_to_unsigned_asm);
   return to_u64 (aa, 0x3f);
 }
-F7_PURE ALIAS (f7_get_u64, f7_fixunsdfdi)
+ALIAS (f7_get_u64, f7_fixunsdfdi)
 #endif // F7MOD_get_u64_


diff --git a/libgcc/config/avr/libf7/libf7.h 
b/libgcc/config/avr/libf7/libf7.h

index 3f81b5f1f88..f692854dced 100644
--- a/libgcc/config/avr/libf7/libf7.h
+++ b/libgcc/config/avr/libf7/libf7.h
@@ -36,7 +36,7 @@
 --  Inline asm
 --  Setting assembler names by means of __asm (GNU-C).
 --  Attributes: alias, always_inline, const, noinline, unused,
-progmem, pure, weak, warning
+   progmem, weak, warning
 --  GCC built-ins: __builtin_abort, __builtin_constant_p
 --  AVR built-ins: __builtin_avr_bitsr, __builtin_avr_rbits
 */
@@ -112,7 +112,6 @@ extern "C" {
 #define F7_INLINE   inline __attribute__((__always_inline__))
 #define F7_NOINLINE __attribute__((__noinline__))
 #define F7_WEAK __attribute__((__weak__))
-#define F7_PURE __attribute__((__pure__))
 #define F7_UNUSED   __attribute__((__unused__))
 #define F7_CONST__attribute__((__const__))

@@ -150,7 +149,7 @@ typedef uint64_t f7_double_t;
 #define F7_MANT_HI2(X) \
   (*(uint16_t*) & (X)->mant[F7_MANT_BYTES - 2])

-static F7_INLINE F7_PURE
+static F7_INLINE
 uint8_t f7_classify (const f7_t *aa)
 {
   extern void f7_classify_asm (void);
@@ -361,14 +360,14 @@ f7_t* f7_abs (f7_t *cc, const f7_t *aa)
 }


-F7_PURE extern int8_t f7_cmp (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_lt_impl (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_le_impl (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_gt_impl (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_ge_impl (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_ne_impl (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_eq_impl (const f7_t*, const f7_t*);
-F7_PURE extern bool f7_unord_impl (const f7_t*, const f7_t*);
+extern int8_t f7_cmp (const f7_t*, const f7_t*);
+extern bool f7_lt_impl (const f7_t*, const f7_t*);
+extern bool f7_le_impl (const f7_t*, const f7_t*);
+extern bool f7_gt_impl (const f7_t*, const f7_t*);
+extern bool f7_ge_impl (const f7_t*, const f7_t*);
+extern bool f7_ne_impl (const f7_t*, const f7_t*);
+extern bool f7_eq_impl (const f7_t*, const f7_t*);
+extern bool f7_unord_impl (const f7_t*, const f7_t*);

 static F7_INLINE
 bool f7_lt (const f7_t *aa, const f7_t *bb)
@@ -541,14 +540,14 @@ extern f7_t* f7_set_u32 (f7_t*, uint32_t);
 extern void f7_set_float (f7_t*, float);
 extern void f7_set_pdouble (f7_t*, const f7_double_t*);

-F7_PURE extern int16_t f7_get_s16 (const f7_t*);
-F7_PURE extern int32_t f7_get_s32 (const f7_t*);
-F7_PURE extern int64_t f7_get_s64 (const f7_t*);
-F7_PURE extern uint16_t f7_get_u16 (const f7_t*);
-F7_PURE extern uint32_t f7_get_u32 (const f7_t*);
-F7_PURE extern uint64_t f7_get_u64 (const f7_t*);
-F7_PURE extern float f7_get_float (const f7_t*);
-F7_PURE extern f7_double_t f7_get_double (const f7_t*);
+extern int16_t f7_get_s16 (const f7_t*);
+extern int32_t f7_get_s32 (const f7_t*);
+extern int64_t f7_get_s64 (const f7_t*);
+extern uint16_t f7_get_u16 (const f7_t*);
+extern uint32_t f7_get_u32 (const f7_t*);
+extern uint64_t f7_get_u64 (const f7_t*);
+extern float f7_get_float (const f7_t*);
+extern f7_double_t f7_get_double (const f7_t*);

 #if USE_LPM == 1
   #define F7_PGMSPACE __attribute__((__progmem__))
@@ -639,10 +638,10 @@ extern void f7_horner (f7_t*, const f7_t*, 
uint8_t, const f7_t *coeff, f7_t*);

 ex

Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-05 Thread Robin Dapp
> So I think Kenner's code is trying to prevent having a value in a
> SUBREG that is inconsistent with the SUBREG_PROMOTED* flag bits.  But
> I think it's been unnecessary since Matz's rewrite in 2009.

I couldn't really tell what the rewrite does entirely so I tried creating
a case where we would require the SUBREG_PROMOTED_VAR but couldn't come
up with any.  At least for the most common path through expr I believe
I know why:

So our case is when we have an SI subreg from a DI reg that is originally
a sign-extended SI.  Then we NOP-convert the SI subreg from signed to
unsigned.  We only perform implicit sign extensions therefore we can
omit the implicit zero-extension case here.
The way the result of the signed->unsigned conversion is used determines
whether we can use SUBREG_PROMOTED_VAR.  There are two possibilities
(1) and (2).

 void foo (int bar)
 {
unsigned int u = (unsigned int) bar;


(1) unsigned long long ul = (unsigned long long) u;

As long as the result is used unsigned, we will always perform a zero
extension no matter the "Kenner hunk" (because whether the subreg has
SRP_SIGNED or !SUBREG_PROMOTED_VAR does not change the need for a
zero_extend).


(2) long long l = (long long) u;

SUBREG_PROMOTED is checked by the following in convert_move:

  scalar_int_mode to_int_mode;
  if (GET_CODE (from) == SUBREG
  && SUBREG_PROMOTED_VAR_P (from)
  && is_a  (to_mode, &to_int_mode)
  && (GET_MODE_PRECISION (subreg_promoted_mode (from))
  >= GET_MODE_PRECISION (to_int_mode))
  && SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp))

The SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp) is decisive
as far as I can tell.  unsignedp = 1 comes from treeop0 so our
"from" (i.e. unsigned int u).
With the "Kenner hunk" SUBREG_PROMOTED_VAR is unset, so we don't
strip the extension.  Without it, SUBREG_PROMOTED_VAR () == SRP_SIGNED
which is != unsignedp, so no stripping either.

Now there are several other paths that would need auditing as well
but at least this one is safe.  An interesting test target would be
a backend that does implicit zero extensions but as we haven't seen
fallout so far chances to find a trigger are slim.

Does that make sense?

Regards
 Robin



Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jørgen Kvalsvik

On 05/10/2023 21:59, Jan Hubicka wrote:


Like Wahlen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. This is very fast, but
introduces a limit on the number of terms in a single boolean
expression, the number of bits in a gcov_unsigned_type (which is
typedef'd to uint64_t), so for most practical purposes this would be
acceptable. This limitation is in the implementation and not the
algorithm, so support for more conditions can be added by also
introducing arbitrary-sized bitsets.


This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?


It shouldn't be, no. But when dynamic bitsets are on the table it would 
be much better to length-encode in smaller multiples than the 64-bit 
counters. Most expressions are small (<4 terms), so the savings would be 
substantial. I opted for the simpler fixed-size to start with because it 
is much simpler and would not introduce any branching or decisions in 
the instrumentation.




For space overhead, the instrumentation needs two accumulators
(gcov_unsigned_type) per condition in the program which will be written
to the gcov file. In addition, every function gets a pair of local
accumulators, but these accmulators are reused between conditions in the
same function.

For time overhead, there is a zeroing of the local accumulators for
every condition and one or two bitwise operation on every edge taken in
the an expression.

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

 3:   17:void fn (int a, int b, int c, int d) {
 3:   18:if ((a && (b || c)) && d)
conditions covered 3/8
condition  0 not covered (true)
condition  0 not covered (false)
condition  1 not covered (true)
condition  2 not covered (true)
condition  3 not covered (true)

It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?


Some expressions, mostly those without else-blocks, are effectively
"rewritten" in the CFG construction making the algorithm unable to
distinguish them:

and.c:

 if (a && b && c)
 x = 1;

ifs.c:

 if (a)
 if (b)
 if (c)
 x = 1;

gcc will build the same graph for both these programs, and gcov will
report boths as 3-term expressions. It is vital that it is not
interpreted the other way around (which is consistent with the shape of
the graph) because otherwise the masking would be wrong for the and.c
program which is a more severe error. While surprising, users would
probably expect some minor rewriting of semantically-identical
expressions.

and.c.gcov:
 #:2:if (a && b && c)
conditions covered 6/6
 #:3:x = 1;

ifs.c.gcov:
 #:2:if (a)
 #:3:if (b)
 #:4:if (c)
 #:5:x = 1;
conditions covered 6/6


Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is also used 
for
auto-FDO?


That is one possibility, which I tried for a bit, but abandoned to focus 
on getting the rest of the algorithm right. I am sure it can be 
revisited (possibly as a future improvement) and weighted against always 
emitting an else block (see 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631254.html)




gcc/ChangeLog:

* builtins.cc (expand_builtin_fork_or_exec): Check
profile_condition_flag.
 * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
* common.opt: Add new options -fprofile-conditions and
* doc/gcov.texi: Add --conditions documentation.
* doc/invoke.texi: Add -fprofile-conditions documentation.
* gcc.cc: Link gcov on -fprofile-conditions.
* gcov-counter.def (GCOV_COUNTER_CONDS): New.
* gcov-dump.cc (tag_conditions): New.
* gcov-io.h (GCOV_TAG_CONDS): New.
(GCOV_TAG_CONDS_LENGTH): Likewise.
(GCOV_TAG_CONDS_NUM): Likewise.
* gcov.cc (class condition_info): New.
(condition_info::condition_info): New.
(condition_info::popcount): New.
(struct coverage_info): New.
(add_condition_counts): New.
(output_conditions): New.
(print_usage): Add -g, --conditions.
(process_args): Likewise.
(output_intermediate_json_line): Output conditions.
(read_graph_file): Read conditions counters.
(read_count_file): Read

Re: [PATCH] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-05 Thread Kito Cheng
Hi Robin:

Your suggested code seems work fine, let me run more test and send v2, I
guess I just don’t know how to explain why it work in comment :p

Robin Dapp 於 2023年10月5日 週四,03:57寫道:

> >> I think the "max poly value" is the LMUL 1 mode coeffs[1]
> >>
> >> See int vlenb = BYTES_PER_RISCV_VECTOR.coeffs[1];
> >>
> >> So I think bump max_power to exact_log2 (64); is not enough.
> >> since we adjust the LMUL 1 mode size according to TARGET_MIN_VLEN.
> >>
> >> I suspect the testcase you append in this patch will fail with
> -march=rv64gcv_zvl4096b.
> >
> >
> > There is no type smaller than  [64, 64] in zvl4096b, RVVMF64BI is [64,
> > 64], it’s smallest type, and RVVFM1BI is [512, 512] (size of single
> > vector reg.) which at most 64x for zvl4096b, so my understanding is
> > log2(64) is enough :)
> >
> > and of cause, verified the testcase is work with -march=rv64gcv_zvl4096b
>
> I was wondering if the whole hunk couldn't be condensed into something
> like (untested):
>
>   div_factor = wi::ctz (factor) - wi::ctz (vlenb);
>   if (div_factor >= 0)
> div_factor = 1;
>   else
> div_factor = 1 << -div_factor;
>
> This would avoid the loop as well.  An assert for the div_factor (not
> exceeding a value) could still be added.
>
> Regards
>  Robin
>


RE: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-05 Thread Tamar Christina
> On Tue, Oct 03, 2023 at 11:41:01AM +, Tamar Christina wrote:
> > > We have stablesort method instead of qsort but that would require
> > > consistent ordering in the vector (std::sort doesn't ensure stable
> > > sorting either).
> > >
> > > If it is a non-issue, the patch is ok with the above nits fixed.
> > > Otherwise perhaps we'd need to push in the first loop into the vector (but
> that
> > >   if (!phi_arg_map.get (arg))
> > >   args.quick_push (arg);
> > >   phi_arg_map.get_or_insert (arg).safe_push (i); in there was
> > > quite inefficient, better would be
> > >   bool existed;
> > >   phi_arg_map.get_or_insert (arg, &existed).safe_push (i);
> > >   if (!existed)
> > >   args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL }); or something
> > > similar), plus use stablesort.  Or add another compared member which
> > > would be the first position.
> >
> > Hmm the problem here is that it would make the second loop that fills
> > in the len quadratic as it has to search for arg in the list.  I
> > suppose I could push a pointer to the struct instead of `i` in the
> > hashmap and the element into args and update the pointer as we go along?
> Would that work?
> 
> Only if the second loop traverses the hashmap elements and for each tries to
> find the corresponding vector element.
> If instead you do what you've done before in the second loop, walk the vector
> and for each arg in there lookup phi_args_map.get (v.arg) (but please just
> once, vanilla trunk looks it up twice in
>   for (int index : phi_arg_map.get (args[i]))
> {
>   edge e = gimple_phi_arg_edge (phi, index);
>   len += get_bb_num_predicate_stmts (e->src);
> }
> 
>   unsigned occur = phi_arg_map.get (args[i])->length (); ), then I don't 
> think
> it would be quadratic.

Fair enough, here's the updated patch. It should address all the concerns
and clean up the code 😊

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
(typedef struct ifcvt_arg_entry): New.
(cmp_arg_entry): New.
(gen_phi_arg_condition, gen_phi_nest_statement,
predicate_scalar_phi): Use them.

--- inline copy of patch ---

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
f76e0d8f2e6e0f59073fa8484b0b2c7a6cdc9783..635fce7a69af254dbc5aa9f829e6a053671d1d2c
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -80,7 +80,6 @@ along with GCC; see the file COPYING3.  If not see
  :;
 */
 
-#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -1937,11 +1936,32 @@ gen_simplified_condition (tree cond, 
scalar_cond_masked_set_type &cond_set)
   return cond;
 }
 
+/* Structure used to track meta-data on PHI arguments used to generate
+   most efficient comparison sequence to slatten a PHI node.  */
+
+typedef struct ifcvt_arg_entry
+{
+  /* The PHI node argument value.  */
+  tree arg;
+
+  /* The number of compares required to reach this PHI node from start of the
+ BB being if-converted.  */
+  unsigned num_compares;
+
+  /* The number of times this PHI node argument appears in the current PHI
+ node.  */
+  unsigned occurs;
+
+  /* The indices at which this PHI arg occurs inside the PHI node.  */
+  vec  *indexes;
+} ifcvt_arg_entry_t;
+
 /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
as to whether the condition is inverted.  */
 
 static tree
-gen_phi_arg_condition (gphi *phi, vec *occur, gimple_stmt_iterator *gsi,
+gen_phi_arg_condition (gphi *phi, ifcvt_arg_entry_t &arg,
+  gimple_stmt_iterator *gsi,
   scalar_cond_masked_set_type &cond_set, bool *invert)
 {
   int len;
@@ -1951,11 +1971,11 @@ gen_phi_arg_condition (gphi *phi, vec *occur, 
gimple_stmt_iterator *gsi,
   edge e;
 
   *invert = false;
-  len = occur->length ();
+  len = arg.indexes->length ();
   gcc_assert (len > 0);
   for (i = 0; i < len; i++)
 {
-  e = gimple_phi_arg_edge (phi, (*occur)[i]);
+  e = gimple_phi_arg_edge (phi, (*arg.indexes)[i]);
   c = bb_predicate (e->src);
   if (is_true_predicate (c))
{
@@ -2020,22 +2040,21 @@ gen_phi_arg_condition (gphi *phi, vec *occur, 
gimple_stmt_iterator *gsi,
 static tree
 gen_phi_nest_statement (gphi *phi, gimple_stmt_iterator *gsi,
scalar_cond_masked_set_type &cond_set, tree type,
-   hash_map> &phi_arg_map,
-   gimple **res_stmt, tree lhs0, vec &args,
-   unsigned idx)
+   gimple **res_stmt, tree lhs0,
+   vec &args, unsigned idx)
 {
   if (idx == args.length ())
-return args[idx - 1];
+return args[idx - 1].arg;
 
-  vec *indexes = phi_arg_map.get (args[idx - 1]);
   bool invert;
-  tree cond = gen_phi_arg_condition (phi, indexes, gsi, cond_se

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Robin Dapp
Hi Tamar,

> The only comment I have is whether you actually need this helper
> function? It looks like all the uses of it are in cases you have, or
> will call conditional_internal_fn_code directly.
removed the cond_fn_p entirely in the attached v3.

Bootstrapped and regtested on x86_64, aarch64 and power10.

Regards
 Robin

Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
 reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD during ifcvt and
adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
(neutral_op_for_reduction): Return -0 for PLUS.
(vect_is_simple_reduction): Don't count else operand in
COND_ADD.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_ADD handling.
(vectorizable_reduction): Don't count else operand in COND_ADD.
(vect_transform_reduction): Add COND_ADD handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
---
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
 .../riscv/rvv/autovec/cond/pr111401.c | 139 
 gcc/tree-if-conv.cc   |  63 ++--
 gcc/tree-vect-loop.cc | 150 ++
 gcc/tree-vectorizer.h |   2 +-
 5 files changed, 451 insertions(+), 44 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..7b46e7d8a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone))
+reduc_minus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+  int n = 19;
+  double a[N];
+  int cond1[N], cond2[N];
+
+  for (int i = 0; i < N; i++)
+{
+  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
+  cond1[i] = 0;
+  cond2[i] = i & 4 ? 1 : 0;
+  asm volatile ("" ::: "memory");
+}
+
+  double res1 = reduc_plus_double (a, -0.0, cond1, n);
+  double ref1 = reduc_plus_double_ref (a, -0.0, cond1, n);
+  double res2 = reduc_minus_double (a, -0.0, cond1, n);
+  double ref2 = reduc_minus_double_ref (a, -0.0, cond1, n);
+  double res3 = reduc_plus_double (a, -0.0, cond1, n);
+  double ref3 = reduc_plus_double_ref (a, -0.0, cond1, n);
+  double res4 = reduc_minus_double (a, -0.0, cond1, n);
+  double ref4 = reduc_minus_double_ref (a, -0.0, cond1, n);
+
+  if (res1 != ref1 || signbit (res1) != signbit (ref1))
+__builtin_abort ();
+  if (res2 != ref2 || signbit (res2) != signbit (ref2))
+__builtin_abort ();
+  if (res3 != ref3 || signbit (res3) != signbit (ref3))
+__builtin_abort ();
+  if (res4 != ref4 || signbit (res4) != signbit (ref4))
+__builtin_abort ();
+
+  res1 = reduc_plus_double (a, 0.0, cond1, n);
+  ref1 = reduc_plus_double_r

[PATCH v5] Add condition coverage profiling

2023-10-05 Thread Jørgen Kvalsvik
This patch adds support in gcc+gcov for modified condition/decision
coverage (MC/DC) with the -fprofile-conditions flag. MC/DC is a type of
test/code coverage and it is particularly important in the avation and
automotive industries for safety-critical applications. MC/DC it is
required for or recommended by:

* DO-178C for the most critical software (Level A) in avionics
* IEC 61508 for SIL 4
* ISO 26262-6 for ASIL D

>From the SQLite webpage:

Two methods of measuring test coverage were described above:
"statement" and "branch" coverage. There are many other test
coverage metrics besides these two. Another popular metric is
"Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
MC/DC as follows:

* Each decision tries every possible outcome.
* Each condition in a decision takes on every possible outcome.
* Each entry and exit point is invoked.
* Each condition in a decision is shown to independently affect
  the outcome of the decision.

In the C programming language where && and || are "short-circuit"
operators, MC/DC and branch coverage are very nearly the same thing.
The primary difference is in boolean vector tests. One can test for
any of several bits in bit-vector and still obtain 100% branch test
coverage even though the second element of MC/DC - the requirement
that each condition in a decision take on every possible outcome -
might not be satisfied.

https://sqlite.org/testing.html#mcdc

Whalen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
MC/DC" describes an algorithm for adding instrumentation by carrying
over information from the AST, but my algorithm analyses the the control
flow graph to instrument for coverage. This has the benefit of being
programming language independent and faithful to compiler decisions
and transformations. I have primarily tested it on C and C++, see
testsuite/gcc.misc-tests and testsuite/g++.dg, and run some manual tests
using D, Rust, and Go. D and Rust mostly behave as you would expect (I
found D would sometimes put the conditions for lambdas into the module)
It does not work as expected for Go as the go front-end evaluates
multi-conditional expressions by folding results into temporaries.

Like Whalen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. This is very fast, but
introduces a limit on the number of terms in a single boolean
expression, the number of bits in a gcov_unsigned_type (which is
typedef'd to uint64_t), so for most practical purposes this would be
acceptable. This limitation is in the implementation and not the
algorithm, so support for more conditions can be added by also
introducing arbitrary-sized bitsets.

For space overhead, the instrumentation needs two accumulators
(gcov_unsigned_type) per condition in the program which will be written
to the gcov file. In addition, every function gets a pair of local
accumulators, but these accmulators are reused between conditions in the
same function.

For time overhead, there is a zeroing of the local accumulators for
every condition and one or two bitwise operation on every edge taken in
the an expression.

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

3:   17:void fn (int a, int b, int c, int d) {
3:   18:if ((a && (b || c)) && d)
conditions covered 3/8
condition  0 not covered (true)
condition  0 not covered (false)
condition  1 not covered (true)
condition  2 not covered (true)
condition  3 not covered (true)
1:   19:x = 1;
-:   20:else
2:   21:x = 2;
3:   22:}

gcov --conditions --json-format:

"conditions": [
{
"not_covered_false": [
0
],
"count": 8,
"covered": 3,
"not_covered_true": [
0,
1,
2,
3
]
}
],

Some expressions, mostly those without else-blocks, are effectively
"rewritten" in the CFG construction making the algorithm unable to
distinguish them:

and.c:

if (a && b && c)
x = 1;

ifs.c:

if (a)
if (b)
if (c)
x = 1;

gcc will build the same graph for both these programs, and gcov will
report boths as 3-term expressions. It is vital that it is not
interpreted the other way around (which is consistent with the shape of
the graph) because otherwise the masking would be wrong for and.c which
is a more severe error. While surprising, users would probably expect
some minor rewriting of semantically-identical expressions. I think this
is something that can be improved on later.

and.c.gcov:
#:2:if (a && b && c)
conditions covered 6/6
#:3:   

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-05 Thread Jakub Jelinek
On Thu, Oct 05, 2023 at 02:01:40PM +, Tamar Christina wrote:
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
>   (typedef struct ifcvt_arg_entry): New.
>   (cmp_arg_entry): New.
>   (gen_phi_arg_condition, gen_phi_nest_statement,
>   predicate_scalar_phi): Use them.

> -  /* Compute phi_arg_map.  */
> +  /* Compute phi_arg_map, determine the list of unique PHI args and the 
> indices
> + where they are in the PHI node.  The indices will be used to determine
> + the conditions to apply and their complexity.  */
> +  auto_vec unique_args (num_args);
>for (i = 0; i < num_args; i++)
>  {
>tree arg;
>  
>arg = gimple_phi_arg_def (phi, i);
>if (!phi_arg_map.get (arg))
> - args.quick_push (arg);
> + unique_args.quick_push (arg);
>phi_arg_map.get_or_insert (arg).safe_push (i);
>  }

I meant instead of using another vector (unique_args) just do
args.quick_push ({ arg, 0, 0, NULL });
above (to avoid needing another allocation etc.).

> -  /* Determine element with max number of occurrences and complexity.  
> Looking at only
> - number of occurrences as a measure for complexity isn't enough as all 
> usages can
> - be unique but the comparisons to reach the PHI node differ per branch.  
> */
> -  typedef std::pair > ArgEntry;
> -  auto_vec argsKV;
> -  for (i = 0; i < args.length (); i++)
> +  /* Determine element with max number of occurrences and complexity.  
> Looking
> + at only number of occurrences as a measure for complexity isn't enough 
> as
> + all usages can be unique but the comparisons to reach the PHI node 
> differ
> + per branch.  */
> +  for (auto arg : unique_args)

And then
  for (auto &entry : args)
here with entry.arg instead of arg and

>  {
>unsigned int len = 0;
> -  for (int index : phi_arg_map.get (args[i]))
> +  vec *indices = phi_arg_map.get (arg);
> +  for (int index : *indices)
>   {
> edge e = gimple_phi_arg_edge (phi, index);
> len += get_bb_num_predicate_stmts (e->src);
>   }
>  
> -  unsigned occur = phi_arg_map.get (args[i])->length ();
> +  unsigned occur = indices->length ();
>if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
> -  argsKV.safe_push ({ args[i], { len, occur }});
> +  args.safe_push ({ arg, len, occur, indices });

either
  entry.num_compares = len;
  entry.occur = occur;
  entry.indices = indices;
here or just using entry.{num_occurrences,occur,indices} directly
instead of the extra automatic vars.

>  }
>  
> +  unique_args.release ();

Plus drop this.

Though, if Richi or Jeff think this is ok as is, I won't stand against it.

Jakub



RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-05 Thread Tamar Christina
Hi Robin,

> -Original Message-
> From: Robin Dapp 
> Sent: Thursday, October 5, 2023 3:06 PM
> To: Tamar Christina ; gcc-patches  patc...@gcc.gnu.org>; Richard Biener 
> Cc: rdapp@gmail.com
> Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> Hi Tamar,
> 
> > The only comment I have is whether you actually need this helper
> > function? It looks like all the uses of it are in cases you have, or
> > will call conditional_internal_fn_code directly.
> removed the cond_fn_p entirely in the attached v3.
> 
> Bootstrapped and regtested on x86_64, aarch64 and power10.
> 

Changes look good to me thanks! I'll leave it up to Richi for final approval.

Regards,
Tamar

> Regards
>  Robin
> 
> Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both into a
> masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is
> true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 150 ++
>  gcc/tree-vectorizer.h |   2 +-
>  5 files changed, 451 insertions(+), 44 deletions(-)  create mode 100644
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-
> zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone)) reduc_plus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone)) reduc_minus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i = 0; i < N; i++)
> +{
> +  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
> +  cond1[i] = 0;
> +  cond2[i] = i & 4 ? 1 : 0;
> +  asm volatile ("" ::: "memory");
> +}
> +
> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);  double ref1 =
> + reduc_plus_double_ref (a, -0.0, cond1, n);  double res2 =
> + reduc_minus_double (a, -0.0, cond1, n);  double ref2 =
> + reduc_minus_double_ref (a, -0.0, cond1, n);  double res3 =
> + reduc_plu

Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jørgen Kvalsvik

On 05/10/2023 22:39, Jørgen Kvalsvik wrote:

On 05/10/2023 21:59, Jan Hubicka wrote:


Like Wahlen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. This is very fast, but
introduces a limit on the number of terms in a single boolean
expression, the number of bits in a gcov_unsigned_type (which is
typedef'd to uint64_t), so for most practical purposes this would be
acceptable. This limitation is in the implementation and not the
algorithm, so support for more conditions can be added by also
introducing arbitrary-sized bitsets.


This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?


It shouldn't be, no. But when dynamic bitsets are on the table it would 
be much better to length-encode in smaller multiples than the 64-bit 
counters. Most expressions are small (<4 terms), so the savings would be 
substantial. I opted for the simpler fixed-size to start with because it 
is much simpler and would not introduce any branching or decisions in 
the instrumentation.


Oh, and I forgot - I have never seen a real world case that have been 
more than 64 conditions, but I suppose it may happen with generated code.






For space overhead, the instrumentation needs two accumulators
(gcov_unsigned_type) per condition in the program which will be written
to the gcov file. In addition, every function gets a pair of local
accumulators, but these accmulators are reused between conditions in the
same function.

For time overhead, there is a zeroing of the local accumulators for
every condition and one or two bitwise operation on every edge taken in
the an expression.

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

 3:   17:void fn (int a, int b, int c, int d) {
 3:   18:    if ((a && (b || c)) && d)
conditions covered 3/8
condition  0 not covered (true)
condition  0 not covered (false)
condition  1 not covered (true)
condition  2 not covered (true)
condition  3 not covered (true)

It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?


Some expressions, mostly those without else-blocks, are effectively
"rewritten" in the CFG construction making the algorithm unable to
distinguish them:

and.c:

 if (a && b && c)
 x = 1;

ifs.c:

 if (a)
 if (b)
 if (c)
 x = 1;

gcc will build the same graph for both these programs, and gcov will
report boths as 3-term expressions. It is vital that it is not
interpreted the other way around (which is consistent with the shape of
the graph) because otherwise the masking would be wrong for the and.c
program which is a more severe error. While surprising, users would
probably expect some minor rewriting of semantically-identical
expressions.

and.c.gcov:
 #:    2:    if (a && b && c)
conditions covered 6/6
 #:    3:    x = 1;

ifs.c.gcov:
 #:    2:    if (a)
 #:    3:    if (b)
 #:    4:    if (c)
 #:    5:    x = 1;
conditions covered 6/6


Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is 
also used for

auto-FDO?


That is one possibility, which I tried for a bit, but abandoned to focus 
on getting the rest of the algorithm right. I am sure it can be 
revisited (possibly as a future improvement) and weighted against always 
emitting an else block (see 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631254.html)




gcc/ChangeLog:

* builtins.cc (expand_builtin_fork_or_exec): Check
profile_condition_flag.
 * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
* common.opt: Add new options -fprofile-conditions and
* doc/gcov.texi: Add --conditions documentation.
* doc/invoke.texi: Add -fprofile-conditions documentation.
* gcc.cc: Link gcov on -fprofile-conditions.
* gcov-counter.def (GCOV_COUNTER_CONDS): New.
* gcov-dump.cc (tag_conditions): New.
* gcov-io.h (GCOV_TAG_CONDS): New.
(GCOV_TAG_CONDS_LENGTH): Likewise.
(GCOV_TAG_CONDS_NUM): Likewise.
* gcov.cc (class condition_info): New.
(condition_info::condition_info): New.
(condition_info::popcount): New.
(struct coverage_info): New.
(add_condition_counts): New.
(output_conditions): New.
(print_usage): Add -g, --conditions.
(process_args): Likewise.
(output_intermediate_

Re: [PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Aldy Hernandez
On Thu, Oct 5, 2023, 8:26 a.m. Jakub Jelinek  wrote:

> Hi!
>
> The following patch removes ipa_bits struct pointer/vector from ipa
> jump functions and ipa cp transformations.
>
> The reason is because the struct uses widest_int to represent
> mask/value pair, which in the RFC patches to allow larger precisions
> for wide_int/widest_int is GC unfriendly because those types become
> non-trivially default constructible/copyable/destructible.
> One option would be to use trailing_wide_int for that instead, but
> as pointed out by Aldy, irange_storage which we already use under
> the hood for ipa_vr when type of parameter is integral or pointer
> already stores the mask/value pair because VRP now does the bit cp
> as well.
> So, this patch just uses m_vr to store both the value range and
> the bitmask.  There is still separate propagation of the
> ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but
> when storing we merge the two into the same container.
>
> I've bootstrapped/regtested a slightly older version of this
> patch on x86_64-linux and i686-linux and that version regressed
> +FAIL: gcc.dg/ipa/propalign-3.c scan-ipa-dump-not cp "align:"
> +FAIL: gcc.dg/ipa/propalign-3.c scan-tree-dump optimized "fail_the_test"
> +FAIL: gcc.dg/ipa/propbits-1.c scan-ipa-dump cp "Adjusting mask for param
> 0 to 0x7"
> +FAIL: gcc.dg/ipa/propbits-2.c scan-ipa-dump cp "Adjusting mask for param
> 0 to 0xf"
> The last 2 were solely about the earlier patch not actually copying
> the if (dump_file) dumping of message that we set some mask for some
> parameter (since then added in the @@ -5985,6 +5741,77 @@ hunk).
> The first testcase is a test for -fno-ipa-bit-cp disabling bit cp
> for alignments.  For integral types I'm afraid it is a lost case
> when -fno-ipa-bit-cp -fipa-vrp is on when value ranges track bit cp
> as well, but for pointer alignments I've added
>   && opt_for_fn (cs->caller->decl, flag_ipa_bit_cp)
> and
>   && opt_for_fn (node->decl, flag_ipa_bit_cp)
> guards such that even just -fno-ipa-bit-cp disables it (alternatively
> we could just add -fno-ipa-vrp to propalign-3.c dg-options).
>
> Ok for trunk if this passes another bootstrap/regtest?
> Or defer until it is really needed (when the wide_int/widest_int
> changes are about to be committed)?
>

Up to the maintainers, but this looks like a nice cleanup that has merit on
its own. It's exactly what I had in mind when I worked on IPA earlier this
cycle.

Thanks.
Aldy


> 2023-10-05  Jakub Jelinek  
>
> * ipa-prop.h (ipa_bits): Remove.
> (struct ipa_jump_func): Remove bits member.
> (struct ipcp_transformation): Remove bits member, adjust
> ctor and dtor.
> (ipa_get_ipa_bits_for_value): Remove.
> * ipa-prop.cc (struct ipa_bit_ggc_hash_traits): Remove.
> (ipa_bits_hash_table): Remove.
> (ipa_print_node_jump_functions_for_edge): Don't print bits.
> (ipa_get_ipa_bits_for_value): Remove.
> (ipa_set_jfunc_bits): Remove.
> (ipa_compute_jump_functions_for_edge): For pointers query
> pointer alignment before ipa_set_jfunc_vr and update_bitmask
> in there.  For integral types, just rely on bitmask already
> being handled in value ranges.
> (ipa_check_create_edge_args): Don't create ipa_bits_hash_table.
> (ipcp_transformation_initialize): Neither here.
> (ipcp_transformation_t::duplicate): Don't copy bits vector.
> (ipa_write_jump_function): Don't stream bits here.
> (ipa_read_jump_function): Neither here.
> (useful_ipcp_transformation_info_p): Don't test bits vec.
> (write_ipcp_transformation_info): Don't stream bits here.
> (read_ipcp_transformation_info): Neither here.
> (ipcp_get_parm_bits): Get mask and value from m_vr rather
> than bits.
> (ipcp_update_bits): Remove.
> (ipcp_update_vr): For pointers, set_ptr_info_alignment from
> bitmask stored in value range.
> (ipcp_transform_function): Don't test bits vector, don't call
> ipcp_update_bits.
> * ipa-cp.cc (propagate_bits_across_jump_function): Don't use
> jfunc->bits, instead get mask and value from jfunc->m_vr.
> (ipcp_store_bits_results): Remove.
> (ipcp_store_vr_results): Incorporate parts of
> ipcp_store_bits_results here, merge the bitmasks with value
> range if both are supplied.
> (ipcp_driver): Don't call ipcp_store_bits_results.
> * ipa-sra.cc (zap_useless_ipcp_results): Remove *ts->bits
> clearing.
>
> --- gcc/ipa-prop.h.jj   2023-10-05 11:32:40.172739988 +0200
> +++ gcc/ipa-prop.h  2023-10-05 11:36:45.405378086 +0200
> @@ -292,18 +292,6 @@ public:
>array_slice m_elts;
>  };
>
> -/* Information about zero/non-zero bits.  */
> -class GTY(()) ipa_bits
> -{
> -public:
> -  /* The propagated value.  */
> -  widest_int value;
> -  /* Mask corresponding to the value.
> - Similar to ccp_latt

Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jan Hubicka
> On 05/10/2023 22:39, Jørgen Kvalsvik wrote:
> > On 05/10/2023 21:59, Jan Hubicka wrote:
> > > > 
> > > > Like Wahlen et al this implementation records coverage in fixed-size
> > > > bitsets which gcov knows how to interpret. This is very fast, but
> > > > introduces a limit on the number of terms in a single boolean
> > > > expression, the number of bits in a gcov_unsigned_type (which is
> > > > typedef'd to uint64_t), so for most practical purposes this would be
> > > > acceptable. This limitation is in the implementation and not the
> > > > algorithm, so support for more conditions can be added by also
> > > > introducing arbitrary-sized bitsets.
> > > 
> > > This should not be too hard to do - if conditionalis more complex you
> > > simply introduce more than one counter for it, right?
> > > How many times this trigers on GCC sources?
> > 
> > It shouldn't be, no. But when dynamic bitsets are on the table it would
> > be much better to length-encode in smaller multiples than the 64-bit
> > counters. Most expressions are small (<4 terms), so the savings would be
> > substantial. I opted for the simpler fixed-size to start with because it
> > is much simpler and would not introduce any branching or decisions in
> > the instrumentation.
> 
> Oh, and I forgot - I have never seen a real world case that have been more
> than 64 conditions, but I suppose it may happen with generated code.

reload.cc has some long hand-written conditionals in it.  The first one
I counted had 38 conditions. Some of them are macros that may expand to
sub-conditions :)

But I agree that such code should not be common and probably the
conditional should be factored to multiple predicates.

Honza


Re: [PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Jan Hubicka
> Hi!
> 
> The following patch removes ipa_bits struct pointer/vector from ipa
> jump functions and ipa cp transformations.
> 
> The reason is because the struct uses widest_int to represent
> mask/value pair, which in the RFC patches to allow larger precisions
> for wide_int/widest_int is GC unfriendly because those types become
> non-trivially default constructible/copyable/destructible.
> One option would be to use trailing_wide_int for that instead, but
> as pointed out by Aldy, irange_storage which we already use under
> the hood for ipa_vr when type of parameter is integral or pointer
> already stores the mask/value pair because VRP now does the bit cp
> as well.
> So, this patch just uses m_vr to store both the value range and
> the bitmask.  There is still separate propagation of the
> ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but
> when storing we merge the two into the same container.
> 
> I've bootstrapped/regtested a slightly older version of this
> patch on x86_64-linux and i686-linux and that version regressed
> +FAIL: gcc.dg/ipa/propalign-3.c scan-ipa-dump-not cp "align:"
> +FAIL: gcc.dg/ipa/propalign-3.c scan-tree-dump optimized "fail_the_test"
> +FAIL: gcc.dg/ipa/propbits-1.c scan-ipa-dump cp "Adjusting mask for param 0 
> to 0x7"
> +FAIL: gcc.dg/ipa/propbits-2.c scan-ipa-dump cp "Adjusting mask for param 0 
> to 0xf"
> The last 2 were solely about the earlier patch not actually copying
> the if (dump_file) dumping of message that we set some mask for some
> parameter (since then added in the @@ -5985,6 +5741,77 @@ hunk).
> The first testcase is a test for -fno-ipa-bit-cp disabling bit cp
> for alignments.  For integral types I'm afraid it is a lost case
> when -fno-ipa-bit-cp -fipa-vrp is on when value ranges track bit cp
> as well, but for pointer alignments I've added
>   && opt_for_fn (cs->caller->decl, flag_ipa_bit_cp)
> and
>   && opt_for_fn (node->decl, flag_ipa_bit_cp)
> guards such that even just -fno-ipa-bit-cp disables it (alternatively
> we could just add -fno-ipa-vrp to propalign-3.c dg-options).
> 
> Ok for trunk if this passes another bootstrap/regtest?
> Or defer until it is really needed (when the wide_int/widest_int
> changes are about to be committed)?

It does look like a nice cleanup to me.
I wonder if you did some compare of the bit information propagated with
new code and old code?  Theoretically they should be equivalent?

Honza


Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-05 Thread Jeff Law




On 10/3/23 05:45, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations



+
+/* If INSN is a root memory instruction then compute a potentially new offset
+   for it and test if the resulting instruction is valid.  */
+static void
+do_check_validity (rtx_insn *insn, fold_mem_info *info)
+{
+  rtx mem, reg;
+  HOST_WIDE_INT cur_offset;
+  if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
+return;
+
+  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+
+  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
+  int icode = INSN_CODE (insn);
+  rtx mem_addr = XEXP (mem, 0);
+  machine_mode mode = GET_MODE (mem_addr);
+  if (new_offset != 0)
+XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
+  else
+XEXP (mem, 0) = reg;
+
+  bool illegal = insn_invalid_p (insn, false)
+|| !memory_address_addr_space_p (mode, XEXP (mem, 0),
+ MEM_ADDR_SPACE (mem));
+
+  /* Restore the instruction.  */
+  XEXP (mem, 0) = mem_addr;
+  INSN_CODE (insn) = icode;
+
+  if (illegal)
+bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
+  else
+bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
+}
+
So overnight testing with the latest version of your patch triggered a 
fault on the sh3-linux-gnu target with this code at -O2:



enum
{
  _ISspace = ((5) < 8 ? ((1 << (5)) << 8) : ((1 << (5)) >> 8)),
};
extern const unsigned short int **__ctype_b_loc (void)
 __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
void
read_alias_file (const char *fname, int fname_len)
{
  char buf[400];
  char *cp;
  cp = buf;
  while (((*__ctype_b_loc ())[(int) (((unsigned char) cp[0]))] & (unsigned 
short int) _ISspace))
   ++cp;
}




The problem is we need to clear the INSN_CODE before we call recog.  In 
this specific case we had (mem (plus (reg) (offset)) after f-m-o does 
its job, the offset went to zero so we changed the structure of the RTL 
to (mem (reg)).  But we had the old INSN_CODE still in place which 
caused us to reference operands that no longer exist.


A simple INSN_CODE (insn) = -1 before calling_insn_invalid_p is the 
right fix.


jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-05 Thread Richard Kenner
> At that particular time I think Kenner was mostly focused on the alpha 
> and ppc ports, but I think he was also still poking around with romp and 
> a29k.  I think romp is an unlikely target for this because it didn't 
> promote modes and it wasn't even building for several months 
> (April->late July).

Obviously, I have no recollection of that change at all.

In July of 1994, I don't believe I was actively working on much in the
way of ports, though I could be misremembering.  My guess is that this
change was to fix some bug, but I'm a bit mystified why I'd have
batched so many different changes together in one ChangeLog entry like
that.  I think you're correct that it was most likely the Alpha that
showed the bug. 


Re: [PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Jakub Jelinek
On Thu, Oct 05, 2023 at 04:42:42PM +0200, Jan Hubicka wrote:
> It does look like a nice cleanup to me.
> I wonder if you did some compare of the bit information propagated with
> new code and old code?  Theoretically they should be equivalent?

Beyond testsuite, I've tried
__attribute__((noinline, noclone)) static int
foo (int x, int y, int *p)
{
  return p[x + y];
}

__attribute__((noinline, noclone)) static int
bar (int x, int y, int *p)
{
  return foo (x, y & 0xff, p);
}

int
baz (int x, int y, int *p)
{
  return bar ((x & 0x) | 0x12345678, (x & 0x) | 0x87654321, 
__builtin_assume_aligned (p, 32, 16));
}
and -fdump-tree-ccp2-alias was identical before/after the patch,
so the expected
  # RANGE [irange] int [305419896, +INF] MASK 0x45410105 VALUE 0x12345678
  int x_5(D) = x;
  # RANGE [irange] int [0, 255] MASK 0x8a VALUE 0x21
  int y_6(D) = y;
  # PT = nonlocal null
  # ALIGN = 32, MISALIGN = 16
  int * p_7(D) = p;
in foo (-O2).  With -O2 -fno-ipa-vrp
  # RANGE [irange] int [-INF, +INF] MASK 0x45410105 VALUE 0x12345678
  int x_5(D) = x;
  # RANGE [irange] int [-INF, +INF] MASK 0x8a VALUE 0x21
  int y_6(D) = y;
  # PT = nonlocal null
  # ALIGN = 32, MISALIGN = 16
  int * p_7(D) = p;
and -O2 -fno-ipa-bit-cp
  # RANGE [irange] int [305419896, +INF] MASK 0x45410105 VALUE 0x12345678
  int x_5(D) = x;
  # RANGE [irange] int [0, 255] MASK 0x8a VALUE 0x21
  int y_6(D) = y;
  # PT = nonlocal null
  int * p_7(D) = p;
All that is the same as before.

Jakub



Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-05 Thread Jeff Law




On 10/5/23 08:56, Richard Kenner wrote:

At that particular time I think Kenner was mostly focused on the alpha
and ppc ports, but I think he was also still poking around with romp and
a29k.  I think romp is an unlikely target for this because it didn't
promote modes and it wasn't even building for several months
(April->late July).


Obviously, I have no recollection of that change at all.

That's the assumption I made :-)



In July of 1994, I don't believe I was actively working on much in the
way of ports, though I could be misremembering.  My guess is that this
change was to fix some bug, but I'm a bit mystified why I'd have
batched so many different changes together in one ChangeLog entry like
that.  I think you're correct that it was most likely the Alpha that
showed the bug.
The alpha was a combination of my memory and reviewing patches/email 
messages in that time span.


I agree this was almost certainly meant to be a bugfix and I suspect the 
bug was expanding directly into a promoted subreg target and ending up 
with inconsistency between the value and the promoted subreg state bits.


Jeff




Re: [PATCH 01/22] Add condition coverage profiling

2023-10-05 Thread Jørgen Kvalsvik

On 05/10/2023 21:59, Jan Hubicka wrote:


Like Wahlen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. This is very fast, but
introduces a limit on the number of terms in a single boolean
expression, the number of bits in a gcov_unsigned_type (which is
typedef'd to uint64_t), so for most practical purposes this would be
acceptable. This limitation is in the implementation and not the
algorithm, so support for more conditions can be added by also
introducing arbitrary-sized bitsets.


This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?


For space overhead, the instrumentation needs two accumulators
(gcov_unsigned_type) per condition in the program which will be written
to the gcov file. In addition, every function gets a pair of local
accumulators, but these accmulators are reused between conditions in the
same function.

For time overhead, there is a zeroing of the local accumulators for
every condition and one or two bitwise operation on every edge taken in
the an expression.

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

 3:   17:void fn (int a, int b, int c, int d) {
 3:   18:if ((a && (b || c)) && d)
conditions covered 3/8
condition  0 not covered (true)
condition  0 not covered (false)
condition  1 not covered (true)
condition  2 not covered (true)
condition  3 not covered (true)

It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?


Some expressions, mostly those without else-blocks, are effectively
"rewritten" in the CFG construction making the algorithm unable to
distinguish them:

and.c:

 if (a && b && c)
 x = 1;

ifs.c:

 if (a)
 if (b)
 if (c)
 x = 1;

gcc will build the same graph for both these programs, and gcov will
report boths as 3-term expressions. It is vital that it is not
interpreted the other way around (which is consistent with the shape of
the graph) because otherwise the masking would be wrong for the and.c
program which is a more severe error. While surprising, users would
probably expect some minor rewriting of semantically-identical
expressions.

and.c.gcov:
 #:2:if (a && b && c)
conditions covered 6/6
 #:3:x = 1;

ifs.c.gcov:
 #:2:if (a)
 #:3:if (b)
 #:4:if (c)
 #:5:x = 1;
conditions covered 6/6


Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is also used 
for
auto-FDO?


gcc/ChangeLog:

* builtins.cc (expand_builtin_fork_or_exec): Check
profile_condition_flag.
 * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
* common.opt: Add new options -fprofile-conditions and
* doc/gcov.texi: Add --conditions documentation.
* doc/invoke.texi: Add -fprofile-conditions documentation.
* gcc.cc: Link gcov on -fprofile-conditions.
* gcov-counter.def (GCOV_COUNTER_CONDS): New.
* gcov-dump.cc (tag_conditions): New.
* gcov-io.h (GCOV_TAG_CONDS): New.
(GCOV_TAG_CONDS_LENGTH): Likewise.
(GCOV_TAG_CONDS_NUM): Likewise.
* gcov.cc (class condition_info): New.
(condition_info::condition_info): New.
(condition_info::popcount): New.
(struct coverage_info): New.
(add_condition_counts): New.
(output_conditions): New.
(print_usage): Add -g, --conditions.
(process_args): Likewise.
(output_intermediate_json_line): Output conditions.
(read_graph_file): Read conditions counters.
(read_count_file): Read conditions counters.
(file_summary): Print conditions.
(accumulate_line_info): Accumulate conditions.
(output_line_details): Print conditions.
* ipa-inline.cc (can_early_inline_edge_p): Check
profile_condition_flag.
* ipa-split.cc (pass_split_functions::gate): Likewise.
* passes.cc (finish_optimization_passes): Likewise.
* profile.cc (find_conditions): New declaration.
(cov_length): Likewise.
(cov_blocks): Likewise.
(cov_masks): Likewise.
(cov_free): Likewise.
(instrument_decisions): New.
(read_thunk_profile): Control output to file.
(branch_prob): Call find_conditions, instr

Re: [PATCH] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-05 Thread Robin Dapp
> Your suggested code seems work fine, let me run more test and send
> v2, I guess I just don’t know how to explain why it work in comment
> :p

If it's too convoluted maybe we should rather not use it :D

The idea is for
  factor % (vlenb / potential_div) == 0
we're actually looking for the largest power of 2 that factor
and vlenb are both a multiple of.

(Individually) for factor the largest pow2 can be calculated
by ctz (factor).  Same for vlenb via ctz (vlenb).

If ctz (factor) >= ctz (vlenb), vlenb already divides factor
evenly.

Otherwise, i.e. when ctz (factor) = 1 (divisible by 2) and
ctz (vlenb) = 4 (divisible by 16) we need to divide vlenb by the
pow2 difference 1 << (4 - 1).

I just realize that the explanation is longer than the code before,
maybe not a good sign ;)  

Regards
 Robin



[COMMITTED] i386: Improve memory copy from named address space [PR111657]

2023-10-05 Thread Uros Bizjak
The stringop strategy selection algorithm falls back to a libcall strategy
when it exhausts its pool of available strategies.  The memory area copy
function (memcpy) is not available from the system library for non-default
address spaces, so the compiler emits the most trivial byte-at-a-time
copy loop instead.

The compiler should instead emit an optimized copy loop as a fallback for
non-default address spaces.

PR target/111657

gcc/ChangeLog:

* config/i386/i386-expand.cc (alg_usable_p): Reject libcall
strategy for non-default address spaces.
(decide_alg): Use loop strategy as a fallback strategy for
non-default address spaces.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111657.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e42ff27c6ef..9a988347200 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -8320,6 +8320,11 @@ alg_usable_p (enum stringop_alg alg, bool memset, bool 
have_as)
 {
   if (alg == no_stringop)
 return false;
+  /* It is not possible to use a library call if we have non-default
+ address space.  We can do better than the generic byte-at-a-time
+ loop, used as a fallback.  */
+  if (alg == libcall && have_as)
+return false;
   if (alg == vector_loop)
 return TARGET_SSE || TARGET_AVX;
   /* Algorithms using the rep prefix want at least edi and ecx;
@@ -8494,8 +8499,12 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT 
expected_size,
gcc_assert (alg != libcall);
   return alg;
 }
+
+  /* Try to use some reasonable fallback algorithm.  Note that for
+ non-default address spaces we default to a loop instead of
+ a libcall.  */
   return (alg_usable_p (algs->unknown_size, memset, have_as)
- ? algs->unknown_size : libcall);
+ ? algs->unknown_size : have_as ? loop : libcall);
 }
 
 /* Decide on alignment.  We know that the operand is already aligned to ALIGN
diff --git a/gcc/testsuite/gcc.target/i386/pr111657.c 
b/gcc/testsuite/gcc.target/i386/pr111657.c
new file mode 100644
index 000..fe54fcae8cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111657.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand -mno-sse" } */
+
+struct a { long arr[30]; };
+
+__seg_gs struct a m;
+void bar (struct a *dst) { *dst = m; }
+
+/* { dg-final { scan-rtl-dump-not "libcall" "expand" } } */


[committed] contrib: add mdcompact

2023-10-05 Thread Andrea Corallo
Hello all,

this patch checks in mdcompact, the tool written in elisp that I used
to mass convert all the multi choice pattern in the aarch64 back-end to
the new compact syntax.

I tested it on Emacs 29 (might run on older versions as well not
sure), also I verified it runs cleanly on a few other back-ends (arm,
loongarch).

The tool can be used to convert a single pattern, an open buffer or
all md files in a directory.

The tool might need further adjustment to run on some specific
back-end, in case very happy to help.

This patch was pre-approved here [1].

Best Regards

  Andrea Corallo

[1] 

contrib/ChangeLog

* mdcompact/mdcompact-testsuite.el: New file.
* mdcompact/mdcompact.el: Likewise.
* mdcompact/tests/1.md: Likewise.
* mdcompact/tests/1.md.out: Likewise.
* mdcompact/tests/2.md: Likewise.
* mdcompact/tests/2.md.out: Likewise.
* mdcompact/tests/3.md: Likewise.
* mdcompact/tests/3.md.out: Likewise.
* mdcompact/tests/4.md: Likewise.
* mdcompact/tests/4.md.out: Likewise.
* mdcompact/tests/5.md: Likewise.
* mdcompact/tests/5.md.out: Likewise.
* mdcompact/tests/6.md: Likewise.
* mdcompact/tests/6.md.out: Likewise.
* mdcompact/tests/7.md: Likewise.
* mdcompact/tests/7.md.out: Likewise.
---
 contrib/mdcompact/mdcompact-testsuite.el |  56 +
 contrib/mdcompact/mdcompact.el   | 296 +++
 contrib/mdcompact/tests/1.md |  36 +++
 contrib/mdcompact/tests/1.md.out |  32 +++
 contrib/mdcompact/tests/2.md |  25 ++
 contrib/mdcompact/tests/2.md.out |  21 ++
 contrib/mdcompact/tests/3.md |  16 ++
 contrib/mdcompact/tests/3.md.out |  17 ++
 contrib/mdcompact/tests/4.md |  17 ++
 contrib/mdcompact/tests/4.md.out |  17 ++
 contrib/mdcompact/tests/5.md |  12 +
 contrib/mdcompact/tests/5.md.out |  11 +
 contrib/mdcompact/tests/6.md |  11 +
 contrib/mdcompact/tests/6.md.out |  11 +
 contrib/mdcompact/tests/7.md |  11 +
 contrib/mdcompact/tests/7.md.out |  11 +
 16 files changed, 600 insertions(+)
 create mode 100644 contrib/mdcompact/mdcompact-testsuite.el
 create mode 100644 contrib/mdcompact/mdcompact.el
 create mode 100644 contrib/mdcompact/tests/1.md
 create mode 100644 contrib/mdcompact/tests/1.md.out
 create mode 100644 contrib/mdcompact/tests/2.md
 create mode 100644 contrib/mdcompact/tests/2.md.out
 create mode 100644 contrib/mdcompact/tests/3.md
 create mode 100644 contrib/mdcompact/tests/3.md.out
 create mode 100644 contrib/mdcompact/tests/4.md
 create mode 100644 contrib/mdcompact/tests/4.md.out
 create mode 100644 contrib/mdcompact/tests/5.md
 create mode 100644 contrib/mdcompact/tests/5.md.out
 create mode 100644 contrib/mdcompact/tests/6.md
 create mode 100644 contrib/mdcompact/tests/6.md.out
 create mode 100644 contrib/mdcompact/tests/7.md
 create mode 100644 contrib/mdcompact/tests/7.md.out

diff --git a/contrib/mdcompact/mdcompact-testsuite.el 
b/contrib/mdcompact/mdcompact-testsuite.el
new file mode 100644
index 000..494c0b5cd68
--- /dev/null
+++ b/contrib/mdcompact/mdcompact-testsuite.el
@@ -0,0 +1,56 @@
+;;; -*- lexical-binding: t; -*-
+
+;; This file is part of GCC.
+
+;; GCC is free software: you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC.  If not, see .
+
+;;; Commentary:
+
+;;; Usage:
+;; $ emacs -batch -l mdcompact.el -l mdcompact-testsuite.el -f 
ert-run-tests-batch-and-exit 
+
+;;; Code:
+
+(require 'mdcompact)
+(require 'ert)
+
+(defconst mdcompat-test-directory (concat (file-name-directory
+  (or load-file-name
+   buffer-file-name))
+ "tests/"))
+
+(defun mdcompat-test-run (f)
+  (with-temp-buffer
+(insert-file-contents f)
+(mdcomp-run-at-point)
+(let ((a (buffer-string))
+ (b (with-temp-buffer
+  (insert-file-contents (concat f ".out"))
+  (buffer-string
+  (should (string= a b)
+
+(defmacro mdcompat-gen-tests ()
+  `(progn
+ ,@(cl-loop
+  for f in (directory-files mdcompat-test-directory t "md$")
+  collect
+  `(ert-deftest ,(intern (concat "mdcompat-test-"
+(file-n

[committed 1/2] secpol: add grammatically missing commas / remove one excess instance

2023-10-05 Thread Siddhesh Poyarekar
From: Jan Engelhardt 

Signed-off-by: Jan Engelhardt 

ChangeLog:

* SECURITY.txt: Fix up commas.
---
 SECURITY.txt | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/SECURITY.txt b/SECURITY.txt
index b65f24cfc2a..93792923583 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -3,12 +3,12 @@ What is a GCC security bug?
 
 A security bug is one that threatens the security of a system or
 network, or might compromise the security of data stored on it.
-In the context of GCC there are multiple ways in which this might
+In the context of GCC, there are multiple ways in which this might
 happen and some common scenarios are detailed below.
 
 If you're reporting a security issue and feel like it does not fit
 into any of the descriptions below, you're encouraged to reach out
-through the GCC bugzilla or if needed, privately, by following the
+through the GCC bugzilla or, if needed, privately, by following the
 instructions in the last two sections of this document.
 
 Compiler drivers, programs, libgccjit and support libraries
@@ -24,11 +24,11 @@ Compiler drivers, programs, libgccjit and support libraries
 
 The libgccjit library can, despite the name, be used both for
 ahead-of-time compilation and for just-in-compilation.  In both
-cases it can be used to translate input representations (such as
-source code) in the application context; in the latter case the
+cases, it can be used to translate input representations (such as
+source code) in the application context; in the latter case, the
 generated code is also run in the application context.
 
-Limitations that apply to the compiler driver, apply here too in
+Limitations that apply to the compiler driver apply here too in
 terms of trusting inputs and it is recommended that both the
 compilation *and* execution context of the code are appropriately
 sandboxed to contain the effects of any bugs in libgccjit, the
@@ -43,7 +43,7 @@ Compiler drivers, programs, libgccjit and support libraries
 
 Libraries such as zlib that are bundled with GCC to build it will be
 treated the same as the compiler drivers and programs as far as
-security coverage is concerned.  However if you find an issue in
+security coverage is concerned.  However, if you find an issue in
 these libraries independent of their use in GCC, you should reach
 out to their upstream projects to report them.
 
@@ -97,7 +97,7 @@ Language runtime libraries
 * libssp
 * libstdc++
 
-These libraries are intended to be used in arbitrary contexts and as
+These libraries are intended to be used in arbitrary contexts and, as
 a result, bugs in these libraries may be evaluated for security
 impact.  However, some of these libraries, e.g. libgo, libphobos,
 etc.  are not maintained in the GCC project, due to which the GCC
@@ -145,7 +145,7 @@ GCC plugins
 
 It should be noted that GCC may execute arbitrary code loaded by a
 user through the GCC plugin mechanism or through system preloading
-mechanism.  Such custom code should be vetted by the user for safety
+mechanism.  Such custom code should be vetted by the user for safety,
 as bugs exposed through such code will not be considered security
 issues.
 
-- 
2.41.0



[committed 0/2] SECURITY.txt: Trivial fixups

2023-10-05 Thread Siddhesh Poyarekar
Committed some trivial comma and indentation fixups that Jan shared with
me off-list.

Jan Engelhardt (2):
  secpol: add grammatically missing commas / remove one excess instance
  secpol: consistent indentation

 SECURITY.txt | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

-- 
2.41.0



[committed 2/2] secpol: consistent indentation

2023-10-05 Thread Siddhesh Poyarekar
From: Jan Engelhardt 

86% of the document have 4 spaces; adjust the remaining 14%.

Signed-off-by: Jan Engelhardt 

ChangeLog:

* SECURITY.txt: Fix up indentation.
---
 SECURITY.txt | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/SECURITY.txt b/SECURITY.txt
index 93792923583..b3e2bbfda90 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -173,33 +173,33 @@ Security features implemented in GCC
 Reporting private security bugs
 ===
 
-   *All bugs reported in the GCC Bugzilla are public.*
+*All bugs reported in the GCC Bugzilla are public.*
 
-   In order to report a private security bug that is not immediately
-   public, please contact one of the downstream distributions with
-   security teams.  The following teams have volunteered to handle
-   such bugs:
+In order to report a private security bug that is not immediately
+public, please contact one of the downstream distributions with
+security teams.  The following teams have volunteered to handle
+such bugs:
 
   Debian:  secur...@debian.org
   Red Hat: secal...@redhat.com
   SUSE:secur...@suse.de
   AdaCore: product-secur...@adacore.com
 
-   Please report the bug to just one of these teams.  It will be shared
-   with other teams as necessary.
+Please report the bug to just one of these teams.  It will be shared
+with other teams as necessary.
 
-   The team contacted will take care of details such as vulnerability
-   rating and CVE assignment (http://cve.mitre.org/about/).  It is likely
-   that the team will ask to file a public bug because the issue is
-   sufficiently minor and does not warrant an embargo.  An embargo is not
-   a requirement for being credited with the discovery of a security
-   vulnerability.
+The team contacted will take care of details such as vulnerability
+rating and CVE assignment (http://cve.mitre.org/about/).  It is likely
+that the team will ask to file a public bug because the issue is
+sufficiently minor and does not warrant an embargo.  An embargo is not
+a requirement for being credited with the discovery of a security
+vulnerability.
 
 Reporting public security bugs
 ==
 
-   It is expected that critical security bugs will be rare, and that most
-   security bugs can be reported in GCC, thus making
-   them public immediately.  The system can be found here:
+It is expected that critical security bugs will be rare, and that most
+security bugs can be reported in GCC, thus making
+them public immediately.  The system can be found here:
 
   https://gcc.gnu.org/bugzilla/
-- 
2.41.0



Re: [PATCH v2] Add a GCC Security policy

2023-10-05 Thread Richard Earnshaw (lists)
On 28/09/2023 12:55, Siddhesh Poyarekar wrote:
> +Security features implemented in GCC
> +
> +
[...]
> +
> +Similarly, GCC may transform code in a way that the correctness of
> +the expressed algorithm is preserved, but supplementary properties
> +that are not specifically expressible in a high-level language
> +are not preserved. Examples of such supplementary properties
> +include absence of sensitive data in the program's address space
> +after an attempt to wipe it, or data-independent timing of code.
> +When the source code attempts to express such properties, failure
> +to preserve them in resulting machine code is not a security issue
> +in GCC.

I think it would be worth mentioning here that compilers interpret source code 
according to an abstract machine defined by the source language.  Properties of 
a program that cannot be described in the abstract machine may not be 
translated into the generated machine code.

This is, fundamentally, describing the 'as if' rule.

R.


Re: [PATCH] ira: Scale save/restore costs of callee save registers with block frequency

2023-10-05 Thread Vladimir Makarov



On 10/3/23 10:07, Surya Kumari Jangala wrote:

ira: Scale save/restore costs of callee save registers with block frequency

In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.


Thank you for addressing this part of code.  Sometimes changes looking 
obvious have unpredicted results.  I remember experimenting with 
different heuristics for this code long time ago when 32-bit x86 target 
was the major one and this was the best variant I found.  Since a lot of 
changes happened since then, I decided to benchmark your change.


This change is increasing x86-64 spec2017 code size by 0.67% in 
average.  The increase is very stable for 20 spec2017 benchmarks. Only 
code for bwaves is smaller (by 0.01%).  The specfp2017 performance is 
the same.  There is one positive impact, specin2017 improved by 0.6% 
(8.59 vs 8.54) mainly because of improvement of xalamcbmk (2.5%) and 
exchange (5%).


So I propose to make this change only when it is not an optimization for 
the code size.  Also please be prepared that there might be testsuite 
failures on other targets: some targets are overconstrained by tests 
expecting specific generated code.



2023-10-03  Surya Kumari Jangala  

gcc/
PR rtl-optimization/111673
* ira-color.cc (assign_hard_reg): Scale save/restore costs of
callee save registers with block frequency.

gcc/testsuite/
PR rtl-optimization/111673
* gcc.target/powerpc/pr111673/c: New test.
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..eb20c52310d 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2175,7 +2175,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
add_cost = ((ira_memory_move_cost[mode][rclass][0]
 + ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
- mode) - 1);
+ mode) - 1)
+   * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
cost += add_cost;
full_cost += add_cost;
  }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c 
b/gcc/testsuite/gcc.target/powerpc/pr111673.c
new file mode 100644
index 000..e0c0f85460a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+
+int f (int);
+int
+advance (int dz)
+{
+  if (dz > 0)
+return (dz + dz) * dz;
+  else
+return dz * f (dz);
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */





Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-05 Thread Jeff Law




On 10/5/23 07:33, Robin Dapp wrote:

So I think Kenner's code is trying to prevent having a value in a
SUBREG that is inconsistent with the SUBREG_PROMOTED* flag bits.  But
I think it's been unnecessary since Matz's rewrite in 2009.


I couldn't really tell what the rewrite does entirely so I tried creating
a case where we would require the SUBREG_PROMOTED_VAR but couldn't come
up with any.  At least for the most common path through expr I believe
I know why:

So our case is when we have an SI subreg from a DI reg that is originally
a sign-extended SI.  Then we NOP-convert the SI subreg from signed to
unsigned.  We only perform implicit sign extensions therefore we can
omit the implicit zero-extension case here.
Right.  The extension into bits 32..63, whether it be zero or sign 
extension is essentially a don't care.  It's there because of 
PROMOTE_MODE forcing most operations to 64 bits to match the hardware, 
even if the upper 32 bits aren't ever relevant.







The way the result of the signed->unsigned conversion is used determines
whether we can use SUBREG_PROMOTED_VAR.  There are two possibilities
(1) and (2).

  void foo (int bar)
  {
 unsigned int u = (unsigned int) bar;


(1) unsigned long long ul = (unsigned long long) u;

As long as the result is used unsigned, we will always perform a zero
extension no matter the "Kenner hunk" (because whether the subreg has
SRP_SIGNED or !SUBREG_PROMOTED_VAR does not change the need for a
zero_extend).

Right.




(2) long long l = (long long) u;

SUBREG_PROMOTED is checked by the following in convert_move:

   scalar_int_mode to_int_mode;
   if (GET_CODE (from) == SUBREG
   && SUBREG_PROMOTED_VAR_P (from)
   && is_a  (to_mode, &to_int_mode)
   && (GET_MODE_PRECISION (subreg_promoted_mode (from))
  >= GET_MODE_PRECISION (to_int_mode))
   && SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp))

The SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp) is decisive
as far as I can tell.
Right.   We have already ensured the modes are either the same size or 
the PARM_DECL's mode is wider than the local VAR_DECL's mode.  So the 
check that FROM has the same promotion property as UNSIGNEDP is going to 
be decisive.


  unsignedp = 1 comes from treeop0 so our
Correct.  It comes from the TREE_TYPE (treeop0) where treeop0 is the 
incoming PARM_DECL.



"from" (i.e. unsigned int u).
With the "Kenner hunk" SUBREG_PROMOTED_VAR is unset, so we don't
strip the extension.  Without it, SUBREG_PROMOTED_VAR () == SRP_SIGNED
which is != unsignedp, so no stripping either.
Correct.  The Kenner hunk wipes SUBREG_PROMOTED_VAR, meaning the 
promotion state of the object is unknown.




Now there are several other paths that would need auditing as well
but at least this one is safe.  An interesting test target would be
a backend that does implicit zero extensions but as we haven't seen
fallout so far chances to find a trigger are slim.
I did some testing of the other paths yesterday, but didn't include them 
in my message.


First, if the PARM_DECL is a narrower type than the local VAR_DECL, then 
the path we're considering changing doesn't get used because the modes 
have different sizes.   Thus we need not worry about this case.


If the PARM_DECL is wider than the local VAR_DECL, then we downsize to 
the same size as the VAR_DECL via a SUBREG and it behaves the same as 
the Vineet's original when the sizes are the same, but they differ in 
signedness.  So if we conclude the same size cases are OK, then the case 
where the PARM_DECL is wider than the VAR_DECL, we're going to be safe 
as well.



Jeff


Re: [PATCH] RISC-V: xfail gcc.dg/pr90263.c for riscv_v

2023-10-05 Thread Patrick O'Neill



On 10/4/23 15:29, Jeff Law wrote:



On 10/4/23 16:21, Patrick O'Neill wrote:


On 10/4/23 15:14, Jeff Law wrote:



On 10/4/23 15:57, Patrick O'Neill wrote:

Since r14-4358-g9464e72bcc9 riscv_v targets use vector instructions to
perform a memcpy. We no longer expect memcpy for riscv_v targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90263.c: xfail riscv_v targets.
Or rather than XFAIL skip the test?  XFAIL kind of implies its 
something we'd like to fix.  But in this case we don't want a memcpy 
call as the inlined vector implementation is almost certainly better.

Ah. Since XFAIL notifies us if a test starts passing (via xpass) I
thought it would help us ensure the test doesn't start passing
on riscv_v. I didn't know it implied something needed to be fixed.

I'll rework it to skip riscv_v targets.

Hopefully that works.

If you wanted a test to verify that we don't go backwards and start 
emitting a memcpy, you can set up a test like


// dg-directives
#include "pr90263.c"

// dg directives for scanning

Where the scanning verifies that we don't have a call to memcpy. The 
kind of neat thing here is the dg directives in the included file are 
ignored, so you can use the same test sources in multiple ways.


Given this is kindof specific to risc-v, it might make more sense in 
the riscv directory.


Jeff


Title changed/superseded by:
https://inbox.sourceware.org/gcc-patches/20231004225527.930610-1-patr...@rivosinc.com/T/#u

Patrick



Re: [PATCH V5 1/2] rs6000: optimize moving to sf from highpart di

2023-10-05 Thread David Edelsohn
On Thu, Oct 5, 2023 at 12:50 AM Jiufu Guo  wrote:

> Hi,
>
> Currently, we have the pattern "movsf_from_si2" which was trying
> to support moving high part DI to SF.
>
> But current pattern only accepts "ashiftrt":
> XX:SF=bitcast:SF(subreg(YY:DI>>32),0), but actually "lshiftrt" should
> also be ok.
> And current pattern only supports BE.
>
> This patch updats the pattern to support BE and "lshiftrt".
>
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628790.html
> This version refines the code slightly and updates the test case
> according to review comments.
>
> Pass bootstrap and regtest on ppc64{,le}.
> Is this ok for trunk?
>

Okay.

Thanks, David


>
> BR,
> Jeff (Jiufu Guo)
>
> PR target/108338
>
> gcc/ChangeLog:
>
> * config/rs6000/predicates.md (lowpart_subreg_operator): New
> define_predicate.
> * config/rs6000/rs6000.md (any_rshift): New code_iterator.
> (movsf_from_si2): Rename to ...
> (movsf_from_si2_): ... this.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr108338.c: New test.
>
> ---
>  gcc/config/rs6000/predicates.md |  5 +++
>  gcc/config/rs6000/rs6000.md | 12 ---
>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 37 +
>  3 files changed, 49 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>
> diff --git a/gcc/config/rs6000/predicates.md
> b/gcc/config/rs6000/predicates.md
> index 925f69cd3fc..ef7d3f214c4 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -2098,3 +2098,8 @@ (define_predicate "macho_pic_address"
>else
>  return false;
>  })
> +
> +(define_predicate "lowpart_subreg_operator"
> +  (and (match_code "subreg")
> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG
> (op)))
> +   == SUBREG_BYTE (op)")))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 1a9a7b1a479..56bd8bc1147 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -643,6 +643,9 @@ (define_code_iterator any_extend[sign_extend
> zero_extend])
>  (define_code_iterator any_fix  [fix unsigned_fix])
>  (define_code_iterator any_float[float unsigned_float])
>
> +; Shift right.
> +(define_code_iterator any_shiftrt  [ashiftrt lshiftrt])
> +
>  (define_code_attr u  [(sign_extend "")
>   (zero_extend  "u")
>   (fix  "")
> @@ -8303,14 +8306,13 @@ (define_insn_and_split "movsf_from_si"
>  ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>  ;; split it before reload with "and mask" to avoid generating shift right
>  ;; 32 bit then shift left 32 bit.
> -(define_insn_and_split "movsf_from_si2"
> +(define_insn_and_split "movsf_from_si2_"
>[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
> (unspec:SF
> -[(subreg:SI
> -  (ashiftrt:DI
> +[(match_operator:SI 3 "lowpart_subreg_operator"
> +  [(any_shiftrt:DI
> (match_operand:DI 1 "input_operand" "r")
> -   (const_int 32))
> -  0)]
> +   (const_int 32))])]
>  UNSPEC_SF_FROM_SI))
>(clobber (match_scratch:DI 2 "=r"))]
>"TARGET_NO_SF_SUBREG"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> new file mode 100644
> index 000..bd83c0b3ad8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> @@ -0,0 +1,37 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-O2 -save-temps" } */
> +
> +/* Under lp64, parameter 'v' is in DI regs, then bitcast sub DI to SF. */
> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 1 { target { lp64 &&
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 &&
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 &&
> has_arch_pwr8 } } } } */
> +
> +struct di_sf_sf
> +{
> +  float f1; float f2; long long l;
> +};
> +
> +float __attribute__ ((noipa))
> +sf_from_high32bit_di (struct di_sf_sf v)
> +{
> +#ifdef __LITTLE_ENDIAN__
> +  return v.f2;
> +#else
> +  return v.f1;
> +#endif
> +}
> +
> +int main()
> +{
> +  struct di_sf_sf v;
> +  v.f1 = v.f2 = 0.0f;
> +#ifdef __LITTLE_ENDIAN__
> +  v.f2 = 2.0f;
> +#else
> +  v.f1 = 2.0f;
> +#endif
> +  if (sf_from_high32bit_di (v) != 2.0f)
> +__builtin_abort ();
> +  return 0;
> +}
> --
> 2.25.1
>
>


Re: [PATCH V5 2/2] rs6000: use mtvsrws to move sf from si p9

2023-10-05 Thread David Edelsohn
On Thu, Oct 5, 2023 at 12:14 AM Jiufu Guo  wrote:

> Hi,
>
> As mentioned in PR108338, on p9, we could use mtvsrws to implement
> the bitcast from SI to SF (or lowpart DI to SF).
>
> For example:
>   *(long long*)buff = di;
>   float f = *(float*)(buff);
>
> "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
> A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".
>
> Compare with previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628791.html
> According to review comments, this version refines commit message
> and words in comments, also updates the test case
>
> Pass bootstrap and regtest on ppc64{,le}.
> Is this ok for trunk?
>

Okay.

Thanks, David


>
> BR,
> Jeff (Jiufu Guo)
>
> PR target/108338
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.md (movsf_from_si): Update to generate
> mtvsrws
> for P9.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.
>
> ---
>  gcc/config/rs6000/rs6000.md | 25 -
>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 21 ++---
>  2 files changed, 37 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 56bd8bc1147..d6dfb25cea0 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8283,13 +8283,26 @@ (define_insn_and_split "movsf_from_si"
>  {
>rtx op0 = operands[0];
>rtx op1 = operands[1];
> -  rtx op2 = operands[2];
> -  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>
> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +  /* Move lowpart 32-bits from register for SFmode.  */
> +  if (TARGET_P9_VECTOR)
> +{
> +  /* Using mtvsrws;xscvspdpn.  */
> +  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
> +  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}
> +  else
> +{
> +  rtx op2 = operands[2];
> +  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
> +
> +  /* Using sldi;mtvsrd;xscvspdpn.  */
> +  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> +  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}
> +
>DONE;
>  }
>[(set_attr "length"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> index bd83c0b3ad8..5f2f62866ee 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr108338.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> @@ -3,9 +3,12 @@
>  /* { dg-options "-O2 -save-temps" } */
>
>  /* Under lp64, parameter 'v' is in DI regs, then bitcast sub DI to SF. */
> -/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 1 { target { lp64 &&
> has_arch_pwr8 } } } } */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 &&
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 &&
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && {
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 &&
> has_arch_pwr9 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { lp64 &&
> has_arch_pwr9 } } } } */
>  /* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 &&
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && {
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
>
>  struct di_sf_sf
>  {
> @@ -22,16 +25,28 @@ sf_from_high32bit_di (struct di_sf_sf v)
>  #endif
>  }
>
> +float __attribute__ ((noipa))
> +sf_from_low32bit_di (struct di_sf_sf v)
> +{
> +#ifdef __LITTLE_ENDIAN__
> +  return v.f1;
> +#else
> +  return v.f2;
> +#endif
> +}
> +
>  int main()
>  {
>struct di_sf_sf v;
>v.f1 = v.f2 = 0.0f;
>  #ifdef __LITTLE_ENDIAN__
> +  v.f1 = 1.0f;
>v.f2 = 2.0f;
>  #else
>v.f1 = 2.0f;
> +  v.f2 = 1.0f;
>  #endif
> -  if (sf_from_high32bit_di (v) != 2.0f)
> +  if (sf_from_high32bit_di (v) != 2.0f || sf_from_low32bit_di (v) != 1.0f)
>  __builtin_abort ();
>return 0;
>  }
> --
> 2.25.1
>
>


[PATCH][_GLIBCXX_INLINE_VERSION] Add missing symbols

2023-10-05 Thread François Dumont

Here is a patch to fix following test case in gcc:

gcc/testsuite/g++.dg/cpp23/ext-floating13.C

    libstdc++: [_GLIBCXX_INLINE_VERSION] Add missing float symbols

    libstdc++-v3/ChangeLog:

    * config/abi/pre/gnu-versioned-namespace.ver: Add missing 
symbols

    for _Float{16,32,64,128,32x,64x,128x}.

Ok to commit ?

François

diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 267ab8fc719..9fab8bead15 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -318,6 +318,15 @@ CXXABI_2.0 {
 _ZTIPD[fde];
 _ZTIPKD[fde];
 
+# typeinfo for _Float{16,32,64,128,32x,64x,128x} and
+# __bf16
+_ZTIDF[0-9]*[_bx];
+_ZTIPDF[0-9]*[_bx];
+_ZTIPKDF[0-9]*[_bx];
+_ZTIu6__bf16;
+_ZTIPu6__bf16;
+_ZTIPKu6__bf16;
+
 # typeinfo for decltype(nullptr)
 _ZTIDn;
 _ZTIPDn;


Re: [PATCH][_GLIBCXX_INLINE_VERSION] Add missing symbols

2023-10-05 Thread Jonathan Wakely
On Thu, 5 Oct 2023 at 18:04, François Dumont  wrote:
>
> Here is a patch to fix following test case in gcc:
>
> gcc/testsuite/g++.dg/cpp23/ext-floating13.C
>
>  libstdc++: [_GLIBCXX_INLINE_VERSION] Add missing float symbols
>
>  libstdc++-v3/ChangeLog:
>
>  * config/abi/pre/gnu-versioned-namespace.ver: Add missing
> symbols
>  for _Float{16,32,64,128,32x,64x,128x}.
>
> Ok to commit ?

OK, thanks.


>
> François
>



[Patch] libgomp.texi: Document some of the device-memory routines

2023-10-05 Thread Tobias Burnus

I was checking one of those functions - and now ended up documenting
some of them. Still to be documented are omp_target_{is_accessible,memcpy*}.

I did run into some possibly questionable code for corner cases and have
filed https://gcc.gnu.org/PR111707 for those. The documentation matches
the current implementation.

Comments, suggestions, remarks?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Document some of the device-memory routines

libgomp/ChangeLog:

	* libgomp.texi (Device Memory Routines): New.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index d24f590fd84..0d965f96d48 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -514,7 +514,7 @@ specification in version 5.2.
 * Tasking Routines::
 @c * Resource Relinquishing Routines::
 * Device Information Routines::
-@c * Device Memory Routines::
+* Device Memory Routines::
 * Lock Routines::
 * Timing Routines::
 * Event Routine::
@@ -1658,25 +1658,298 @@ For OpenMP 5.1, this must be equal to the value returned by the
 
 
 
-@c @node Device Memory Routines
-@c @section Device Memory Routines
-@c
-@c Routines related to memory allocation and managing corresponding
-@c pointers on devices. They have C linkage and do not throw exceptions.
-@c 
-@c @menu
-@c * omp_target_alloc:: 
-@c * omp_target_free:: 
-@c * omp_target_is_present:: 
+@node Device Memory Routines
+@section Device Memory Routines
+
+Routines related to memory allocation and managing corresponding
+pointers on devices. They have C linkage and do not throw exceptions.
+
+@menu
+* omp_target_alloc:: Allocate device memory
+* omp_target_free:: Free device memory
+* omp_target_is_present:: Check whether storage is mapped
 @c * omp_target_is_accessible:: 
 @c * omp_target_memcpy:: 
 @c * omp_target_memcpy_rect:: 
 @c * omp_target_memcpy_async:: 
 @c * omp_target_memcpy_rect_async:: 
-@c * omp_target_associate_ptr:: 
-@c * omp_target_disassociate_ptr:: 
-@c * omp_get_mapped_ptr:: 
-@c @end menu
+@c * omp_target_memset:: /TR12
+@c * omp_target_memset_async:: /TR12
+* omp_target_associate_ptr:: Associate a device pointer with a host pointer
+* omp_target_disassociate_ptr:: Remove device--host pointer association
+* omp_get_mapped_ptr:: Return device pointer to a host pointer
+@end menu
+
+
+
+@node omp_target_alloc
+@subsection @code{omp_target_alloc} -- Allocate device memory
+@table @asis
+@item @emph{Description}:
+This routine allocates @var{size} bytes of memory in the device environment
+associated with the device number @var{device_num}.  If successful, a device
+pointer is returned, otherwise a null pointer.
+
+In GCC, when the device is the host or the device shares memory with the host,
+the memory is allocated on the host; in that case, when @var{size} is zero,
+either NULL or a unique pointer value that can later be successfully passed to
+@code{omp_target_free} is returned.  When the allocation is not performed on
+the host, a null pointer is returned when @var{size} is zero; in that case,
+additionally a diagnostic might be printed to standard error (stderr).
+
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *omp_target_alloc(size_t size, int device_num)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{type(c_ptr) function omp_target_alloc(size, device_num) bind(C)}
+@item   @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
+@item   @tab @code{integer(c_size_t), value :: size}
+@item   @tab @code{integer(c_int), value :: device_num}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_target_free}, @ref{omp_target_associate_ptr}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.1
+@end table
+
+
+
+@node omp_target_free
+@subsection @code{omp_target_free} -- Free device memory
+@table @asis
+@item @emph{Description}:
+This routine frees memory allocated by the @code{omp_target_alloc} routine.
+The @var{device_ptr} argument must be either a null pointer or a device pointer
+returned by @code{omp_target_alloc} for the specified @code{device_num}.  The
+device number @var{device_num} must be a conforming device number.
+
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void omp_target_free(void *device_ptr, int device_num)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine omp_ta

RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-05 Thread Tamar Christina
> > b17e1136600a 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -9476,3 +9476,57 @@ and,
> > }
> > (if (full_perm_p)
> > (vec_perm (op@3 @0 @1) @3 @2))
> > +
> > +/* Transform fneg (fabs (X)) -> X | 1 << signbit (X).  */
> > +
> > +(simplify
> > + (negate (abs @0))
> > + (if (FLOAT_TYPE_P (type)
> > +  /* We have to delay this rewriting till after forward prop because
> otherwise
> > +it's harder to do trigonometry optimizations. e.g. cos(-fabs(x)) is not
> > +matched in one go.  Instead cos (-x) is matched first followed by
> cos(|x|).
> > +The bottom op approach makes this rule match first and it's not untill
> > +fwdprop that we match top down.  There are manu such
> simplications
> > +so we
> Multiple typos this line.  fwdprop->fwprop manu->many
> simplications->simplifications.
> 
> OK with the typos fixed.

Ah I think you missed the previous emails from Richi whom wanted this 
canonicalized to
copysign instead. I've just finished doing so and will send the updated patch 😊

> 
> Thanks.  I meant to say hi at the Cauldron, but never seemed to get away long
> enough to find you..

Hehehe Indeed, I think I only saw you once and then *poof* like a ninja you 
were gone!

Next time 😊

Cheers,
Tamar

> 
> jeff



RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-05 Thread Tamar Christina
> I suppose the idea is that -abs(x) might be easier to optimize with other
> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> 
> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
> canonical than copysign.
> 
> > Should I try removing this?
> 
> I'd say yes (and put the reverse canonicalization next to this pattern).
> 

This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
canonical and allows a target to expand this sequence efficiently.  Such
sequences are common in scientific code working with gradients.

various optimizations in match.pd only happened on COPYSIGN but not COPYSIGN_ALL
which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to only
the C99 builtins and so doesn't work for vectors.

The patch expands these optimizations to work on COPYSIGN_ALL.

There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
which I remove since this is a less efficient form.  The testsuite is also
updated in light of this.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: Add new neg+abs rule, remove inverse copysign rule and
expand existing copysign optimizations.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.dg/fold-copysign-1.c: Updated.
* gcc.dg/pr55152-2.c: Updated.
* gcc.dg/tree-ssa/abs-4.c: Updated.
* gcc.dg/tree-ssa/backprop-6.c: Updated.
* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
* gcc.target/aarch64/fneg-abs_1.c: New test.
* gcc.target/aarch64/fneg-abs_2.c: New test.
* gcc.target/aarch64/fneg-abs_3.c: New test.
* gcc.target/aarch64/fneg-abs_4.c: New test.
* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
* gcc.target/aarch64/sve/fneg-abs_4.c: New test.

--- inline copy of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
 (for coss (COS COSH)
- copysigns (COPYSIGN)
- (simplify
-  (coss (copysigns @0 @1))
-   (coss @0)))
+ (for copysigns (COPYSIGN_ALL)
+  (simplify
+   (coss (copysigns @0 @1))
+(coss @0
 
 /* pow(copysign(x, y), z) -> pow(x, z) if z is an even integer.  */
 (for pows (POW)
- copysigns (COPYSIGN)
- (simplify
-  (pows (copysigns @0 @2) REAL_CST@1)
-  (with { HOST_WIDE_INT n; }
-   (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
-(pows @0 @1)
+ (for copysigns (COPYSIGN_ALL)
+  (simplify
+   (pows (copysigns @0 @2) REAL_CST@1)
+   (with { HOST_WIDE_INT n; }
+(if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
+ (pows @0 @1))
 /* Likewise for powi.  */
 (for pows (POWI)
- copysigns (COPYSIGN)
- (simplify
-  (pows (copysigns @0 @2) INTEGER_CST@1)
-  (if ((wi::to_wide (@1) & 1) == 0)
-   (pows @0 @1
+ (for copysigns (COPYSIGN_ALL)
+  (simplify
+   (pows (copysigns @0 @2) INTEGER_CST@1)
+   (if ((wi::to_wide (@1) & 1) == 0)
+(pows @0 @1)
 
 (for hypots (HYPOT)
- copysigns (COPYSIGN)
- /* hypot(copysign(x, y), z) -> hypot(x, z).  */
- (simplify
-  (hypots (copysigns @0 @1) @2)
-  (hypots @0 @2))
- /* hypot(x, copysign(y, z)) -> hypot(x, y).  */
- (simplify
-  (hypots @0 (copysigns @1 @2))
-  (hypots @0 @1)))
+ (for copysigns (COPYSIGN)
+  /* hypot(copysign(x, y), z) -> hypot(x, z).  */
+  (simplify
+   (hypots (copysigns @0 @1) @2)
+   (hypots @0 @2))
+  /* hypot(x, copysign(y, z)) -> hypot(x, y).  */
+  (simplify
+   (hypots @0 (copysigns @1 @2))
+   (hypots @0 @1
 
-/* copysign(x, CST) -> [-]abs (x).  */
-(for copysigns (COPYSIGN_ALL)
- (simplify
-  (copysigns @0 REAL_CST@1)
-  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
-   (negate (abs @0))
-   (abs @0
+/* Transform fneg (fabs (X)) -> copysign (X, -1).  */
+
+(simplify
+ (negate (abs @0))
+ (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
 
 /* copysign(copysign(x, y), z) -> copysign(x, z).  */
 (for copysigns (COPYSIGN_ALL)
diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
b/gcc/testsuite/gcc.dg/fold-copysign-1.c
index 
f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6
 100644
--- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
+++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
@@ -12,5 +12,5 @@ double bar (double x)
   return __builtin_copysign (x, minuszero);
 }
 
-/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */
-/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
+/* { 

RE: [PATCH]AArch64: Use SVE unpredicated LOGICAL expressions when Advanced SIMD inefficient [PR109154]

2023-10-05 Thread Tamar Christina
> >>
> >> The WIP SME patches add a %Z modifier for 'z' register prefixes,
> >> similarly to b/h/s/d for scalar FP.  With that I think the alternative can 
> >> be:
> >>
> >>  [w , 0 , ; * , sve ] \t%Z0., %Z0., #%2
> >>
> >> although it would be nice to keep the hex constant.
> >
> > My original patch added a %u for (undecorated) which just prints the
> > register number and changed %C to also accept a single constant instead of
> only a uniform vector.
> 
> Not saying no to %u in future, but %Z seems more consistent with the current
> approach.  And yeah, I'd also wondered about extending %C.
> The problem is guessing whether to print a 32-bit, 64-bit or 128-bit constant
> for negative immediates.
> 

I know we're waiting for the %Z but I've updated the remainder of the series 
and for
completeness and CI purposes I'm sending the updated patch before the change to
use %Z.

--

SVE has much bigger immediate encoding range for bitmasks than Advanced SIMD has
and so on a system that is SVE capable if we need an Advanced SIMD Inclusive-OR
by immediate and would require a reload then use an unpredicated SVE ORR 
instead.

This has both speed and size improvements.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64.md (3): Add SVE split case.
* config/aarch64/aarch64-simd.md (ior3): Likewise.
* config/aarch64/iterators.md (VCONV, vconv): New.
* config/aarch64/predicates.md(aarch64_orr_imm_sve_advsimd): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/fneg-abs_1.c: Updated.
* gcc.target/aarch64/sve/fneg-abs_2.c: Updated.
* gcc.target/aarch64/sve/fneg-abs_4.c: Updated.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
33eceb436584ff73c7271f93639f2246d1af19e0..25a1e4e8ecf767636c0ff3cdab6cad6e1482f73e
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1216,14 +1216,29 @@ (define_insn "and3"
 )
 
 ;; For ORR (vector, register) and ORR (vector, immediate)
-(define_insn "ior3"
+(define_insn_and_split "ior3"
   [(set (match_operand:VDQ_I 0 "register_operand")
(ior:VDQ_I (match_operand:VDQ_I 1 "register_operand")
-  (match_operand:VDQ_I 2 "aarch64_reg_or_orr_imm")))]
+  (match_operand:VDQ_I 2 "aarch64_orr_imm_sve_advsimd")))]
   "TARGET_SIMD"
-  {@ [ cons: =0 , 1 , 2   ]
- [ w, w , w   ] orr\t%0., %1., %2.
- [ w, 0 , Do  ] << aarch64_output_simd_mov_immediate (operands[2], 
, AARCH64_CHECK_ORR);
+  {@ [ cons: =0 , 1 , 2; attrs: arch ]
+ [ w, w , w  ; simd  ] orr\t%0., %1., 
%2.
+ [ w, 0 , vsl; sve   ] #
+ [ w, 0 , Do ; simd  ] \
+   << aarch64_output_simd_mov_immediate (operands[2], , \
+AARCH64_CHECK_ORR);
+  }
+  "&& TARGET_SVE && rtx_equal_p (operands[0], operands[1])
+   && satisfies_constraint_vsl (operands[2])
+   && FP_REGNUM_P (REGNO (operands[0]))"
+  [(const_int 0)]
+  {
+rtx op1 = lowpart_subreg (mode, operands[1], mode);
+rtx op2 =
+  gen_const_vec_duplicate (mode,
+  unwrap_const_vec_duplicate (operands[2]));
+emit_insn (gen_ior3 (op1, op1, op2));
+DONE;
   }
   [(set_attr "type" "neon_logic")]
 )
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
064d68ceb22533434468b22c4e5848e85a8c6eff..24349ecdbbab875f21975f116732a9e53762d4c1
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4545,7 +4545,7 @@ (define_insn_and_split "*aarch64_and_imm2"
   }
 )
 
-(define_insn "3"
+(define_insn_and_split "3"
   [(set (match_operand:GPI 0 "register_operand")
(LOGICAL:GPI (match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_logical_operand")))]
@@ -4553,8 +4553,19 @@ (define_insn "3"
   {@ [ cons: =0 , 1  , 2; attrs: type , arch  ]
  [ r, %r , r; logic_reg   , * ] \t%0, 
%1, %2
  [ rk   , r  ,  ; logic_imm   , * ] \t%0, 
%1, %2
+ [ w, 0  ,  ; *   , sve   ] #
  [ w, w  , w; neon_logic  , simd  ] 
\t%0., %1., %2.
   }
+  "&& TARGET_SVE && rtx_equal_p (operands[0], operands[1])
+   && satisfies_constraint_ (operands[2])
+   && FP_REGNUM_P (REGNO (operands[0]))"
+  [(const_int 0)]
+  {
+rtx op1 = lowpart_subreg (mode, operands[1], mode);
+rtx op2 = gen_const_vec_duplicate (mode, operands[2]);
+emit_insn (gen_3 (op1, op1, op2));
+DONE;
+  }
 )
 
 ;; zero_extend version of above
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
d17becc37e230684beaee3c69e2a0f0ce612eda5..5ec854a364e41b9827271ca6e870c8027336c7cd
 100644
--- a/gcc/config/aarc

RE: [PATCH]AArch64 Add special patterns for creating DI scalar and vector constant 1 << 63 [PR109154]

2023-10-05 Thread Tamar Christina
Hi,

> The lowpart_subreg should simplify this back into CONST0_RTX (mode),
> making it no different from:
> 
> emti_move_insn (target, CONST0_RTX (mode));
> 
> If the intention is to share zeros between modes (sounds good!), then I think
> the subreg needs to be on the lhs instead.
> 
> > +  rtx neg = lowpart_subreg (V2DFmode, target, mode);
> > +  emit_insn (gen_negv2df2 (neg, lowpart_subreg (V2DFmode, target,
> > + mode)));
> 
> The rhs seems simpler as copy_rtx (neg).  (Even the copy_rtx shouldn't be
> needed after RA, but it's probably more future-proof to keep it.)
> 
> > +  emit_move_insn (target, lowpart_subreg (mode, neg, V2DFmode));
> 
> This shouldn't be needed, since neg is already a reference to target.
> 
> Overall, looks like a nice change/framework.

Updated the patch, and in te process also realized this can be used for the
vector variants:

Hi All,

This adds a way to generate special sequences for creation of constants for
which we don't have single instructions sequences which would have normally
lead to a GP -> FP transfer or a literal load.

The patch starts out by adding support for creating 1 << 63 using fneg (mov 0).

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64-protos.h (aarch64_simd_special_constant_p,
aarch64_maybe_generate_simd_constant): New.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov,
*aarch64_simd_mov): Add new coden for special constants.
* config/aarch64/aarch64.cc (aarch64_extract_vec_duplicate_wide_int):
Take optional mode.
(aarch64_simd_special_constant_p,
aarch64_maybe_generate_simd_constant): New.
* config/aarch64/aarch64.md (*movdi_aarch64): Add new codegen for
special constants.
* config/aarch64/constraints.md (Dx): new.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/fneg-abs_1.c: Updated.
* gcc.target/aarch64/fneg-abs_2.c: Updated.
* gcc.target/aarch64/fneg-abs_4.c: Updated.
* gcc.target/aarch64/dbl_mov_immediate_1.c: Updated.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..36d6c688bc888a51a9de174bd3665aebe891b8b1
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -831,6 +831,8 @@ bool aarch64_sve_ptrue_svpattern_p (rtx, struct 
simd_immediate_info *);
 bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
 rtx aarch64_check_zero_based_sve_index_immediate (rtx);
+bool aarch64_maybe_generate_simd_constant (rtx, rtx, machine_mode);
+bool aarch64_simd_special_constant_p (rtx, machine_mode);
 bool aarch64_sve_index_immediate_p (rtx);
 bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
 bool aarch64_sve_sqadd_sqsub_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
81ff5bad03d598fa0d48df93d172a28bc0d1d92e..33eceb436584ff73c7271f93639f2246d1af19e0
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -142,26 +142,35 @@ (define_insn "aarch64_dup_lane_"
   [(set_attr "type" "neon_dup")]
 )
 
-(define_insn "*aarch64_simd_mov"
+(define_insn_and_split "*aarch64_simd_mov"
   [(set (match_operand:VDMOV 0 "nonimmediate_operand")
(match_operand:VDMOV 1 "general_operand"))]
   "TARGET_FLOAT
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
-  {@ [cons: =0, 1; attrs: type, arch]
- [w , m ; neon_load1_1reg , *   ] ldr\t%d0, %1
- [r , m ; load_8 , *   ] ldr\t%x0, %1
- [m , Dz; store_8, *   ] str\txzr, %0
- [m , w ; neon_store1_1reg, *   ] str\t%d1, %0
- [m , r ; store_8, *   ] str\t%x1, %0
- [w , w ; neon_logic  , simd] mov\t%0., %1.
- [w , w ; neon_logic  , *   ] fmov\t%d0, %d1
- [?r, w ; neon_to_gp  , simd] umov\t%0, %1.d[0]
- [?r, w ; neon_to_gp  , *   ] fmov\t%x0, %d1
- [?w, r ; f_mcr  , *   ] fmov\t%d0, %1
- [?r, r ; mov_reg, *   ] mov\t%0, %1
- [w , Dn; neon_move   , simd] << aarch64_output_simd_mov_immediate 
(operands[1], 64);
- [w , Dz; f_mcr  , *   ] fmov\t%d0, xzr
+  {@ [cons: =0, 1; attrs: type, arch, length]
+ [w , m ; neon_load1_1reg , *   , *] ldr\t%d0, %1
+ [r , m ; load_8 , *   , *] ldr\t%x0, %1
+ [m , Dz; store_8, *   , *] str\txzr, %0
+ [m , w ; neon_store1_1reg, *   , *] str\t%d1, %0
+ [m , r ; store_8, *   , *] str\t%x1, %0
+ [w , w ; neon_logic  , simd, *] mov\t%0., %1.
+ [w , w ; neon_logic  , *   , *] fmov\

[PATCH]middle-end ifcvt: Allow any const IFN in conditional blocks

2023-10-05 Thread Tamar Christina
Hi All,

When ifcvt was initially added masking was not a thing and as such it was
rather conservative in what it supported.

For builtins it only allowed C99 builtin functions which it knew it can fold
away.

These days the vectorizer is able to deal with needing to mask IFNs itself.
vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
the operation to emulate the masking.

This is then used by match.pd to conver the IFN into a masked variant if it's
available.

For these reasons the restriction in ifconvert is no longer require and we
needless block vectorization when we can effectively handle the operations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
a8c915913aed267edfb3ebd2c530aeca7cf51832..f76e0d8f2e6e0f59073fa8484b0b2c7a6cdc9783
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1129,6 +1129,16 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)
return true;
  }
  }
+
+   /* There are some IFN_s that are used to replace builtins but have the
+  same semantics.  Even if MASK_CALL cannot handle them vectorable_call
+  will insert the proper selection, so do not block conversion.  */
+   int flags = gimple_call_flags (stmt);
+   if ((flags & ECF_CONST)
+   && !(flags & ECF_LOOPING_CONST_OR_PURE)
+   && gimple_call_combined_fn (stmt) != CFN_LAST)
+ return true;
+
return false;
   }
 




-- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
a8c915913aed267edfb3ebd2c530aeca7cf51832..f76e0d8f2e6e0f59073fa8484b0b2c7a6cdc9783
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1129,6 +1129,16 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)
return true;
  }
  }
+
+   /* There are some IFN_s that are used to replace builtins but have the
+  same semantics.  Even if MASK_CALL cannot handle them vectorable_call
+  will insert the proper selection, so do not block conversion.  */
+   int flags = gimple_call_flags (stmt);
+   if ((flags & ECF_CONST)
+   && !(flags & ECF_LOOPING_CONST_OR_PURE)
+   && gimple_call_combined_fn (stmt) != CFN_LAST)
+ return true;
+
return false;
   }
 





[PATCH]AArch64 Handle copysign (x, -1) expansion efficiently

2023-10-05 Thread Tamar Christina
Hi All,

copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be
most efficiently done by doing an OR of the signbit.

The middle-end will optimize fneg (abs (x)) now to copysign as the
canonical form and so this optimizes the expansion.

If the target has an inclusive-OR that takes an immediate, then the transformed
instruction is both shorter and faster.  For those that don't, the immediate
has to be separately constructed, but this still ends up being faster as the
immediate construction is not on the critical path.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note that this is part of another patch series, the additional testcases
are mutually dependent on the match.pd patch.  As such the tests are added
there insteadof here.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64.md (copysign3): Handle
copysign (x, -1).
* config/aarch64/aarch64-simd.md (copysign3): Likewise.
* config/aarch64/aarch64-sve.md (copysign3): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
25a1e4e8ecf767636c0ff3cdab6cad6e1482f73e..a78e77dcc3473445108b06c50f9c28a8369f3e3f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -754,15 +754,33 @@ (define_insn 
"aarch64_dot_lane<
 (define_expand "copysign3"
   [(match_operand:VHSDF 0 "register_operand")
(match_operand:VHSDF 1 "register_operand")
-   (match_operand:VHSDF 2 "register_operand")]
+   (match_operand:VHSDF 2 "nonmemory_operand")]
   "TARGET_SIMD"
 {
-  rtx v_bitmask = gen_reg_rtx (mode);
+  machine_mode int_mode = mode;
+  rtx v_bitmask = gen_reg_rtx (int_mode);
   int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
 
   emit_move_insn (v_bitmask,
  aarch64_simd_gen_const_vector_dup (mode,
 HOST_WIDE_INT_M1U << 
bits));
+
+  /* copysign (x, -1) should instead be expanded as orr with the sign
+ bit.  */
+  if (!REG_P (operands[2]))
+{
+  auto r0
+   = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[2]));
+  if (-1 == real_to_integer (r0))
+   {
+ emit_insn (gen_ior3 (
+   lowpart_subreg (int_mode, operands[0], mode),
+   lowpart_subreg (int_mode, operands[1], mode), v_bitmask));
+ DONE;
+   }
+}
+
+  operands[2] = force_reg (mode, operands[2]);
   emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
 operands[2], operands[1]));
   DONE;
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
5a652d8536a0ef9461f40da7b22834e683e73ceb..071400c820a5b106ddf9dc9faebb117975d74ea0
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6387,7 +6387,7 @@ (define_insn "*3"
 (define_expand "copysign3"
   [(match_operand:SVE_FULL_F 0 "register_operand")
(match_operand:SVE_FULL_F 1 "register_operand")
-   (match_operand:SVE_FULL_F 2 "register_operand")]
+   (match_operand:SVE_FULL_F 2 "nonmemory_operand")]
   "TARGET_SVE"
   {
 rtx sign = gen_reg_rtx (mode);
@@ -6398,11 +6398,26 @@ (define_expand "copysign3"
 rtx arg1 = lowpart_subreg (mode, operands[1], mode);
 rtx arg2 = lowpart_subreg (mode, operands[2], mode);
 
-emit_insn (gen_and3
-  (sign, arg2,
-   aarch64_simd_gen_const_vector_dup (mode,
-  HOST_WIDE_INT_M1U
-  << bits)));
+rtx v_sign_bitmask
+  = aarch64_simd_gen_const_vector_dup (mode,
+  HOST_WIDE_INT_M1U << bits);
+
+/* copysign (x, -1) should instead be expanded as orr with the sign
+   bit.  */
+if (!REG_P (operands[2]))
+  {
+   auto r0
+ = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[2]));
+   if (-1 == real_to_integer (r0))
+ {
+   emit_insn (gen_ior3 (int_res, arg1, v_sign_bitmask));
+   emit_move_insn (operands[0], gen_lowpart (mode, int_res));
+   DONE;
+ }
+  }
+
+operands[2] = force_reg (mode, operands[2]);
+emit_insn (gen_and3 (sign, arg2, v_sign_bitmask));
 emit_insn (gen_and3
   (mant, arg1,
aarch64_simd_gen_const_vector_dup (mode,
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
24349ecdbbab875f21975f116732a9e53762d4c1..d6c581ad81615b4feb095391cbcf4f5b78fa72f1
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6940,12 +6940,25 @@ (define_expand "lrint2"
 (define_expand "copysign3"
   [(match_operand:GPF 0 "register_operand")
(match_operand:GPF 1 "register_operand")
-   (match_operand:GPF 2 "register_operand")]
+   (match_operand:GPF 2 "nonmemory_operand")]
   "TARGET_SIMD"
 {
-  

[PATCH]middle-end ifcvt: Add support for conditional copysign

2023-10-05 Thread Tamar Christina
Hi All,

This adds a masked variant of copysign.  Nothing very exciting just the
general machinery to define and use a new masked IFN.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* internal-fn.def (COPYSIGN): New.
* match.pd (UNCOND_BINARY, COND_BINARY): Map IFN_COPYSIGN to
IFN_COND_COPYSIGN.
* optabs.def (cond_copysign_optab, cond_len_copysign_optab): New.

--- inline copy of patch -- 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
a2023ab9c3d01c28f51eb8a59e08c59e4c39aa7f..d9e6bdef6977f7ab9c0290bf4f4568aad0380456
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -268,6 +268,7 @@ DEF_INTERNAL_SIGNED_COND_FN (MOD, ECF_CONST, first, smod, 
umod, binary)
 DEF_INTERNAL_COND_FN (RDIV, ECF_CONST, sdiv, binary)
 DEF_INTERNAL_SIGNED_COND_FN (MIN, ECF_CONST, first, smin, umin, binary)
 DEF_INTERNAL_SIGNED_COND_FN (MAX, ECF_CONST, first, smax, umax, binary)
+DEF_INTERNAL_COND_FN (COPYSIGN, ECF_CONST, copysign, binary)
 DEF_INTERNAL_COND_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_COND_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_COND_FN (AND, ECF_CONST | ECF_NOTHROW, and, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 
e12b508ce8ced64e62d94d6df82734cb630b8c1c..1e8d406e6c196b10b48d3c30dc29bffc1bc27bf4
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -93,14 +93,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   plus minus
   mult trunc_div trunc_mod rdiv
   min max
-  IFN_FMIN IFN_FMAX
+  IFN_FMIN IFN_FMAX IFN_COPYSIGN
   bit_and bit_ior bit_xor
   lshift rshift)
 (define_operator_list COND_BINARY
   IFN_COND_ADD IFN_COND_SUB
   IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV
   IFN_COND_MIN IFN_COND_MAX
-  IFN_COND_FMIN IFN_COND_FMAX
+  IFN_COND_FMIN IFN_COND_FMAX IFN_COND_COPYSIGN
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
2ccbe4197b7b700dcdb70e2c67cfcf12d7e381b1..93d4c63700cbaa9fea1177b3d6c7a3e12f609361
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -256,6 +256,7 @@ OPTAB_D (cond_fms_optab, "cond_fms$a")
 OPTAB_D (cond_fnma_optab, "cond_fnma$a")
 OPTAB_D (cond_fnms_optab, "cond_fnms$a")
 OPTAB_D (cond_neg_optab, "cond_neg$a")
+OPTAB_D (cond_copysign_optab, "cond_copysign$F$a")
 OPTAB_D (cond_one_cmpl_optab, "cond_one_cmpl$a")
 OPTAB_D (cond_len_add_optab, "cond_len_add$a")
 OPTAB_D (cond_len_sub_optab, "cond_len_sub$a")
@@ -281,6 +282,7 @@ OPTAB_D (cond_len_fms_optab, "cond_len_fms$a")
 OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
 OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
 OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
+OPTAB_D (cond_len_copysign_optab, "cond_len_copysign$F$a")
 OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")




-- 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
a2023ab9c3d01c28f51eb8a59e08c59e4c39aa7f..d9e6bdef6977f7ab9c0290bf4f4568aad0380456
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -268,6 +268,7 @@ DEF_INTERNAL_SIGNED_COND_FN (MOD, ECF_CONST, first, smod, 
umod, binary)
 DEF_INTERNAL_COND_FN (RDIV, ECF_CONST, sdiv, binary)
 DEF_INTERNAL_SIGNED_COND_FN (MIN, ECF_CONST, first, smin, umin, binary)
 DEF_INTERNAL_SIGNED_COND_FN (MAX, ECF_CONST, first, smax, umax, binary)
+DEF_INTERNAL_COND_FN (COPYSIGN, ECF_CONST, copysign, binary)
 DEF_INTERNAL_COND_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_COND_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_COND_FN (AND, ECF_CONST | ECF_NOTHROW, and, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 
e12b508ce8ced64e62d94d6df82734cb630b8c1c..1e8d406e6c196b10b48d3c30dc29bffc1bc27bf4
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -93,14 +93,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   plus minus
   mult trunc_div trunc_mod rdiv
   min max
-  IFN_FMIN IFN_FMAX
+  IFN_FMIN IFN_FMAX IFN_COPYSIGN
   bit_and bit_ior bit_xor
   lshift rshift)
 (define_operator_list COND_BINARY
   IFN_COND_ADD IFN_COND_SUB
   IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV
   IFN_COND_MIN IFN_COND_MAX
-  IFN_COND_FMIN IFN_COND_FMAX
+  IFN_COND_FMIN IFN_COND_FMAX IFN_COND_COPYSIGN
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
2ccbe4197b7b700dcdb70e2c67cfcf12d7e381b1..93d4c63700cbaa9fea1177b3d6c7a3e12f609361
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -256,6 +256,7 @@ OPTAB_D (cond_fms_optab, "cond_fms$a")
 OPTAB_D (cond_fnma_optab, "cond_fnma$a")
 OPTAB_D (cond_fnms_optab, "cond_fnms$a")
 OPTAB_D (cond_neg_optab, "cond_neg$a")
+OPTAB_D (cond_copysign_optab, "cond_copysign$F$a")
 OPTAB_D (cond_one_cmpl_optab, "cond_one_cmpl$a")
 OPTAB_D (cond_len_add_optab, "cond_len_add

[PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-05 Thread Tamar Christina
Hi All,

This adds an implementation for masked copysign along with an optimized
pattern for masked copysign (x, -1).

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64-sve.md (cond_copysign): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/fneg-abs_5.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
071400c820a5b106ddf9dc9faebb117975d74ea0..00ca30c24624dc661254568f45b61a14aa11c305
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6429,6 +6429,57 @@ (define_expand "copysign3"
   }
 )
 
+(define_expand "cond_copysign"
+  [(match_operand:SVE_FULL_F 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:SVE_FULL_F 2 "register_operand")
+   (match_operand:SVE_FULL_F 3 "nonmemory_operand")
+   (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
+  "TARGET_SVE"
+  {
+rtx sign = gen_reg_rtx (mode);
+rtx mant = gen_reg_rtx (mode);
+rtx int_res = gen_reg_rtx (mode);
+int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+rtx arg2 = lowpart_subreg (mode, operands[2], mode);
+rtx arg3 = lowpart_subreg (mode, operands[3], mode);
+rtx arg4 = lowpart_subreg (mode, operands[4], mode);
+
+rtx v_sign_bitmask
+  = aarch64_simd_gen_const_vector_dup (mode,
+  HOST_WIDE_INT_M1U << bits);
+
+/* copysign (x, -1) should instead be expanded as orr with the sign
+   bit.  */
+if (!REG_P (operands[3]))
+  {
+   auto r0
+ = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[3]));
+   if (-1 == real_to_integer (r0))
+ {
+   arg3 = force_reg (mode, v_sign_bitmask);
+   emit_insn (gen_cond_ior (int_res, operands[1], arg2,
+ arg3, arg4));
+   emit_move_insn (operands[0], gen_lowpart (mode, int_res));
+   DONE;
+ }
+  }
+
+operands[2] = force_reg (mode, operands[3]);
+emit_insn (gen_and3 (sign, arg3, v_sign_bitmask));
+emit_insn (gen_and3
+  (mant, arg2,
+   aarch64_simd_gen_const_vector_dup (mode,
+  ~(HOST_WIDE_INT_M1U
+<< bits;
+emit_insn (gen_cond_ior (int_res, operands[1], sign, mant,
+ arg4));
+emit_move_insn (operands[0], gen_lowpart (mode, int_res));
+DONE;
+  }
+)
+
 (define_expand "xorsign3"
   [(match_operand:SVE_FULL_F 0 "register_operand")
(match_operand:SVE_FULL_F 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_5.c
new file mode 100644
index 
..f4ecbeecbe1290134e688f46a4389d17155e4a0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_5.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include 
+#include 
+
+/*
+** f1:
+** ...
+** orr z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s
+** ...
+*/
+void f1 (float32_t *a, int n)
+{
+  for (int i = 0; i < (n & -8); i++)
+   if (a[i] > n)
+ a[i] = -fabsf (a[i]);
+   else
+ a[i] = n;
+}
+
+/*
+** f2:
+** ...
+** orr z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d
+** ...
+*/
+void f2 (float64_t *a, int n)
+{
+  for (int i = 0; i < (n & -8); i++)
+   if (a[i] > n)
+ a[i] = -fabs (a[i]);
+   else
+ a[i] = n;
+}




-- 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
071400c820a5b106ddf9dc9faebb117975d74ea0..00ca30c24624dc661254568f45b61a14aa11c305
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6429,6 +6429,57 @@ (define_expand "copysign3"
   }
 )
 
+(define_expand "cond_copysign"
+  [(match_operand:SVE_FULL_F 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:SVE_FULL_F 2 "register_operand")
+   (match_operand:SVE_FULL_F 3 "nonmemory_operand")
+   (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
+  "TARGET_SVE"
+  {
+rtx sign = gen_reg_rtx (mode);
+rtx mant = gen_reg_rtx (mode);
+rtx int_res = gen_reg_rtx (mode);
+int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+rtx arg2 = lowpart_subreg (mode, operands[2], mode);
+rtx arg3 = lowpart_subreg (mode, operands[3], mode);
+rtx arg4 = lowpart_subreg (mode, operands[4], mode);
+
+rtx v_sign_bitmask
+  = aarch64_simd_gen_const_vector_dup (mode,
+  HOST_WIDE_INT_M1U << bits);
+
+/* copysign (x, -1) should instead be expanded as orr

Re: [PATCH]AArch64 Handle copysign (x, -1) expansion efficiently

2023-10-05 Thread Andrew Pinski
On Thu, Oct 5, 2023 at 11:22 AM Tamar Christina  wrote:
>
> Hi All,
>
> copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be
> most efficiently done by doing an OR of the signbit.
>
> The middle-end will optimize fneg (abs (x)) now to copysign as the
> canonical form and so this optimizes the expansion.
>
> If the target has an inclusive-OR that takes an immediate, then the 
> transformed
> instruction is both shorter and faster.  For those that don't, the immediate
> has to be separately constructed, but this still ends up being faster as the
> immediate construction is not on the critical path.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Note that this is part of another patch series, the additional testcases
> are mutually dependent on the match.pd patch.  As such the tests are added
> there insteadof here.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/109154
> * config/aarch64/aarch64.md (copysign3): Handle
> copysign (x, -1).
> * config/aarch64/aarch64-simd.md (copysign3): Likewise.
> * config/aarch64/aarch64-sve.md (copysign3): Likewise.
>
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 25a1e4e8ecf767636c0ff3cdab6cad6e1482f73e..a78e77dcc3473445108b06c50f9c28a8369f3e3f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -754,15 +754,33 @@ (define_insn 
> "aarch64_dot_lane<
>  (define_expand "copysign3"
>[(match_operand:VHSDF 0 "register_operand")
> (match_operand:VHSDF 1 "register_operand")
> -   (match_operand:VHSDF 2 "register_operand")]
> +   (match_operand:VHSDF 2 "nonmemory_operand")]
>"TARGET_SIMD"
>  {
> -  rtx v_bitmask = gen_reg_rtx (mode);
> +  machine_mode int_mode = mode;
> +  rtx v_bitmask = gen_reg_rtx (int_mode);
>int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
>
>emit_move_insn (v_bitmask,
>   aarch64_simd_gen_const_vector_dup (mode,
>  HOST_WIDE_INT_M1U << 
> bits));
> +
> +  /* copysign (x, -1) should instead be expanded as orr with the sign
> + bit.  */
> +  if (!REG_P (operands[2]))
> +{
> +  auto r0
> +   = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[2]));
> +  if (-1 == real_to_integer (r0))

I think this should be: REAL_VALUE_NEGATIVE (r0) instead. Just copying
the sign here is needed, right?
Also seems like double check that this is a vec_duplicate of a const
and that the constant is a CONST_DOUBLE?


> +   {
> + emit_insn (gen_ior3 (
> +   lowpart_subreg (int_mode, operands[0], mode),
> +   lowpart_subreg (int_mode, operands[1], mode), v_bitmask));
> + DONE;
> +   }
> +}
> +
> +  operands[2] = force_reg (mode, operands[2]);
>emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
>  operands[2], operands[1]));
>DONE;
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> 5a652d8536a0ef9461f40da7b22834e683e73ceb..071400c820a5b106ddf9dc9faebb117975d74ea0
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -6387,7 +6387,7 @@ (define_insn "*3"
>  (define_expand "copysign3"
>[(match_operand:SVE_FULL_F 0 "register_operand")
> (match_operand:SVE_FULL_F 1 "register_operand")
> -   (match_operand:SVE_FULL_F 2 "register_operand")]
> +   (match_operand:SVE_FULL_F 2 "nonmemory_operand")]
>"TARGET_SVE"
>{
>  rtx sign = gen_reg_rtx (mode);
> @@ -6398,11 +6398,26 @@ (define_expand "copysign3"
>  rtx arg1 = lowpart_subreg (mode, operands[1], mode);
>  rtx arg2 = lowpart_subreg (mode, operands[2], mode);
>
> -emit_insn (gen_and3
> -  (sign, arg2,
> -   aarch64_simd_gen_const_vector_dup (mode,
> -  HOST_WIDE_INT_M1U
> -  << bits)));
> +rtx v_sign_bitmask
> +  = aarch64_simd_gen_const_vector_dup (mode,
> +  HOST_WIDE_INT_M1U << bits);
> +
> +/* copysign (x, -1) should instead be expanded as orr with the sign
> +   bit.  */
> +if (!REG_P (operands[2]))
> +  {
> +   auto r0
> + = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate 
> (operands[2]));
> +   if (-1 == real_to_integer (r0))

Likewise.

> + {
> +   emit_insn (gen_ior3 (int_res, arg1, v_sign_bitmask));
> +   emit_move_insn (operands[0], gen_lowpart (mode, int_res));
> +   DONE;
> + }
> +  }
> +
> +operands[2] = force_reg (mode, operands[2]);
> +emit_insn (gen_and3 (sign, arg2, v_sign_bitmask));
>  emit_insn (gen_and3
>(mant, arg1,
> aarch64_simd_gen_const_vector_dup (mode

Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)

2023-10-05 Thread Siddhesh Poyarekar

On 2023-08-25 11:24, Qing Zhao wrote:

Provide a new counted_by attribute to flexible array member field.


The obligatory "I can't ack the patch but here's a review" disclaimer :)



'counted_by (COUNT)'
  The 'counted_by' attribute may be attached to the flexible array
  member of a structure.  It indicates that the number of the
  elements of the array is given by the field named "COUNT" in the
  same structure as the flexible array member.  GCC uses this
  information to improve the results of the array bound sanitizer and
  the '__builtin_dynamic_object_size'.

  For instance, the following code:

   struct P {
 size_t count;
 char other;
 char array[] __attribute__ ((counted_by (count)));
   } *p;

  specifies that the 'array' is a flexible array member whose number
  of elements is given by the field 'count' in the same structure.

  The field that represents the number of the elements should have an
  integer type.  An explicit 'counted_by' annotation defines a
  relationship between two objects, 'p->array' and 'p->count', that
  'p->array' has _at least_ 'p->count' number of elements available.
  This relationship must hold even after any of these related objects
  are updated.  It's the user's responsibility to make sure this
  relationship to be kept all the time.  Otherwise the results of the
  array bound sanitizer and the '__builtin_dynamic_object_size' might
  be incorrect.

  For instance, in the following example, the allocated array has
  less elements than what's specified by the 'sbuf->count', this is
  an user error.  As a result, out-of-bounds access to the array
  might not be detected.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
(offsetof (struct P, array[0])
 + nelems * sizeof (char;
 sbuf->count = nelems + SIZE_BUMP;
 /* This is invalid when the sbuf->array has less than sbuf->count
elements.  */
   }

  In the following example, the 2nd update to the field 'sbuf->count'
  of the above structure will permit out-of-bounds access to the
  array 'sbuf>array' as well.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
(offsetof (struct P, array[0])
 + (nelems + SIZE_BUMP) * sizeof 
(char;
 sbuf->count = nelems;
 /* This is valid when the sbuf->array has at least sbuf->count
elements.  */
   }
   void use_buf (int index)
   {
 sbuf->count = sbuf->count + SIZE_BUMP + 1;
 /* Now the value of sbuf->count is larger than the number
of elements of sbuf->array.  */
 sbuf->array[index] = 0;
 /* then the out-of-bound access to this array
might not be detected.  */
   }

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.
* tree.cc (get_named_field): New function.
* tree.h (get_named_field): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
  gcc/c-family/c-attribs.cc| 54 -
  gcc/c-family/c-common.cc | 13 
  gcc/c-family/c-common.h  |  1 +
  gcc/c/c-decl.cc  | 79 +++-
  gcc/doc/extend.texi  | 77 +++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++
  gcc/tree.cc  | 40 ++
  gcc/tree.h   |  5 ++
  8 files changed, 291 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.c

[COMMITTED 2/3] Add a dom based ranger for fast VRP.

2023-10-05 Thread Andrew MacLeod
This patch adds a DOM based ranger that is intended to be used by a dom 
walk pass and provides basic ranges.


It utilizes the new GORI edge API to find outgoing ranges on edges, and 
combines these with any ranges calculated during the walk up to this 
point.  When a query is made for a range not defined in the current 
block, a quick dom walk is performed looking for a range either on a 
single-pred  incoming  edge or defined in the block.


Its about twice the speed of current EVRP, and although there is a bit 
of room to improve both memory usage and speed, I'll leave that until I 
either get around to it or we elect to use it and it becomes more 
important.   It also serves as a POC for anyone wanting to use the new 
GORI API to use edge ranges, as well as a potentially different fast VRP 
more similar to the old EVRP. This version performs more folding of PHI 
nodes as it has all the info on incoming edges, but at a slight cost, 
mostly memory.  It does no relation processing as yet.


It has been bootstrapped running right after EVRP, and as a replacement 
for EVRP, and since it uses existing machinery, should be reasonably 
solid.   It is currently not invoked from anywhere.


Pushed.

Andrew



From ad8cd713b4e489826e289551b8b8f8f708293a5b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 28 Jul 2023 13:18:15 -0400
Subject: [PATCH 2/3] Add a dom based ranger for fast VRP.

Provide a dominator based implementation of a range query.

	* gimple_range.cc (dom_ranger::dom_ranger): New.
	(dom_ranger::~dom_ranger): New.
	(dom_ranger::range_of_expr): New.
	(dom_ranger::edge_range): New.
	(dom_ranger::range_on_edge): New.
	(dom_ranger::range_in_bb): New.
	(dom_ranger::range_of_stmt): New.
	(dom_ranger::maybe_push_edge): New.
	(dom_ranger::pre_bb): New.
	(dom_ranger::post_bb): New.
	* gimple-range.h (class dom_ranger): New.
---
 gcc/gimple-range.cc | 300 
 gcc/gimple-range.h  |  28 +
 2 files changed, 328 insertions(+)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 13c3308d537..5e9bb397a20 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -928,3 +928,303 @@ assume_query::dump (FILE *f)
 }
   fprintf (f, "--\n");
 }
+
+// ---
+
+
+// Create a DOM based ranger for use by a DOM walk pass.
+
+dom_ranger::dom_ranger () : m_global (), m_out ()
+{
+  m_freelist.create (0);
+  m_freelist.truncate (0);
+  m_e0.create (0);
+  m_e0.safe_grow_cleared (last_basic_block_for_fn (cfun));
+  m_e1.create (0);
+  m_e1.safe_grow_cleared (last_basic_block_for_fn (cfun));
+  m_pop_list = BITMAP_ALLOC (NULL);
+  if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE))
+tracer.enable_trace ();
+}
+
+// Dispose of a DOM ranger.
+
+dom_ranger::~dom_ranger ()
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Non-varying global ranges:\n");
+  fprintf (dump_file, "=:\n");
+  m_global.dump (dump_file);
+}
+  BITMAP_FREE (m_pop_list);
+  m_e1.release ();
+  m_e0.release ();
+  m_freelist.release ();
+}
+
+// Implement range of EXPR on stmt S, and return it in R.
+// Return false if no range can be calculated.
+
+bool
+dom_ranger::range_of_expr (vrange &r, tree expr, gimple *s)
+{
+  unsigned idx;
+  if (!gimple_range_ssa_p (expr))
+return get_tree_range (r, expr, s);
+
+  if ((idx = tracer.header ("range_of_expr ")))
+{
+  print_generic_expr (dump_file, expr, TDF_SLIM);
+  if (s)
+	{
+	  fprintf (dump_file, " at ");
+	  print_gimple_stmt (dump_file, s, 0, TDF_SLIM);
+	}
+  else
+	  fprintf (dump_file, "\n");
+}
+
+  if (s)
+range_in_bb (r, gimple_bb (s), expr);
+  else
+m_global.range_of_expr (r, expr, s);
+
+  if (idx)
+tracer.trailer (idx, " ", true, expr, r);
+  return true;
+}
+
+
+// Return TRUE and the range if edge E has a range set for NAME in
+// block E->src.
+
+bool
+dom_ranger::edge_range (vrange &r, edge e, tree name)
+{
+  bool ret = false;
+  basic_block bb = e->src;
+
+  // Check if BB has any outgoing ranges on edge E.
+  ssa_lazy_cache *out = NULL;
+  if (EDGE_SUCC (bb, 0) == e)
+out = m_e0[bb->index];
+  else if (EDGE_SUCC (bb, 1) == e)
+out = m_e1[bb->index];
+
+  // If there is an edge vector and it has a range, pick it up.
+  if (out && out->has_range (name))
+ret = out->get_range (r, name);
+
+  return ret;
+}
+
+
+// Return the range of EXPR on edge E in R.
+// Return false if no range can be calculated.
+
+bool
+dom_ranger::range_on_edge (vrange &r, edge e, tree expr)
+{
+  basic_block bb = e->src;
+  unsigned idx;
+  if ((idx = tracer.header ("range_on_edge ")))
+{
+  fprintf (dump_file, "%d->%d for ",e->src->index, e->dest->index);
+  print_generic_expr (dump_file, expr, TDF_SLIM);
+  fputc ('\n',dump_file);
+}
+
+  if (!gimple_range_ssa_p (expr))
+return get_tree_range (r, exp

[COMMITTED 1/3] Add outgoing range vector calculation API.

2023-10-05 Thread Andrew MacLeod

This patch adds 2 routine that can be called to generate GORI information.

The primar API is:
bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = 
NULL, gimple_outgoing_range *ogr = NULL);


This will populate an ssa-cache R with any ranges that are generated by 
edge E.   It will use QUERY, if provided, to satisfy any incoming 
values.  if OGR is provided, it is used to pick up hard edge values.. 
like TRUE, FALSE, or switch edges.


It currently only works for TRUE/FALSE conditionals, and doesn't try to 
solve complex logical combinations.  ie (a <6 && b > 6) || (a>10 || b < 
3) as those can get exponential and require multiple evaluations of the 
IL to satisfy.  It will fully utilize range-ops however and so comes up 
with many ranges ranger does.


It also provides the "raw" ranges on the edge.. ie. it doesn't try to 
figure out anything outside the current basic block, but rather reflects 
exactly what the edge indicates.


ie:

   :
  x.0_1 = (unsigned int) x_20(D);
  _2 = x.0_1 + 4294967292;
  if (_2 > 4)
    goto ; [INV]
  else
    goto ; [INV]

produces

Edge ranges BB 2->3
x.0_1  : [irange] unsigned int [0, 3][9, +INF]
_2  : [irange] unsigned int [5, +INF]
x_20(D)  : [irange] int [-INF, 3][9, +INF]

Edge ranges BB 2->4
x.0_1  : [irange] unsigned int [4, 8] MASK 0xf VALUE 0x0
_2  : [irange] unsigned int [0, 4]
x_20(D)  : [irange] int [4, 8] MASK 0xf VALUE 0x0

It performs a linear walk through juts the required statements, so each 
of the the above vectors are generated by visiting each of the 3 
statements exactly once, so its pretty quick.



The other entry point is:
bool gori_name_on_edge (vrange &r, tree name, edge e, range_query *q);

This does basically the same thing, except it only looks at whether NAME 
has a range, and returns it if it does.  not other overhead.


Pushed.
From 52c1e2c805bc2fd7a30583dce3608b738f3a5ce4 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 15 Aug 2023 17:29:58 -0400
Subject: [PATCH 1/3] Add outgoing range vector calcualtion API

Provide a GORI API which can produce a range vector for all outgoing
ranges on an edge without any of the other infratructure.

	* gimple-range-gori.cc (gori_stmt_info::gori_stmt_info): New.
	(gori_calc_operands): New.
	(gori_on_edge): New.
	(gori_name_helper): New.
	(gori_name_on_edge): New.
	* gimple-range-gori.h (gori_on_edge): New prototype.
	(gori_name_on_edge): New prototype.
---
 gcc/gimple-range-gori.cc | 213 +++
 gcc/gimple-range-gori.h  |  15 +++
 2 files changed, 228 insertions(+)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 2694e551d73..1b5eda43390 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1605,3 +1605,216 @@ gori_export_iterator::get_name ()
 }
   return NULL_TREE;
 }
+
+// This is a helper class to set up STMT with a known LHS for further GORI
+// processing.
+
+class gori_stmt_info : public gimple_range_op_handler
+{
+public:
+  gori_stmt_info (vrange &lhs, gimple *stmt, range_query *q);
+  Value_Range op1_range;
+  Value_Range op2_range;
+  tree ssa1;
+  tree ssa2;
+};
+
+
+// Uses query Q to get the known ranges on STMT with a LHS range
+// for op1_range and op2_range and set ssa1 and ssa2 if either or both of
+// those operands are SSA_NAMES.
+
+gori_stmt_info::gori_stmt_info (vrange &lhs, gimple *stmt, range_query *q)
+  : gimple_range_op_handler (stmt)
+{
+  ssa1 = NULL;
+  ssa2 = NULL;
+  // Don't handle switches as yet for vector processing.
+  if (is_a (stmt))
+return;
+
+  // No frther processing for VARYING or undefined.
+  if (lhs.undefined_p () || lhs.varying_p ())
+return;
+
+  // If there is no range-op handler, we are also done.
+  if (!*this)
+return;
+
+  // Only evaluate logical cases if both operands must be the same as the LHS.
+  // Otherwise its becomes exponential in time, as well as more complicated.
+  if (is_gimple_logical_p (stmt))
+{
+  gcc_checking_assert (range_compatible_p (lhs.type (), boolean_type_node));
+  enum tree_code code = gimple_expr_code (stmt);
+  if (code == TRUTH_OR_EXPR ||  code == BIT_IOR_EXPR)
+	{
+	  // [0, 0] = x || y  means both x and y must be zero.
+	  if (!lhs.singleton_p () || !lhs.zero_p ())
+	return;
+	}
+  else if (code == TRUTH_AND_EXPR ||  code == BIT_AND_EXPR)
+	{
+	  // [1, 1] = x && y  means both x and y must be one.
+	  if (!lhs.singleton_p () || lhs.zero_p ())
+	return;
+	}
+}
+
+  tree op1 = operand1 ();
+  tree op2 = operand2 ();
+  ssa1 = gimple_range_ssa_p (op1);
+  ssa2 = gimple_range_ssa_p (op2);
+  // If both operands are the same, only process one of them.
+  if (ssa1 && ssa1 == ssa2)
+ssa2 = NULL_TREE;
+
+  // Extract current ranges for the operands.
+  fur_stmt src (stmt, q);
+  if (op1)
+{
+  op1_range.set_type (TREE_TYPE (op1));
+  src.get_operand (op1_range, op1);
+}
+
+  // And satisfy the second operand for single op satements.
+  if (op2)
+{
+  op2_

[COMMITTED 3/3] Create a fast VRP pass

2023-10-05 Thread Andrew MacLeod
This patch adds a fast VRP pass.  It is not invoked from anywhere, so 
should cause no issues.


If you want to utilize it, simply add a new pass, ie:

--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -92,6 +92,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_phiprop);
  NEXT_PASS (pass_fre, true /* may_iterate */);
  NEXT_PASS (pass_early_vrp);
+ NEXT_PASS (pass_fast_vrp);
  NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_dse);
  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

it will generate a dump file with the extension .fvrp.


pushed.

From f4e2dac53fd62fbf2af95e0bf26d24e929fa1f66 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 2 Oct 2023 18:32:49 -0400
Subject: [PATCH 3/3] Create a fast VRP pass

	* timevar.def (TV_TREE_FAST_VRP): New.
	* tree-pass.h (make_pass_fast_vrp): New prototype.
	* tree-vrp.cc (class fvrp_folder): New.
	(fvrp_folder::fvrp_folder): New.
	(fvrp_folder::~fvrp_folder): New.
	(fvrp_folder::value_of_expr): New.
	(fvrp_folder::value_on_edge): New.
	(fvrp_folder::value_of_stmt): New.
	(fvrp_folder::pre_fold_bb): New.
	(fvrp_folder::post_fold_bb): New.
	(fvrp_folder::pre_fold_stmt): New.
	(fvrp_folder::fold_stmt): New.
	(execute_fast_vrp): New.
	(pass_data_fast_vrp): New.
	(pass_vrp:execute): Check for fast VRP pass.
	(make_pass_fast_vrp): New.
---
 gcc/timevar.def |   1 +
 gcc/tree-pass.h |   1 +
 gcc/tree-vrp.cc | 124 
 3 files changed, 126 insertions(+)

diff --git a/gcc/timevar.def b/gcc/timevar.def
index 9523598f60e..d21b08c030d 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -160,6 +160,7 @@ DEFTIMEVAR (TV_TREE_TAIL_MERGE   , "tree tail merge")
 DEFTIMEVAR (TV_TREE_VRP  , "tree VRP")
 DEFTIMEVAR (TV_TREE_VRP_THREADER , "tree VRP threader")
 DEFTIMEVAR (TV_TREE_EARLY_VRP, "tree Early VRP")
+DEFTIMEVAR (TV_TREE_FAST_VRP , "tree Fast VRP")
 DEFTIMEVAR (TV_TREE_COPY_PROP, "tree copy propagation")
 DEFTIMEVAR (TV_FIND_REFERENCED_VARS  , "tree find ref. vars")
 DEFTIMEVAR (TV_TREE_PTA		 , "tree PTA")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index eba2d54ac76..9c4b1e4185c 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -470,6 +470,7 @@ extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_copy_prop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_isolate_erroneous_paths (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_early_vrp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_fast_vrp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vrp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_assumptions (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_uncprop (gcc::context *ctxt);
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 4f8c7745461..19d8f995d70 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1092,6 +1092,106 @@ execute_ranger_vrp (struct function *fun, bool warn_array_bounds_p,
   return 0;
 }
 
+// Implement a Fast VRP folder.  Not quite as effective but faster.
+
+class fvrp_folder : public substitute_and_fold_engine
+{
+public:
+  fvrp_folder (dom_ranger *dr) : substitute_and_fold_engine (),
+ m_simplifier (dr)
+  { m_dom_ranger = dr; }
+
+  ~fvrp_folder () { }
+
+  tree value_of_expr (tree name, gimple *s = NULL) override
+  {
+// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
+if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
+  return NULL;
+return m_dom_ranger->value_of_expr (name, s);
+  }
+
+  tree value_on_edge (edge e, tree name) override
+  {
+// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
+if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
+  return NULL;
+return m_dom_ranger->value_on_edge (e, name);
+  }
+
+  tree value_of_stmt (gimple *s, tree name = NULL) override
+  {
+// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
+if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
+  return NULL;
+return m_dom_ranger->value_of_stmt (s, name);
+  }
+
+  void pre_fold_bb (basic_block bb) override
+  {
+m_dom_ranger->pre_bb (bb);
+// Now process the PHIs in advance.
+gphi_iterator psi = gsi_start_phis (bb);
+for ( ; !gsi_end_p (psi); gsi_next (&psi))
+  {
+	tree name = gimple_range_ssa_p (PHI_RESULT (psi.phi ()));
+	if (name)
+	  {
+	Value_Range vr(TREE_TYPE (name));
+	m_dom_ranger->range_of_stmt (vr, psi.phi (), name);
+	  }
+  }
+  }
+
+  void post_fold_bb (basic_block bb) override
+  {
+m_dom_ranger->post_bb (bb);
+  }
+
+  void pre_fold_stmt (gimple *s) override
+  {
+// Ensure range_of_stmt has been called.
+tree type = gimple_range_type (s);
+if (type)
+  {
+	Value_Range vr(type);
+	m_dom_ranger->range_of_stmt (vr, s);
+  }
+

[COMMITTED 0/3] Add a FAST VRP pass.

2023-10-05 Thread Andrew MacLeod
the following set of 3 patches provide the infrastructure for a fast vrp 
pass.


The pass is currently not invoked anywhere, but I wanted to get the 
infrastructure bits in place now... just in case we want to use it 
somewhere.


It clearly bootstraps with no regressions since it isn't being invoked 
:-)   I have however bootstrapped it with calls to the new fast-vrp pass 
immediately following the EVRP, and as an EVRP replacement .  This is to 
primarily ensure it isn't doing anything harmful.  That is a test of 
sorts :-).


I also ran it instead of EVRP, and it bootstraps, but does trigger a few 
regressions, all related to relation processing, which it doesn't do.


Patch one provides a new API for GORI which simply provides a list of 
all the ranges that it can generate on an outgoing edge. It utilizes the 
sparse ssa-cache, and simply sets the outgoing range as determines by 
the edge.  Its very efficient, only walking up the chain once and not 
generating any other utillity structures.  This provides fats an easy 
access to any info an edge may provide.  There is a second API for 
querying a specific name instead of asking for all the ranges.   It 
should be pretty solid as is simply invokes ranges-ops and other 
components the same way the larger GORI engine does, it just puts them 
together in a different way


Patch 2 is the new DOM ranger.  It assumes it will be called in DOM 
order, and evaluates the statements, and tracks any ranges on outgoing 
edges.  Queries for ranges walk the dom tree looking for a range until 
it finds one on an edge or hits the definition block.   There are 
additional efficiencies that can be employed, and I'll eventually get 
back to them.


Patch 3 is the FAST VRP pass and folder.  Its pretty straightforward, 
invokes the new DOM ranger, and enables  you to add  MAKE_PASS 
(pass_fast_vrp)  in passes. def.


Timewise, it is currently about twice as fast as EVRP.  It does basic 
range evaluation and fold PHIs, etc. It does *not* do relation 
processing or any of the fancier things we do (like statement side 
effects).   A little additional  work can reduce the memory footprint 
further too.  I have done no experiments as yet as to the cot of adding 
relations, but it would be pretty straightforward as it is just reusing 
all the same components the main ranger does


Andrew






Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-05 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This adds an implementation for masked copysign along with an optimized
> pattern for masked copysign (x, -1).

It feels like we're ending up with a lot of AArch64-specific code that
just hard-codes the observation that changing the sign is equivalent to
changing the top bit.  We then need to make sure that we choose the best
way of changing the top bit for any given situation.

Hard-coding the -1/negative case is one instance of that.  But it looks
like we also fail to use the best sequence for SVE2.  E.g.
[https://godbolt.org/z/ajh3MM5jv]:

#include 

void f(double *restrict a, double *restrict b) {
for (int i = 0; i < 100; ++i)
a[i] = __builtin_copysign(a[i], b[i]);
}

void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
for (int i = 0; i < 100; ++i)
a[i] = (a[i] & ~c) | (b[i] & c);
}

gives:

f:
mov x2, 0
mov w3, 100
whilelo p7.d, wzr, w3
.L2:
ld1dz30.d, p7/z, [x0, x2, lsl 3]
ld1dz31.d, p7/z, [x1, x2, lsl 3]
and z30.d, z30.d, #0x7fff
and z31.d, z31.d, #0x8000
orr z31.d, z31.d, z30.d
st1dz31.d, p7, [x0, x2, lsl 3]
incdx2
whilelo p7.d, w2, w3
b.any   .L2
ret
g:
mov x3, 0
mov w4, 100
mov z29.d, x2
whilelo p7.d, wzr, w4
.L6:
ld1dz30.d, p7/z, [x0, x3, lsl 3]
ld1dz31.d, p7/z, [x1, x3, lsl 3]
bsl z31.d, z31.d, z30.d, z29.d
st1dz31.d, p7, [x0, x3, lsl 3]
incdx3
whilelo p7.d, w3, w4
b.any   .L6
ret

I saw that you originally tried to do this in match.pd and that the
decision was to fold to copysign instead.  But perhaps there's a compromise
where isel does something with the (new) copysign canonical form?
I.e. could we go with your new version of the match.pd patch, and add
some isel stuff as a follow-on?

Not saying no to this patch, just thought that the above was worth
considering.

[I agree with Andrew's comments FWIW.]

Thanks,
Richard

>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/109154
>   * config/aarch64/aarch64-sve.md (cond_copysign): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/109154
>   * gcc.target/aarch64/sve/fneg-abs_5.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> 071400c820a5b106ddf9dc9faebb117975d74ea0..00ca30c24624dc661254568f45b61a14aa11c305
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -6429,6 +6429,57 @@ (define_expand "copysign3"
>}
>  )
>  
> +(define_expand "cond_copysign"
> +  [(match_operand:SVE_FULL_F 0 "register_operand")
> +   (match_operand: 1 "register_operand")
> +   (match_operand:SVE_FULL_F 2 "register_operand")
> +   (match_operand:SVE_FULL_F 3 "nonmemory_operand")
> +   (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
> +  "TARGET_SVE"
> +  {
> +rtx sign = gen_reg_rtx (mode);
> +rtx mant = gen_reg_rtx (mode);
> +rtx int_res = gen_reg_rtx (mode);
> +int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
> +
> +rtx arg2 = lowpart_subreg (mode, operands[2], mode);
> +rtx arg3 = lowpart_subreg (mode, operands[3], mode);
> +rtx arg4 = lowpart_subreg (mode, operands[4], mode);
> +
> +rtx v_sign_bitmask
> +  = aarch64_simd_gen_const_vector_dup (mode,
> +HOST_WIDE_INT_M1U << bits);
> +
> +/* copysign (x, -1) should instead be expanded as orr with the sign
> +   bit.  */
> +if (!REG_P (operands[3]))
> +  {
> + auto r0
> +   = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[3]));
> + if (-1 == real_to_integer (r0))
> +   {
> + arg3 = force_reg (mode, v_sign_bitmask);
> + emit_insn (gen_cond_ior (int_res, operands[1], arg2,
> +   arg3, arg4));
> + emit_move_insn (operands[0], gen_lowpart (mode, int_res));
> + DONE;
> +   }
> +  }
> +
> +operands[2] = force_reg (mode, operands[3]);
> +emit_insn (gen_and3 (sign, arg3, v_sign_bitmask));
> +emit_insn (gen_and3
> +(mant, arg2,
> + aarch64_simd_gen_const_vector_dup (mode,
> +~(HOST_WIDE_INT_M1U
> +  << bits;
> +emit_insn (gen_cond_ior (int_res, operands[1], sign, mant,
> +   arg4));
> +emit_move_insn (operands[0], gen_lowpart (mode, int_res));
> +DONE;
> +  }
> +)
> +
>  (define_expand "xorsign3"
>[(match_operand:SVE_FULL_F 0 "register_operand")
> (match_operand:SVE_FULL_F 1 "register_operand")
> diff -

Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)

2023-10-05 Thread Siddhesh Poyarekar

On 2023-10-05 14:51, Siddhesh Poyarekar wrote:

On 2023-08-25 11:24, Qing Zhao wrote:

Provide a new counted_by attribute to flexible array member field.


The obligatory "I can't ack the patch but here's a review" disclaimer :)



'counted_by (COUNT)'
  The 'counted_by' attribute may be attached to the flexible array
  member of a structure.  It indicates that the number of the
  elements of the array is given by the field named "COUNT" in the
  same structure as the flexible array member.  GCC uses this
  information to improve the results of the array bound sanitizer and
  the '__builtin_dynamic_object_size'.

  For instance, the following code:

   struct P {
 size_t count;
 char other;
 char array[] __attribute__ ((counted_by (count)));
   } *p;

  specifies that the 'array' is a flexible array member whose number
  of elements is given by the field 'count' in the same structure.

  The field that represents the number of the elements should have an
  integer type.  An explicit 'counted_by' annotation defines a
  relationship between two objects, 'p->array' and 'p->count', that
  'p->array' has _at least_ 'p->count' number of elements available.
  This relationship must hold even after any of these related objects
  are updated.  It's the user's responsibility to make sure this
  relationship to be kept all the time.  Otherwise the results of the
  array bound sanitizer and the '__builtin_dynamic_object_size' might
  be incorrect.

  For instance, in the following example, the allocated array has
  less elements than what's specified by the 'sbuf->count', this is
  an user error.  As a result, out-of-bounds access to the array
  might not be detected.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
    (offsetof (struct P, 
array[0])

 + nelems * sizeof (char;
 sbuf->count = nelems + SIZE_BUMP;
 /* This is invalid when the sbuf->array has less than 
sbuf->count

    elements.  */
   }

  In the following example, the 2nd update to the field 'sbuf->count'
  of the above structure will permit out-of-bounds access to the
  array 'sbuf>array' as well.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
    (offsetof (struct P, 
array[0])
 + (nelems + SIZE_BUMP) * 
sizeof (char;

 sbuf->count = nelems;
 /* This is valid when the sbuf->array has at least 
sbuf->count

    elements.  */
   }
   void use_buf (int index)
   {
 sbuf->count = sbuf->count + SIZE_BUMP + 1;
 /* Now the value of sbuf->count is larger than the number
    of elements of sbuf->array.  */
 sbuf->array[index] = 0;
 /* then the out-of-bound access to this array
    might not be detected.  */
   }

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.
* tree.cc (get_named_field): New function.
* tree.h (get_named_field): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
  gcc/c-family/c-attribs.cc    | 54 -
  gcc/c-family/c-common.cc | 13 
  gcc/c-family/c-common.h  |  1 +
  gcc/c/c-decl.cc  | 79 +++-
  gcc/doc/extend.texi  | 77 +++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++
  gcc/tree.cc  | 40 ++
  gcc/tree.h   |  5 ++
  8 files changed, 291 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family

Re: [PATCH]AArch64 Add special patterns for creating DI scalar and vector constant 1 << 63 [PR109154]

2023-10-05 Thread Richard Sandiford
Tamar Christina  writes:
> Hi,
>
>> The lowpart_subreg should simplify this back into CONST0_RTX (mode),
>> making it no different from:
>> 
>> emti_move_insn (target, CONST0_RTX (mode));
>> 
>> If the intention is to share zeros between modes (sounds good!), then I think
>> the subreg needs to be on the lhs instead.
>> 
>> > +  rtx neg = lowpart_subreg (V2DFmode, target, mode);
>> > +  emit_insn (gen_negv2df2 (neg, lowpart_subreg (V2DFmode, target,
>> > + mode)));
>> 
>> The rhs seems simpler as copy_rtx (neg).  (Even the copy_rtx shouldn't be
>> needed after RA, but it's probably more future-proof to keep it.)
>> 
>> > +  emit_move_insn (target, lowpart_subreg (mode, neg, V2DFmode));
>> 
>> This shouldn't be needed, since neg is already a reference to target.
>> 
>> Overall, looks like a nice change/framework.
>
> Updated the patch, and in te process also realized this can be used for the
> vector variants:
>
> Hi All,
>
> This adds a way to generate special sequences for creation of constants for
> which we don't have single instructions sequences which would have normally
> lead to a GP -> FP transfer or a literal load.
>
> The patch starts out by adding support for creating 1 << 63 using fneg (mov 
> 0).
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/109154
>   * config/aarch64/aarch64-protos.h (aarch64_simd_special_constant_p,
>   aarch64_maybe_generate_simd_constant): New.
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_mov,
>   *aarch64_simd_mov): Add new coden for special constants.
>   * config/aarch64/aarch64.cc (aarch64_extract_vec_duplicate_wide_int):
>   Take optional mode.
>   (aarch64_simd_special_constant_p,
>   aarch64_maybe_generate_simd_constant): New.
>   * config/aarch64/aarch64.md (*movdi_aarch64): Add new codegen for
>   special constants.
>   * config/aarch64/constraints.md (Dx): new.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/109154
>   * gcc.target/aarch64/fneg-abs_1.c: Updated.
>   * gcc.target/aarch64/fneg-abs_2.c: Updated.
>   * gcc.target/aarch64/fneg-abs_4.c: Updated.
>   * gcc.target/aarch64/dbl_mov_immediate_1.c: Updated.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..36d6c688bc888a51a9de174bd3665aebe891b8b1
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -831,6 +831,8 @@ bool aarch64_sve_ptrue_svpattern_p (rtx, struct 
> simd_immediate_info *);
>  bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
>   enum simd_immediate_check w = AARCH64_CHECK_MOV);
>  rtx aarch64_check_zero_based_sve_index_immediate (rtx);
> +bool aarch64_maybe_generate_simd_constant (rtx, rtx, machine_mode);
> +bool aarch64_simd_special_constant_p (rtx, machine_mode);
>  bool aarch64_sve_index_immediate_p (rtx);
>  bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
>  bool aarch64_sve_sqadd_sqsub_immediate_p (machine_mode, rtx, bool);
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 81ff5bad03d598fa0d48df93d172a28bc0d1d92e..33eceb436584ff73c7271f93639f2246d1af19e0
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -142,26 +142,35 @@ (define_insn "aarch64_dup_lane_"
>[(set_attr "type" "neon_dup")]
>  )
>  
> -(define_insn "*aarch64_simd_mov"
> +(define_insn_and_split "*aarch64_simd_mov"
>[(set (match_operand:VDMOV 0 "nonimmediate_operand")
>   (match_operand:VDMOV 1 "general_operand"))]
>"TARGET_FLOAT
> && (register_operand (operands[0], mode)
> || aarch64_simd_reg_or_zero (operands[1], mode))"
> -  {@ [cons: =0, 1; attrs: type, arch]
> - [w , m ; neon_load1_1reg , *   ] ldr\t%d0, %1
> - [r , m ; load_8 , *   ] ldr\t%x0, %1
> - [m , Dz; store_8, *   ] str\txzr, %0
> - [m , w ; neon_store1_1reg, *   ] str\t%d1, %0
> - [m , r ; store_8, *   ] str\t%x1, %0
> - [w , w ; neon_logic  , simd] mov\t%0., %1.
> - [w , w ; neon_logic  , *   ] fmov\t%d0, %d1
> - [?r, w ; neon_to_gp  , simd] umov\t%0, %1.d[0]
> - [?r, w ; neon_to_gp  , *   ] fmov\t%x0, %d1
> - [?w, r ; f_mcr  , *   ] fmov\t%d0, %1
> - [?r, r ; mov_reg, *   ] mov\t%0, %1
> - [w , Dn; neon_move   , simd] << 
> aarch64_output_simd_mov_immediate (operands[1], 64);
> - [w , Dz; f_mcr  , *   ] fmov\t%d0, xzr
> +  {@ [cons: =0, 1; attrs: type, arch, length]
> + [w , m ; neon_load1_1reg , *   , *] ldr\t%d0, %1
> + [r , m ; load_8 , *   , *] ldr\t%x0, %1
> + [m , Dz; store_8, *   , *] str\txzr, %0
> + [m , w ; neo

RE: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-05 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, October 5, 2023 8:29 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This adds an implementation for masked copysign along with an
> > optimized pattern for masked copysign (x, -1).
> 
> It feels like we're ending up with a lot of AArch64-specific code that just 
> hard-
> codes the observation that changing the sign is equivalent to changing the top
> bit.  We then need to make sure that we choose the best way of changing the
> top bit for any given situation.
> 
> Hard-coding the -1/negative case is one instance of that.  But it looks like 
> we
> also fail to use the best sequence for SVE2.  E.g.
> [https://godbolt.org/z/ajh3MM5jv]:
> 
> #include 
> 
> void f(double *restrict a, double *restrict b) {
> for (int i = 0; i < 100; ++i)
> a[i] = __builtin_copysign(a[i], b[i]); }
> 
> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
> for (int i = 0; i < 100; ++i)
> a[i] = (a[i] & ~c) | (b[i] & c); }
> 
> gives:
> 
> f:
> mov x2, 0
> mov w3, 100
> whilelo p7.d, wzr, w3
> .L2:
> ld1dz30.d, p7/z, [x0, x2, lsl 3]
> ld1dz31.d, p7/z, [x1, x2, lsl 3]
> and z30.d, z30.d, #0x7fff
> and z31.d, z31.d, #0x8000
> orr z31.d, z31.d, z30.d
> st1dz31.d, p7, [x0, x2, lsl 3]
> incdx2
> whilelo p7.d, w2, w3
> b.any   .L2
> ret
> g:
> mov x3, 0
> mov w4, 100
> mov z29.d, x2
> whilelo p7.d, wzr, w4
> .L6:
> ld1dz30.d, p7/z, [x0, x3, lsl 3]
> ld1dz31.d, p7/z, [x1, x3, lsl 3]
> bsl z31.d, z31.d, z30.d, z29.d
> st1dz31.d, p7, [x0, x3, lsl 3]
> incdx3
> whilelo p7.d, w3, w4
> b.any   .L6
> ret
> 
> I saw that you originally tried to do this in match.pd and that the decision 
> was
> to fold to copysign instead.  But perhaps there's a compromise where isel does
> something with the (new) copysign canonical form?
> I.e. could we go with your new version of the match.pd patch, and add some
> isel stuff as a follow-on?
> 

Sure if that's what's desired But..

The example you posted above is for instance worse for x86 
https://godbolt.org/z/x9ccqxW6T
where the first operation has a dependency chain of 2 and the latter of 3.  
It's likely any
open coding of this operation is going to hurt a target.

So I'm unsure what isel transform this into...

Tamar

> Not saying no to this patch, just thought that the above was worth
> considering.
> 
> [I agree with Andrew's comments FWIW.]
> 
> Thanks,
> Richard
> 
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/109154
> > * config/aarch64/aarch64-sve.md (cond_copysign): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/109154
> > * gcc.target/aarch64/sve/fneg-abs_5.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> > b/gcc/config/aarch64/aarch64-sve.md
> > index
> >
> 071400c820a5b106ddf9dc9faebb117975d74ea0..00ca30c24624dc661254
> 568f45b6
> > 1a14aa11c305 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -6429,6 +6429,57 @@ (define_expand "copysign3"
> >}
> >  )
> >
> > +(define_expand "cond_copysign"
> > +  [(match_operand:SVE_FULL_F 0 "register_operand")
> > +   (match_operand: 1 "register_operand")
> > +   (match_operand:SVE_FULL_F 2 "register_operand")
> > +   (match_operand:SVE_FULL_F 3 "nonmemory_operand")
> > +   (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
> > +  "TARGET_SVE"
> > +  {
> > +rtx sign = gen_reg_rtx (mode);
> > +rtx mant = gen_reg_rtx (mode);
> > +rtx int_res = gen_reg_rtx (mode);
> > +int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
> > +
> > +rtx arg2 = lowpart_subreg (mode, operands[2],
> mode);
> > +rtx arg3 = lowpart_subreg (mode, operands[3],
> mode);
> > +rtx arg4 = lowpart_subreg (mode, operands[4],
> > + mode);
> > +
> > +rtx v_sign_bitmask
> > +  = aarch64_simd_gen_const_vector_dup (mode,
> > +  HOST_WIDE_INT_M1U << bits);
> > +
> > +/* copysign (x, -1) should instead be expanded as orr with the sign
> > +   bit.  */
> > +if (!REG_P (operands[3]))
> > +  {
> > +   auto r0
> > + = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate
> (operands[3]));
> > +   if (-1 == real_to_integer (r0))
> > + {
> > +   arg3 = force_reg (mode, v_sign_bitmask);
> > +   emit_insn (gen_cond_ior (int_res, operands[1], arg2,
>

Re: [V3][PATCH 2/3] Use the counted_by atribute info in builtin object size [PR108896]

2023-10-05 Thread Siddhesh Poyarekar




On 2023-08-25 11:24, Qing Zhao wrote:

Use the counted_by atribute info in builtin object size to compute the
subobject size for flexible array members.

gcc/ChangeLog:

PR C/108896
* tree-object-size.cc (addr_object_size): Use the counted_by
attribute info.
* tree.cc (component_ref_has_counted_by_p): New function.
(component_ref_get_counted_by): New function.
* tree.h (component_ref_has_counted_by_p): New prototype.
(component_ref_get_counted_by): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by-2.c: New test.
* gcc.dg/flex-array-counted-by-3.c: New test.
---
  .../gcc.dg/flex-array-counted-by-2.c  |  74 ++
  .../gcc.dg/flex-array-counted-by-3.c  | 210 ++
  gcc/tree-object-size.cc   |  37 ++-
  gcc/tree.cc   |  95 +++-
  gcc/tree.h|  10 +
  5 files changed, 418 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c

diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
new file mode 100644
index ..ec580c1f1f01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
@@ -0,0 +1,74 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+#define expect(p, _v) do { \
+size_t v = _v; \
+if (p == v) \
+   __builtin_printf ("ok:  %s == %zd\n", #p, p); \
+else \
+   {  \
+ __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
+ FAIL (); \
+   } \
+} while (0);


You're using this in a bunch of tests already; does it make sense to 
consolidate it into builtin-object-size-common.h?



+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+expect(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+expect(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+expect(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}


Maybe another test where the allocation, size assignment and __bdos call 
happen in the same function, where the allocator is not recognized by gcc:


void *
__attribute__ ((noinline))
alloc (size_t sz)
{
  return __builtin_malloc (sz);
}

void test (size_t sz)
{
  array_annotated = alloc (sz);
  array_annotated->b = sz;
  return __builtin_dynamic_object_size (array_annotated->c, 1);
}

The interesting thing to test (and ensure in the codegen) is that the 
assignment to array_annotated->b does not get reordered to below the 
__builtin_dynamic_object_size call since technically there is no data 
dependency between the two.



+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..a0c3cb88ec71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,210 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?  */


If the behaviour is undefined, does it make sense to add tests for this? 
 Maybe once you have a -Wmismatched-counted-by or similar, we could 
have tests for that.  I guess the counter-argument is that we keep track 
of this behaviour but not necessarily guarantee it.



+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attrib

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-05 Thread Siddhesh Poyarekar

On 2023-08-25 11:24, Qing Zhao wrote:

This is the 3rd version of the patch, per our discussion based on the
review comments for the 1st and 2nd version, the major changes in this
version are:


Hi Qing,

I hope the review was helpful.  Overall, a couple of things to consider:

1. How would you handle potential reordering between assignment of the 
size to the counted_by field with the __bdos call that may consume it? 
You'll probably need to express some kind of dependency there or in the 
worst case, insert a barrier to disallow reordering.


2. How would you handle signedness of the size field?  The size gets 
converted to sizetype everywhere it is used and overflows/underflows may 
produce interesting results.  Do you want to limit the types to unsigned 
or do you want to add a disclaimer in the docs?  The former seems like 
the *right* thing to do given that it is a new feature; best to enforce 
the cleaner habit at the outset.


Thanks,
Sid



***Against 1st version:
1. change the name "element_count" to "counted_by";
2. change the parameter for the attribute from a STRING to an
Identifier;
3. Add logic and testing cases to handle anonymous structure/unions;
4. Clarify documentation to permit the situation when the allocation
size is larger than what's specified by "counted_by", at the same time,
it's user's error if allocation size is smaller than what's specified by
"counted_by";
5. Add a complete testing case for using counted_by attribute in
__builtin_dynamic_object_size when there is mismatch between the
allocation size and the value of "counted_by", the expecting behavior
for each case and the explanation on why in the comments.

***Against 2rd version:
1. Identify a tree node sharing issue and fixed it in the routine
"component_ref_get_counted_ty" of tree.cc;
2. Update the documentation and testing cases with the clear usage
of the fomula to compute the allocation size:
MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * 
sizeof(element))
(the algorithm used in tree-object-size.cc is correct).

In this set of patches, the major functionality provided is:

1. a new attribute "counted_by";
2. use this new attribute in bound sanitizer;
3. use this new attribute in dynamic object size for subobject size;

As discussed, I plan to add two more separate patches sets after this initial
patch set is approved and committed.

set 1. A new warning option and a new sanitizer option for the user error
   when the allocation size is smaller than the value of "counted_by".
set 2. An improvement to __builtin_dynamic_object_size  for whole-object
   size of the structure with FAM annaoted with counted_by.

there are also some existing bugs in tree-object-size.cc identified
during the study, and PRs were filed to record them. these bugs will
be fixed seperately with individual patches:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040

Bootstrapped and regression tested on both aarch64 and X86, no issue.

Please see more details on the description of this work on:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html

and more discussions on
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html

Okay for committing?

thanks.

Qing

Qing Zhao (3):
   Provide counted_by attribute to flexible array member field (PR108896)
   Use the counted_by atribute info in builtin object size [PR108896]
   Use the counted_by attribute information in bound sanitizer[PR108896]

  gcc/c-family/c-attribs.cc |  54 -
  gcc/c-family/c-common.cc  |  13 ++
  gcc/c-family/c-common.h   |   1 +
  gcc/c-family/c-ubsan.cc   |  16 ++
  gcc/c/c-decl.cc   |  79 +--
  gcc/doc/extend.texi   |  77 +++
  .../gcc.dg/flex-array-counted-by-2.c  |  74 ++
  .../gcc.dg/flex-array-counted-by-3.c  | 210 ++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 
  .../ubsan/flex-array-counted-by-bounds-2.c|  27 +++
  .../ubsan/flex-array-counted-by-bounds.c  |  46 
  gcc/tree-object-size.cc   |  37 ++-
  gcc/tree.cc   | 133 +++
  gcc/tree.h|  15 ++
  14 files changed, 797 insertions(+), 25 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c



[committed] hppa: Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h

2023-10-05 Thread John David Anglin
In spite of what the comment says, the MALLOC_ALIGNMENT in glibc
for 32-bit hppa-linux has always been 8, not 16.  There is no
reason to increase it as the old linux threads implementation
has been removed.  So, we can use default in pa.h.

Dave
---

Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h

2023-10-05  John David Anglin  

* config/pa/pa32-linux.h (MALLOC_ABI_ALIGNMENT): Delete.

diff --git a/gcc/config/pa/pa32-linux.h b/gcc/config/pa/pa32-linux.h
index bdd13ce492e..f48e45374f4 100644
--- a/gcc/config/pa/pa32-linux.h
+++ b/gcc/config/pa/pa32-linux.h
@@ -68,11 +68,6 @@ call_ ## FUNC (void) \
 #undef  WCHAR_TYPE_SIZE
 #define WCHAR_TYPE_SIZE BITS_PER_WORD
 
-/* POSIX types such as pthread_mutex_t require 16-byte alignment to retain
-   layout compatibility with the original linux thread implementation.  */
-#undef MALLOC_ABI_ALIGNMENT
-#define MALLOC_ABI_ALIGNMENT 128
-
 /* Place jump tables in the text section except when generating non-PIC
code.  When generating non-PIC code, the relocations needed to load the
address of the jump table result in a text label in the final executable


signature.asc
Description: PGP signature


Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-05 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Thursday, October 5, 2023 8:29 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This adds an implementation for masked copysign along with an
>> > optimized pattern for masked copysign (x, -1).
>> 
>> It feels like we're ending up with a lot of AArch64-specific code that just 
>> hard-
>> codes the observation that changing the sign is equivalent to changing the 
>> top
>> bit.  We then need to make sure that we choose the best way of changing the
>> top bit for any given situation.
>> 
>> Hard-coding the -1/negative case is one instance of that.  But it looks like 
>> we
>> also fail to use the best sequence for SVE2.  E.g.
>> [https://godbolt.org/z/ajh3MM5jv]:
>> 
>> #include 
>> 
>> void f(double *restrict a, double *restrict b) {
>> for (int i = 0; i < 100; ++i)
>> a[i] = __builtin_copysign(a[i], b[i]); }
>> 
>> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
>> for (int i = 0; i < 100; ++i)
>> a[i] = (a[i] & ~c) | (b[i] & c); }
>> 
>> gives:
>> 
>> f:
>> mov x2, 0
>> mov w3, 100
>> whilelo p7.d, wzr, w3
>> .L2:
>> ld1dz30.d, p7/z, [x0, x2, lsl 3]
>> ld1dz31.d, p7/z, [x1, x2, lsl 3]
>> and z30.d, z30.d, #0x7fff
>> and z31.d, z31.d, #0x8000
>> orr z31.d, z31.d, z30.d
>> st1dz31.d, p7, [x0, x2, lsl 3]
>> incdx2
>> whilelo p7.d, w2, w3
>> b.any   .L2
>> ret
>> g:
>> mov x3, 0
>> mov w4, 100
>> mov z29.d, x2
>> whilelo p7.d, wzr, w4
>> .L6:
>> ld1dz30.d, p7/z, [x0, x3, lsl 3]
>> ld1dz31.d, p7/z, [x1, x3, lsl 3]
>> bsl z31.d, z31.d, z30.d, z29.d
>> st1dz31.d, p7, [x0, x3, lsl 3]
>> incdx3
>> whilelo p7.d, w3, w4
>> b.any   .L6
>> ret
>> 
>> I saw that you originally tried to do this in match.pd and that the decision 
>> was
>> to fold to copysign instead.  But perhaps there's a compromise where isel 
>> does
>> something with the (new) copysign canonical form?
>> I.e. could we go with your new version of the match.pd patch, and add some
>> isel stuff as a follow-on?
>> 
>
> Sure if that's what's desired But..
>
> The example you posted above is for instance worse for x86 
> https://godbolt.org/z/x9ccqxW6T
> where the first operation has a dependency chain of 2 and the latter of 3.  
> It's likely any
> open coding of this operation is going to hurt a target.
>
> So I'm unsure what isel transform this into...

I didn't mean that we should go straight to using isel for the general
case, just for the new case.  The example above was instead trying to
show the general point that hiding the logic ops in target code is a
double-edged sword.

The x86_64 example for the -1 case would be https://godbolt.org/z/b9s6MaKs8
where the isel change would be an improvement.  Without that, I guess
x86_64 will need to have a similar patch to the AArch64 one.

That said, https://godbolt.org/z/e6nqoqbMh suggests that powerpc64
is probably relying on the current copysign -> neg/abs transform.
(Not sure why the second function uses different IVs from the first.)

Personally, I wouldn't be against a target hook that indicated whether
float bit manipulation is "free" for a given mode, if it comes to that.

Thanks,
Richard


  1   2   >