Re: [PATCH] Remove poly_int_pod

2023-09-28 Thread Jeff Law




On 9/28/23 11:26, Jason Merrill wrote:

On 9/28/23 05:55, Richard Sandiford wrote:

poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors.  This led to an awkward split
between poly_int_pod and poly_int.  poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body.  But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).

All that goes away if we switch to using default constructors.

The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code.  The two problems here are:

(1) When initialising a poly_int with fewer than N
 coefficients, the other coefficients need to be a zero of
 the same precision as the explicit coefficients.  This was
 previously done in a for loop using wi::ints_for<...>::zero,
 but C++11 constexpr constructors can't have function bodies.
 The patch instead uses a series of delegated initialisers to
 fill in the implicit coefficients.


Perhaps it's time to update the bootstrap requirement to C++14 (i.e. GCC 
5, from eight years ago).  Not that this would affect this particular 
patch.
IIRC the primary reason we settled on gcc-4.8.x was RHEL7/Centos7.  With 
RHEL 7 approaching EOL moving the baseline forward would seem to make sense.


I'd want to know if this affects folks using SuSE's enterprise distro 
before actually making the change, but I'm broadly in favor of moving 
forward it it's not going to have a major impact on users that are using 
enterprise distros.


jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-28 Thread Jeff Law




On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.
So I scoured my old gcc2 archives to see if there was anything that 
might hint as to why this was changed.  Sadly (but not unexpectedly), 
nothing.  The relevant ChangeLog entry is;




Fri Jul  8 11:46:50 1994  Richard Kenner  (ken...@vlsi1.ultra.nyu.edu)

* varasm.c (record_constant_rtx, force_const_mem): Ensure everything
is in saveable_obstack, not current_obstack.

* combine.c (force_to_mode): OP_MODE must be MODE if MODE and
mode of X are of different classes.
(nonzero_bits, num_sign_bit_copies): Say nothing known for
floating-point modes.

* function.c (instantiate_virtual_regs_1, case SET):
If DEST is virtual_stack_vars_rtx, replace with hardware
frame pointer.

* expr.c (expand_expr, case CONVERT_EXPR): If changing signedness
and we have a promoted SUBREG, clear the promotion flag.

* c-decl.c (finish_decl): Put RTL and other stuff in
permanent_obstack if DECL is.

* combine.c (gen_unary): Add new arg, OP0_MODE.
All callers changed.


So standard practice back then was to re-use the header and have a blank 
line between conceptual changes if the same author made a series of 
changes.  So it's reasonable to assume the expr.c change was considered 
independent of the other changes.


At that particular time I think Kenner was mostly focused on the alpha 
and ppc ports, but I think he was also still poking around with romp and 
a29k.  I think romp is an unlikely target for this because it didn't 
promote modes and it wasn't even building for several months 
(April->late July).


PPC and a29k were both 32 bit ports and while they did promotions, I 
would hazard a guess the alpha was actually more sensitive to this 
stuff.  Which suggests a possible path forward.


I can bootstrap & regression test alpha using QEMU user mode emulation. 
So we might be able to trigger something that way.  It'll take some 
time, but might prove fruitful.



Jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-29 Thread Jeff Law




On 9/28/23 21:49, Vineet Gupta wrote:


On 9/28/23 20:17, Jeff Law wrote:
I can bootstrap & regression test alpha using QEMU user mode 
emulation. So we might be able to trigger something that way. It'll 
take some time, but might prove fruitful. 


That would be awesome. It's not like this this is burning or anything 
and one of the things in the long tail of things we need to do in this 
area.
Absolutely true.  In fact, I doubt this particular quirk is all that 
important in the extension removal space.  But we've got enough data to 
do a bit of an experiment.  While it takes a long time to run, it 
doesn't require any kind of regular human intervention.  Better to fire 
it off now while it's still fresh in our minds.  If we wait a week or 
two I'll have forgotten everything.


Jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-29 Thread Jeff Law




On 9/29/23 04:40, Roger Sayle wrote:


I agree that this looks dubious.  Normally, if the middle-end/optimizers
wish to reuse a SUBREG in a context where the flags are not valid, it
should create a new one with the desired flags, rather than "mutate"
an existing (and possibly shared) RTX.
SUBREGs aren't shared, though I don't think that changes any of your 
conclusions.



jeff


Re: committed [RISC-V]: Harden test scan patterns

2023-09-29 Thread Jeff Law




On 9/27/23 17:21, Vineet Gupta wrote:



On 9/27/23 13:14, Jeff Law wrote:

It would help to describe how these patterns were under specified so
that folks don't continue to make the same mistake as new tests get 
added.


dg-final scan-assembler, scan-assembler-not, and scan-assembler-times
use a tcl regular expression (often referred to abbreviated as RE), as
described in https://www.tcl.tk/man/tcl8.4/TclCmd/re_syntax.html .

If your RE is not specific enough, it can match LTO information that the
compiler places into its assembly output when the relevant options are
provided, which is common when running tests where the test harness
iterates over a number of optimization option combinations.
Note that '.' is an atom that can match any character.  If you want to
match a dot specifically, you have to escape it with a backslash: '\.' .
When you are matching an instruction mnemonic, an effective way to
avoid matching in LTO information is to enforce matching of word start
(\m) and/or word end (\M) .
Note also that the backslash has to be quoted.  If the RE is enclosed in
'"' quotes, extra backslashes are needed.  That is not necessary when it
is enclosed in curly braces.

For example, "ld.w" will be matched in:

.ascii "h\227\022\212ld@w\251jr\254'\320\255vwj\252\026\016\364"

If you write {\mld\.w\M} instead, you avoid this problem.
OK.  So that naturally leads to the question, why aren't others seeing 
this, both in the RISC-V world and more generally.  I'm not aware of 
any case where I've run the testsuite and tripped over this issue, nor 
am I aware of anyone else tripping over it.


Actually I did run into it. See commit ecfa870ff29d979bd2c ("RISC-V: 
optim const DF +0.0 store to mem [PR/110748]") where a false failure was 
triggered due to these random LTO strings and needed adjusting.

Ah!  Good (I suppose).



-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
The downside of this approach (which I nearly suggested to Joern) is 
that it relies on tabs before/after the mnemonic.  That would also imply 
a review policy to ensure we always use tabs -- and there's going to be 
cases where spotting a tab vs space is going to be tough in a patch 
review, particularly after the mnemonic.


We could instead match a regexp that allows spaces or tabs, but at that 
point I doubt it's simpler than Joern's approach.


So I recommend we go forward with Joern's approach (so consider that an 
ACK for the trunk).   Joern  can you post a follow-up manual twiddle so 
that other ports can follow your example and avoid this problem?


THanks,

jeff


Re: [PING] [PATCH] Harmonize headers between both dg-extract-results scripts

2023-09-29 Thread Jeff Law




On 9/29/23 02:19, Paul Iannetta wrote:

On Tue, Sep 26, 2023 at 08:29:11AM -0600, Jeff Law wrote:



On 9/25/23 03:55, Paul Iannetta wrote:

On Mon, Sep 18, 2023 at 08:39:34AM +0200, Paul Iannetta wrote:

On Thu, Sep 14, 2023 at 04:24:33PM +0200, Paul Iannetta wrote:

Hi,

This is a small patch so that both dg-extract-results.py and
dg-extract-results.sh share the same header.  In particular, it fixes
the fact that the regexp r'^Test Run By (\S+) on (.*)$' was never
matched in the python file.


By the way, the bash script dg-extract-results.sh checks whether
python is available by invoking python.  However, it seems that the
policy on newer machines is to not provide python as a symlink (at
least on Ubuntu 22.04 and above; and RHEL 8).  Therefore, we might
want to also check against python3 so that the bash script does not
fail to find python even though it is available.

Thanks,
Paul



Author: Paul Iannetta 
Date:   Thu Sep 14 15:43:58 2023 +0200

  Harmonize headers between both dg-extract-results scripts

  The header of the python version looked like:
  Target is ...
  Host   is ...
  The header of the bash version looked like:
  Test run by ... on ...
  Target is ...

  After this change both headers look like:
  Test run by ... on ...
  Target is ...
  Host   is ...

  The order of the tests is not the same but since dg-cmp-results.sh it
  does not matter much.

  contrib/ChangeLog:

  2023-09-14  Paul Iannetta  

  * dg-extract-results.py: Print the "Test run" line.
  * dg-extract-results.sh: Print the "Host" line.

OK
jeff


Thanks,
May I ask to you commit it to trunk on my behalf?

Done.  Thanks for letting me know you didn't have write access.

Jeff


Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-09-29 Thread Jeff Law




On 9/26/23 18:50, Tamar Christina wrote:

Hi All,

For targets that allow conversion between int and float modes this adds a new
optimization transforming fneg (fabs (x)) into x | (1 << signbit(x)).  Such
sequences are common in scientific code working with gradients.

The transformed instruction if the target has an inclusive-OR that takes an
immediate is both shorter an faster.  For those that don't the immediate has
to be seperate constructed but this still ends up being faster as the immediate
construction is not on the critical path.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: Add new neg+abs rule.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/fneg-abs_1.c: New test.
* gcc.target/aarch64/fneg-abs_2.c: New test.
* gcc.target/aarch64/fneg-abs_3.c: New test.
* gcc.target/aarch64/fneg-abs_4.c: New test.
* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
* gcc.target/aarch64/sve/fneg-abs_4.c: New test.

--- inline copy of patch --
diff --git a/gcc/match.pd b/gcc/match.pd
index 
39c7ea1088f25538ed8bd26ee89711566141a71f..8ebde06dcd4b26d694826cffad0fb17e1136600a
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9476,3 +9476,57 @@ and,
}
(if (full_perm_p)
(vec_perm (op@3 @0 @1) @3 @2))
+
+/* Transform fneg (fabs (X)) -> X | 1 << signbit (X).  */
+
+(simplify
+ (negate (abs @0))
+ (if (FLOAT_TYPE_P (type)
+  /* We have to delay this rewriting till after forward prop because 
otherwise
+it's harder to do trigonometry optimizations. e.g. cos(-fabs(x)) is not
+matched in one go.  Instead cos (-x) is matched first followed by 
cos(|x|).
+The bottom op approach makes this rule match first and it's not untill
+fwdprop that we match top down.  There are manu such simplications so 
we
Multiple typos this line.  fwdprop->fwprop manu->many 
simplications->simplifications.


OK with the typos fixed.

Thanks.  I meant to say hi at the Cauldron, but never seemed to get away 
long enough to find you..


jeff



Re: [PATCH] Testsuite, DWARF2: adjust regexp to match darwin output

2023-09-29 Thread Jeff Law




On 9/20/23 07:53, FX Coudert wrote:

ping**2



Hi,

This was a painful one to fix, because I hate regexps, especially when they are 
quoted. On darwin, we have this failure:

FAIL: gcc.dg/debug/dwarf2/inline4.c scan-assembler 
DW_TAG_inlined_subroutine[^(]*([^)]*)[^(]*(DIE 
(0x[0-9a-f]*) DW_TAG_formal_parameter[^(]*(DIE 
(0x[0-9a-f]*) DW_TAG_variable

That hideous regexp is trying to match (generated on Linux):


.uleb128 0x4# (DIE (0x5c) DW_TAG_inlined_subroutine)
.long   0xa0# DW_AT_abstract_origin
.quad   .LBI4   # DW_AT_entry_pc
.byte   .LVU2   # DW_AT_GNU_entry_view
.quad   .LBB4   # DW_AT_low_pc
.quad   .LBE4-.LBB4 # DW_AT_high_pc
.byte   0x1 # DW_AT_call_file (u.c)
.byte   0xf # DW_AT_call_line
.byte   0x14# DW_AT_call_column
.uleb128 0x5# (DIE (0x7d) DW_TAG_formal_parameter)
.long   0xad# DW_AT_abstract_origin
.long   .LLST0  # DW_AT_location
.long   .LVUS0  # DW_AT_GNU_locviews
.uleb128 0x6# (DIE (0x8a) DW_TAG_variable)


It is using the parentheses to check what is between  DW_TAG_inlined_subroutine, 
DW_TAG_formal_parameter and DW_TAG_variable. There’s only one block of parentheses 
in the middle, that "(u.c)”. However, on darwin, the generated output is more 
compact:


.uleb128 0x4; (DIE (0x188) DW_TAG_inlined_subroutine)
.long   0x1b8   ; DW_AT_abstract_origin
.quad   LBB4; DW_AT_low_pc
.quad   LBE4; DW_AT_high_pc
.uleb128 0x5; (DIE (0x19d) DW_TAG_formal_parameter)
.long   0x1c6   ; DW_AT_abstract_origin
.uleb128 0x6; (DIE (0x1a2) DW_TAG_variable)


I think that’s valid as well, and the test should pass (what the test really 
wants to check is that there is no DW_TAG_lexical_block emitted there, see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37801 for its origin). It could be 
achieved in two ways:

1. making darwin emit the DW_AT_call_file
2. adjusting the regexp to match, making the internal block of parentheses 
optional

I chose the second approach. It makes the test pass on darwin. If someone can 
test it on linux, it’d be appreciated :) I don’t have ready access to such a 
system right now.

Once that passes, OK to commit?
I actually tested the change on a few embedded targets (rx-elf, 
bfin-elf, visium-elf and lm32-elf).  For this change, that should be 
sufficient.


OK for the trunk, sorry for the delay.

jeff



Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-09-29 Thread Jeff Law




On 9/19/23 02:44, Jin Ma wrote:

gcc/ChangeLog:

* config/riscv/iterators.md (HFBF): New.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type_Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_block_arith_comp_libfuncs_for_mode): New.
(riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
* config/riscv/riscv.md (mode" ): Add BF.
(truncdfbf2): New.
(movhf): Support for BFmode.
(mov): Ditto.
(*mov_softfloat):  Ditto.
(fix_truncbf2): New.
(fixuns_truncbf2): New.
(floatbf2): New.
(floatunsbf2): New.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): New.
* config/riscv/t-softfp32: Add support for BF libfuncs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparisons.c: New test.
* gcc.target/riscv/bf16_convert-1.c: New test.
* gcc.target/riscv/bf16_convert-2.c: New test.
* gcc.target/riscv/bf16_convert_run.c: New test.
So this can't go in the tree until the extension has moved into a frozen 
state.  Hopefully that'll happen before we close stage1 development in Nov.





diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e00b8ee3579..5048628c784 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1631,6 +1631,20 @@ (define_insn "truncdfhf2"
[(set_attr "type" "fcvt")
 (set_attr "mode" "HF")])
  
+;; The conversion of DF to BF needs to be done with SF if there is a

+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __truncdfbf2.
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF 0 "register_operand" "=f")
+   (float_truncate:BF
+   (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  })
So for conversions to/from BFmode, doesn't generic code take care of 
this for us?  Search for convert_mode_scalar in expr.cc. That code will 
utilize SFmode as an intermediate step just like your expander.   Is 
there some reason that generic code is insufficient?


Similarly for the the other conversions.

Otherwise it looks pretty good.

Jeff


Re: [RFC 2/2] RISC-V: Add 'Zfbfmin' extension.

2023-09-29 Thread Jeff Law




On 9/19/23 02:46, Jin Ma wrote:

This patch adds the 'Zfbfmin' extension for riscv, which is based on spec of 
bfloat16:
https://github.com/riscv/riscv-bfloat16/commit/5578e34e15a44e9ad13246072a29f51274b4d999

The 'Zfbfmin' extension of binutils-gdb (REVIEW ONLY):
https://sourceware.org/pipermail/binutils/2023-August/128773.html

The 'Zfbfmin' extension of qemu:
https://github.com/qemu/qemu/commit/5d1270caac2ef7b8c887d4cb5a2444ba6d237516

Because the binutils does not yet support the 'Zfbfmin' extension, test case
zfbfmin_convert_run.c is invalidated with '#if 0' and '#endif'.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add 'Zfbfmin' extension.
* config/riscv/riscv-opts.h (MASK_ZFBFMIN): New.
(TARGET_ZFBFMIN): New.
* config/riscv/riscv.cc (riscv_output_move): Enable FMV.X.H, and FMV.H.X
for 'Zfbfmin' extension.
(riscv_excess_precision): Likewise.
* config/riscv/riscv.md (truncsfbf2): New.
(extendbfsf2):  New.
(*mov_hardfloat): Support for BFmode.
(*mov_softfloat): Disable for BFmode  when 'Zfbfmin' extension is
enabled.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin_call.c: New test.
* gcc.target/riscv/zfbfmin_comparisons.c: New test.
* gcc.target/riscv/zfbfmin_convert.c: New test.
* gcc.target/riscv/zfbfmin_convert_run.c: New test.
* gcc.target/riscv/zfbfmin_fsh_and_flh.c: New test.
So as with 1/2 in this series, it can't go into the trunk until the 
relevant spec reaches a frozen state.




+/* { dg-final { scan-assembler-times "fcvt.s.bf16" 14 } } */
+/* { dg-final { scan-assembler-times "fcvt.bf16.s" 10 } } */
So I think these have the potential to run afoul of unexpected matching 
of LTO bits.  Joern has an approach to tackle this problem that was 
recently pushed into the tree:



https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631485.html


The gist is wrap the assembly instruction inside a {\m \M} construct. 
So concretely


> +/* { dg-final { scan-assembler-times {\mfcvt.s.bf16\M} 14 } } */
> +/* { dg-final { scan-assembler-times {\mfcvt.bf16.s\M} 10 } } */

Similarly for the other new tests where you actually match an instruction.


Overall it looks pretty good.

Jeff


Re: [PATCH v6] RISC-V:Optimize the MASK opt generation

2023-09-29 Thread Jeff Law




On 9/29/23 12:05, Kito Cheng wrote:

Hi Jeff:

Could you take a look for this? RISC-V part is ok to me.

Thanks :)
Yea, I've got two things on my list to review from a doc standpoint. 
This is one of 'em.


jeff


Re: PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-29 Thread Jeff Law




On 9/19/23 10:06, Stefan Schulze Frielinghaus wrote:

Since this patch is sitting in the queue for quite some time and (more
importantly?) solves a bootstrap problem let me reiterate:

While writing the initial commit 7cdd0860949c6c3232e6cff1d7ca37bb5234074c
and the subsequent (potential) fix 41ef5a34161356817807be3a2e51fbdbe575ae85
I was not aware of the fact that the normal form of a CONST_INT,
representing an unsigned integer with fewer bits than in HOST_WIDE_INT,
is a sign-extended version of the unsigned integer.  This invariant is
checked in rtl.h where we have at line 2297:

 case CONST_INT:
   if (precision < HOST_BITS_PER_WIDE_INT)
 /* Nonzero BImodes are stored as STORE_FLAG_VALUE, which on many
targets is 1 rather than -1.  */
 gcc_checking_assert (INTVAL (x.first)
  == sext_hwi (INTVAL (x.first), precision)
  || (x.second == BImode && INTVAL (x.first) == 1));

This was pretty surprising and frankly speaking unintuitive to me which


At the heart of the matter (as I think Eric recently mentioned) is that 
CONST_INT objects do not have a mode.  So consider something like 
(const_int 32768) on a 32bit host.


In modes wider than HImode is does exactly what you'd expect.  But if 
you tried to use it in a HImode context, then it's actually invalid as 
it needs to be sign extended resulting in (const_int -32768).  That's 
what gen_int_mode handles for you -- you provide a mode for the context 
in which the constant will be used and you get back suitable RTL.


This actually mimicks what some targets do in hardware for 32 bit 
objects being held in 64bit registers.


Anyway, that's the background.  It is highly likely there are bugs where 
we fail to use gen_int_mode when we should or in some tests for when an 
optimization may or may not be applicable.  So I wouldn't be surprised 
at all if you were to grub around and find cases that aren't properly 
handled -- particularly of the missed-optimization variety.






Independent of why such a normal form was chosen, this patch restores
the normal form and solves the bootstrap problem for Loongarch.
Digging it out and looking at it momentarily.  It just keep falling off 
the end of the TODO list day-to-day.  Sorry for that, particularly since 
you're probably getting bugged by folks looking to restore builds/test 
results.


jeff


Re: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-29 Thread Jeff Law




On 8/10/23 07:04, Stefan Schulze Frielinghaus via Gcc-patches wrote:

In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
completely missed the fact that the normal form of a generated constant for a
mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
actual constant.  This even holds true for unsigned constants.

Fixed by masking out the upper bits for the incoming constant and sign
extending the resulting unsigned constant.

Bootstrapped and regtested on x64 and s390x.  Ok for mainline?


[ snip]



gcc/ChangeLog:

* combine.cc (simplify_compare_const): Properly handle unsigned
constants while narrowing comparison of memory and constants.

OK.



---
@@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
-   (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME (adjusted_code),
-   n);
+   (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode),
+   GET_RTX_NAME (adjusted_code), n);
}
  poly_int64 offset = (BYTES_BIG_ENDIAN
   ? 0
   : (GET_MODE_SIZE (int_mode)
  - GET_MODE_SIZE (narrow_mode_iter)));
  *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
- *pop1 = GEN_INT (n);
+ *pop1 = gen_int_mode (n, narrow_mode_iter);
  return adjusted_code;
FWIW, I should definitely have caught this hunk earlier -- we've gone 
the rounds in this same space (GEN_INT vs gen_int_mode) elsewhere.


Again, sorry for the long wait.

jeff


Re: [PATCH] check_GNU_style.py: Skip .md square bracket linting

2023-09-29 Thread Jeff Law




On 9/12/23 12:54, Patrick O'Neill wrote:

This testcase causes lots of false-positives for machine description files.

contrib/ChangeLog:

* check_GNU_style_lib.py: Skip machine description file bracket linting.

OK.  We probably need a completely separate checker for for .md files.


jeff


Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-29 Thread Jeff Law




On 9/12/23 04:13, Manolis Tsamis wrote:


+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+{
+  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+ return NULL;
+  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+ break;
+}
+
+  if (!use)
+return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);

So what if there's two uses of REG in INSN?  I don't think it's be
common at all, but probably better safe and reject than sorry, right? Or
is that case filtered out earlier?


If the REG is the same won't the definitions be the same even if that
REG appears multiple times in INSN?

Yes.


fold_offsets_1 should be able to handle the folding with multiple uses
of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
If there's no other issue here I assume we want to keep that as-is in
order to not reduce the propagation power (Which I assume is similar
to ree which uses the same logic).
OK.  I was primarily concerned about the folding and rewriting aspects. 
It probably can only show up on targets with LEA like instructions, and 
even on such targets it's probably rate.






+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction and return through MEM_OUT,
+   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
+   used as a base address and the offset accordingly.
+   All of the out pointers may be NULL in which case they will be ignored.  */
+bool
+get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
+HOST_WIDE_INT *offset_out)
+{
+  rtx set = single_set (insn);
+  rtx mem = NULL_RTX;
+
+  if (set != NULL_RTX)
+{
+  rtx src = SET_SRC (set);
+  rtx dest = SET_DEST (set);
+
+  /* Don't fold when we have unspec / volatile.  */
+  if (GET_CODE (src) == UNSPEC
+   || GET_CODE (src) == UNSPEC_VOLATILE
+   || GET_CODE (dest) == UNSPEC
+   || GET_CODE (dest) == UNSPEC_VOLATILE)
+ return false;
+
+  if (MEM_P (src))
+ mem = src;
+  else if (MEM_P (dest))
+ mem = dest;
+  else if ((GET_CODE (src) == SIGN_EXTEND
+ || GET_CODE (src) == ZERO_EXTEND)
+&& MEM_P (XEXP (src, 0)))

Note some architectures allow both a source and destination memory.  It
looks like your code will prefer the source operand in that case.
That's fine, just pointing it out.


Thanks for pointing that possibility out. I thought for a moment that
this would be a bug with multiple mentions of the address register.
but it should be fine due to:
/* Special case: A foldable memory store is not foldable if it
mentions DEST outside of the address calculation. */

ACK.

The other thing I keep pondering is autoincrement style addressing. 
Though I think at some point I convinced myself they weren't a problem. 
I think your checks only allow specific kinds of expressions for the 
memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the 
list of valid ops.




+
+  int max_iters = 5;
+  for (int i = 0; i < max_iters; i++)
+{
+  bool made_changes = false;
+  for (fold_info_map::iterator iter = fold_info->begin ();
+iter != fold_info->end (); ++iter)
+ {
+   fold_mem_info *info = (*iter).second;
+   if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
+ made_changes |= bitmap_ior_into (&cannot_fold_insns,
+  info->fold_insns);
+ }
+
+  if (!made_changes)
+ return true;
+}
+
+  return false;

So how was the magic value of "5" determined here?  In general we try
not to have magic #s like that and instead find a better way to control
iterations, falling back to a PARAM when all else fails.


It's indeed just a magic value :)
In general even two iterations of this should be quite rare. One
iteration will handle the majority of interesting cases. I chose 5 to
limit the worst case execution time for degenerate cases.

I'll be happy to change that to something better but I couldn't
figure out how "falling back to a PARAM when all else fails"  should
work here. Is there a similar example somewhere else in the code that
I can look at?
If we expect two iterations to be quite rare, then I doubt a param is 
really worth the effort.  I might suggest using a loop like


for (pass = 0; pass < flag_expensive_optimizations + 1; pass++)

No magic numbers :-)   We use similar loops elsewhere for this kind of 
scenario.



If you ever need to create a param, the procedure is to add the option 
to params.opt.  If you look in that file you'll see that you define a 
variable name in the specification that you can then 

Re: [PATCH v6] RISC-V:Optimize the MASK opt generation

2023-09-29 Thread Jeff Law




On 9/12/23 03:18, Feng Wang wrote:

New patch add some comments and update docs for this new usage.
---
Accoring to Kito's advice, using "MASK(name) Var(other_flag_name)"
to generate MASK and TARGET MACRO automatically.
This patch improve the MACRO generation of MASK_* and TARGET_*.
Due to the more and more riscv extensions are added, the default target_flag
is full.
Before this patch,if you want to add new MACRO,you should define the
MACRO in the riscv-opts.h manually.
After this patch, you just need two steps:
1.Define the new TargetVariable.
2.Define "MASK(name) Var(new_target_flag).

gcc/ChangeLog:

 * config/riscv/riscv-opts.h (MASK_ZICSR):
 (MASK_ZIFENCEI): Delete;
 (MASK_ZIHINTNTL):Ditto;
 (MASK_ZIHINTPAUSE):  Ditto;
 (TARGET_ZICSR):  Ditto;
 (TARGET_ZIFENCEI):   Ditto;
 (TARGET_ZIHINTNTL):  Ditto;
 (TARGET_ZIHINTPAUSE):Ditto;
 (MASK_ZAWRS):Ditto;
 (TARGET_ZAWRS):  Ditto;
 (MASK_ZBA):  Ditto;
 (MASK_ZBB):  Ditto;
 (MASK_ZBC):  Ditto;
 (MASK_ZBS):  Ditto;
 (TARGET_ZBA):Ditto;
 (TARGET_ZBB):Ditto;
 (TARGET_ZBC):Ditto;
 (TARGET_ZBS):Ditto;
 (MASK_ZFINX):Ditto;
 (MASK_ZDINX):Ditto;
 (MASK_ZHINX):Ditto;
 (MASK_ZHINXMIN): Ditto;
 (TARGET_ZFINX):  Ditto;
 (TARGET_ZDINX):  Ditto;
 (TARGET_ZHINX):  Ditto;
 (TARGET_ZHINXMIN):   Ditto;
 (MASK_ZBKB): Ditto;
 (MASK_ZBKC): Ditto;
 (MASK_ZBKX): Ditto;
 (MASK_ZKNE): Ditto;
 (MASK_ZKND): Ditto;
 (MASK_ZKNH): Ditto;
 (MASK_ZKR):  Ditto;
 (MASK_ZKSED):Ditto;
 (MASK_ZKSH): Ditto;
 (MASK_ZKT):  Ditto;
 (TARGET_ZBKB):   Ditto;
 (TARGET_ZBKC):   Ditto;
 (TARGET_ZBKX):   Ditto;
 (TARGET_ZKNE):   Ditto;
 (TARGET_ZKND):   Ditto;
 (TARGET_ZKNH):   Ditto;
 (TARGET_ZKR):Ditto;
 (TARGET_ZKSED):  Ditto;
 (TARGET_ZKSH):   Ditto;
 (TARGET_ZKT):Ditto;
 (MASK_ZTSO): Ditto;
 (TARGET_ZTSO):   Ditto;
 (MASK_VECTOR_ELEN_32):   Ditto;
 (MASK_VECTOR_ELEN_64):   Ditto;
 (MASK_VECTOR_ELEN_FP_32):Ditto;
 (MASK_VECTOR_ELEN_FP_64):Ditto;
 (MASK_VECTOR_ELEN_FP_16):Ditto;
 (TARGET_VECTOR_ELEN_32): Ditto;
 (TARGET_VECTOR_ELEN_64): Ditto;
 (TARGET_VECTOR_ELEN_FP_32):Ditto;
 (TARGET_VECTOR_ELEN_FP_64):Ditto;
 (TARGET_VECTOR_ELEN_FP_16):Ditto;
  (MASK_ZVBB):   Ditto;
 (MASK_ZVBC):   Ditto;
 (TARGET_ZVBB): Ditto;
 (TARGET_ZVBC): Ditto;
 (MASK_ZVKG):   Ditto;
 (MASK_ZVKNED): Ditto;
 (MASK_ZVKNHA): Ditto;
 (MASK_ZVKNHB): Ditto;
 (MASK_ZVKSED): Ditto;
 (MASK_ZVKSH):  Ditto;
 (MASK_ZVKN):   Ditto;
 (MASK_ZVKNC):  Ditto;
 (MASK_ZVKNG):  Ditto;
 (MASK_ZVKS):   Ditto;
 (MASK_ZVKSC):  Ditto;
 (MASK_ZVKSG):  Ditto;
 (MASK_ZVKT):   Ditto;
 (TARGET_ZVKG): Ditto;
 (TARGET_ZVKNED):   Ditto;
 (TARGET_ZVKNHA):   Ditto;
 (TARGET_ZVKNHB):   Ditto;
 (TARGET_ZVKSED):   Ditto;
 (TARGET_ZVKSH):Ditto;
 (TARGET_ZVKN): Ditto;
 (TARGET_ZVKNC):Ditto;
 (TARGET_ZVKNG):Ditto;
 (TARGET_ZVKS): Ditto;
 (TARGET_ZVKSC):Ditto;
 (TARGET_ZVKSG):Ditto;
 (TARGET_ZVKT): Ditto;
 (MASK_ZVL32B): Ditto;
 (MASK_ZVL64B): Ditto;
 (MASK_ZVL128B):Ditto;
 (MASK_ZVL256B):Ditto;
 (MASK_ZVL512B):Ditto;
 (MASK_ZVL1024B):   Ditto;
 (MASK_ZVL2048B):   Ditto;
 (MASK_ZVL4096B):   Ditto;
 (MASK_ZVL8192B):   Ditto;
 (MASK_ZVL16384B):  Ditto;
 (MASK_ZVL32768B):  Ditto;
 (MASK_ZVL65536B):  Ditto;
 (TARGET_ZVL32B):   Ditto;
 (TARGET_ZVL64B):   Ditto;
 (TARGET_ZVL128B):  Ditto;
 (TARGET_ZVL256B):  Ditto;
 (TARGET_ZVL512B):  Ditto;
 (TARGET_ZVL1024B): Ditto;
 (TARGET_ZVL2048B): Ditto;
 (TARGET_ZVL4096B): Ditto;
 (TARGET_ZVL8192B): Ditto;
 (TARGET_ZVL16384B):Ditto;
 (TARGET_ZVL32768B):Ditto;
 (TARGET_ZVL65536B):Ditto;
 (MASK_ZICBOZ): Ditto;
 (MASK_ZICBOM):

Re: [V2] RISC-V: Replace not + bitwise_imm with li + bitwise_not

2023-09-29 Thread Jeff Law




On 9/12/23 13:09, Jivan Hakobyan via Gcc-patches wrote:

In the case when we have C code like this

int foo (int a) {
return 100 & ~a;
}

GCC generates the following instruction sequence

foo:
  not a0,a0
  andia0,a0,100
  ret

This patch replaces that with this sequence
foo:
  li a5,100
  andn a0,a5,a0
  ret

The profitability comes from an out-of-order processor being able to
issue the "li a5, 100" at any time after it's fetched while "not a0, a0" has
to wait until any prior setter of a0 has reached completion.


gcc/ChangeLog:
 * config/riscv/bitmanip.md (*_not_const): New split
pattern.

gcc/testsuite/ChangeLog:
 * gcc.target/riscv/zbb-andn-orn-01.c: New test.
 * gcc.target/riscv/zbb-andn-orn-02.c: Likewise.

Thanks for the updated patch.  I've pushed this to the trunk.
jeff


Re: [PATCH] ssa_name_has_boolean_range vs signed-boolean:31 types

2023-09-29 Thread Jeff Law




On 9/5/23 01:46, Andrew Pinski wrote:

On Tue, Sep 5, 2023 at 12:09 AM Jeff Law via Gcc-patches
 wrote:




On 9/1/23 20:32, Andrew Pinski via Gcc-patches wrote:

This turns out to be a latent bug in ssa_name_has_boolean_range
where it would return true for all boolean types but all of the
uses of ssa_name_has_boolean_range was expecting 0/1 as the range
rather than [-1,0].
So when I fixed vector lower to do all comparisons in boolean_type
rather than still in the signed-boolean:31 type (to fix a different issue),
the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
was signed-boolean:31) had a range of [0,1] which broke down and sometimes
gave us -1/-2 as values rather than what we were expecting of -1/0.

This was the simpliest patch I found while testing.

We have another way of matching [0,1] range which we could use instead
of ssa_name_has_boolean_range except that uses only the global ranges
rather than the local range (during VRP).
I tried to clean this up slightly by using gimple_match_zero_one_valuedp
inside ssa_name_has_boolean_range but that failed because due to using
only the global ranges. I then tried to change get_nonzero_bits to use
the local ranges at the optimization time but that failed also because
we would remove branches to __builtin_unreachable during evrp and lose
information as we don't set the global ranges during evrp.

OK? Bootstrapped and tested on x86_64-linux-gnu.

   PR 110817

gcc/ChangeLog:

   * tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
   check for boolean type as they don't have "[0,1]" range.

gcc/testsuite/ChangeLog:

   * gcc.c-torture/execute/pr110817-1.c: New test.
   * gcc.c-torture/execute/pr110817-2.c: New test.
   * gcc.c-torture/execute/pr110817-3.c: New test.

I'm a bit surprised this didn't trigger any regressions.  Though maybe
all the existing testcases were capturing cases where non-boolean types
were known to have a 0/1 value.


Well except ssa_name_has_boolean_range will return true for `An
[unsigned] integral type with a single bit of precision` which the
normal boolean type for C is. So the only case where this makes a
difference is signed booleans. Vectors and Ada are the only 2 places I
know of which use signed booleans even.
Ah.  That would explain things well.  When we added that stuff we were 
mostly focused on cases that would have fallen under unsigned integral 
types with a single bit of precision.






The other uses of ssa_name_has_boolean_range are in DOM.

Right.  That was the original client of this code.


The first 2 uses of ssa_name_has_boolean_range use
build_one_cst/build_one_cst which is definitely wrong there. should
have been constant_boolean_node for N-bit signed boolean types.
ACK.  Feel free to fix these.  Consider it pre-approved if it passes 
testing.




jeff


Re: [PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-09-29 Thread Jeff Law




On 9/5/23 01:12, Andrew Pinski wrote:

On Mon, Sep 4, 2023 at 11:06 PM Jeff Law via Gcc-patches
 wrote:




On 9/1/23 11:30, Andrew Pinski via Gcc-patches wrote:

So it turns out there was a simplier way of starting to
improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
That was rewrite test_for_singularity to use range_op_handler
and Value_Range.

This patch implements that and

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

   * vr-values.cc (test_for_singularity): Add edge argument
   and rewrite using range_op_handler.
   (simplify_compare_using_range_pairs): Use Value_Range
   instead of value_range and update test_for_singularity call.

gcc/testsuite/ChangeLog:

   * gcc.dg/tree-ssa/vrp124.c: New test.
   * gcc.dg/tree-ssa/vrp125.c: New test.
---



diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 52ab4fe6109..2474e57ee90 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -904,69 +904,33 @@ simplify_using_ranges::simplify_bit_ops_using_ranges
   }

   /* We are comparing trees OP1 and OP2 using COND_CODE.  OP1 has
-   a known value range VR.
+   a known value range OP1_RANGE.

  If there is one and only one value which will satisfy the
-   conditional, then return that value.  Else return NULL.
-
-   If signed overflow must be undefined for the value to satisfy
-   the conditional, then set *STRICT_OVERFLOW_P to true.  */
+   conditional on the EDGE, then return that value.
+   Else return NULL.  */

   static tree
   test_for_singularity (enum tree_code cond_code, tree op1,
-   tree op2, const value_range *vr)
+   tree op2, const int_range_max &op1_range, bool edge)
   {
-  tree min = NULL;
-  tree max = NULL;
-
-  /* Extract minimum/maximum values which satisfy the conditional as it was
- written.  */
-  if (cond_code == LE_EXPR || cond_code == LT_EXPR)
+  /* This is already a singularity.  */
+  if (cond_code == NE_EXPR || cond_code == EQ_EXPR)
+return NULL;

I don't think this is necessarily the right thing to do for NE.

Consider if op1 has the range [0,1] and op2 has the value 1.  If the
code is NE, then we should be able to return a singularity of 0 since
that's the only value for x where x ne 1 is true given the range for x.


The "false" edge singularity is already known when NE is supplied. I
don't think changing it to the "true" edge singularity will be helpful
all of the time; preferring the value of 0 is a different story.
But that is a different patch and for a different location rather than
inside VRP; it should be in either isel or expand (more likely isel).
I forgot something critically important here.  Specifically, this code 
is supposed to be subsumed by Ranger.


The whole point of this routine is to rewrite to EQ/NE comparisons so 
that we can expose equivalences on the true/false arm of the conditional 
(and NE is just as important as EQ).  It's not really about preferring 
any particular value like 0.


The problem with this routine is it loses information after the code has 
been transformed.  Let's say we had a test x < 4.  If we assume that VRP 
is able to prove X doesn't have any of the values [MIN..3], then we can 
change the test to x == 4 and propagate 4 for any uses of X in the true arm.


But on the false ARM we end up with x != 4 which is a wider range than 
[5..MAX].  So if we were to instantiate a new Ranger after the 
transformation we'd lose information on the false arm.  More 
importantly, I think the transformation was bad for either SCEV or loop 
iteration analysis.


When Andrew, Aldy and I kicked this problem around the consensus was 
that Ranger should find and propagate the equivalence, including making 
it visible to jump threading.  That should make the rewriting totally 
unnecessary.


So the net is we really ought to be doing here is looking for cases 
where this code actually helps code generation and if it does we need to 
understand how/why as this code is supposed to go away.


Given you're already poking around in here, you might have such cases 
handy :-)  If you do, I'm sure Andrew, Aldy and I would love to see them.


jeff



Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-09-29 Thread Jeff Law




On 8/25/23 06:44, Li, Pan2 wrote:

Hi Jeff,


You might also peek at the RTL gcse/pre code which is also LCM based and
has the same class of problems.


I found a similar approach to take care of this in gcse.cc/pre_edge_insert with 
some comments as below.

   /* We can't insert anything on an abnormal and
critical edge, so we insert the insn at the end of
the previous block. There are several alternatives
detailed in Morgans book P277 (sec 10.5) for
handling this situation.  This one is easiest for
now.  */

if (eg->flags & EDGE_ABNORMAL)
   insert_insn_end_basic_block (index_map[j], bb);
else
   {
   insn = process_insert_insn (index_map[j]);
   insert_insn_on_edge (insn, eg);
   }

It looks the insert_insn_end_basic_block is designed to handle the ABNORMAL 
edge by inserting at end of previous block from the comments.
That's probably dead code at this point.  IIRC rth did further work in 
this space because inserting in the end of the block with the abnormal 
edge isn't semantically correct.


It's been 20+ years, but IIRC he adjusted the PRE bitmaps so that we 
never would need to do an insertion on an abnormal edge.  Search for 
EDGE_ABNORMAL in gcse.cc.


Jeff


Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-09-29 Thread Jeff Law




On 8/23/23 22:53, Li, Pan2 wrote:

Thanks Jeff.


That implies a save/restore pair around the call (possibly optimized so
that we minimize the number of save/restores).  I would have expected
x86 to already be doing this.  But maybe there's some ABI thing around
mmx vs x86 state that allows it to be avoided


Very similar to save/restore but optional.
If no static rounding mode instrinsic here, it is unnecessary to add 
save/restore
pair around the call. I bet mode-switching take care of this already.

But I still fail to see how this is relevant.

If we go back to the x86 mode switching case.  We ultimately still have 
to ensure that the caller does not impact the callee and the callee does 
not impact the caller.  That implies there must be a mechanism to 
save/restore the mode at call sites unless the x86 ABI somehow defines 
that problem away.


Conceptually the rounding mode is just a property.  The call, in effect, 
should demand a "normal" rounding mode and set the rounding mode to 
unknown if I understand how this is supposed to work.  If my 
understanding is wrong, then maybe that's where we should start -- with 
a good description of the problem ;-)


jeff


Re: [PATCH] RISC-V/testsuite: Fix ILP32 RVV failures from missing

2023-09-29 Thread Jeff Law




On 9/28/23 03:46, Maciej W. Rozycki wrote:

On Wed, 27 Sep 2023, Jeff Law wrote:


IMO this is one of those places where we should just be as normal as
possible.  So if the other big ports allow system headers then we should,
otherwise we should move everyone over to testing in some way we'll catch
these before commit.

Exactly.  I think the dance we've been doing with stdint-gcc.h is a bit silly,
but I haven't pushed on it at all.

No other port does anything similar.  When they need stdint.h, they include
it.  It does mean you have to have the appropriate headers installed for each
multilib configuration, but that's the way every other port handles this
problem.  There's no good reason I'm aware of for RISC-V to be different.


  I agree that using standard system headers where required is a reasonable
expectation, but I maintain that when using a non-default ABI/multilib by
an explicit request of a test case, it is the responsibility of our test
framework to verify that the chosen ABI/multilib is compatible with the
environment.  I think it should apply equally to all the tests whether
they are run, link, or compilation tests.
No other target enforces this kind of requirement and I would maintain 
that we shouldn't either.





  I think a requirement for a verifier to have headers for all the possible
ABIs/multilibs installed is unreasonable.  For a hosted target such as
Linux/*BSD/whatever it may yet be feasible.  For a bare-metal target it
may not be possible, and in particular such a target may possibly support
one specific ABI only and #error if an incompatible configuration is
detected.  This must not cause a test to FAIL, because GIGO.
If you're testing options that you don't have headers/libraries for, 
then that's a testsuite bug irrespective of bare metal vs linux.  It's 
been like that for about 30 years at this point.  I fail to see why 
RISC-V should be special in this regard.


Jeff


Re: [PATCH] RISC-V: Specify -mabi=lp64d in wredsum_vlmax.c testcase

2023-09-29 Thread Jeff Law




On 9/29/23 15:37, Patrick O'Neill wrote:

Resolves this error on rv32gcv:
cc1: error: ABI requires '-march=rv32'
compiler exited with status 1
FAIL: gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c   -O0  (test for excess 
errors)

Tested for regressions using glibc rv32gcv/rv64gcv multilib on
r14-4339-geaa41a6dc12.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: Specify -mabi=lp64d.

OK
jeff


Re: [ARC PATCH] Use rlc r0, 0 to implement scc_ltu (i.e. carry_flag ? 1 : 0)

2023-09-29 Thread Jeff Law




On 9/29/23 15:11, Roger Sayle wrote:


Hi Claudiu,

The patch looks sane. Have you run dejagnu test suite?


I've not yet managed to set up an emulator or compile the entire toolchain,
so my dejagnu results are only useful for catching (serious) problems in the
compile only tests:

 === gcc Summary ===

# of expected passes91875
# of unexpected failures23768
# of unexpected successes   23
# of expected failures  1038
# of unresolved testcases   19490
# of unsupported tests  3819
/home/roger/GCC/arc-linux/gcc/xgcc  version 14.0.0 20230828 (experimental)
(GCC)

If someone could double check there are no issues on real hardware that
would be great.  I'm not sure if ARC is one of the targets covered by Jeff
Law's compile farm?
It is :-)  Runs daily, about 4:30 am UTC.  So if the bits go in we'd 
have data within 24hrs.



Jeff



[committed] RISC-V: Fix INSN costing and more zicond tests

2023-09-29 Thread Jeff Law


So this ends up looking a lot like the bits that I had to revert several 
weeks ago :-)


The core issue we have is given an INSN the generic code will cost the 
SET_SRC and SET_DEST and sum them.  But that's far from ideal on a RISC 
target.


For a register destination, the cost can be determined be looking at 
just the SET_SRC.  Which is precisely what this patch does.  When the 
outer code is an INSN and we're presented with a SET we take one of two 
paths.


If the destination is a register, then we recurse just on the SET_SRC 
and we're done.  Otherwise we fall back to the existing code which sums 
the cost of the SET_SRC and SET_DEST.  That fallback path isn't great 
and probably could be further improved (just costing SET_DEST in that 
case is probably quite reasonable).


The difference between this version and the bits that slipped through by 
accident several weeks ago is that old version mis-used the API due to a 
thinko on my part.


This tightens up various zicond tests to avoid undesirable matching.

This has been tested on rv64gc -- the only difference it makes on the 
testsuite is the new tests (included in this patch) flip from failing to 
passing.


Pushed to the trunk.

Jeff
commit 44efc743acc01354b6b9eb1939aedfdcc44e71f3
Author: Xiao Zeng 
Date:   Fri Sep 29 16:29:02 2023 -0600

Fix INSN costing and more zicond tests

So this ends up looking a lot like the bits that I had to revert several 
weeks
ago :-)

The core issue we have is given an INSN the generic code will cost the 
SET_SRC
and SET_DEST and sum them.  But that's far from ideal on a RISC target.

For a register destination, the cost can be determined be looking at just 
the
SET_SRC.  Which is precisely what this patch does.  When the outer code is 
an
INSN and we're presented with a SET we take one of two paths.

If the destination is a register, then we recurse just on the SET_SRC and 
we're
done.  Otherwise we fall back to the existing code which sums the cost of 
the
SET_SRC and SET_DEST.  That fallback path isn't great and probably could be
further improved (just costing SET_DEST in that case is probably quite
reasonable).

The difference between this version and the bits that slipped through by
accident several weeks ago is that old version mis-used the API due to a 
thinko
on my part.

This tightens up various zicond tests to avoid undesirable matching.

This has been tested on rv64gc -- the only difference it makes on the 
testsuite
is the new tests (included in this patch) flip from failing to passing.

Pushed to the trunk.

gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Better handle costing
SETs when the outer code is INSN.

gcc/testsuite
* gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c: New 
test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_0_imm.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c:
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_compare_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_0_imm.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c:
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics.c: Tighten expected 
regexp.
* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: 
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: 
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: 
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: 
Likewise.
* gcc.target/riscv/zicond-xor-01.c: Likewise.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6e7a719e7a0..d5446b63dbf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2768,6 +2768,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
 
   switch (GET_CODE (x))
 {
+case SET:
+  /* If we are called for an INSN that's a simple set of a register,
+then cost based on the SET_SRC alone.  */
+  if (outer_code == INSN && REG_P (SET_DEST (x)))
+   {
+ riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed);
+ return true;
+   }
+
+  /* Ot

Re: RISC-V: Add type attribute in *_not_const pattern

2023-09-29 Thread Jeff Law




On 9/29/23 16:53, Jivan Hakobyan wrote:

After f088b768d01a commit riscv_sched_variable_issue function requires
that all insns should have a type attribute.

When I sent my previous patch there was no such limitation.
Currently, I have regressions on my tests. This patch fixes them.

gcc/ChangeLog:
         * config/riscv/bitmanip.md (*_not_const): Added 
type attribute
I should have remembered that we've got a policy that all insns need a 
type.  My bad.


Pushed to the trunk,
jeff


Re: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-10-01 Thread Jeff Law




On 10/1/23 08:26, Stefan Schulze Frielinghaus wrote:


FWIW, I should definitely have caught this hunk earlier -- we've gone the
rounds in this same space (GEN_INT vs gen_int_mode) elsewhere.

Again, sorry for the long wait.

jeff


No worries at all.  At least I have learned something new :)

Thanks Jeff and Eric for clarification.  This matches with my intuition,
now, so I've pushed the patch.
BTW, in a completely different context I need to do some testing on the 
alpha port, which hasn't bootstrapped in a couple months.  I'd been 
assuming the failure was completely due to ongoing changes I'm making in 
the infrastructure I used to bootstrap QEMU emulated systems.


As it turns out, the infrastructure changes and this minor combine issue 
were both playing a role!


Jeff


Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread Jeff Law




On 10/2/23 11:20, David Edelsohn wrote:

Wang,

The AWK portions of this patch broke bootstrap on AIX.

Also, the AWK portions are common code, not RISC-V specific.  I don't 
see anywhere that the common portions of the patch were reviewed or 
approved by anyone with authority to approve the changes to the AWK files.


This patch should not have been committed without approval by a reviewer 
with authority for that portion of the compiler and should have been 
tested on targets other than RISC-V if common parts of the compiler were 
changed.
I acked the generic bits.  So the lack of testing on another target is 
on me.


jeff


Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread Jeff Law




On 10/2/23 12:03, David Edelsohn wrote:
On Mon, Oct 2, 2023 at 1:59 PM Jeff Law <mailto:jeffreya...@gmail.com>> wrote:




On 10/2/23 11:20, David Edelsohn wrote:
 > Wang,
 >
 > The AWK portions of this patch broke bootstrap on AIX.
 >
 > Also, the AWK portions are common code, not RISC-V specific.  I
don't
 > see anywhere that the common portions of the patch were reviewed or
 > approved by anyone with authority to approve the changes to the
AWK files.
 >
 > This patch should not have been committed without approval by a
reviewer
 > with authority for that portion of the compiler and should have been
 > tested on targets other than RISC-V if common parts of the
compiler were
 > changed.
I acked the generic bits.  So the lack of testing on another target is
on me.


Hi, Jeff

Sorry. I didn't see a comment from a global reviewer in the V3 thread.

NP.



I am using Gawk on AIX.  After the change, I see a parse error from 
gawk.  I'm rebuilding with a checkout just before the change to confirm 
that it was the source of the error, and it seems to be past that 
failure location.  I didn't keep the exact error.  Once I get past this 
build cycle, I'll reproduce it.
I think there's already a patch circulating which fixes this.  It broke 
at least one other platform.  Hopefully it'll all be sorted out today.



jeff


Re: [committed] Require target lra in gcc.dg/pr108095.c

2023-10-02 Thread Jeff Law




On 10/2/23 14:42, John David Anglin wrote:

Committed to trunk.

Dave
---

Require target lra in gcc.dg/pr108095.c

2023-10-02  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/pr108095.c: Require target lra.

Thanks.  I already had this in my local tree.

jeff


Re: PING: PR rtl-optimization/110701

2023-10-03 Thread Jeff Law




On 10/3/23 09:55, Roger Sayle wrote:

There are a small handful of middle-end maintainers/reviewers that

understand and appreciate the difference between the RTL statements:

(set (subreg:HI (reg:SI x)) (reg:HI y))

and

(set (strict_lowpart:HI (reg:SI x)) (reg:HI y))

If one (or more) of them could please take a look at

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625532.html 



I’d very much appreciate it (one less wrong-code regression).

This definitely fell through the cracks.

The patch is OK for the trunk.

Thanks again,
jeff


Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-03 Thread Jeff Law




On 10/2/23 20:38, Kito Cheng wrote:

Proposed fix, and verified with "mawk" and "gawk -P" (gawk with posix
mode) on my linux also some other report it work on freebsd, just wait
review :)

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html

OK
jeff


Re: mvconst_internal splitter gated with !@ira_in_progess (was Re: Yet Another IRA question)

2023-10-03 Thread Jeff Law




On 10/2/23 18:12, Vineet Gupta wrote:



On 9/28/23 12:52, Vineet Gupta wrote:


On 9/28/23 05:53, Jeff Law wrote:
Vineet -- assuming Vlad's patch goes in, the other obvious candidate 
for this would be the mvconst_internal define_insn_and_split where 
we'd probably want to reject the insn as a whole once IRA has started. 


Good point, although currently we've kind of papered over it with 
-fsched-pressure, but I'm sure there are way more cases that this will 
improve still.
I will spin up a full multilib test with that, hopefully with no 
fallout :-)


I have the results finally. This is testsuite neutral. Same results 
before/after


    = Summary of gcc testsuite =
     | # of unexpected case / # of unique 
unexpected case

     |  gcc |  g++ | gfortran |
    rv32imac/  ilp32/ medlow |  168 /    70 |   13 / 6 |   67 /    12 |
  rv32imafdc/ ilp32d/ medlow |  168 /    70 |   13 / 6 |   24 / 4 |
    rv64imac/   lp64/ medlow |  161 /    70 |    9 / 3 |   67 /    12 |
  rv64imafdc/  lp64d/ medlow |  160 /    69 |    5 / 2 |    6 / 1 |

But the SPEC runs are not affected at all, if anything it seems to be 
way under noise for 5 workloads.

Not sure if we still want to add the gate, your call
I'd tend to go forward with it as I think it'll help -O1 builds as those 
don't inherently turn on the scheduler.


Jeff


Re: [PATCH] RISC-V: Unescape chars in pr111566.f90 test

2023-10-03 Thread Jeff Law




On 10/3/23 14:19, Patrick O'Neill wrote:

Some characters are escaped which causes the testcase to fail. This
patch restores the original characters.

Tested for regressions using multilib rv32gcv-ilp32d, rv64gcv-lp64d.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111566.f90: Restore escaped
characters.

LOL.  Yea, this is OK.

jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-04 Thread Jeff Law




On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

[ ... ]
So the data for Alpha was -- no change.  I also put the patch into my 
tester in the hopes that some target, any target would show a difference 
in test results.  It's churned about halfway through the embedded 
targets so far with no regressions.


Given the data in your followup message, this clearly looks valuable and 
so far we don't have any evidence Kenner's old change is useful or 
necessary anymore.


I'm not (yet) comfortable committing this patch, I think the easy 
avenues to getting a case where it's needed are not likely to prove 
fruitful.  So next steps here are for me to spend a bit more time trying 
to understand what cases Kenner was trying to handle.


Jeff


Re: [PATCH] RISC-V: Remove @ of vec_series

2023-10-04 Thread Jeff Law




On 10/4/23 09:06, Robin Dapp wrote:

I'm currently in the process of removing some unused @s.
This is OK.
Agreed.  And if you or Juzhe have other @ cases that are unused, such 
changes should be considered pre-approved.


Jeff


Re: [RFC gcc13 backport 0/3] Add Ztso atomic mappings

2023-10-04 Thread Jeff Law




On 10/3/23 16:26, Patrick O'Neill wrote:

I vaugely recall some discussion about backporting the Ztso mappings
along with the RVWMO mappings. Now that the RVWMO mappings have been
backported for 13.3, is there interest in also backporting the Ztso
mappings?

Tested using for regressions using rv32gc/rv64gc glibc.

Jeff Law (1):
   [RISCV][committed] Remove spurious newline in ztso sequence

Patrick O'Neill (2):
   RISC-V: Add Ztso atomic mappings
   RISC-V: Specify -mabi for ztso testcases
I recall discussing Ztso mappings, but not the final conclusion.  I 
think the final decision comes down to the motivation behind the changes.


If they're primarily optimized sequences utilizing the stronger ordering 
guarantees from Ztso, then it's probably not a good candidate for 
gcc-13.  If the primary motivation is to make it easier to port code 
from targets with stronger memory models (ie x86), then the Ztso work is 
a reasonable candidate for backporting.


Jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-04 Thread Jeff Law




On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w  a6,a1 <-- this goes away
beq a1,zero,.L4
li  a5,0
li  a0,0
.L3:
addwa4,a2,a5
addwa5,a3,a5
addwa0,a4,a0
bltua5,a6,.L3
ret
.L4:
li  a0,0
ret
```

Signed-off-by: Vineet Gupta 
Co-developed-by: Robin Dapp 
---
  gcc/expr.cc   |  7 ---
  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
  2 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
So mcore-elf is showing something interesting.  With that hunk of Kenner 
code removed, it actually has a few failing tests that flip to passes.



Tests that now work, but didn't before (11 tests):

mcore-sim: gcc.c-torture/execute/pr109986.c   -O1  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O3 -g  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -Os  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test


So that's a really interesting result.   If further analysis doesn't 
point the finger at a simulator bug or something like that, then we'll 
have strong evidence that Kenner's change is actively harmful from a 
correctness standpoint.  That would change the calculus here significantly.


Sadly, mcore-elf doesn't have a working gdb IIRC (don't ask how I know 
that!), so I'm going to have to analyze this further with less efficient 
techniques.  BUt definitely interesting news/results.


Jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-04 Thread Jeff Law




On 10/4/23 12:14, Vineet Gupta wrote:

On 10/4/23 10:59, Jeff Law wrote:



On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit 
registers,

meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various 
ideas

floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
    5069803972 ("expand_expr, case CONVERT_EXPR .. clear the 
promotion flag")


This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique 
unexpected case
|   |  gcc |  g++ | 
gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 / 2 |   72 
/    12 |

|    lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w    a6,a1 <-- this goes away
beq    a1,zero,.L4
li    a5,0
li    a0,0
.L3:
addw    a4,a2,a5
addw    a5,a3,a5
addw    a0,a4,a0
bltu    a5,a6,.L3
ret
.L4:
li    a0,0
ret
```

Signed-off-by: Vineet Gupta 
Co-developed-by: Robin Dapp 
---
  gcc/expr.cc   |  7 ---
  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
  2 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
So mcore-elf is showing something interesting.  With that hunk of 
Kenner code removed, it actually has a few failing tests that flip to 
passes.



Tests that now work, but didn't before (11 tests):

mcore-sim: gcc.c-torture/execute/pr109986.c   -O1  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test

mcore-sim: gcc.c-torture/execute/pr109986.c   -O3 -g  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -Os  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test

mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test


So that's a really interesting result.   If further analysis doesn't 
point the finger at a simulator bug or something like that, then we'll 
have strong evidence that Kenner's change is actively harmful from a 
correctness standpoint.  That would change the calculus here 
significantly.


Sadly, mcore-elf doesn't have a working gdb IIRC (don't ask how I know 
that!), so I'm going to have to analyze this further with less 
efficient techniques.  BUt definitely interesting news/results.


Really interesting. Can't thank you enough for spending time in chasing 
this down.
Turns out this is a simulator bug.  The simulator assumed that "long" 
types were 32 bits and implemented sextb as x <<= 24; x >>= 24;  This is 
a fairly common error.  If "long" is 64 bits on the host, then that 
approach doesn't work well.




Jeff


Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-04 Thread Jeff Law




On 10/3/23 05:45, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 



So I was ready to ACK, but realized there weren't any testresults for a 
primary platform mentioned.  So I ran this on x86.


It's triggering one regression (code quality).

Specifically gcc.target/i386/pr52146.c

The f-m-o code is slightly worse than without f-m-o.

Without f-m-o we get this:

   9  B88000E0  movl$-18874240, %eax
   9  FE
  10 0005 67C7  movl$0, (%eax)
  10  00
  11 000c C3ret

With f-m-o we get this:

   9  B800  movl$0, %eax
   9  00
  10 0005 67C78080  movl$0, -18874240(%eax)
  10  00E0FE00
  10  00
  11 0010 C3ret


The key being that we don't get rid of the original move instruction, 
nor does the original move instruction get smaller due to simplification 
of its constant.  Additionally, the memory store gets larger.  The net 
is a 4 byte increase in code size.



This is probably a fairly rare scenario and the original bug report was 
for a correctness issue in using addresses in the range 
0x8000..0x in x32.  So I wouldn't lose any sleep if we 
adjusted the test to pass -fno-fold-mem-offsets.  But before doing that 
I wanted to give you the chance to ponder if this is something you'd 
prefer to improve in f-m-o itself.   At some level if the base register 
collapses down to 0, then we could take the offset as a constant address 
and try to recognize that form.  If that fails, then just consider the 
change unprofitable rather than trying to recognize it as reg+d.


Anyway, waiting to hear your thoughts...

If we do a V7, then we need to fix one spelling issue that shows up in 
several places (if we go with the v6 we can just fix it prior to 
committing).  Specifically in several places we need to replace 
"recognised" with "recognized".



jeff


Re: [PATCH] RISC-V: xfail gcc.dg/pr90263.c for riscv_v

2023-10-04 Thread Jeff Law




On 10/4/23 15:57, Patrick O'Neill wrote:

Since r14-4358-g9464e72bcc9 riscv_v targets use vector instructions to
perform a memcpy. We no longer expect memcpy for riscv_v targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90263.c: xfail riscv_v targets.
Or rather than XFAIL skip the test?  XFAIL kind of implies its something 
we'd like to fix.  But in this case we don't want a memcpy call as the 
inlined vector implementation is almost certainly better.


You might be able to use riscv_v in a dg-skip-if directive.  Not sure.


Jeff


Re: [PATCH] RISC-V: xfail gcc.dg/pr90263.c for riscv_v

2023-10-04 Thread Jeff Law




On 10/4/23 16:21, Patrick O'Neill wrote:


On 10/4/23 15:14, Jeff Law wrote:



On 10/4/23 15:57, Patrick O'Neill wrote:

Since r14-4358-g9464e72bcc9 riscv_v targets use vector instructions to
perform a memcpy. We no longer expect memcpy for riscv_v targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90263.c: xfail riscv_v targets.
Or rather than XFAIL skip the test?  XFAIL kind of implies its 
something we'd like to fix.  But in this case we don't want a memcpy 
call as the inlined vector implementation is almost certainly better.

Ah. Since XFAIL notifies us if a test starts passing (via xpass) I
thought it would help us ensure the test doesn't start passing
on riscv_v. I didn't know it implied something needed to be fixed.

I'll rework it to skip riscv_v targets.

Hopefully that works.

If you wanted a test to verify that we don't go backwards and start 
emitting a memcpy, you can set up a test like


// dg-directives
#include "pr90263.c"

// dg directives for scanning

Where the scanning verifies that we don't have a call to memcpy.  The 
kind of neat thing here is the dg directives in the included file are 
ignored, so you can use the same test sources in multiple ways.


Given this is kindof specific to risc-v, it might make more sense in the 
riscv directory.


Jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-04 Thread Jeff Law




On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.
So I think Kenner's code is trying to prevent having a value in a SUBREG 
that is inconsistent with the SUBREG_PROMOTED* flag bits.  But I think 
it's been unnecessary since Matz's rewrite in 2009.


The key change in Matz's work is that when the target is a promoted 
subreg expression we pass NULL down to expand_expr_real_2 which forces 
that routine to expand operand 0 (the incoming PARM_DECL object) into a 
temporary object (in this case another promoted subreg expression). 
It's that temporary object that Kenner's change clears the promoted 
subreg state on.


After expand_expr_real_returns, we call convert_move to move the data 
from that temporary object into the actual destination.  That routine 
(and its children) seem to be doing the right things WRT promoted subregs.



Prior to Matz's change I'm pretty sure we would do expansion directly 
into the destination (we'd generate appropriate tree nodes, then 
ultimately pass things off to store_expr which would pass down the final 
target to expand_expr).  Meaning that if the signedness differed then we 
potentially needed to reset the subreg promotion state on the 
destination object to ensure we were conservatively correct, hence 
Kenner's change.



I'm largely convinced we can drop this code.  I'm going to give it a few 
days to run some of the emulated native targets (they're long running 
jobs, so they only fire once a week).But my plan is to move forward 
with the patch relatively soon.


jeff


Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-05 Thread Jeff Law




On 10/3/23 05:45, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations



+
+/* If INSN is a root memory instruction then compute a potentially new offset
+   for it and test if the resulting instruction is valid.  */
+static void
+do_check_validity (rtx_insn *insn, fold_mem_info *info)
+{
+  rtx mem, reg;
+  HOST_WIDE_INT cur_offset;
+  if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
+return;
+
+  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+
+  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
+  int icode = INSN_CODE (insn);
+  rtx mem_addr = XEXP (mem, 0);
+  machine_mode mode = GET_MODE (mem_addr);
+  if (new_offset != 0)
+XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
+  else
+XEXP (mem, 0) = reg;
+
+  bool illegal = insn_invalid_p (insn, false)
+|| !memory_address_addr_space_p (mode, XEXP (mem, 0),
+ MEM_ADDR_SPACE (mem));
+
+  /* Restore the instruction.  */
+  XEXP (mem, 0) = mem_addr;
+  INSN_CODE (insn) = icode;
+
+  if (illegal)
+bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
+  else
+bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
+}
+
So overnight testing with the latest version of your patch triggered a 
fault on the sh3-linux-gnu target with this code at -O2:



enum
{
  _ISspace = ((5) < 8 ? ((1 << (5)) << 8) : ((1 << (5)) >> 8)),
};
extern const unsigned short int **__ctype_b_loc (void)
 __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
void
read_alias_file (const char *fname, int fname_len)
{
  char buf[400];
  char *cp;
  cp = buf;
  while (((*__ctype_b_loc ())[(int) (((unsigned char) cp[0]))] & (unsigned 
short int) _ISspace))
   ++cp;
}




The problem is we need to clear the INSN_CODE before we call recog.  In 
this specific case we had (mem (plus (reg) (offset)) after f-m-o does 
its job, the offset went to zero so we changed the structure of the RTL 
to (mem (reg)).  But we had the old INSN_CODE still in place which 
caused us to reference operands that no longer exist.


A simple INSN_CODE (insn) = -1 before calling_insn_invalid_p is the 
right fix.


jeff


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-05 Thread Jeff Law




On 10/5/23 08:56, Richard Kenner wrote:

At that particular time I think Kenner was mostly focused on the alpha
and ppc ports, but I think he was also still poking around with romp and
a29k.  I think romp is an unlikely target for this because it didn't
promote modes and it wasn't even building for several months
(April->late July).


Obviously, I have no recollection of that change at all.

That's the assumption I made :-)



In July of 1994, I don't believe I was actively working on much in the
way of ports, though I could be misremembering.  My guess is that this
change was to fix some bug, but I'm a bit mystified why I'd have
batched so many different changes together in one ChangeLog entry like
that.  I think you're correct that it was most likely the Alpha that
showed the bug.
The alpha was a combination of my memory and reviewing patches/email 
messages in that time span.


I agree this was almost certainly meant to be a bugfix and I suspect the 
bug was expanding directly into a promoted subreg target and ending up 
with inconsistency between the value and the promoted subreg state bits.


Jeff




Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-05 Thread Jeff Law




On 10/5/23 07:33, Robin Dapp wrote:

So I think Kenner's code is trying to prevent having a value in a
SUBREG that is inconsistent with the SUBREG_PROMOTED* flag bits.  But
I think it's been unnecessary since Matz's rewrite in 2009.


I couldn't really tell what the rewrite does entirely so I tried creating
a case where we would require the SUBREG_PROMOTED_VAR but couldn't come
up with any.  At least for the most common path through expr I believe
I know why:

So our case is when we have an SI subreg from a DI reg that is originally
a sign-extended SI.  Then we NOP-convert the SI subreg from signed to
unsigned.  We only perform implicit sign extensions therefore we can
omit the implicit zero-extension case here.
Right.  The extension into bits 32..63, whether it be zero or sign 
extension is essentially a don't care.  It's there because of 
PROMOTE_MODE forcing most operations to 64 bits to match the hardware, 
even if the upper 32 bits aren't ever relevant.







The way the result of the signed->unsigned conversion is used determines
whether we can use SUBREG_PROMOTED_VAR.  There are two possibilities
(1) and (2).

  void foo (int bar)
  {
 unsigned int u = (unsigned int) bar;


(1) unsigned long long ul = (unsigned long long) u;

As long as the result is used unsigned, we will always perform a zero
extension no matter the "Kenner hunk" (because whether the subreg has
SRP_SIGNED or !SUBREG_PROMOTED_VAR does not change the need for a
zero_extend).

Right.




(2) long long l = (long long) u;

SUBREG_PROMOTED is checked by the following in convert_move:

   scalar_int_mode to_int_mode;
   if (GET_CODE (from) == SUBREG
   && SUBREG_PROMOTED_VAR_P (from)
   && is_a  (to_mode, &to_int_mode)
   && (GET_MODE_PRECISION (subreg_promoted_mode (from))
  >= GET_MODE_PRECISION (to_int_mode))
   && SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp))

The SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp) is decisive
as far as I can tell.
Right.   We have already ensured the modes are either the same size or 
the PARM_DECL's mode is wider than the local VAR_DECL's mode.  So the 
check that FROM has the same promotion property as UNSIGNEDP is going to 
be decisive.


  unsignedp = 1 comes from treeop0 so our
Correct.  It comes from the TREE_TYPE (treeop0) where treeop0 is the 
incoming PARM_DECL.



"from" (i.e. unsigned int u).
With the "Kenner hunk" SUBREG_PROMOTED_VAR is unset, so we don't
strip the extension.  Without it, SUBREG_PROMOTED_VAR () == SRP_SIGNED
which is != unsignedp, so no stripping either.
Correct.  The Kenner hunk wipes SUBREG_PROMOTED_VAR, meaning the 
promotion state of the object is unknown.




Now there are several other paths that would need auditing as well
but at least this one is safe.  An interesting test target would be
a backend that does implicit zero extensions but as we haven't seen
fallout so far chances to find a trigger are slim.
I did some testing of the other paths yesterday, but didn't include them 
in my message.


First, if the PARM_DECL is a narrower type than the local VAR_DECL, then 
the path we're considering changing doesn't get used because the modes 
have different sizes.   Thus we need not worry about this case.


If the PARM_DECL is wider than the local VAR_DECL, then we downsize to 
the same size as the VAR_DECL via a SUBREG and it behaves the same as 
the Vineet's original when the sizes are the same, but they differ in 
signedness.  So if we conclude the same size cases are OK, then the case 
where the PARM_DECL is wider than the VAR_DECL, we're going to be safe 
as well.



Jeff


Re: [PATCH v2] RISC-V: Test memcpy inlined on riscv_v

2023-10-05 Thread Jeff Law




On 10/4/23 16:55, Patrick O'Neill wrote:

Since r14-4358-g9464e72bcc9 riscv_v targets use vector instructions to
perform a memcpy. We no longer expect memcpy for riscv_v targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90263.c: Skip riscv_v targets.
* gcc.target/riscv/rvv/base/pr90263.c: New test.

OK
jeff


Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-10-05 Thread Jeff Law




On 10/5/23 16:17, Richard Sandiford wrote:

"Jose E. Marchesi"  writes:

ping


I don't know this code very well, and have AFAIR haven't worked
with an assembler that requires external declarations, but since
it's at a second ping :)




ping


[Differences from V1:
- Prototype for call_from_call_insn moved before comment block.
- Reuse the `call' flag for SYMBOL_REF_LIBCALL.
- Fallback to check REG_CALL_DECL in non-direct calls.
- New test to check correct behavior for non-direct calls.]

There are many places in GCC where alternative local sequences are
tried in order to determine what is the cheapest or best alternative
to use in the current target.  When any of these sequences involve a
libcall, the current implementation of emit_library_call_value_1
introduce a side-effect consisting on emitting an external declaration
for the funcall (such as __divdi3) which is thus emitted even if the
sequence that does the libcall is not retained.

This is problematic in targets such as BPF, because the kernel loader
chokes on the spurious symbol __divdi3 and makes the resulting BPF
object unloadable.  Note that BPF objects are not linked before being
loaded.

This patch changes emit_library_call_value_1 to mark the target
SYMBOL_REF as a libcall.  Then, the emission of the external
declaration is done in the first loop of final.cc:shorten_branches.
This happens only if the corresponding sequence has been kept.

Regtested in x86_64-linux-gnu.
Tested with host x86_64-linux-gnu with target bpf-unknown-none.


I'm not sure that shorten_branches is a natural place to do this.
It isn't something that would normally emit asm text.

Would it be OK to emit the declaration at the same point as for decls,
which IIUC is process_pending_assemble_externals?  If so, how about
making assemble_external_libcall add the symbol to a list when
!SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
directly?  assemble_external_libcall could then also call get_identifier
on the name (perhaps after calling strip_name_encoding -- can't
remember whether assemble_external_libcall sees the encoded or
unencoded name).

All being well, the call to get_identifier should cause
assemble_name_resolve to record when the name is used, via
TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
go through the list of libcalls recorded by assemble_external_libcall
and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.

Not super elegant, but it seems to fit within the existing scheme.
And I don't there should be any problem with using get_identifier
for libcalls, since it isn't valid to use libcall names for other
types of symbol.
It might be worth looking at the PA port.  There's a fair number of 
similarities.  Essentially it has to "import" every external symbol, 
including libcall symbols.   A symbol which has been exported, can not 
then also be imported as that can change the underlying type.


IIRC we implemented all the magic to handle this inside 
ASM_OUTPUT_EXTERNAL and ASM_OUTPUT_EXTERNAL_LIBCALL with a bit of magic 
to handle "plabel" symbols which we have to defer until the end of the 
compilation unit via pa_end_file or whatever it's called since some 
cases of symbol marking couldn't be determined until we were actually 
emitting code.


jeff



Re: [PATCH] RISC-V: Use stdint-gcc.h in rvv testsuite

2023-10-05 Thread Jeff Law




On 10/3/23 14:58, Patrick O'Neill wrote:

On 10/2/23 06:57, Kito Cheng wrote:


On Tue, Sep 26, 2023 at 10:59 AM Patrick O'Neill  wrote:

stdint.h can be replaced with stdint-gcc.h to resolve some missing
system headers in non-multilib installations.

Tested using glibc rv32gcv and rv64gcv on r14-4258-gc9837443075.

gcc/ChangeLog:

  * config/riscv/riscv_vector.h (__RISCV_VECTOR_H): Replace
  stdint.h with stdint-gcc.h

I don't think this will work when testing an installed compiler which I do.

Thanks,
Andrew

In the riscv target testsuite (gcc.target/riscv) all occurrences of
#include  are currently constrained to the rvv/ subdirectory.
All non-vector tests use #include  rather than
#include . Have you encountered any issues when testing
installations with non-vector tests?

I think the concern is to replace stdint.h with stdint-gcc.h for riscv_vector.h,
that means users MAY include stdint-gcc.h *AND* stdint.h, stdint.h the later
one generally is provided by libc, and stdint-gcc.h typically are not included.

Other than the changes in "riscv_vector.h", everything else looks fine to me.


Ah okay, I'll retest and send a v2 that omits the riscv_vector.h change. 
Thanks.  The general consensus Tuesday was for this patch to go forward, 
even though we've got additional issues in this space that need to be 
addressed.


jeff


Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-06 Thread Jeff Law




On 10/6/23 08:17, Manolis Tsamis wrote:
SNIP

So I was ready to ACK, but realized there weren't any testresults for a
primary platform mentioned.  So I ran this on x86.

It's triggering one regression (code quality).

Specifically gcc.target/i386/pr52146.c

The f-m-o code is slightly worse than without f-m-o.

Without f-m-o we get this:

 9  B88000E0  movl$-18874240, %eax
 9  FE
10 0005 67C7  movl$0, (%eax)
10  00
11 000c C3ret

With f-m-o we get this:

 9  B800  movl$0, %eax
 9  00
10 0005 67C78080  movl$0, -18874240(%eax)
10  00E0FE00
10  00
11 0010 C3ret


The key being that we don't get rid of the original move instruction,
nor does the original move instruction get smaller due to simplification
of its constant.  Additionally, the memory store gets larger.  The net
is a 4 byte increase in code size.


Yes, this case is not good for f-m-o. In theory there could be a cost
calculation step that tries to estimate the benefit of a
transformation, but given that f-m-o cannot transform code in a way
that we have big regressions it's unclear to me whether complicating
the code is worth it. At least if we can solve the issues in other
ways (also see discussion below).



This is probably a fairly rare scenario and the original bug report was
for a correctness issue in using addresses in the range
0x8000..0x in x32.  So I wouldn't lose any sleep if we
adjusted the test to pass -fno-fold-mem-offsets.  But before doing that
I wanted to give you the chance to ponder if this is something you'd
prefer to improve in f-m-o itself.   At some level if the base register
collapses down to 0, then we could take the offset as a constant address
and try to recognize that form.  If that fails, then just consider the
change unprofitable rather than trying to recognize it as reg+d.

Anyway, waiting to hear your thoughts...


Yes, this testcase has been bugging me too, I have brought that up in
previous iterations as well.

I must have missed that in the earlier discussion.


I'm also not sure whether this is a code quality or a correctness
issue? From what I understand from the relevant ticket, if we emit
movl $0, -18874240 then it's wrong code?
It's a code quality issue as long as we don't transform the code into 
movl $0, -18874240, at which point it would become a correctness issue.





With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.
It's a chain of insns that produce an address and use it in the memory 
reference.  We essentially changed the first insn in the chain from movl 
-18874240, %eax into movl 0, %eax.  So we'd have to someone note that 
the base register in the memory reference has the value zero in the 
chain of instructions.  That may or may not be reasonable to do.




One thought that I've had is that this started being an issue on x86
when I enabled folding of mv REG, INT in addition to the existing ADD
REG, REG, INT. The idea was that a move will be folded to mv REG, 0
and on targets that we have a zero register that can be beneficial for
a number of reasons... but on x86 we don't have a zero register so the
benefit is much more limited anyway. So maybe we could disable folding
of moves on targets that don't have a zero register? That would solve
the issue and I believe it also makes sense in general. If so, is
there a way to query wether the target has such register?
We don't have a generalized query to do that.  You might be able to ask 
what the cost to load 0 into a register is, but many targets 
artificially decrease that value.


You could use the costing model to cost the entire sequence 
before/after.  There's an interface to walk a sequence and return a 
cost.  In the case of f-m-o the insns are part of the larger chain, so 
we'd need a different API.


The other option would be to declare this is known, but not important 
issue.  I would think that it's going to be rare to have absolute 
addresses and x32 isn't really used much.  The combination of the two 
should be exceedingly rare.  Hence my willingness to use 
-fno-fold-mem-offsets in the test.


Jeff


Re: [PATCH v2] RISC-V: const: hide mvconst splitter from IRA

2023-10-06 Thread Jeff Law




On 10/6/23 11:49, Vineet Gupta wrote:

Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

= Summary of gcc testsuite =
 | # of unexpected case / # of unique unexpected 
case
 |  gcc |  g++ | gfortran |

rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
  rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
  rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

OK
jeff


Re: [PATCH v1] RISC-V: Bugfix for legitimize address PR/111634

2023-10-06 Thread Jeff Law




On 10/6/23 22:49, pan2...@intel.com wrote:

From: Pan Li 

Given we have RTL as below.

(plus:DI (mult:DI (reg:DI 138 [ g.4_6 ])
   (const_int 8 [0x8]))
  (lo_sum:DI (reg:DI 167)
 (symbol_ref:DI ("f") [flags 0x86] )
))

When handling (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case,
the fp will be the lo_sum operand as above. We have assumption that the fp
is reg but actually not here. It will have ICE when building with option
--enable-checking=rtl.

This patch would like to fix it by adding the REG_P to ensure the operand
is a register. The test case gcc/testsuite/gcc.dg/pr109417.c covered this
fix when build with --enable-checking=rtl.

PR target/111634

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_address): Bugfix.

OK, though the ChangeLog entry could be better.  Perhaps

* config/riscv/riscv.cc (riscv_legitimize_address): Ensure
object is a REG before extracting its register number.


Jeff


Re: [PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-07 Thread Jeff Law




On 10/7/23 03:23, Juzhe-Zhong wrote:

Fix these following XPASS FAILs of TSVC for RVV:

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: Di

Re: [PATCH] RISC-V: Enable more tests of "vect" for RVV

2023-10-07 Thread Jeff Law




On 10/7/23 01:04, Juzhe-Zhong wrote:

This patch enables almost full coverage vectorization tests for RVV, except 
these
following tests (not enabled yet):

1. Will enable soon:

check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
check_effective_target_vect_call_ceilf
check_effective_target_vect_call_floor
check_effective_target_vect_call_floorf
check_effective_target_vect_call_lceil
check_effective_target_vect_call_lfloor
check_effective_target_vect_call_nearbyint
check_effective_target_vect_call_nearbyintf
check_effective_target_vect_call_round
check_effective_target_vect_call_roundf

2. Not sure we will need to enable or not:

check_effective_target_vect_complex_*
check_effective_target_vect_simd_clones
check_effective_target_vect_bswap
check_effective_target_vect_widen_shift
check_effective_target_vect_widen_mult_*
check_effective_target_vect_widen_sum_*
check_effective_target_vect_unpack
check_effective_target_vect_interleave
check_effective_target_vect_extract_even_odd
check_effective_target_vect_pack_trunc
check_effective_target_vect_check_ptrs
check_effective_target_vect_sdiv_pow2_si
check_effective_target_vect_usad_*
check_effective_target_vect_udot_*
check_effective_target_vect_sdot_*
check_effective_target_vect_gather_load_ifn

After this patch, we will have these following additional FAILs:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
XPASS

Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread Jeff Law




On 10/7/23 05:45, Juzhe-Zhong wrote:

This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump vect " 
= \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1

For RVV, the expected dumple IR is COND_LEN_* pattern.

Also, we are still failing at this check:

FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump optimized 
" = \\.COND_LEN_SUB"

Since we have a known bug in GIMPLE_FOLD that Robin is working on it.

@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.
Would it make more sense to adjust the regexp so that it matched the 
standard form as well as the LEN form?  So for example we could have a 
regexp that matched COND_ADD and COND_LEN_ADD.


Just wondering if that'll be better from a long term maintenance standpoint.

Jeff


Re: [PATCH] RISC-V: add static-pie support

2023-10-07 Thread Jeff Law




On 10/7/23 05:32, yanzhang.w...@intel.com wrote:

From: Yanzhang Wang 

We only need to pass options to the linker when static-pie is passed.
There's another patch to enable static-pie in glibc. And we need to
enable in GCC first.

gcc/ChangeLog:

* config/riscv/linux.h: Pass the static-pie specific options to
  the linker.

OK.
jeff


Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-07 Thread Jeff Law




On 10/4/23 16:19, Roger Sayle wrote:


The recent patch to remove poly_int_pod triggers a bug in g++ 4.8.5's
C++ 11 support which mistakenly believes poly_uint16 has a non-trivial
constructor.  This in turn prohibits it from being used as a member in
a union (rtxunion) that constructed statically, resulting in a (fatal)
error during stage 1.  A workaround is to add an explicit constructor
to the problematic union, which allows mainline to be bootstrapped with
the system compiler on older RedHat 7 systems.

This patch has been tested on x86_64-pc-linux-gnu where it allows a
bootstrap to complete when using g++ 4.8.5 as the host compiler.
Ok for mainline?


2023-10-04  Roger Sayle  

gcc/ChangeLog
* rtl.h (rtx_def::u): Add explicit constructor to workaround
issue using g++ 4.8 as a host compiler.
I think the bigger question is whether or not we're going to step 
forward on the minimum build requirements.


My recollection was we settled on gcc-4.8 for the benefit of RHEL 7 and 
Centos 7 which are rapidly approaching EOL (June 2024).


I would certainly support stepping forward to a more modern compiler for 
the build requirements, which might make this patch obsolete.


Jeff


Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-07 Thread Jeff Law




On 10/7/23 15:30, Sam James wrote:


Jeff Law  writes:


On 10/4/23 16:19, Roger Sayle wrote:

The recent patch to remove poly_int_pod triggers a bug in g++
4.8.5's
C++ 11 support which mistakenly believes poly_uint16 has a non-trivial
constructor.  This in turn prohibits it from being used as a member in
a union (rtxunion) that constructed statically, resulting in a (fatal)
error during stage 1.  A workaround is to add an explicit constructor
to the problematic union, which allows mainline to be bootstrapped with
the system compiler on older RedHat 7 systems.
This patch has been tested on x86_64-pc-linux-gnu where it allows a
bootstrap to complete when using g++ 4.8.5 as the host compiler.
Ok for mainline?
2023-10-04  Roger Sayle  
gcc/ChangeLog
* rtl.h (rtx_def::u): Add explicit constructor to workaround
issue using g++ 4.8 as a host compiler.

I think the bigger question is whether or not we're going to step
forward on the minimum build requirements.

My recollection was we settled on gcc-4.8 for the benefit of RHEL 7
and Centos 7 which are rapidly approaching EOL (June 2024).

I would certainly support stepping forward to a more modern compiler
for the build requirements, which might make this patch obsolete.


See also richi and jakub's comments at 
https://inbox.sourceware.org/gcc-patches/mpt5y3ppio0@arm.com/T/#m985295bedaadb47aa0b9ba63b7cb69a660a108bb.
Yea.  As Jakub notes, there's the cfarm situation, but I've had good 
success with DTS on Centos 7 systems (I have to support some of those 
internally within Ventana).  It quite literally "just works" though 
users would have to enable it.


Alternately, update the cfarm hosts?

Jeff


Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-08 Thread Jeff Law




On 10/8/23 05:35, Juzhe-Zhong wrote:

RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this 
case well.
So, adjust dump check for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV.
I'd hoped to avoid a bunch of risc-v special casing in the generic part 
of the testsuite.  Basically the more we have target specific 
conditionals rather than conditionals using properties, the more likely 
we are to keep revisiting this stuff over time and possibly for other 
architectures as well.


What is it about risc-v's vector support that allows it to optimize this 
case?  Is it the same property that allows us to handle the outer loop 
vectorization tests that you changed in another patch?


Neither an ACK nor NAK right now.

Jeff


Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-08 Thread Jeff Law




On 10/7/23 16:02, 钟居哲 wrote:

Do you mean change it like this ?

/* { dg-final { scan-tree-dump-times { = \.COND_L?E?N?_?RDIV} 1 "optimized" { 
target vect_double_cond_arith } } } */

I was thinking something more like
COND(_LEN)?_ADD

The idea being we match _LEN conditionally as a group.

jeff


Re: [PATCH] RISC-V: THead: Fix missing CFI directives for th.sdd in prologue.

2023-10-09 Thread Jeff Law




On 10/4/23 01:49, Xianmiao Qu wrote:

From: quxm 

When generating CFI directives for the store-pair instruction,
if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like
   (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp)
 (const_int 8 [0x8])) [1  S8 A64])
 (reg:DI 1 ra))
   (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1  S8 A64])
 (reg:DI 8 s0))
only the first expr_list will be recognized by dwarf2out_frame_debug
funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR,
which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the
dwarf2out_frame_debug_expr function will iterate through all the sub-expressions
and generate the corresponding CFI directives.

gcc/
* config/riscv/thead.cc (th_mempair_save_regs): Fix missing CFI
directives for store-pair instruction.

gcc/testsuite/
* gcc.target/riscv/xtheadmempair-4.c: New test.

THanks.  I pushed this to the trunk.
jeff



Re: [PATCH] RISC-V Regression test: Adapt SLP tests like ARM SVE

2023-10-09 Thread Jeff Law




On 10/9/23 07:37, Juzhe-Zhong wrote:

Like ARM SVE, RVV is vectorizing these 2 cases in the same way.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
* gcc.dg/vect/slp-perm-10.c: Ditto.

OK
jeff


Re: [PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

2023-10-09 Thread Jeff Law




On 10/9/23 07:39, Juzhe-Zhong wrote:

RVV vectorize it with stride5 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.

OK.

As a follow-up, would it make sense to test the .vect dump for something 
else in the ! {vec_load_lanes && vect_strided5 } case to verify that it 
does and continues to be vectorized for that configuration?


jeff


Re: [PATCH] RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV

2023-10-09 Thread Jeff Law




On 10/9/23 07:41, Juzhe-Zhong wrote:

RVV vectortizes this case with stride8 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.
OK.  Similar question as my last ack.  Do we want a follow-up here which 
tests the .vect dump for the ! { vect_load_lanes && vec_strided8 } case?


jeff


Re: [PATCH] RISC-V Regression test: Fix FAIL of slp-12a.c

2023-10-09 Thread Jeff Law




On 10/9/23 07:35, Juzhe-Zhong wrote:

This case is vectorized by stride8 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.

OK.  Same question as last two ACKs.

jeff


Re: [PATCH] RISC-V Regression tests: Fix FAIL of pr97832* for RVV

2023-10-09 Thread Jeff Law




On 10/9/23 07:15, Juzhe-Zhong wrote:

These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
with -fno-vect-cost-model.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports 
load_lanes with stride = 8.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr97832-4.c: Ditto.

OK.  Same question as last 3 acks.

jeff


Re: [PATCH v1 1/4] options: Define TARGET__P and TARGET__OPTS_P macro for Mask and InverseMask

2023-10-09 Thread Jeff Law




On 10/3/23 03:09, Kito Cheng wrote:

We TARGET__P marcro to test a Mask and InverseMask with user
specified target_variable, however we may want to test with specific
gcc_options variable rather than target_variable.

Like RISC-V has defined lots of Mask with TargetVariable, which is not
easy to use, because that means we need to known which Mask are associate with
which TargetVariable, so take a gcc_options variable is a better interface
for such use case.

gcc/ChangeLog:

* doc/options.texi (Mask): Document TARGET__P and
TARGET__OPTS_P.
(InverseMask): Ditto.
* opth-gen.awk (Mask): Generate TARGET__P and
TARGET__OPTS_P macro.
(InverseMask): Ditto.
Doesn't this need to be updated to avoid multi-dimensional arrays in awk 
and rebased?


Jeff


Re: [PATCH v1 2/4] RISC-V: Refactor riscv_option_override and riscv_convert_vector_bits. [NFC]

2023-10-09 Thread Jeff Law




On 10/3/23 03:09, Kito Cheng wrote:

Allow those funciton apply from a local gcc_options rather than the
global options.

Preparatory for target attribute, sperate this change for eaiser reivew
since it's a NFC.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_convert_vector_bits): Get setting
from argument rather than get setting from global setting.
(riscv_override_options_internal): New, splited from
riscv_override_options, also take a gcc_options argument.
(riscv_option_override): Splited most part to
riscv_override_options_internal.

OK once prerequisites are approved and installed.

jeff


Re: [PATCH] RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

2023-10-09 Thread Jeff Law




On 10/9/23 08:21, juzhe.zhong wrote:

Do you mean add a check whether it is vectorized or not?

Yes.



Sounds reasonable, I can add that in another patch.

Sounds good.  Thanks.

jeff


Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-10-09 Thread Jeff Law




On 10/9/23 00:18, Jin Ma wrote:


+;; The conversion of DF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __truncdfbf2.
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF 0 "register_operand" "=f")
+   (float_truncate:BF
+   (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  })

So for conversions to/from BFmode, doesn't generic code take care of
this for us?  Search for convert_mode_scalar in expr.cc. That code will
utilize SFmode as an intermediate step just like your expander.   Is
there some reason that generic code is insufficient?

Similarly for the the other conversions.


As far as I can see, the function 'convert_mode_scalar' doesn't seem to be 
perfect for
dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, 
DF and
SF to BF well, but the rest of the conversion without any processing, directly 
using
the libcall.

Maybe I should choose to enhance its functionality? This seems to be a
good choice, I'm not sure.My recollection was that BF could be converted to/from SF trivially and 

if we wanted BF->DF we'd first convert to SF, then to DF.

Direct BF<->DF conversions aren't actually important from a performance 
standpoint.  So it's OK if they have an extra step IMHO.


jeff


Re: xthead regression with [COMMITTED] RISC-V: const: hide mvconst splitter from IRA

2023-10-09 Thread Jeff Law




On 10/9/23 14:36, Vineet Gupta wrote:

Hi Christoph,

On 10/9/23 12:06, Patrick O'Neill wrote:


Hi Vineet,

We're seeing a regression on all riscv targets after this patch:|

FAIL: gcc.target/riscv/xtheadcondmov-indirect.c -O2 
check-function-bodies ConNmv_imm_imm_reg||
FAIL: gcc.target/riscv/xtheadcondmov-indirect.c -O3 -g 
check-function-bodies ConNmv_imm_imm_reg


Debug log output:
body: \taddi    a[0-9]+,a[0-9]+,-1000+
\tli    a[0-9]+,9998336+
\taddi    a[0-9]+,a[0-9]+,1664+
\tth.mveqz    a[0-9]+,a[0-9]+,a[0-9]+
\tret

against:     li    a5,9998336
    addi    a4,a0,-1000
    addi    a0,a5,1664
    th.mveqz    a0,a1,a4
    ret|

https://github.com/patrick-rivos/gcc-postcommit-ci/issues/8
https://github.com/ewlu/riscv-gnu-toolchain/issues/286



It seems with my patch, exactly same instructions get out of order (for 
-O2/-O3) tripping up the test results and differ from say O1 for exact 
same build.


-O2 w/ patch
ConNmv_imm_imm_reg:
     li    a5,9998336
     addi    a4,a0,-1000
     addi    a0,a5,1664
     th.mveqz    a0,a1,a4
     ret

-O1 w/ patch
ConNmv_imm_imm_reg:
     addi    a4,a0,-1000
     li    a5,9998336
     addi    a0,a5,1664
     th.mveqz    a0,a1,a4
     ret

I'm not sure if there is an easy way to handle that.
Is there a real reason for testing the full sequences verbatim, or is 
testing number of occurrences of th.mv{eqz,nez} enough.
It seems Jeff recently added -fno-sched-pressure to avoid similar issues 
but that apparently is no longer sufficient.

I'd suggest doing a count test rather than an exact match.

Verify you get a single li, two addis and one th.mveqz

Jeff


Re: [PATCH] RISC-V Regression: Fix FAIL of pr65947-8.c for RVV

2023-10-10 Thread Jeff Law




On 10/10/23 06:55, Juzhe-Zhong wrote:

This test is testing fold_extract_last pattern so it's more reasonable use
vect_fold_extract_last instead of specifying targets.

This is the vect_fold_extract_last property:
proc check_effective_target_vect_fold_extract_last { } {
 return [expr { [check_effective_target_aarch64_sve]
   || [istarget amdgcn*-*-*]
   || [check_effective_target_riscv_v] }]
}

include ARM SVE/GCN/RVV.

It perfectly matches what we want and more reasonable, better maintainment.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr65947-8.c: Use vect_fold_extract_last.

OK
jeff


Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-10 Thread Jeff Law




On 10/9/23 19:13, juzhe.zh...@rivai.ai wrote:

Oh. I realize this patch increase FAIL that I recently fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632247.html 



This fail because RVV doesn't have vec_pack_trunc_optab (Loop vectorizer 
will failed at first time but succeed at 2nd time),
then RVV will dump 4 times FOLD_EXTRACT_LAST instead of 2  (ARM SVE 2 
times because they have vec_pack_trunc_optab).


I think the root cause of RVV failing at multiple tests of "vect" is 
that we don't enable vec_pack/vec_unpack/... stuff,

we still succeed at vectorizations and we want to enable tests of them
(Mostly just using different approach to vectorize it (cause dump FAIL) 
because of some changing I have done previously in the middle-end).


So enabling "vec_pack" for RVV will fix some FAILs but increase some 
other FAILs.


CC to Richi to see more reasonable suggestions.
So what is the summary on Maciej's patch to enable vec_pack_trunc?  ie, 
is it something we should move forward with as-is, is it superceded by 
your work in this space or does it need further investigation because of 
differences in testing methodologies or something else?


jeff


Re: [PATCH] RISC-V Regression: Fix dump check of bb-slp-68.c

2023-10-10 Thread Jeff Law




On 10/9/23 19:16, Juzhe-Zhong wrote:

Like GCN, RVV also has 64 bytes vectors (512 bits) which cause FAIL in this 
test.

It's more reasonable to use "vect512" instead of AMDGCN.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-68.c: Use vect512.
Just a note for the record.  At this time the only target obviously 
advertising 512 bit vectors in available_vector_sizes is amdgcn -- 
avx512 doesn't signal 512 bit vectors right now.  SVE might, but it's a 
bit hard to tell easily and I don't have a cross handy.  I'd hazard a 
guess it would via -msve-vector-bits.


Anyway, OK for the trunk.  Just keep an eye out for reports of any 
issues with this test on aarch64.


Thanks,
Jeff



Re: [PATCH] RISC-V Regression: Make match patterns more accurate

2023-10-10 Thread Jeff Law




On 10/9/23 20:47, Juzhe-Zhong wrote:

This patch fixes following 2 FAILs in RVV regression since the check is not 
accurate.

It's inspired by Robin's previous patch:
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/

gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-7.c: Adjust regex pattern.
* gcc.dg/vect/no-scevccp-vect-iv-3.c: Ditto.
OK.   We might see other ports flipping to a pass if they were 
exhibiting the same behavior with failing to vectorize with the first 
selected type, but passing on the second type.


Jeff


Re: [PATCH] RISC-V Regression: Fix FAIL of predcom-2.c

2023-10-10 Thread Jeff Law




On 10/9/23 20:58, Juzhe-Zhong wrote:

Like GCN, add -fno-tree-vectorize.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/predcom-2.c: Add riscv.

OK.
jeff


Re: [PATCH v2 1/4] options: Define TARGET__P and TARGET__OPTS_P macro for Mask and InverseMask

2023-10-10 Thread Jeff Law




On 10/9/23 22:13, Kito Cheng wrote:

We TARGET__P marcro to test a Mask and InverseMask with user
specified target_variable, however we may want to test with specific
gcc_options variable rather than target_variable.

Like RISC-V has defined lots of Mask with TargetVariable, which is not
easy to use, because that means we need to known which Mask are associate with
which TargetVariable, so take a gcc_options variable is a better interface
for such use case.

gcc/ChangeLog:

* doc/options.texi (Mask): Document TARGET__P and
TARGET__OPTS_P.
(InverseMask): Ditto.
* opth-gen.awk (Mask): Generate TARGET__P and
TARGET__OPTS_P macro.
(InverseMask): Ditto.

OK assuming it passes a build cycle on x86 or some other common target.

jeff


Re: [PATCH v2 2/4] RISC-V: Refactor riscv_option_override and riscv_convert_vector_bits. [NFC]

2023-10-10 Thread Jeff Law




On 10/9/23 22:13, Kito Cheng wrote:

Allow those funciton apply from a local gcc_options rather than the
global options.

Preparatory for target attribute, sperate this change for eaiser reivew
since it's a NFC.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_convert_vector_bits): Get setting
from argument rather than get setting from global setting.
(riscv_override_options_internal): New, splited from
riscv_override_options, also take a gcc_options argument.
(riscv_option_override): Splited most part to
riscv_override_options_internal.

OK.
jeff


Re: [PATCH v2 3/4] RISC-V: Extend riscv_subset_list, preparatory for target attribute support

2023-10-10 Thread Jeff Law




On 10/9/23 22:13, Kito Cheng wrote:

riscv_subset_list only accept a full arch string before, but we need to
parse single extension when supporting target attribute, also we may set
a riscv_subset_list directly rather than re-parsing the ISA string
again.

gcc/ChangeLog:

* config/riscv/riscv-subset.h (riscv_subset_list::parse_single_std_ext):
New.
(riscv_subset_list::parse_single_multiletter_ext): Ditto.
(riscv_subset_list::clone): Ditto.
(riscv_subset_list::parse_single_ext): Ditto.
(riscv_subset_list::set_loc): Ditto.
(riscv_set_arch_by_subset_list): Ditto.
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_single_std_ext): New.
(riscv_subset_list::parse_single_multiletter_ext): Ditto.
(riscv_subset_list::clone): Ditto.
(riscv_subset_list::parse_single_ext): Ditto.
(riscv_subset_list::set_loc): Ditto.
(riscv_set_arch_by_subset_list): Ditto.
---
  gcc/common/config/riscv/riscv-common.cc | 203 
  gcc/config/riscv/riscv-subset.h |  11 ++
  2 files changed, 214 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 9a0a68fe5db..25630d5923e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1138,6 +1173,102 @@ riscv_subset_list::handle_combine_ext ()
  }
  }
  
+/* Parsing function for multi-letter extensions.

+
+   Return Value:
+ Points to the end of extensions.
+
+   Arguments:
+ `p`: Current parsing position.
+ `ext_type`: What kind of extensions, 's', 'z' or 'x'.
+ `ext_type_str`: Full name for kind of extension.  */
+
+
+const char *
+riscv_subset_list::parse_single_multiletter_ext (const char *p,
+const char *ext_type,
+const char *ext_type_str)

[ ... ]





+
+  if (end_of_version == NULL)
+return NULL;

I think when we hit this path we leak SUBSET.




  
  std::string

@@ -1498,6 +1673,34 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
{NULL, NULL, 0}
  };
  
+void

+riscv_set_arch_by_subset_list (riscv_subset_list *subset_list,
+  struct gcc_options *opts)

Needs a function comment.

OK with those two minor issues fixed.

jeff


Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-10 Thread Jeff Law




On 10/10/23 07:53, juzhe.zhong wrote:
I am working on it. Currently,  we have about 50+ additional FAILs after 
enabling vectorization.


some of them need fixed on middle-end. E.g richard fixed a missed cse 
optimization.


Some need fix on test case.

I am analyzing each fail one by one.

I prefer postpone this patch since it will cause some additional fails 
and I will handle that eventually after full coverage analysis.
OK.  We can defer.  I just want to make sure we're all know where things 
stand.


jeff


Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-10 Thread Jeff Law




On 10/10/23 05:59, Manolis Tsamis wrote:


It's a code quality issue as long as we don't transform the code into
movl $0, -18874240, at which point it would become a correctness issue.


Ok, thanks for pointing that out as I thought that movl $0, -18874240
and movl $0, -18874240(eax) with eax == 0 were the same.
It's a quirk of x32 related to sign extension of the address.  Certainly 
not obvious!  But it's better than the PA where:


(mem (plus (reg A) (reg B)))

Is semantically different than

(mem (plus (reg B) (reg A)))


Due to the implicit segment register selection that happens on the high 
bits of the base register rather than the high bits of the effective 
address.









With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.

It's a chain of insns that produce an address and use it in the memory
reference.  We essentially changed the first insn in the chain from movl
-18874240, %eax into movl 0, %eax.  So we'd have to someone note that
the base register in the memory reference has the value zero in the
chain of instructions.  That may or may not be reasonable to do.


Ok, do you believe it would be worthwhile to add a REG_EQ note or
something similar? I assume some REG_EQ notes will be added from the
following cprop-hardreg pass too?
I think it really depends on the degree of difficulty and whether or not 
we think there are other cases like this one lurking.


Given this scenario can only happen with an absolute address, I would 
probably explore if there's a way to trivially detect this case, but I 
wouldn't spend a lot of time on it.


And I wasn't necessarily thinking of a note in the RTL, just a bit of 
state within f-m-o when updating an arithmetic chain so that we could 
determine that the final instruction was a memory reference to an 
absolute address.








You could use the costing model to cost the entire sequence
before/after.  There's an interface to walk a sequence and return a
cost.  In the case of f-m-o the insns are part of the larger chain, so
we'd need a different API.

The other option would be to declare this is known, but not important
issue.  I would think that it's going to be rare to have absolute
addresses and x32 isn't really used much.  The combination of the two
should be exceedingly rare.  Hence my willingness to use
-fno-fold-mem-offsets in the test.


Yes, I now better understand why you suggested adding
-fno-fold-mem-offsets to the testcase. Or we could also make the test
not fail in this case where the memory access has the base register,
as there's no correctness issue.

:)




Then, going back to the code quality regression, wouldn't things be
much better if we would emit xor eax, eax instead of mov eax, 0? In
that case xor eax, eax should be 2 bytes instead of 5 for mov eax, 0
and hence the code size difference should be trivial. Then we can keep
f-m-o as is, including this testcase.
The selection of xor vs mov imm is handled elsewhere.  On some uarchs 
the former is preferred, on others the latter.


If you think about xor %eax,%eax as an example.  Unless you special case 
that scenario in your cpu front-end, the xor will have a data dependency 
on any prior set of %eax which inhibits ILP.





Is there a way to emit a taret-specific optimized register zero insn?
If so I'll use that and adjust the testcase as needed, and I think
we're done with this one.

I can live with just adjusting the testcase.  Let's go with that.

jeff


Re: [PATCH] Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).

2023-10-10 Thread Jeff Law




On 10/10/23 06:28, Roger Sayle wrote:


This patch is the middle-end piece of an improvement to PRs 101955 and
106245, that adds a missing simplification to the RTL optimizers.
This transformation is to simplify (char)(x << 7) != 0 as x & 1.
Technically, the cast can be any truncation, where shift is by one
less than the narrower type's precision, setting the most significant
(only) bit from the least significant bit.

This transformation applies to any target, but it's easy to see
(and add a new test case) on x86, where the following function:

int f(int a) { return (a << 31) >> 31; }

currently gets compiled with -O2 to:

foo:movl%edi, %eax
 sall$7, %eax
 sarb$7, %al
 movsbl  %al, %eax
 ret

but with this patch, we now generate the slightly simpler.

foo:movl%edi, %eax
 sall$31, %eax
 sarl$31, %eax
 ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2023-10-10  Roger Sayle  

gcc/ChangeLog
 PR middle-end/101955
 PR tree-optimization/106245
 * simplify-rtx.c (simplify_relational_operation_1): Simplify
 the RTL (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) to (and:SI x 1).

gcc/testsuite/ChangeLog
 * gcc.target/i386/pr106245-1.c: New test case.

OK.  Thanks!  I must admit, I'm a bit surprised this wasn't already handled.

jeff


Re: [PATCH] Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).

2023-10-10 Thread Jeff Law




On 10/10/23 08:41, Michael Matz wrote:


On Tue, 10 Oct 2023, Roger Sayle wrote:



This patch is the middle-end piece of an improvement to PRs 101955 and
106245, that adds a missing simplification to the RTL optimizers.
This transformation is to simplify (char)(x << 7) != 0 as x & 1.


Random observation:

So, why restrict to shifts of LEN-1 and mask 1?  It's always the case that
(type-of-LEN)(x << S)) != 0  ===  (x & ((1 << (LEN - S)) - 1)) != 0.

E.g. (char)(x << 5) != 0  ===  (x & 7) != 0.

Yea, it probably could be extended as a followup.



(Eventually the mask will be a constant that's too costly to compute if S
is target-dependendly too small, but all else being equal avoiding shifts
seems sensible)

Agreed, though it's nowhere near as important as it was 20+ years ago ;-)

jeff


Re: [PATCH v2 4/4] RISC-V: Implement target attribute

2023-10-10 Thread Jeff Law




On 10/9/23 22:13, Kito Cheng wrote:

The target attribute which proposed in [1], target attribute allow user
to specify a local setting per-function basis.

The syntax of target attribute is `__attribute__((target("")))`.

and the syntax of `` describes below:
```
ATTR-STRING := ATTR-STRING ';' ATTR
  | ATTR

ATTR:= ARCH-ATTR
  | CPU-ATTR
  | TUNE-ATTR

ARCH-ATTR   := 'arch=' EXTENSIONS-OR-FULLARCH

EXTENSIONS-OR-FULLARCH := 
 | 

EXTENSIONS :=  ',' 
 | 

FULLARCHSTR:= 

EXTENSION  :=   

OP := '+'

VERSION:= [0-9]+ 'p' [0-9]+
 | [1-9][0-9]*
 |

EXTENSION-NAME := Naming rule is defined in RISC-V ISA manual

CPU-ATTR:= 'cpu=' 
TUNE-ATTR   := 'tune=' 
```

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35

gcc/ChangeLog:

* config.gcc (riscv): Add riscv-target-attr.o.
* config/riscv/riscv-opts.h (TARGET_MIN_VLEN_OPTS): New.
* config/riscv/riscv-protos.h (riscv_declare_function_size) New.
(riscv_option_valid_attribute_p): New.
(riscv_override_options_internal): New.
(struct riscv_tune_info): New.
(riscv_parse_tune): New.
* config/riscv/riscv-target-attr.cc
(class riscv_target_attr_parser): New.
(struct riscv_attribute_info): New.
(riscv_attributes): New.
(riscv_target_attr_parser::parse_arch):
(riscv_target_attr_parser::handle_arch):
(riscv_target_attr_parser::handle_cpu):
(riscv_target_attr_parser::handle_tune):
(riscv_target_attr_parser::update_settings):
(riscv_process_one_target_attr):
(num_occurences_in_str):
(riscv_process_target_attr):
(riscv_option_valid_attribute_p):
* config/riscv/riscv.cc: Include target-globals.h and
riscv-subset.h.
(struct riscv_tune_info): Move to riscv-protos.h.
(get_tune_str):
(riscv_parse_tune):
(riscv_declare_function_size):
(riscv_option_override): Build target_option_default_node and
target_option_current_node.
(riscv_save_restore_target_globals):
(riscv_option_restore):
(riscv_previous_fndecl):
(riscv_set_current_function): Apply the target attribute.
(TARGET_OPTION_RESTORE): Define.
(TARGET_OPTION_VALID_ATTRIBUTE_P): Ditto.
* config/riscv/riscv.h (SWITCHABLE_TARGET): Define to 1.
(ASM_DECLARE_FUNCTION_SIZE) Define.
* config/riscv/riscv.opt (mtune=): Add Save attribute.
(mcpu=): Ditto.
(mcmodel=): Ditto.
* config/riscv/t-riscv: Add build rule for riscv-target-attr.o
* doc/extend.texi: Add doc for target attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/target-attr-01.c: New.
* gcc.target/riscv/target-attr-02.c: Ditto.
* gcc.target/riscv/target-attr-03.c: Ditto.
* gcc.target/riscv/target-attr-04.c: Ditto.
* gcc.target/riscv/target-attr-05.c: Ditto.
* gcc.target/riscv/target-attr-06.c: Ditto.
* gcc.target/riscv/target-attr-07.c: Ditto.
* gcc.target/riscv/target-attr-bad-01.c: Ditto.
* gcc.target/riscv/target-attr-bad-02.c: Ditto.
* gcc.target/riscv/target-attr-bad-03.c: Ditto.
* gcc.target/riscv/target-attr-bad-04.c: Ditto.
* gcc.target/riscv/target-attr-bad-05.c: Ditto.
* gcc.target/riscv/target-attr-bad-06.c: Ditto.
* gcc.target/riscv/target-attr-bad-07.c: Ditto.
* gcc.target/riscv/target-attr-warning-01.c: Ditto.
* gcc.target/riscv/target-attr-warning-02.c: Ditto.
* gcc.target/riscv/target-attr-warning-03.c: Ditto.
I probably would have split the save/restore state out as a separate 
patch, but it's not a big deal.  A bit surprised we didn't already have 
that (and thus SWITCHABLE_TARGET) enabled already.  But no worries about 
that.




And just so I can do the right thing WRT RISE, what's the state of an 
implementation for LLVM?






diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
new file mode 100644
index 000..33cff2c222f
--- /dev/null
+++ b/gcc/config/riscv/riscv-target-attr.cc



+  /* Parsing the extension list like "+[,+]*".  */
+  size_t len = strlen (str);
+  char *str_to_check = (char *) alloca (len + 1);
If STR is under user control, then this could be a potential security 
issue if they were to provide a crazy long string (at least until we 
implement stack-clash protection -- next year).


In general we should avoid alloca unless we know the amount of memory 
consumed is relatively small and we have a high degree of confidence the 
code isn't going to be called inside a loop (and not inlined into a loop 
in the caller).  I might even go so far as to say no programmer should 
ever use alloca.


Cl

Re: [PATCH] RISC-V Regression: Make pattern match more accurate of vect-live-2.c

2023-10-10 Thread Jeff Law




On 10/10/23 08:57, Juzhe-Zhong wrote:

Like previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac...@gmail.com/

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-live-2.c: Make pattern match more accurate.

OK
jeff


Re: [PATCH] RISC-V Regression: Fix FAIL of vect-multitypes-16.c for RVV

2023-10-10 Thread Jeff Law




On 10/10/23 08:49, Juzhe-Zhong wrote:

As Richard suggested: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html

Add vect_ext_char_longlong to fix FAIL for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV.
* lib/target-supports.exp: Add vect_ext_char_longlong property.

OK
jeff


Re: [RFC] RISC-V: Handle new types in scheduling descriptions

2023-10-10 Thread Jeff Law




On 10/9/23 15:02, Edwin Lu wrote:

Now that every insn is guaranteed a type, we want to ensure the types are
handled by the existing scheduling descriptions.

There are 2 approaches I see:
1. Create a new pipeline intended to eventually abort (sifive-7.md)
2. Add the types to an existing pipeline (generic.md)

Which approach do we want to go with? If there is a different approach we
want to take instead, please let me know as well.

Additionally, should types associated with specific extensions
(vector, crypto, etc) have specific pipelines dedicated to them?

* config/riscv/generic.md: update pipeline
* config/riscv/sifive-7.md (sifive_7): update pipeline
(sifive_7_other):
I'd largely expect that we look at an unhandled type and first look to 
see if its properties roughly fit into an existing define_insn_unit.  If 
so, just add it to the existing unit.  Otherwise we end up needing to 
create another unit.


What would be really interesting would be to see if we can get the 
scheduler to indicate that it's trying to schedule an insn that doesn't 
have a reservation.ie, our backend tells us if we have an insn with 
no type, the next step is to see if we have a type with no 
units/reservations.



Jeff


Re: [PATCH] RFC: Add late-combine pass [PR106594]

2023-10-10 Thread Jeff Law




On 10/7/23 06:58, Richard Sandiford wrote:



Yeah, that'd probably be best.  I need to split the patch up into a
proper submission sequence, do more testing, and make it RFA quality.
Jeff has also found a couple of regressions that I need to look at.
When you've got updates, just let me know.  I can drop them into the CI 
system and see what pops out.


Jeff


[committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V

2023-10-10 Thread Jeff Law


Ventana has had a variant of this patch from Andrew W. in its tree for 
at least a year.   I'm dusting it off and submitting it on Andrew's behalf.


There's multiple approaches we could be using here.

First we could make $ra fixed and use it as the scratch register for the 
long branch sequences.


Second, we could add a match_scratch to all the conditional branch 
patterns and allow the register allocator to assign the scratch register 
from the pool of GPRs.


Third we could do register scavenging.  This can usually work, though it 
can get complex in some scenarios.


Forth we could use trampolines for extended reach.

Andrew's original patch did a bit of the first approach (make $ra fixed) 
and mostly the second approach.  The net is it was probably the worst in 
terms of impacting code generation -- we lost a register *and* forced 
every branch instruction to get a scratch register allocated.


I had expected the second approach to produce better code than the 
first, but that wasn't actually the case in practice.  It's probably a 
combination of allocating a GPR at every branch point (even with a life 
of a single insn, there's a cost) and perhaps the additional operands on 
conditional branches spoiling simplistic pattern matching in one or more 
passes.


In addition to performing better based on dynamic instruction counts, 
the first approach is significantly simpler to implement.  Given those 
two positives, that's what I've chosen to go with.  Yes it does remove 
$ra from the set of registers available, but the impact of that is *tiny*.


If someone wanted to dive into one of the other approaches to address a 
real world impact, that's great.  If that happens I would strongly 
suggest also evaluating perlbench from spec2017.  It seems particularly 
sensitive to this issue in terms of approach #2's impact on code generation.


I've built & regression tested this variant on the vt1 configuration 
without regressions.  Earlier versions have been bootstrapped as well.


Pushed to the trunk,

Jeff

commit 71f906498ada9ec2780660b03bd6e27a93ad350c
Author: Andrew Waterman 
Date:   Tue Oct 10 12:34:04 2023 -0600

RISC-V: far-branch: Handle far jumps and branches for functions larger than 
1MB

On RISC-V, branches further than +/-1MB require a longer instruction
sequence (3 instructions): we can reuse the jump-construction in the
assmbler (which clobbers $ra) and a temporary to set up the jump
destination.

gcc/ChangeLog:

* config/riscv/riscv.cc (struct machine_function): Track if a
far-branch/jump is used within a function (and $ra needs to be
saved).
(riscv_print_operand): Implement 'N' (inverse integer branch).
(riscv_far_jump_used_p): Implement.
(riscv_save_return_addr_reg_p): New function.
(riscv_save_reg_p): Use riscv_save_return_addr_reg_p.
* config/riscv/riscv.h (FIXED_REGISTERS): Update $ra.
(CALL_USED_REGISTERS): Update $ra.
* config/riscv/riscv.md: Add new types "ret" and "jalr".
(length attribute): Handle long conditional and unconditional
branches.
(conditional branch pattern): Handle case where jump can not
reach the intended target.
(indirect_jump, tablejump): Use new "jalr" type.
(simple_return): Use new "ret" type.
(simple_return_internal, eh_return_internal): Likewise.
(gpr_restore_return, riscv_mret): Likewise.
(riscv_uret, riscv_sret): Likewise.
* config/riscv/generic.md (generic_branch): Also recognize jalr & 
ret
types.
    * config/riscv/sifive-7.md (sifive_7_jump): Likewise.

Co-authored-by: Philipp Tomsich 
Co-authored-by: Jeff Law 

diff --git a/gcc/config/riscv/generic.md b/gcc/config/riscv/generic.md
index 57d3c3b4adc..88940483829 100644
--- a/gcc/config/riscv/generic.md
+++ b/gcc/config/riscv/generic.md
@@ -47,7 +47,7 @@ (define_insn_reservation "generic_xfer" 3
 
 (define_insn_reservation "generic_branch" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "branch,jump,call"))
+   (eq_attr "type" "branch,jump,call,jalr"))
   "alu")
 
 (define_insn_reservation "generic_imul" 10
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b7acf836d02..d17139e945e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -183,6 +183,9 @@ struct GTY(())  machine_function {
   /* True if attributes on current function have been checked.  */
   bool attributes_checked_p;
 
+  /* True if RA must be saved because of a far jump.  */
+  bool far_jump_used;
+
   /* The current frame information, calculated by riscv_comput

Re: [committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V

2023-10-10 Thread Jeff Law




On 10/10/23 18:24, Andrew Waterman wrote:

I remembered another concern since we discussed this patch privately.
Using ra for long calls results in a sequence that will corrupt the
return-address stack.
Yup.  We've actually got data on that internally, it's not showing up in 
a significant way in practice.



  I know nothing

about the complexity of register scavenging, but it would be nice to
opportunistically use a scratch register (other than t0), falling back
to ra only when necessary.
The nice thing about making $ra fixed is some can add a register 
scavenging approach, then fall back to $ra if they're unable to find a 
register to reuse.




Tangentially, I noticed the patch uses `jump label, ra' for far
branches but uses `call label' for far jumps.  These corrupt the RAS
in opposite ways (the former pops the RAS and the latter pushes it.
Any reason for using a different sequence in one than the other?
I'd noticed it as well -- that's the way it was in the patch that was 
already in Ventana's tree ;-)  My plan was to address that separately 
after dropping in enough infrastructure to allow me to force everything 
to be far branches for testing purposes.


jeff


Re: [PATCH v4 0/2] RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread Jeff Law




On 10/11/23 06:06, Mary Bennett wrote:

This patch series presents the comprehensive implementation of the MAC and ALU
extension for CORE-V.

Tested with riscv-gnu-toolchain on binutils, ld, gas and gcc testsuites to
ensure its correctness and compatibility with the existing codebase.
However, your input, reviews, and suggestions are invaluable in making this
extension even more robust.

The CORE-V builtins are described in the specification [1] and work can be
found in the OpenHW group's Github repository [2].

[1] 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

[2] github.com/openhwgroup/corev-gcc

Contributors:
 Mary Bennett 
 Nandni Jamnadas 
 Pietra Ferreira 
 Charlie Keaney
 Jessica Mills
 Craig Blackmore 
 Simon Cook 
 Jeremy Bennett 
 Helene Chelin 

   RISC-V: Add support for XCValu extension in CV32E40P
   RISC-V: Add support for XCVmac extension in CV32E40P

Per yesterday's discussion, I've pushed both rebased patches to the trunk.

Thanks!

jeff


Re: Principles of the C99 testsuite conversion

2023-10-11 Thread Jeff Law




On 10/11/23 04:39, Florian Weimer wrote:

I've started to look at what it is required to convert the testsuite to
C99 (without implicit ints, without implicit function declarations, and
a few other legacy language features).
I bet those older tests originating from c-torture will be a bit 
painful.  Torbjorn liked having them minimized, to the point of 
squashing out nearly everything he considered extraneous.  I'd bet many 
of those older tests are going to need lots of changes.




I plan to bundle a bunch of fixes together and submit patches
incrementally.  So far, I have identified the following change
categories:

* C89 test cases that appear to make explicit use of C89-only features.
   I plan to add /* { dg-additional-options "-std=gnu89" } */ to those
   (or add -std=gnu89 to an existing dg-options line).  These fixes can
   be submitted early.

Sounds quite sensible.



* Overly reduced test cases where I can verify that the less-reduced
   test case still reproduces the bug.  Can be submitted early.

Seems like this could be a huge time sink, but that's your call.



* Similar test cases where I cannot verify that the test case is still
   reproducing the bug, but where I assume that the unreduction is
   harmless as far as the test objective is concerned.  Can be submitted
   early (but should be in a separate patch from the previous one).

I suspect many of the older tests will fall into this bucket.



* Test cases which have to build with -fpermissive.  These include
   expected-warnings tests, and over-reduced test cases where unreducing
   might conceivably interfere with the test objective.  These fixes have
   to wait until -fpermissive for C lands (or they could wait until the
   first relevant use of permerror).

Also seems quite sensible.



* Test cases which check (for example) error locations from diagnostics
   that used be warnings, but that are now errors, and other test cases
   where it seems to make sense to test for errors now.  Here I plan to
   change dg-warning into dg-error as appropriate.  These changes would
   be submitted along the patch that introduces the relevant permerror
   calls.  In some cases, I plan to duplicate tests into warning variants
   (now with -fpermissive) and error variants (with dg-warning changed to
   dg-error).

Sure, this seems quite sensible as well.



* There's the matter of target-specific tests that I don't know how to
   cover.  I can probably run the testsuite on x86-64, POWER9 and
   AArch64, but beyond that, I'm pretty much out of options.  I will have
   to rely on help from the community, and perhaps target maintainers
   adjusting their tests after the permerror changes according to the
   principles above.
The good news is we'll see these pop in the tester for all those 
embedded targets.  So I'm not too concerned here.


Seems like a lot of (long needed) maintenance work.   I can't drive it, 
but I can certainly help with identification and fixing of the embedded 
targets when we get to that point.


jeff


Re: Principles of the C99 testsuite conversion

2023-10-11 Thread Jeff Law




On 10/11/23 08:10, Richard Earnshaw (lists) wrote:

On 11/10/2023 14:56, Jeff Law wrote:



On 10/11/23 04:39, Florian Weimer wrote:

I've started to look at what it is required to convert the testsuite to
C99 (without implicit ints, without implicit function declarations, and
a few other legacy language features).

I bet those older tests originating from c-torture will be a bit painful.  
Torbjorn liked having them minimized, to the point of squashing out nearly 
everything he considered extraneous.  I'd bet many of those older tests are 
going to need lots of changes.



I've often wondered just how much of the original c-torture suite is still 
relevant today.  Most of those tests were written at a  time when the compiler 
expanded tree directly into RTL and I suspect that today the tests never get 
even close to tickling the original bug they were intended to validate.
I'm sure it's a mixed bag.  Some I do see pop up regularly during 
development testing.


The real way to try and answer that question would be with gcov. 
Something like take an existing coverage report, run test and see if it 
added any additional coverage.  If not, flag it as potentially 
irrelevant.  Do this on x86 or aarch64.


Given that list of potentially irrelevant tests, then look at them from 
a target standpoint, potentially testing with the same methodology on 
several targets, perhaps a standardized set meant to cover 16->64 bit 
targets, big/little endian and a few other key features.


jeff


[committed] RISC-V: Adjust long unconditional branch sequence

2023-10-11 Thread Jeff Law
Andrew and I independently noted the long unconditional branch sequence 
was using the "call" pseudo op.  Technically it works, but it's a bit 
odd.  This patch flips it to use the "jump" pseudo-op.


This was tested with a hacked-up local compiler which forced all 
branches/jumps to be long jumps.  Naturally it triggered some failures 
for scan-asm tests but no execution regressions (which is mostly what I 
was testing for).


I've updated the long branch support item in the RISE wiki to indicate 
that we eventually want a register scavenging approach with a fallback 
to $ra in the future so that we don't muck up the return address 
predictors.  It's not super-high priority and shouldn't be terrible to 
implement given we've got the $ra fallback when a suitable register can 
not be found.



Pushed to the trunk,

Jeffcommit a3e50ee96dc3e25ca52608e58c4e653f9976cb4e
Author: Jeff Law 
Date:   Wed Oct 11 16:18:22 2023 -0600

RISC-V Adjust long unconditional branch sequence

Andrew and I independently noted the long unconditional branch sequence was
using the "call" pseudo op.  Technically it works, but it's a bit odd.  This
patch flips it to use the "jump" pseudo-op.

This was tested with a hacked-up local compiler which forced all 
branches/jumps
to be long jumps.  Naturally it triggered some failures for scan-asm tests 
but
no execution regressions (which is mostly what I was testing for).

I've updated the long branch support item in the RISE wiki to indicate that 
we
eventually want a register scavenging approach with a fallback to $ra in the
future so that we don't muck up the return address predictors.  It's not
super-high priority and shouldn't be terrible to implement given we've got 
the
$ra fallback when a suitable register can not be found.

gcc/
* config/riscv/riscv.md (jump): Adjust sequence to use a "jump"
pseudo op instead of a "call" pseudo op.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c26541bd5ce..23d91331290 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2924,7 +2924,7 @@ (define_insn "jump"
   /* Hopefully this does not happen often as this is going
  to clobber $ra and muck up the return stack predictors.  */
   if (get_attr_length (insn) == 8)
-return "call\t%l0";
+return "jump\t%l0,ra";
 
   return "j\t%l0";
 }


Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread Jeff Law




On 10/11/23 19:05, juzhe.zh...@rivai.ai wrote:

Plz revert it. It blocks development of all targets.
We have specific policies for reversion.  In general we want to give 
folks time to fix the problem rather than reverting, resubmitting, etc.


Mary, the issue is the doc changes are apparently dependent on specific 
versions of texinfo being installed resulting in this error:



../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector 
Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX Built-in 
Functions' differ
../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in 
Functions' is next for `CORE-V Built-in Functions' in menu but not in sectioning
../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector 
Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in 
sectioning
../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in 
Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector 
Intrinsics' differ
../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' 
lacks menu item for `CORE-V Built-in Functions' despite being its Up target
../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in 
Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V Vector 
Intrinsics' differ



Jeff


Re: [PATCH] RISCV: Bugfix for incorrect documentation heading nesting

2023-10-12 Thread Jeff Law




On 10/12/23 04:05, Mary Bennett wrote:

gcc/ChangeLog:
 * doc/extend.texi: Change subsubsection to subsection for
   CORE-V built-ins.

This is OK.  I'll commit it shortly it.

jeff


Re: [PATCH] RISCV: Bugfix for incorrect documentation heading nesting

2023-10-12 Thread Jeff Law




On 10/12/23 04:05, Mary Bennett wrote:

gcc/ChangeLog:
 * doc/extend.texi: Change subsubsection to subsection for
   CORE-V built-ins.
Thanks for jumping on it quickly.  I added the PR marker to the 
ChangeLog entry (bugzilla integration) and pushed this to the trunk.


jeff


Re: [PATCH] reg-notes.def: Fix up description of REG_NOALIAS

2023-10-12 Thread Jeff Law




On 10/12/23 03:41, Alex Coplan wrote:

Hi,

The description of the REG_NOALIAS note in reg-notes.def isn't quite
right. It describes it as being attached to call insns, but it is
instead attached to a move insn receiving the return value from a call.

This can be seen by looking at the code in calls.cc:expand_call which
attaches the note:

   emit_move_insn (temp, valreg);

   /* The return value from a malloc-like function cannot alias
  anything else.  */
   last = get_last_insn ();
   add_reg_note (last, REG_NOALIAS, temp);

Bootstrapped on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

 * reg-notes.def (NOALIAS): Correct comment.

OK
jeff


Re: Ping: [PATCH v2 1/2] testsuite: Add dg-require-atomic-cmpxchg-word

2023-10-12 Thread Jeff Law




On 10/12/23 08:38, Christophe Lyon wrote:

LGTM but I'm not a maintainer ;-)
LGTM to as well -- I usually try to stay out of libstdc++, but this 
looks simple enough.  Both patches in this series are OK.


jeff


Re: [PATCH v2] RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering

2023-10-12 Thread Jeff Law




On 10/12/23 07:06, Christoph Muellner wrote:

From: Christoph Müllner 

Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA")

A recent change broke the xtheadcondmov-indirect tests, because the order of
emitted instructions changed. Since the test is too strict when testing for
a fixed instruction order, let's change the tests to simply count instruction,
like it is done for similar tests.

Reported-by: Patrick O'Neill 
Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
instruction reordering.

OK for the trunk.

jeff


  1   2   3   4   5   6   7   8   9   10   >