Re: [PATCH v2 00/13] aarch64: CMPBR fixes

2025-08-11 Thread Richard Henderson
On 8/8/25 21:30, Richard Sandiford wrote: Richard Henderson writes: Version 1 regressed the expansion of atomics, which means the addition of CC clobber to all conditional branches is flawed. Version 2 goes the other way: remove CC clobber from all conditional branches. This requires the out

Re: [PATCH v2 13/13] aarch64: Fix condition accepted by movcc

2025-08-08 Thread Richard Henderson
On 8/8/25 21:21, Richard Sandiford wrote: +if (GET_MODE_CLASS (ccmode) == MODE_CC) + gcc_assert (XEXP (operands[1], 1) == const0_rtx); Sorry for the formatting nit, but: too much indentation. Whoops, incomplete removal of braces. :-) r~

Re: [PATCH v2 10/13] aarch64: Fix gcc.target/aarch64/cmpbr.c

2025-08-08 Thread Richard Henderson
On 8/8/25 21:08, Richard Sandiford wrote: Let's change it to: /* { dg-do assemble { target aarch64_asm_cmpbr_ok } } */ /* { dg-do compile { target { ! aarch64_asm_cmpbr_ok } } } */ That was the original plan, and is used extensively in other aarch64 tests. We changed it to use dg-do-if after Ri

Re: [PATCH v2 12/13] aarch64: CMPBR branches must be invertable

2025-08-08 Thread Richard Henderson
On 8/8/25 21:18, Richard Sandiford wrote: +(define_insn "*aarch64_cb" + [(set (pc) (if_then_else + (INT_CMP + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 + "" "r")) An alternative to adding a new code attri

Re: [PATCH v2 05/13] aarch64: Fix gcs save/restore_stack_nonlocal

2025-08-08 Thread Richard Henderson
On 8/8/25 20:39, Richard Sandiford wrote: Richard Henderson writes: The save/restore_stack_nonlocal patterns passed a DImode rtx to gen_tbranch_neqi3 for a QImode compare. The tbranch expander did not do what it said on the tin, that is: emit TBNZ. It only made it as far as AND+CMP+B.cond

[PATCH v2 12/13] aarch64: CMPBR branches must be invertable

2025-08-07 Thread Richard Henderson
Restrict the immediate range to the intersection of LT/GE and GT/LE so that cfglayout can invert the condition to redirect any branch. gcc: * config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the range of LT/GE and GT/LE to their intersections. * config/aarch64/aarch64.m

[PATCH v2 11/13] aarch64: Consider TARGET_CMPBR in rtx costs

2025-08-07 Thread Richard Henderson
gcc: * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Use aarch64_cb_rhs to match CB insns. --- gcc/config/aarch64/aarch64.cc | 9 + 1 file changed, 9 insertions(+) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 650da2ff95d..e7b17

[PATCH v2 10/13] aarch64: Fix gcc.target/aarch64/cmpbr.c

2025-08-07 Thread Richard Henderson
The enable for the test was wrong, so it never ran. gcc/testsuite: * gcc.target/aarch64/cmpbr.c: Use dg-require-effective-target. --- gcc/testsuite/gcc.target/aarch64/cmpbr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c b/gc

[PATCH v2 06/13] aarch64: Rename and improve aarch64_split_imm24

2025-08-07 Thread Richard Henderson
Two of the three uses of aarch64_imm24 included the important follow-up tests vs aarch64_move_imm and aarch64_plus_operand. Lack of the exclusion within aarch64_if_then_else_costs produced incorrect costing. Since aarch64_split_imm24 has already matched a non-negative CONST_INT, drill down from a

[PATCH v2 09/13] aarch64: Remove cc clobber from *aarch64_tbz1

2025-08-07 Thread Richard Henderson
There is a conflict between aarch64_tbzltdi1 and aarch64_cbltdi with respect to pnum_clobbers, resulting in a recog failure: 0xa1fffe fancy_abort(char const*, int, char const*) ../../gcc/diagnostics/context.cc:1640 0x81340e patch_jump_insn ../../gcc/cfgrtl.cc:1303 0xc0eafe redirect

[PATCH v2 02/13] aarch64: Remove an indentation level from aarch64_if_then_else_costs

2025-08-07 Thread Richard Henderson
gcc: * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Remove else after return and re-indent. --- gcc/config/aarch64/aarch64.cc | 52 +-- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/con

[PATCH v2 05/13] aarch64: Fix gcs save/restore_stack_nonlocal

2025-08-07 Thread Richard Henderson
The save/restore_stack_nonlocal patterns passed a DImode rtx to gen_tbranch_neqi3 for a QImode compare. The tbranch expander did not do what it said on the tin, that is: emit TBNZ. It only made it as far as AND+CMP+B.cond. But since we're seeding r16 with 1, GCSEnabled will clear the only set bit

[PATCH v2 03/13] aarch64: Reorg aarch64_if_the_else_costs, conditional branch

2025-08-07 Thread Richard Henderson
gcc: * config/aarch64/aarch64.cc (aarch64_if_the_else_costs): Reorg to include the cost of inner within TBZ sign-bit test, only match CBZ/CBNZ with valid modes, and both for the aarch64_imm24 test. --- gcc/config/aarch64/aarch64.cc | 55 ---

[PATCH v2 04/13] aarch64: Use aarch64_gen_compare_zero_and_branch in aarch64_restore_za

2025-08-07 Thread Richard Henderson
With -mtrack-speculation, the pattern that was directly expanded by aarch64_restore_za is disabled. Use the helper function instead. gcc: * config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): Export. * config/aarch64/aarch64-protos.h (aarch64_gen_compa

[PATCH v2 13/13] aarch64: Fix condition accepted by movcc

2025-08-07 Thread Richard Henderson
Reject QI/HImode conditions, which would require extension in order to compare. Fixes z.c:10:1: error: unrecognizable insn: 10 | } | ^ (insn 23 22 24 2 (set (reg:CC 66 cc) (compare:CC (reg:HI 128) (reg:HI 127))) "z.c":6:6 -1 (nil)) during RTL pass: vregs gcc:

[PATCH v2 07/13] aarch64: Fix aarch64_split_imm24 patterns

2025-08-07 Thread Richard Henderson
Both patterns used !reload_completed as a condition, which is questionable at best. The branch pattern failed to include a clobber of CC_REGNUM. Both problems were unlikely to trigger in practice, due to how the optimization pipeline is organized, but let's fix them anyway. gcc: * config

[PATCH v2 00/13] aarch64: CMPBR fixes

2025-08-07 Thread Richard Henderson
ll-designed helper function. I don't know if I'll get around to doing any of these, since I'm supposed to be working on 128-bit page tables for qemu. :-) r~ Richard Henderson (13): aarch64: Fix spelling of BRANCH_LEN_N_1KiB aarch64: Remove an indentation level from aarch64_if

[PATCH v2 08/13] aarch64: Disable TARGET_CMPBR with aarch64_track_speculation

2025-08-07 Thread Richard Henderson
With -mtrack-speculation, CC_REGNUM must be used at every conditional branch. gcc: * config/aarch64/aarch64.h (TARGET_CMPBR): False when aarch64_track_speculation is true. --- gcc/config/aarch64/aarch64.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc

[PATCH v2 01/13] aarch64: Fix spelling of BRANCH_LEN_N_1KiB

2025-08-07 Thread Richard Henderson
One kilobyte not one kilobit. gcc: * config/aarch64/aarch64.md (BRANCH_LEN_N_1KiB): Rename from BRANCH_LEN_N_1Kib. --- gcc/config/aarch64/aarch64.md | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/confi

Re: [PATCH v9 9/9] Update `cmpbr.c` tests

2025-08-07 Thread Richard Henderson
On 8/7/25 23:35, Karl Meakin wrote: I have updated the tests in `cmpbr.c` to reflect the fixes. There are a few regressions, but they can be fixed later; let's just make GCC crash-free first. I should have replied sooner. The patch set breaks atomics. I am re-working it not to do so and will

Re: [PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-06 Thread Richard Henderson
On 8/6/25 00:45, Karl Meakin wrote: Now that the body of `cbranch` and `cbranch` are the same, could we merge them into one rule? No, the bodies are the same but the predicates are not. r~

Re: [PATCH 3/8] aarch64: Drop cbranch4 expander

2025-08-05 Thread Richard Henderson
On 8/6/25 00:52, Karl Meakin wrote:   ;; Emit a `CBB (register)` or `CBH (register)` instruction. -(define_insn "aarch64_cb" +(define_insn "*aarch64_cb"     [(set (pc) (if_then_else (INT_CMP    (match_operand:SHORT 0 "register_operand" "r")    (match_operand:SHORT

Re: [PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-05 Thread Richard Henderson
On 8/5/25 19:43, Richard Henderson wrote: That said, I'm a little confused why we'd want to use SUBS+B.{EQ,NE} instead of SUB+CB{Z,NZ}. The answer to that is that B.{EQ,NE} converts easily to CSEL/CSINC/CSINV. r~

Re: [PATCH 8/8] aarch64: Use cc when CB/CBB/CBH is out-of-range

2025-08-05 Thread Richard Henderson
On 8/5/25 19:24, Richard Sandiford wrote: + output_asm_insn ("cmn\t%0, %1", operands); It looks like this should be "cmn\t%0, #%n1", since GAS "helpfully" converts cmn w0, #-1 to cmp w0, #1. Whoops, yes. r~

Re: [PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-05 Thread Richard Henderson
On 8/5/25 19:15, Richard Sandiford wrote: Should we also add a clobber to: ;; For a 24-bit immediate CST we can optimize the compare for equality ;; and branch sequence from: ;; mov x0, #imm1 ;; movkx0, #imm2, lsl 16 /* x0 contains CST. */ ;; cmp x1, x0 ;; b .Lab

[PATCH 8/8] aarch64: Use cc when CB/CBB/CBH is out-of-range

2025-08-04 Thread Richard Henderson
Middle distance branches between 1KiB and 1MiB may be implemented with cmp+branch instead of branch+branch. gcc: * config/aarch64/aarch64.cc (*aarch64_cb): Fall back to cmp/cmn + bcond if !far_branch. Adjust far_branch to 1MiB. (*aarch64_cb, operands[1])" { -

[PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-04 Thread Richard Henderson
Some of the compare-and-branch patterns rely on CC for scratch in some of the alternative expansions. This is fine, because when the combined compare-and-branch patterns are formed by combine, we will be eliminating a write to CC, so CC is dead anyway. Standardize on the cc clobber for all such p

[PATCH 7/8] aarch64: Consider TARGET_CMPBR in rtx costs

2025-08-04 Thread Richard Henderson
gcc: * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Use aarch64_cb_rhs to match CB insns. --- gcc/config/aarch64/aarch64.cc | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index ff9243ea732..df01ef3fe

[PATCH 5/8] aarch64: Use aarch64_gen_compare_zero_and_branch in aarch64_restore_za

2025-08-04 Thread Richard Henderson
Do not directly expand a pattern from aarch64.md; use the helper function provided. gcc: * config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): Export. * config/aarch64/aarch64-protos.h (aarch64_gen_compare_zero_and_branch): Declare it. * config/

[PATCH 4/8] aarch64: Disable TARGET_CMPBR with aarch64_track_speculation

2025-08-04 Thread Richard Henderson
With -mtrack-speculation, CC_REGNUM must be used at every conditional branch. gcc: * config/aarch64/aarch64.h (TARGET_CMPBR): False when aarch64_track_speculation is true. --- gcc/config/aarch64/aarch64.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc

[PATCH 2/8] aarch64: Fix spelling of BRANCH_LEN_N_1KiB

2025-08-04 Thread Richard Henderson
One kilobyte not one kilobit. gcc: * config/aarch64/aarch64.md (BRANCH_LEN_N_1KiB): Rename from BRANCH_LEN_N_1Kib. --- gcc/config/aarch64/aarch64.md | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/confi

[PATCH 3/8] aarch64: Drop cbranch4 expander

2025-08-04 Thread Richard Henderson
If we implement bare QI/HImode cbranch, movcc will ask aarch64_gen_compare_reg for a QI/HImode compare, which we cannot provide without modification elsewhere. However, we can usually get the extensions for free from surrounding operations. So e.g. CBcond in SImode is more generally compact than

[PATCH 1/8] aarch64: Drop label format argument from aarch64_gen_far_branch

2025-08-04 Thread Richard Henderson
There's no need for each branch-over-branch to choose its own label format. gcc: * config/aarch64/aarch64.cc (aarch64_gen_far_branch): Drop dest argument; always use "L". * config/aarch64/aarch64.md: Update to match. * config/aarch64/aarch64-protos.h: Update to matc

[PATCH 0/8] aarch64: CMPBR fixes

2025-08-04 Thread Richard Henderson
overse-n1, so I don't seem to have broken anything obvious. r~ Richard Henderson (8): aarch64: Drop label format argument from aarch64_gen_far_branch aarch64: Fix spelling of BRANCH_LEN_N_1KiB aarch64: Drop cbranch4 expander aarch64: Disable TARGET_CMPBR with aarch64_track_specul

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-10 Thread Richard Henderson via Gcc-patches
On 8/10/23 02:50, Wilco Dijkstra wrote: Hi Richard, Why would HWCAP_USCAT not be set by the kernel? Failing that, I would think you would check ID_AA64MMFR2_EL1.AT. Answering my own question, N1 does not officially have FEAT_LSE2. It doesn't indeed. However most cores support atomic 128-bi

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches
On 8/9/23 19:11, Richard Henderson wrote: On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote: +#ifdef HWCAP_USCAT + +#define MIDR_IMPLEMENTOR(midr)    (((midr) >> 24) & 255) +#define MIDR_PARTNUM(midr)    (((midr) >> 4) & 0xfff) + +static inline bool +ifunc1 (unsigned

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches
On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote: +#ifdef HWCAP_USCAT + +#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255) +#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff) + +static inline bool +ifunc1 (unsigned long hwcap) +{ + if (hwcap & HWCAP_USCAT) +return true; + if (!

[PATCH] MAINTAINERS: Update my email address.

2022-04-19 Thread Richard Henderson via Gcc-patches
2022-04-19 Richard Henderson * MAINTAINERS: Update my email address. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 30f81b3dd52..15973503722 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -53,7 +53,7 @@ aarch64 port

Re: [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags

2020-04-09 Thread Richard Henderson
On 4/9/20 2:52 PM, Segher Boessenkool wrote: > Hi! > > On Thu, Apr 02, 2020 at 11:53:47AM -0700, Richard Henderson wrote: >> The rtl description of signed/unsigned overflow from subtract >> was fine, as far as it goes -- we have CC_Cmode and CC_Vmode >> that indicate

[PATCH v4 11/12] aarch64: Accept 0 as first argument to compares

2020-04-09 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses , cmp (shifted register) uses . So we can perform cmp xzr, x0. For ccmp, we only have as an input. * config/aarch64/aarch64.md (cmp): For operand 0, use aarch64_reg_or_zero. Shuffle reg/reg to last alternative and a

[PATCH v4 12/12] aarch64: Implement TImode comparisons

2020-04-09 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64-modes.def (CC_NV): New. * config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of the comparisons for TImode, not just NE. (aarch64_select_cc_mode): Recognize cmp_carryin. (aarch64_get_condition_code_1): Handle CC_NVmode.

[PATCH v4 07/12] aarch64: Rename CC_ADCmode to CC_NOTCmode

2020-04-09 Thread Richard Henderson via Gcc-patches
We are about to use !C in more contexts than add-with-carry. Choose a more generic name. * config/aarch64/aarch64-modes.def (CC_NOTC): Rename CC_ADC. * config/aarch64/aarch64.c (aarch64_select_cc_mode): Update. (aarch64_get_condition_code_1): Likewise. * config/aarc

[PATCH v4 10/12] aarch64: Adjust result of aarch64_gen_compare_reg

2020-04-09 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg. This will allow the routine to adjust the comparison code as needed for TImode comparisons. Note that some users were passing e.g. EQ to aarch64_gen_compare_reg and then using gen_rtx_NE. Pass the proper code in the first place.

[PATCH v4 01/12] aarch64: Provide expander for sub3_compare1

2020-04-09 Thread Richard Henderson via Gcc-patches
In one place we open-code a special case of this pattern into the more specific sub3_compare1_imm, and miss this special case in other places. Centralize that special case into an expander. * config/aarch64/aarch64.md (*sub3_compare1): Rename from sub3_compare1. (sub3_comp

[PATCH v4 09/12] aarch64: Use CC_NOTCmode for double-word subtract

2020-04-09 Thread Richard Henderson via Gcc-patches
We have been using CCmode, which is not correct for this case. Mirror the same code from the arm target. * config/aarch64/aarch64.c (aarch64_select_cc_mode): Recognize usub*_carryinC patterns. * config/aarch64/aarch64.md (usubvti4): Use CC_NOTC. (usub3_carryinC): Li

[PATCH v4 06/12] aarch64: Introduce aarch64_expand_addsubti

2020-04-09 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all addition and subtraction, modulo, signed or unsigned overflow. Use expand_insn to put the operands into the proper form, and do not force values into register if not required. * config/aarch64/aarch64.c (aarch64_ti_split) New.

[PATCH v4 08/12] arm: Merge CC_ADC and CC_B to CC_NOTC

2020-04-09 Thread Richard Henderson via Gcc-patches
These CC_MODEs are identical, merge them into a more generic name. * config/arm/arm-modes.def (CC_NOTC): New. (CC_ADC, CC_B): Remove. * config/arm/arm.c (arm_select_cc_mode): Update to match. (arm_gen_dicompare_reg): Likewise. (maybe_get_arm_condition_code):

[PATCH v4 05/12] aarch64: Improvements to aarch64_select_cc_mode from arm

2020-04-09 Thread Richard Henderson via Gcc-patches
The arm target has some improvements over aarch64 for double-word arithmetic and comparisons. * config/aarch64/aarch64.c (aarch64_select_cc_mode): Check for swapped operands to CC_Cmode; check for zero_extend to CC_ADCmode; check for swapped operands to CC_Vmode. --- gcc/c

[PATCH v4 03/12] aarch64: Add cset, csetm, cinc patterns for carry/borrow

2020-04-09 Thread Richard Henderson via Gcc-patches
Some implementations have a higher cost for the csel insn (and its specializations) than they do for adc/sbc. * config/aarch64/aarch64.md (*cstore_carry): New. (*cstoresi_carry_uxtw): New. (*cstore_borrow): New. (*cstoresi_borrow_uxtw): New. (*csinc2_carry):

[PATCH v4 04/12] aarch64: Add const_dword_umaxp1

2020-04-09 Thread Richard Henderson via Gcc-patches
Rather than duplicating the rather verbose integral test, pull it out to a predicate. * config/aarch64/predicates.md (const_dword_umaxp1): New. * config/aarch64/aarch64.c (aarch64_select_cc_mode): Use it. * config/aarch64/aarch64.md (add*add3_carryinC): Likewise. (*

[PATCH v4 00/12] aarch64: Implement TImode comparisons

2020-04-09 Thread Richard Henderson via Gcc-patches
ven't put enough thought into the problem. r~ Richard Henderson (12): aarch64: Provide expander for sub3_compare1 aarch64: Match add3_carryin expander and insn aarch64: Add cset, csetm, cinc patterns for carry/borrow aarch64: Add const_dword_umaxp1 aarch64: Impro

[PATCH v4 02/12] aarch64: Match add3_carryin expander and insn

2020-04-09 Thread Richard Henderson via Gcc-patches
The expander and insn predicates do not match, which can lead to insn recognition errors. * config/aarch64/aarch64.md (add3_carryin): Use register_operand instead of aarch64_reg_or_zero. --- gcc/config/aarch64/aarch64.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) d

Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson via Gcc-patches
On 4/7/20 4:58 PM, Segher Boessenkool wrote: >> I wonder if it would be helpful to have >> >> (uoverflow_plus x y carry) >> (soverflow_plus x y carry) >> >> etc. > > Those have three operands, which is nasty to express. How so? It's a perfectly natural operation. > On rs6000 we have the car

Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson
On 4/7/20 1:27 PM, Segher Boessenkool wrote: > On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote: >> The reason I'm not keen on using special modes for this case is that >> they'd describe one way in which the result can be used rather than >> describing what the instruction actuall

Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson via Gcc-patches
On 4/7/20 9:32 AM, Richard Sandiford wrote: > It's not really reversibility that I'm after (at least not for its > own sake). > > If we had a three-input compare_cc rtx_code that described a comparison > involving a carry input, we'd certainly be using it here, because that's > what the instructio

Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson
On 4/2/20 11:53 AM, Richard Henderson via Gcc-patches wrote: > This is attacking case 3 of PR 94174. > > In v2, I unify the various subtract-with-borrow and add-with-carry > patterns that also output flags with unspecs. As suggested by > Richard Sandiford during review of v1

[PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg

2020-04-02 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg. This will allow the routine to adjust the comparison code as needed for TImode comparisons. Note that some users were passing e.g. EQ to aarch64_gen_compare_reg and then using gen_rtx_NE. Pass the proper code in the first place.

[PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti

2020-04-02 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all addition and subtraction, modulo, signed or unsigned overflow. Use expand_insn to put the operands into the proper form, and do not force values into register if not required. * config/aarch64/aarch64.c (aarch64_ti_split) New.

[PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags

2020-04-02 Thread Richard Henderson via Gcc-patches
The rtl description of signed/unsigned overflow from subtract was fine, as far as it goes -- we have CC_Cmode and CC_Vmode that indicate that only those particular bits are valid. However, it's not clear how to extend that description to handle signed comparison, where N == V (GE) N != V (LT) are

[PATCH v2 07/11] aarch64: Remove CC_ADCmode

2020-04-02 Thread Richard Henderson via Gcc-patches
Now that we're using UNSPEC_ADCS instead of rtl, there's no reason to distinguish CC_ADCmode from CC_Cmode. Both examine only the C bit. Within uaddvti4, using CC_Cmode is clearer, since it's the carry-outthat's relevant. * config/aarch64/aarch64-modes.def (CC_ADC): Remove. * con

[PATCH v2 11/11] aarch64: Implement absti2

2020-04-02 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64.md (absti2): New. --- gcc/config/aarch64/aarch64.md | 29 + 1 file changed, 29 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index cf716f815a1..4a30d4cca93 100644 --- a/gcc/config/aarch64/aarch

[PATCH v2 08/11] aarch64: Accept -1 as second argument to add3_carryin

2020-04-02 Thread Richard Henderson via Gcc-patches
* config/aarch64/predicates.md (aarch64_reg_or_minus1): New. * config/aarch64/aarch64.md (add3_carryin): Use it. (*add3_carryin): Likewise. (*addsi3_carryin_uxtw): Likewise. --- gcc/config/aarch64/aarch64.md| 26 +++--- gcc/config/aarch64/pre

[PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry + output flags

2020-04-02 Thread Richard Henderson via Gcc-patches
Similar to UNSPEC_SBCS, we can unify the signed/unsigned overflow paths by using an unspec. Accept -1 for the second input by using SBCS. * config/aarch64/aarch64.md (UNSPEC_ADCS): New. (addvti4, uaddvti4): Use adddi_carryin_cmp. (add3_carryinC): Remove. (*add3_car

[PATCH v2 10/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless. * config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of the comparisons for TImode, not just NE. * config/aarch64/aarch64.md (cbranchti4, cstoreti4): New. --- gcc/config/aarch64/aarch64.c | 122 +++

[PATCH v2 01/11] aarch64: Accept 0 as first argument to compares

2020-04-02 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses , cmp (shifted register) uses . So we can perform cmp xzr, x0. For ccmp, we only have as an input. * config/aarch64/aarch64.md (cmp): For operand 0, use aarch64_reg_or_zero. Shuffle reg/reg to last alternative and a

[PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174. In v2, I unify the various subtract-with-borrow and add-with-carry patterns that also output flags with unspecs. As suggested by Richard Sandiford during review of v1. It does seem cleaner. r~ Richard Henderson (11): aarch64: Accept 0 as first

[PATCH v2 02/11] aarch64: Accept zeros in add3_carryin

2020-04-02 Thread Richard Henderson via Gcc-patches
The expander and the insn pattern did not match, leading to recognition failures in expand. * config/aarch64/aarch64.md (*add3_carryin): Accept zeros. --- gcc/config/aarch64/aarch64.md | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64.

[PATCH v2 03/11] aarch64: Provide expander for sub3_compare1

2020-04-02 Thread Richard Henderson via Gcc-patches
In one place we open-code a special case of this pattern into the more specific sub3_compare1_imm, and miss this special case in other places. Centralize that special case into an expander. * config/aarch64/aarch64.md (*sub3_compare1): Rename from sub3_compare1. (sub3_comp

Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-04-01 Thread Richard Henderson via Gcc-patches
On 4/1/20 9:28 AM, Richard Sandiford wrote: > How important is it to describe the flags operation as a compare though? > Could we instead use an unspec with three inputs, and keep it as :CC? > That would still allow special-case matching for zero operands. I'm not sure. My guess is that the only

Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-31 Thread Richard Henderson via Gcc-patches
On 3/31/20 11:34 AM, Richard Sandiford wrote: >> +(define_insn "*cmp3_carryinC" >> + [(set (reg:CC CC_REGNUM) >> +(compare:CC >> + (ANY_EXTEND: >> +(match_operand:GPI 0 "register_operand" "r")) >> + (plus: >> +(ANY_EXTEND: >> + (match_operand:GPI 1 "register_

Re: [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares

2020-03-31 Thread Richard Henderson via Gcc-patches
On 3/31/20 9:55 AM, Richard Sandiford wrote: >> (define_insn "cmp" >>[(set (reg:CC CC_REGNUM) >> -(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk") >> -(match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))] >> +(compare:CC (match_operand:GPI 0 "aarch64_re

Re: [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-22 Thread Richard Henderson
On 3/22/20 2:55 PM, Segher Boessenkool wrote: > Maybe this stuff would be simpler (and more obviously correct) if it > was more explicit CC_REGNUM is a fixed register, and the code would use > it directly everywhere? Indeed the biggest issue I have in this patch is what CC_MODE to expose from the

Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-22 Thread Richard Henderson via Gcc-patches
On 3/22/20 12:30 PM, Segher Boessenkool wrote: > Hi! > > On Fri, Mar 20, 2020 at 07:42:25PM -0700, Richard Henderson via Gcc-patches > wrote: >> Duplicate all usub_*_carryinC, but use xzr for the output when we >> only require the flags output. The signed versions use s

[PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-20 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg. This will allow the routine to adjust the comparison code as needed for TImode comparisons. Note that some users were passing e.g. EQ to aarch64_gen_compare_reg and then using gen_rtx_NE. Pass the proper code in the first place.

[PATCH v2 8/9] aarch64: Implement TImode comparisons

2020-03-20 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless. * config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of the comparisons for TImode, not just NE. * config/aarch64/aarch64.md (cbranchti4, cstoreti4): New. --- gcc/config/aarch64/aarch64.c | 130 +++

[PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti

2020-03-20 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all addition and subtraction, modulo, signed or unsigned overflow. Use expand_insn to put the operands into the proper form, and do not force values into register if not required. * config/aarch64/aarch64.c (aarch64_ti_split) New.

[PATCH v2 1/9] aarch64: Accept 0 as first argument to compares

2020-03-20 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses , cmp (shifted register) uses . So we can perform cmp xzr, x0. For ccmp, we only have as an input. * config/aarch64/aarch64.md (cmp): For operand 0, use aarch64_reg_or_zero. Shuffle reg/reg to last alternative and a

[PATCH v2 9/9] aarch64: Implement absti2

2020-03-20 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64.md (absti2): New. --- gcc/config/aarch64/aarch64.md | 30 ++ 1 file changed, 30 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 284a8038e28..7a112f89487 100644 --- a/gcc/config/aarch64/aarc

[PATCH v2 5/9] aarch64: Provide expander for sub3_compare1

2020-03-20 Thread Richard Henderson via Gcc-patches
In a couple of places we open-code a special case of this pattern into the more specific sub3_compare1_imm. Centralize that special case into an expander. * config/aarch64/aarch64.md (*sub3_compare1): Rename from sub3_compare1. (sub3_compare1): New expander. --- gcc/config

[PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-20 Thread Richard Henderson via Gcc-patches
Duplicate all usub_*_carryinC, but use xzr for the output when we only require the flags output. The signed versions use sign_extend instead of zero_extend for combine's benefit. These will be used shortly for TImode comparisons. * config/aarch64/aarch64.md (cmp3_carryinC): New.

[PATCH v2 4/9] aarch64: Add cmp_carryinC_m2

2020-03-20 Thread Richard Henderson via Gcc-patches
Combine will fold immediate -1 differently than the other *cmp*_carryinC* patterns. In this case we can use adcs with an xzr input, and it occurs frequently when comparing 128-bit values to small negative constants. * config/aarch64/aarch64.md (cmp_carryinC_m2): New. --- gcc/config/aarch

[PATCH v2 2/9] aarch64: Accept zeros in add3_carryin

2020-03-20 Thread Richard Henderson via Gcc-patches
The expander and the insn pattern did not match, leading to recognition failures in expand. * config/aarch64/aarch64.md (*add3_carryin): Accept zeros. --- gcc/config/aarch64/aarch64.md | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64.

[PATCH v2 0/9] aarch64: Implement TImode comparisons

2020-03-20 Thread Richard Henderson via Gcc-patches
! .L1: ret .p2align 2,,3 - .L6: - bne .L4 - cmp x0, 1 - bhi .L1 .L4: b doit --- 11,19 subsx0, x0, x2 sbc x1, x1, xzr ! cmp x0, 2 ! sbcsxzr, x1, xzr ! blt .L4 ret .

Re: [PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Richard Henderson via Gcc-patches
On 3/19/20 8:47 AM, Wilco Dijkstra wrote: > Hi Richard, > > Thanks for these patches - yes TI mode expansions can certainly be improved! > So looking at your expansions for signed compares, why not copy the optimal > sequence from 32-bit Arm? > > Any compare can be done in at most 2 instructions:

[PATCH 4/6] aarch64: Simplify @ccmp operands

2020-03-18 Thread Richard Henderson via Gcc-patches
The first two arguments were "reversed", in that operand 0 was not the output, but the input cc_reg. Remove operand 0 entirely, since we can get the input cc_reg from within the operand 3 comparison expression. This moves the output operand to index 0. * config/aarch64/aarch64.md (@ccmpc

[PATCH 2/6] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-18 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg. This will allow the routine to adjust the comparison code as needed for TImode comparisons. Note that some users were passing e.g. EQ to aarch64_gen_compare_reg and then using gen_rtx_NE. Pass the proper code in the first place.

[PATCH 6/6] aarch64: Implement TImode comparisons

2020-03-18 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless. * config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of the comparisons for TImode, not just NE. * config/aarch64/aarch64.md (cbranchti4, cstoreti4): New. --- gcc/config/aarch64/aarch64.c | 182 +++

[PATCH 5/6] aarch64: Improve nzcv argument to ccmp

2020-03-18 Thread Richard Henderson via Gcc-patches
Currently we use %k to interpret an aarch64_cond_code value. This interpretation is done via an array, aarch64_nzcv_codes. The rtl is neither hindered nor harmed by using the proper nzcv value itself, so index the array earlier than later. This makes it easier to compare the rtl to the assembly. I

[PATCH 1/6] aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC

2020-03-18 Thread Richard Henderson via Gcc-patches
Use xzr for the output when we only require the flags output. This will be used shortly for TImode comparisons. * config/aarch64/aarch64.md (ucmp3_carryinC): New. (*ucmp3_carryinC_z1): New. (*ucmp3_carryinC_z2): New. (*ucmp3_carryinC): New. --- gcc/config/aarch64/a

[PATCH 3/6] aarch64: Accept 0 as first argument to compares

2020-03-18 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses , cmp (shifted register) uses . So we can perform cmp xzr, x0. For ccmp, we only have as an input. * config/aarch64/aarch64.md (cmp): For operand 0, use aarch64_reg_or_zero. Shuffle reg/reg to last alternative and a

[PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-18 Thread Richard Henderson via Gcc-patches
b.ne20 // b.any + 1c: d65f03c0ret 20: 1400b 0 r~ Richard Henderson (6): aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC aarch64: Adjust result of aarch64_gen_compare_reg aarch64: Accept 0 as first argument to compares aarch

[arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-19 Thread Richard Henderson
I'm not sure what happened to v2. I can see it in my sent email, but it never made it to the mailing list, and possibly not to Richard E. either. So resending, with an extra testsuite fix for ilp32, spotted by Christophe. Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in

Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Richard Henderson
On 11/19/19 9:29 AM, Christophe Lyon wrote: > On Mon, 18 Nov 2019 at 20:54, Richard Henderson > wrote: >> >> On 11/18/19 1:30 PM, Christophe Lyon wrote: >>> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute >>> when compiling with -mab

Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-18 Thread Richard Henderson
On 11/18/19 1:30 PM, Christophe Lyon wrote: > I'm sorry to notice that the last test (asm-flag-6.c) fails to execute > when compiling with -mabi=ilp32. I have less details than for Arm, > because here I'm using the Foundation Model as simulator instead of > Qemu. In addition, I'm using an old versi

Re: [PATCH v2 5/6] arm: Add testsuite checks for asm-flag

2019-11-18 Thread Richard Henderson
On 11/18/19 1:25 PM, Christophe Lyon wrote: > Hi Richard > > On Thu, 14 Nov 2019 at 11:08, Richard Henderson > wrote: >> >> Inspired by the tests in gcc.target/i386. Testing code generation, >> diagnostics, and execution. >> >>

[arm] Follow up for asm-flags vs thumb1

2019-11-14 Thread Richard Henderson
What I committed today does in fact ICE for thumb1, as you suspected. I'm currently testing the following vs arm-sim/ arm-sim/-mthumb arm-sim/-mcpu=cortex-a15/-mthumb. which, with the default cpu for arm-elf-eabi, should test all of arm, thumb1, thumb2. I'm not thrilled about the ifdef in

Re: [PATCH v2 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-14 Thread Richard Henderson
On 11/14/19 3:48 PM, Richard Earnshaw (lists) wrote: > On 14/11/2019 10:07, Richard Henderson wrote: >> Since all but a couple of lines is shared between the two targets, >> enable them both at once. >> >> * config/arm/aarch-common-protos.h (arm_md_asm_adjust): D

Re: [PATCH v2 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-14 Thread Richard Henderson
On 11/14/19 3:39 PM, Richard Earnshaw (lists) wrote: > Not had a chance to look at this in detail, but I don't see any support for > > 1) Thumb1 where we do not expose the condition codes at all > 2) Thumb2 where we need IT instructions along-side the conditional > instructions > themselves. > >

Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-14 Thread Richard Henderson
On 11/13/19 8:35 PM, Jeff Law wrote: > On 11/13/19 6:04 AM, Bernd Schmidt wrote: >> The cc0 machinery allows for eliminating unnecessary comparisons by >> examining the effect instructions have on the flags registers. I have >> replicated that mechanism with a relatively modest amount of code based

Re: [PATCH v2 0/6] Implement asm flag outputs for arm + aarch64

2019-11-14 Thread Richard Henderson
On 11/14/19 2:08 PM, Kyrill Tkachov wrote: > Hi Richard, > > On 11/14/19 10:07 AM, Richard Henderson wrote: >> I've put the implementation into config/arm/aarch-common.c, so >> that it can be shared between the two targets.  This required >> a little bit of cleanup

  1   2   3   4   5   6   7   8   9   10   >