date:20231108

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-08 Thread Hongtao Liu

On Wed, Nov 8, 2023 at 3:53 PM Richard Biener
 wrote:
>
> On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu  wrote:
> >
> > On Tue, Nov 7, 2023 at 10:34 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu  wrote:
> > > >
> > > > On Tue, Nov 7, 2023 at 4:10 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt  wrote:
> > > > > >
> > > > > > analyze_and_compute_bitop_with_inv_effect assumes the first operand 
> > > > > > is
> > > > > > loop invariant which is not the case when it's INTEGER_CST.
> > > > > >
> > > > > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}.
> > > > > > Ok for trunk?
> > > > >
> > > > > So this addresses a missed optimization, right?  It seems to me that
> > > > > even with two SSA names we are only "lucky" when rhs1 is the invariant
> > > > > one.  So instead of swapping this way I'd do
> > > > Yes, it's a miss optimization.
> > > > And I think expr_invariant_in_loop_p (loop, match_op[1]) should be
> > > > enough, if match_op[1] is a loop invariant.it must be false for the
> > > > below conditions(there couldn't be any header_phi from its
> > > > definition).
> > >
> > > Yes, all I said is that when you now care for op1 being INTEGER_CST
> > > it could also be an invariant SSA name and thus only after swapping 
> > > op0/op1
> > > we could have a successful match, no?
> > Sorry, the commit message is a little bit misleading.
> > At first, I just wanted to handle the INTEGER_CST case (with TREE_CODE
> > (match_op[1]) == INTEGER_CST), but then I realized that this could
> > probably be extended to the normal SSA_NAME case as well, so I used
> > expr_invariant_in_loop_p, which should theoretically be able to handle
> > the SSA_NAME case as well.
> >
> > if (expr_invariant_in_loop_p (loop, match_op[1])) is true, w/o
> > swapping it must return NULL_TREE for below conditions.
> > if (expr_invariant_in_loop_p (loop, match_op[1])) is false, w/
> > swapping it must return NULL_TREE too.
> > So it can cover the both cases you mentioned, no need for a loop to
> > iterate 2 match_ops for all conditions.
>
> Sorry if it appears we're going in circles ;)
>
> > 3692  if (TREE_CODE (match_op[1]) != SSA_NAME
> > 3693  || !expr_invariant_in_loop_p (loop, match_op[0])
> > 3694  || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT 
> > (match_op[1])))
>
> but this only checks match_op[1] (an SSA name at this point) for being defined
> by the header PHI.  What if expr_invariant_in_loop_p (loop, mach_op[1])
> and header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[0]))
> which I think can happen when both ops are SSA name?
The whole condition is like

3692  if (TREE_CODE (match_op[1]) != SSA_NAME
3693  || !expr_invariant_in_loop_p (loop, match_op[0])
3694  || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[1])))
3695  || gimple_bb (header_phi) != loop->header  - This would
be true if match_op[1] is SSA_NAME and expr_invariant_in_loop_p
3696  || gimple_phi_num_args (header_phi) != 2)

If expr_invariant_in_loop_p (loop, mach_op[1]) is true and it's an SSA_NAME
according to code in expr_invariant_in_loop_p, def_bb of gphi is
either NULL or not belong to this loop, either case will make will
make gimple_bb (header_phi) != loop->header true.

1857  if (TREE_CODE (expr) == SSA_NAME)
1858{
1859  def_bb = gimple_bb (SSA_NAME_DEF_STMT (expr));
1860  if (def_bb
1861  && flow_bb_inside_loop_p (loop, def_bb))  -- def_bb is
NULL or it doesn't belong to the loop
1862return false;
1863
1864  return true;
1865}
1866
1867  if (!EXPR_P (expr))

>
> The only canonicalization we have is that constant operands are put second so
> it would have been more natural to write the matching with the other operand
> order (but likely you'd have been unlucky for the existing testcases then).
>
> > 3695  || gimple_bb (header_phi) != loop->header
> > 3696  || gimple_phi_num_args (header_phi) != 2)
> > 3697return NULL_TREE;
> > 3698
> > 3699  if (PHI_ARG_DEF_FROM_EDGE (header_phi, loop_latch_edge (loop)) != 
> > phidef)
> > 3700return NULL_TREE;
> >
> >
> > >
> > > > >
> > > > >  unsigned i;
> > > > >  for (i = 0; i < 2; ++i)
> > > > >if (TREE_CODE (match_op[i]) == SSA_NAME
> > > > >&& ...)
> > > > > break; /* found! */
> > > > >
> > > > >   if (i == 2)
> > > > > return NULL_TREE;
> > > > >   if (i == 0)
> > > > > std::swap (match_op[0], match_op[1]);
> > > > >
> > > > > to also handle a "swapped" pair of SSA names?
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR tree-optimization/105735
> > > > > > PR tree-optimization/111972
> > > > > > * tree-scalar-evolution.cc
> > > > > > (analyze_and_compute_bitop_with_inv_effect): Handle bitop 
> > > > > > with
> > > > > > INTEGER_CST.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/pr105735-3.c: New test.
> > > >

Re: [PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-08 Thread Xi Ruoyao

On Wed, 2023-11-08 at 09:49 +0800, chenglulu wrote:
> 
> 在 2023/11/6 下午7:36, Xi Ruoyao 写道:
> > This is isomorphic to the LLVM changes [1-2].
> > 
> > On LoongArch, the LL and SC instructions has memory barrier semantics:
> > 
> > - LL:  + 
> > - SC:  + 
> > 
> > But the compare and swap operation is allowed to fail, and if it fails
> > the SC instruction is not executed, thus the guarantee of acquiring
> > semantics cannot be ensured. Therefore, an acquire barrier needs to be
> > generated when failure_memorder includes an acquire operation.
> > 
> > On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
> > acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
> > barrier.  So it's always enough for acquire semantics.  OTOH if an
> > acquire semantic is not needed, we still needs the "dbar 0x700" as the
> > load-load barrier like all LL-SC loops.
> 
> I don't think there's a problem with the logic. I'm also working on 
> correcting the content of the atomic functions now, and I'm doing a 
> correctness test, including this modification, and I'll email you back
> after the correctness test is completed.

Ok.  I'd like to note that we now have only 10 days before GCC 14 stage
1 ends, so we'd be better hurry.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general

2023-11-08 Thread Lehua Ding


Hi Richard,

Thanks for taking the time to review the code.

On 2023/11/8 15:57, Richard Biener wrote:

On Wed, Nov 8, 2023 at 4:48 AM Lehua Ding  wrote:


This patch does not make any functional changes. It mainly refactor two parts:

1. The ira_allocno's objects field is expanded to an scalable array, and 
multi-word
pseduo registers are split and tracked only when necessary.
2. Since the objects array has been expanded, there will be more subreg objects
that pass through later, rather than the previous fixed two. Therefore, it
is necessary to modify the detection of whether two objects conflict, and
the check method is to pull back the registers occupied by the object to
the first register of the allocno for judgment.


Did you profile this before/after?  RA performance is critical ...


Based on the data I ran earlier, the performance changes on spec2017 
were very slight. I'll run again and give you the data.Based on my 
expectations, the impact on existing performance should all be minimal. 
Except for examples like the ones I put up.



diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
  return !operator== (other);
}

+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const


This is a quite costly operation, why do we need it instead
of keeping an "offset" for set queries?


Because there are logic operations after the shift. For a mutil hardreg 
pseudo register, it will record the physical registers of each part of 
the conflict, and different parts of the offset are different, and we 
need to unify these differences to the conflict against the first single 
reg of the pseduo register. That is to say, first we need to convert it 
to a conflict against the first_single_reg, and then we need to collect 
all the conflicting registers (by OR operation). like this:


*start_conflict_regs |= OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) >> 
(OBJECT_START (obj) + j)



+/* Return the object in allocno A which match START & NREGS.  */
+ira_object_t
+find_object (ira_allocno_t a, int start, int nregs)
+{
+  for (ira_object_t obj : a->objects)


linear search?  really?


I was thinking about the fact that most allocno's have only one objects, 
and most of the others don't have more than 10, so I chose this easiest 
way to find them. Thanks for the heads up, it's really not very good 
here, I'll see if there's a faster way.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

[PATCH] gcc.dg/Wmissing-parameter-type*: Test the intended warning

2023-11-08 Thread Florian Weimer

gcc/testsuite/ChangeLog:

* gcc.dg/Wmissing-parameter-type.c: Build with -std=gnu89
to trigger the -Wmissing-parameter-type warning
and not the default -Wimplicit warning.  Also match
against -Wmissing-parameter-type.
* gcc.dg/Wmissing-parameter-type.c: Likewise.

---
 gcc/testsuite/gcc.dg/Wmissing-parameter-type-Wextra.c | 4 ++--
 gcc/testsuite/gcc.dg/Wmissing-parameter-type.c| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/Wmissing-parameter-type-Wextra.c 
b/gcc/testsuite/gcc.dg/Wmissing-parameter-type-Wextra.c
index 37e1a571bda..2cd28a2ecd1 100644
--- a/gcc/testsuite/gcc.dg/Wmissing-parameter-type-Wextra.c
+++ b/gcc/testsuite/gcc.dg/Wmissing-parameter-type-Wextra.c
@@ -1,7 +1,7 @@
 /* Test -Wmissing-parameter-type is enabled by -Wextra */
 /* { dg-do compile } */
-/* { dg-options "-Wextra" } */
+/* { dg-options "-std=gnu89 -Wextra" } */
 
-int foo(bar) { return bar;} /* { dg-warning "type of 'bar' defaults to 'int'" 
} */
+int foo(bar) { return bar;} /* { dg-warning "type of 'bar' defaults to 'int' 
\\\[-Wmissing-parameter-type\\\]" } */
 
 
diff --git a/gcc/testsuite/gcc.dg/Wmissing-parameter-type.c 
b/gcc/testsuite/gcc.dg/Wmissing-parameter-type.c
index 8ec94e2caf7..b25e8d21602 100644
--- a/gcc/testsuite/gcc.dg/Wmissing-parameter-type.c
+++ b/gcc/testsuite/gcc.dg/Wmissing-parameter-type.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-Wmissing-parameter-type" } */
+/* { dg-options "-std=gnu89 -Wmissing-parameter-type" } */
 
-int foo(bar) { return bar; } /* { dg-warning "type of 'bar' defaults to 'int'" 
} */
+int foo(bar) { return bar; } /* { dg-warning "type of 'bar' defaults to 'int' 
\\\[-Wmissing-parameter-type\\\]" } */
 
 

base-commit: e9107464bb24f77038ad042ba858abed4ca060c0

Re: [PATCH 1/3] tree-ssa-sink: do not sink to in front of setjmp

2023-11-08 Thread Florian Weimer

* Alexander Monakov via Gcc-patches:

> diff --git a/gcc/testsuite/gcc.dg/setjmp-7.c b/gcc/testsuite/gcc.dg/setjmp-7.c
> new file mode 100644
> index 0..44b5bcbfa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/setjmp-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-guess-branch-probability -w" } */
> +/* { dg-require-effective-target indirect_jumps } */
> +
> +struct __jmp_buf_tag { };
> +typedef struct __jmp_buf_tag jmp_buf[1];
> +struct globals { jmp_buf listingbuf; };
> +extern struct globals *const ptr_to_globals;
> +void foo()
> +{
> +if ( _setjmp ( ((*ptr_to_globals).listingbuf )))
> + ;
> +}

Is the implicit declaration of _setjmp important to this test?
Could we declare it explicitly instead?

Thanks,
Florian

[PATCH] RISC-V: Eliminate unused parameter warning.

2023-11-08 Thread Li Xu

From: xuli 

The parameter orig_fndecl is not used, use anonymous parameters instead.

../.././gcc/gcc/config/riscv/riscv-c.cc: In function ‘bool 
riscv_check_builtin_call(location_t, vec, tree, tree, unsigned 
int, tree_node**)’:
../.././gcc/gcc/config/riscv/riscv-c.cc:207:11: warning: unused parameter 
‘orig_fndecl’ [-Wunused-parameter]
  tree orig_fndecl, unsigned int nargs, tree *args)
   ^~~

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_check_builtin_call): Eliminate warning.
---
 gcc/config/riscv/riscv-c.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index bedf7217390..b7f9ba204f7 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -204,7 +204,7 @@ riscv_pragma_intrinsic (cpp_reader *)
 /* Implement TARGET_CHECK_BUILTIN_CALL.  */
 static bool
 riscv_check_builtin_call (location_t loc, vec arg_loc, tree fndecl,
- tree orig_fndecl, unsigned int nargs, tree *args)
+ tree, unsigned int nargs, tree *args)
 {
   unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
   unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
-- 
2.17.1

Re: [PATCH] RISC-V: Eliminate unused parameter warning.

2023-11-08 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-11-08 17:09
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Eliminate unused parameter warning.
From: xuli 
 
The parameter orig_fndecl is not used, use anonymous parameters instead.
 
../.././gcc/gcc/config/riscv/riscv-c.cc: In function ‘bool 
riscv_check_builtin_call(location_t, vec, tree, tree, unsigned 
int, tree_node**)’:
../.././gcc/gcc/config/riscv/riscv-c.cc:207:11: warning: unused parameter 
‘orig_fndecl’ [-Wunused-parameter]
  tree orig_fndecl, unsigned int nargs, tree *args)
   ^~~
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc (riscv_check_builtin_call): Eliminate warning.
---
gcc/config/riscv/riscv-c.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index bedf7217390..b7f9ba204f7 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -204,7 +204,7 @@ riscv_pragma_intrinsic (cpp_reader *)
/* Implement TARGET_CHECK_BUILTIN_CALL.  */
static bool
riscv_check_builtin_call (location_t loc, vec arg_loc, tree fndecl,
-   tree orig_fndecl, unsigned int nargs, tree *args)
+   tree, unsigned int nargs, tree *args)
{
   unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
   unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
-- 
2.17.1

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-08 Thread Richard Sandiford

Lehua Ding  writes:
> Hi,
>
> These patchs try to support subreg coalesce feature in
> register allocation passes (ira and lra).

Thanks a lot for the series.  This is definitely something we've
needed for a while.

I probably won't be able to look at it in detail for a couple of weeks
(and the real review should come from Vlad anyway), but one initial
comment:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.

I agree with Richi of course that compile-time is a concern.
The patch seems to add quite a bit of new data to ira_allocno,
but perhaps that's OK.  ira_object + ira_allocno is already quite big.

However:

@@ -387,8 +398,8 @@ struct ira_allocno
   /* An array of structures describing conflict information and live
  ranges for each object associated with the allocno.  There may be
  more than one such object in cases where the allocno represents a
- multi-word register.  */
-  ira_object_t objects[2];
+ multi-hardreg pesudo.  */
+  std::vector objects;
   /* Registers clobbered by intersected calls.  */
HARD_REG_SET crossed_calls_clobbered_regs;
   /* Array of usage costs (accumulated and the one updated during

adds an extra level of indirection (and separate extra storage) for
every allocno, not just multi-hardreg ones.  It'd be worth optimising
the data structures' representation of single-hardreg pseudos even if
that slows down the multi-hardreg code, since single-hardreg pseudos are
so much more common.  And the different single-hardreg and multi-hardreg
representations could be hidden behind accessors, to make life easier
for consumers.  (Of course, performance of the accessors is also then
an issue. :))

Richard

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-08 Thread Richard Sandiford

"Kewen.Lin"  writes:
> Hi,
>
> Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html

Sorry for the lack of review on this.  Personally, I've never looked
at this part of code base in detail, so I don't think I can do a proper
review.  I'll try to have a look in stage 3 if no one more qualified
beats me to it.

Thanks,
Richard

>
> BR,
> Kewen
>
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>   1) rebased from r14-4810 which is relatively up-to-date,
>>  some conflicts on "int to bool" return type change have
>>  been resolved;
>>   2) adjust commit log a bit;
>>   3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>   - Introduce enum rgn_bb_deps_free_action for three
>> kinds of actions to free deps;
>>   - Change function free_deps_for_bb_no_real_insns_p to
>> resolve_forw_deps which only focuses on forward deps;
>>   - Extend the handlings to cover dbg-cnt sched_block,
>> add one test case for it;
>>   - Move free_trg_info call in schedule_region to an
>> appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>> 
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but they get skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at option -O2 -g, is that we could
>> skip to schedule some block, which already gets dependency
>> graph built so has dependencies computed and rgn_n_insns
>> accumulated, then the later verification on if the graph
>> becomes exhausted by scheduling would fail as follow:
>> 
>>   /* Sanity check: verify that all region insns were
>>  scheduled.  */
>> gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> 
>> , and also some forward deps aren't resovled.
>> 
>> As Alexander pointed out, the current debug count handling
>> also suffers the similar issue, so this patch handles these
>> two cases together: one is for some block gets skipped by
>> !dbg_cnt (sched_block), the other is for some block which
>> is not no_real_nondebug_insns_p initially but becomes
>> no_real_nondebug_insns_p due to speculative scheduling.
>> 
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>> 
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>> 
>> Any thoughts?  Is it ok for trunk?
>> 
>> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
>> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
>> 
>> BR,
>> Kewen
>> -
>>  PR rtl-optimization/108273
>> 
>> gcc/ChangeLog:
>> 
>>  * haifa-sched.cc (no_real_insns_p): Rename to ...
>>  (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>>  * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>>  no_real_nondebug_insns_p.
>>  * sched-int.h (no_real_insns_p): Rename to ...
>>  (no_real_nondebug_insns_p): ... this.
>>  * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>>  (bb_deps_free_actions): New static variable.
>>  (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>>  (resolve_forw_deps): New function.
>>  (f

Re: [PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-08 Thread Christophe Lyon





On 11/7/23 23:51, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Extend existing unit tests for the ACLE system register manipulation
functions to include 128-bit tests.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
(set_wsr128): Likewise.
---
  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30 +++-
  1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
index 3af4b960306..e7725022316 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
@@ -1,11 +1,15 @@
  /* Test the __arm_[r,w]sr ACLE intrinsics family.  */
  /* Check that function variants for different data types handle types 
correctly.  */
  /* { dg-do compile } */
-/* { dg-options "-O1 -march=armv8.4-a" } */
+/* { dg-options "-O1 -march=armv9.4-a+d128" } */
  /* { dg-final { check-function-bodies "**" "" } } */


I'm nervous about having our only tests for 64-bit reads and writes
using such a high minimum version.  Could the file instead be compiled
without any minimum architecture and have tests that work with plain
-march=armv8-a?  Then the test could switch to other architectures
where necessary using #pragam GCC target.  This test...


  #include 
  
+#ifndef __ARM_FEATURE_SYSREG128

+#error "__ARM_FEATURE_SYSREG128 feature macro not defined."
+#endif
+


...would still work. with a #pragma GCC target.



Or maybe add a new test file for 128 bit sysregs, and thus have two test 
files, the existing one for 64 bit sysregs, and the new one for 128 bit 
sysregs?


Thanks,

Christophe




Thanks,
Richard


  /*
  ** get_rsr:
  ** ...
@@ -66,6 +70,17 @@ get_rsrf64 ()
return __arm_rsrf64("trcseqstr");
  }
  
+/*

+** get_rsr128:
+** mrrsx0, x1, s3_0_c7_c4_0
+** ...
+*/
+__uint128_t
+get_rsr128 ()
+{
+  __arm_rsr128("par_el1");
+}
+
  /*
  ** set_wsr32:
  ** ...
@@ -129,6 +144,18 @@ set_wsrf64(double a)
__arm_wsrf64("trcseqstr", a);
  }
  
+/*

+** set_wsr128:
+** ...
+** msrrs3_0_c7_c4_0, x0, x1
+** ...
+*/
+void
+set_wsr128 (__uint128_t c)
+{
+  __arm_wsr128 ("par_el1", c);
+}
+
  /*
  ** set_custom:
  ** ...
@@ -142,3 +169,4 @@ void set_custom()
__uint64_t b = __arm_rsr64("S1_2_C3_C4_5");
__arm_wsr64("S1_2_C3_C4_5", b);
  }
+

Re: [PATCH 1/3] tree-ssa-sink: do not sink to in front of setjmp

2023-11-08 Thread Richard Biener




> Am 08.11.2023 um 10:04 schrieb Florian Weimer :
> 
> * Alexander Monakov via Gcc-patches:
> 
>> diff --git a/gcc/testsuite/gcc.dg/setjmp-7.c 
>> b/gcc/testsuite/gcc.dg/setjmp-7.c
>> new file mode 100644
>> index 0..44b5bcbfa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/setjmp-7.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fno-guess-branch-probability -w" } */
>> +/* { dg-require-effective-target indirect_jumps } */
>> +
>> +struct __jmp_buf_tag { };
>> +typedef struct __jmp_buf_tag jmp_buf[1];
>> +struct globals { jmp_buf listingbuf; };
>> +extern struct globals *const ptr_to_globals;
>> +void foo()
>> +{
>> +if ( _setjmp ( ((*ptr_to_globals).listingbuf )))
>> +;
>> +}
> 
> Is the implicit declaration of _setjmp important to this test?
> Could we declare it explicitly instead?

It shouldn’t be important.

> Thanks,
> Florian
>

Re: [PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-08 Thread Richard Sandiford

Christophe Lyon  writes:
> On 11/7/23 23:51, Richard Sandiford wrote:
>> Victor Do Nascimento  writes:
>>> Extend existing unit tests for the ACLE system register manipulation
>>> functions to include 128-bit tests.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
>>> (set_wsr128): Likewise.
>>> ---
>>>   gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30 +++-
>>>   1 file changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c 
>>> b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
>>> index 3af4b960306..e7725022316 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
>>> @@ -1,11 +1,15 @@
>>>   /* Test the __arm_[r,w]sr ACLE intrinsics family.  */
>>>   /* Check that function variants for different data types handle types 
>>> correctly.  */
>>>   /* { dg-do compile } */
>>> -/* { dg-options "-O1 -march=armv8.4-a" } */
>>> +/* { dg-options "-O1 -march=armv9.4-a+d128" } */
>>>   /* { dg-final { check-function-bodies "**" "" } } */
>> 
>> I'm nervous about having our only tests for 64-bit reads and writes
>> using such a high minimum version.  Could the file instead be compiled
>> without any minimum architecture and have tests that work with plain
>> -march=armv8-a?  Then the test could switch to other architectures
>> where necessary using #pragam GCC target.  This test...
>> 
>>>   #include 
>>>   
>>> +#ifndef __ARM_FEATURE_SYSREG128
>>> +#error "__ARM_FEATURE_SYSREG128 feature macro not defined."
>>> +#endif
>>> +
>> 
>> ...would still work. with a #pragma GCC target.
>> 
>
> Or maybe add a new test file for 128 bit sysregs, and thus have two test 
> files, the existing one for 64 bit sysregs, and the new one for 128 bit 
> sysregs?

Yeah, that would be ok too, but what I was suggesting would lead to
at least three groups of tests rather than two.

Thanks,
Richard


>
> Thanks,
>
> Christophe
>
>
>
>> Thanks,
>> Richard
>> 
>>>   /*
>>>   ** get_rsr:
>>>   ** ...
>>> @@ -66,6 +70,17 @@ get_rsrf64 ()
>>> return __arm_rsrf64("trcseqstr");
>>>   }
>>>   
>>> +/*
>>> +** get_rsr128:
>>> +** mrrsx0, x1, s3_0_c7_c4_0
>>> +** ...
>>> +*/
>>> +__uint128_t
>>> +get_rsr128 ()
>>> +{
>>> +  __arm_rsr128("par_el1");
>>> +}
>>> +
>>>   /*
>>>   ** set_wsr32:
>>>   ** ...
>>> @@ -129,6 +144,18 @@ set_wsrf64(double a)
>>> __arm_wsrf64("trcseqstr", a);
>>>   }
>>>   
>>> +/*
>>> +** set_wsr128:
>>> +** ...
>>> +** msrrs3_0_c7_c4_0, x0, x1
>>> +** ...
>>> +*/
>>> +void
>>> +set_wsr128 (__uint128_t c)
>>> +{
>>> +  __arm_wsr128 ("par_el1", c);
>>> +}
>>> +
>>>   /*
>>>   ** set_custom:
>>>   ** ...
>>> @@ -142,3 +169,4 @@ void set_custom()
>>> __uint64_t b = __arm_rsr64("S1_2_C3_C4_5");
>>> __arm_wsr64("S1_2_C3_C4_5", b);
>>>   }
>>> +

[PATCH] Improve C99 compatibility of gcc.dg/setjmp-7.c test

2023-11-08 Thread Florian Weimer

gcc/testsuite/ChangeLog:

* gcc.dg/setjmp-7.c (_setjmp): Declare.

---
 gcc/testsuite/gcc.dg/setjmp-7.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/setjmp-7.c b/gcc/testsuite/gcc.dg/setjmp-7.c
index 44b5bcbfa9d..579542380ba 100644
--- a/gcc/testsuite/gcc.dg/setjmp-7.c
+++ b/gcc/testsuite/gcc.dg/setjmp-7.c
@@ -6,6 +6,7 @@ struct __jmp_buf_tag { };
 typedef struct __jmp_buf_tag jmp_buf[1];
 struct globals { jmp_buf listingbuf; };
 extern struct globals *const ptr_to_globals;
+int _setjmp (struct __jmp_buf_tag __env[1]);
 void foo()
 {
 if ( _setjmp ( ((*ptr_to_globals).listingbuf )))

base-commit: 0610f88cc6b23393b43541b73e50446cb8ab9c57

Re: [PATCH] Improve C99 compatibility of gcc.dg/setjmp-7.c test

2023-11-08 Thread Jakub Jelinek

On Wed, Nov 08, 2023 at 11:19:10AM +0100, Florian Weimer wrote:
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/setjmp-7.c (_setjmp): Declare.

Ok.

Jakub

[PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-08 Thread Juzhe-Zhong

PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

SELECT_VL result is not necessary always VF in non-final iteration.

Current GIMPLE IR is wrong:

# vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)>
...
_24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... };

After this patch which is correct for SELECT_VL:

# vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
...
_35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
_21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };

kito, could you give more explanation ?

PR middle/112438

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_induction): Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112438.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/pr112438.c   | 35 +
 gcc/tree-vect-loop.cc | 39 +++
 2 files changed, 67 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
new file mode 100644
index 000..b326d56a52c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model 
-ffast-math -fdump-tree-optimized-details" } */
+
+void
+foo (int n, int *__restrict in, int *__restrict out)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i;
+}
+}
+
+void
+foo2 (int n, float * __restrict in, 
+float * __restrict out)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i;
+}
+}
+
+void
+foo3 (int n, float * __restrict in, 
+float * __restrict out, float x)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i* i;
+}
+}
+
+/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }.  */
+/* { dg-final { scan-tree-dump-not "\\+ \{ POLY_INT_CST" "optimized" } } */
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a544bc9b059..3e103946168 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10309,10 +10309,30 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 new_name = step_expr;
   else
 {
+  gimple_seq seq = NULL;
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ /* When we're using loop_len produced by SELEC_VL, the non-final
+iterations are not always processing VF elements.  So vectorize
+induction variable instead of
+
+  _21 = vect_vec_iv_.6_22 + { VF, ... };
+
+We should generate:
+
+  _35 = .SELECT_VL (ivtmp_33, VF);
+  vect_cst__22 = [vec_duplicate_expr] _35;
+  _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
+ vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+ tree len
+   = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
+ expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
+unshare_expr (len)),
+  &seq, true, NULL_TREE);
+   }
   /* iv_loop is the loop to be vectorized. Generate:
  vec_step = [VF*S, VF*S, VF*S, VF*S]  */
-  gimple_seq seq = NULL;
-  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
+  else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
{
  expr = build_int_cst (integer_type_node, vf);
  expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
@@ -10323,8 +10343,13 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   expr, step_expr);
   if (seq)
{
- new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
- gcc_assert (!new_bb);
+ if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
+ else
+   {
+ new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+ gcc_assert (!new_bb);
+   }
}
 }
 
@@ -10332,9 +10357,9 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);
   new_vec = build_vector_from_val (step_vectype, t);
-  vec_step = vect_init_vector (loop_vinfo, stmt_info,
-  new_vec, step_vectype, NULL);
-
+  vec_step
+= vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype,
+   LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? &si : NULL);
 
   /* Create the following def-use cycle:
  loop prolog:
-- 
2.36.3

Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV

2023-11-08 Thread juzhe.zh...@rivai.ai

Sorry for wrong description on the log:

After this patch, the IR is:

  _36 = .SELECT_VL (ivtmp_34, POLY_INT_CST [4, 4]);
  _22 = (int) _36;
  vect_cst__21 = [vec_duplicate_expr] _22;



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-08 18:53
To: gcc-patches
CC: richard.sandiford; rguenther; kito.cheng; kito.cheng; Juzhe-Zhong
Subject: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
 
SELECT_VL result is not necessary always VF in non-final iteration.
 
Current GIMPLE IR is wrong:
 
# vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)>
...
_24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... };
 
After this patch which is correct for SELECT_VL:
 
# vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
...
_35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
_21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
 
kito, could you give more explanation ?
 
PR middle/112438
 
gcc/ChangeLog:
 
* tree-vect-loop.cc (vectorizable_induction): Fix bug.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr112438.c: New test.
 
---
.../gcc.target/riscv/rvv/autovec/pr112438.c   | 35 +
gcc/tree-vect-loop.cc | 39 +++
2 files changed, 67 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
new file mode 100644
index 000..b326d56a52c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model 
-ffast-math -fdump-tree-optimized-details" } */
+
+void
+foo (int n, int *__restrict in, int *__restrict out)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i;
+}
+}
+
+void
+foo2 (int n, float * __restrict in, 
+float * __restrict out)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i;
+}
+}
+
+void
+foo3 (int n, float * __restrict in, 
+float * __restrict out, float x)
+{
+  for (int i = 0; i < n; i += 1)
+{
+  out[i] = in[i] + i* i;
+}
+}
+
+/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }.  */
+/* { dg-final { scan-tree-dump-not "\\+ \{ POLY_INT_CST" "optimized" } } */
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a544bc9b059..3e103946168 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10309,10 +10309,30 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 new_name = step_expr;
   else
 {
+  gimple_seq seq = NULL;
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ {
+   /* When we're using loop_len produced by SELEC_VL, the non-final
+  iterations are not always processing VF elements.  So vectorize
+  induction variable instead of
+
+_21 = vect_vec_iv_.6_22 + { VF, ... };
+
+  We should generate:
+
+_35 = .SELECT_VL (ivtmp_33, VF);
+vect_cst__22 = [vec_duplicate_expr] _35;
+_21 = vect_vec_iv_.6_22 + vect_cst__22;  */
+   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+   tree len
+ = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
+   expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
+  unshare_expr (len)),
+&seq, true, NULL_TREE);
+ }
   /* iv_loop is the loop to be vectorized. Generate:
  vec_step = [VF*S, VF*S, VF*S, VF*S]  */
-  gimple_seq seq = NULL;
-  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
+  else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
{
  expr = build_int_cst (integer_type_node, vf);
  expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
@@ -10323,8 +10343,13 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   expr, step_expr);
   if (seq)
{
-   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
-   gcc_assert (!new_bb);
+   if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
+   else
+ {
+   new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+   gcc_assert (!new_bb);
+ }
}
 }
@@ -10332,9 +10357,9 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);
   new_vec = build_vector_from_val (step_vectype, t);
-  vec_step = vect_init_vector (loop_vinfo, stmt_info,
-new_vec, step_vectype, NULL);
-
+  vec_step
+= vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype,
+ LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? &si : NULL);
   /* Create the following def-use cycle:
  loop prolog:
-- 
2.36.3

[PATCH 0/3] RISC-V: Support CORE-V XCVELW and XCVBI extensions

2023-11-08 Thread Mary Bennett

This patch series presents the comprehensive implementation of the ELW and BI
extension for CORE-V.

Tested with riscv-gnu-toolchain on binutils, ld, gas and gcc testsuites to
ensure its correctness and compatibility with the existing codebase.
However, your input, reviews, and suggestions are invaluable in making this
extension even more robust.

The CORE-V builtins are described in the specification [1] and work can be
found in the OpenHW group's Github repository [2].

[1] 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

[2] github.com/openhwgroup/corev-gcc

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 

RISC-V: Update XCValu constraints to match other vendors
RISC-V: Add support for XCVelw extension in CV32E40P
RISC-V: Add support for XCVbi extension in CV32E40P

 gcc/common/config/riscv/riscv-common.cc   |  4 ++
 gcc/config/riscv/constraints.md   | 21 +---
 gcc/config/riscv/corev.def|  3 ++
 gcc/config/riscv/corev.md | 33 -
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv-builtins.cc|  2 +
 gcc/config/riscv/riscv-ftypes.def |  1 +
 gcc/config/riscv/riscv.md |  9 +++-
 gcc/config/riscv/riscv.opt|  4 ++
 gcc/doc/extend.texi   |  8 
 gcc/doc/sourcebuild.texi  |  6 +++
 .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++
 .../gcc.target/riscv/cv-elw-elw-compile-1.c   | 11 +
 gcc/testsuite/lib/target-supports.exp | 26 ++
 17 files changed, 252 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-elw-elw-compile-1.c

-- 
2.34.1

[PATCH 1/3] RISC-V: Add support for XCVelw extension in CV32E40P

2023-11-08 Thread Mary Bennett

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add XCVelw.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* doc/extend.texi: Add XCVelw builtin documentation.
* doc/sourcebuild.texi: Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-elw-compile-1.c: Create test for cv.elw.
* testsuite/lib/target-supports.exp: Add proc for the XCVelw extension.
---
 gcc/common/config/riscv/riscv-common.cc   |  2 ++
 gcc/config/riscv/corev.def|  3 +++
 gcc/config/riscv/corev.md | 15 +++
 gcc/config/riscv/riscv-builtins.cc|  2 ++
 gcc/config/riscv/riscv-ftypes.def |  1 +
 gcc/config/riscv/riscv.opt|  2 ++
 gcc/doc/extend.texi   |  8 
 gcc/doc/sourcebuild.texi  |  3 +++
 .../gcc.target/riscv/cv-elw-elw-compile-1.c   | 11 +++
 gcc/testsuite/lib/target-supports.exp | 13 +
 10 files changed, 60 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-elw-elw-compile-1.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 526dbb7603b..6a1978bd0e4 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -312,6 +312,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xcvelw", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1667,6 +1668,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"xcvmac",&gcc_options::x_riscv_xcv_subext, MASK_XCVMAC},
   {"xcvalu",&gcc_options::x_riscv_xcv_subext, MASK_XCVALU},
+  {"xcvelw",&gcc_options::x_riscv_xcv_subext, MASK_XCVELW},
 
   {"xtheadba",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBA},
   {"xtheadbb",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBB},
diff --git a/gcc/config/riscv/corev.def b/gcc/config/riscv/corev.def
index 17580df3c41..3b9ec029d06 100644
--- a/gcc/config/riscv/corev.def
+++ b/gcc/config/riscv/corev.def
@@ -41,3 +41,6 @@ RISCV_BUILTIN (cv_alu_subN, "cv_alu_subN", 
RISCV_BUILTIN_DIRECT, RISCV_SI_FT
 RISCV_BUILTIN (cv_alu_subuN,"cv_alu_subuN", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI_USI_UQI,  cvalu),
 RISCV_BUILTIN (cv_alu_subRN,"cv_alu_subRN", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI_SI_UQI, cvalu),
 RISCV_BUILTIN (cv_alu_subuRN,   "cv_alu_subuRN",RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI_USI_UQI,  cvalu),
+
+// XCVELW
+RISCV_BUILTIN (cv_elw_elw_si, "cv_elw_elw", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_VOID_PTR, cvelw),
diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index 1350bd4b81e..be66b1428a7 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -24,6 +24,9 @@
   UNSPEC_CV_ALU_CLIPR
   UNSPEC_CV_ALU_CLIPU
   UNSPEC_CV_ALU_CLIPUR
+
+  ;;CORE-V EVENT LOAD
+  UNSPECV_CV_ELW
 ])
 
 ;; XCVMAC extension.
@@ -691,3 +694,15 @@
   cv.suburnr\t%0,%2,%3"
   [(set_attr "type" "arith")
   (set_attr "mode" "SI")])
+
+;; XCVELW builtins
+(define_insn "riscv_cv_elw_elw_si"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+  (unspec_volatile [(mem:SI (match_operand:SI 1 "address_operand" "p"))]
+  UNSPECV_CV_ELW))]
+
+  "TARGET_XCVELW && !TARGET_64BIT"
+  "cv.elw\t%0,%a1"
+
+  [(set_attr "type" "load")
+  (set_attr "mode" "SI")])
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index fc3976f3ba1..5ee11ebe3bc 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -128,6 +128,7 @@ AVAIL (hint_pause, (!0))
 // CORE-V AVAIL
 AVAIL (cvmac, TARGET_XCVMAC && !TARGET_64BIT)
 AVAIL (cvalu, TARGET_XCVALU && !TARGET_64BIT)
+AVAIL (cvelw, TARGET_XCVELW && !TARGET_64BIT)
 
 /* Construct a riscv_builtin_description from the given arguments.
 
@@ -168,6 +169,7 @@ AVAIL (cvalu, TARGET_XCVALU && !TARGET_64BIT)
 #define RISCV_ATYPE_HI intHI_type_node
 #define RISCV_ATYPE_SI intSI_type_node
 #define RISCV_ATYPE_VOID_PTR ptr_type_node
+#define RISCV_ATYPE_INT_PTR integer_ptr_type_node
 
 /* RISCV_FTYPE_ATYPESN takes N RISCV_FTYPES-like type codes and lists
their associated RISCV_ATYPEs.  */
diff --git a/gcc/config/riscv/riscv-ftypes.def 
b/gcc/config/riscv/riscv-ftypes.def
index 0d1e4dd061e..3e7d5c69503 10064

[PATCH 2/3] RISC-V: Update XCValu constraints to match other vendors

2023-11-08 Thread Mary Bennett

gcc/ChangeLog:
* config/riscv/constraints.md: CVP2 -> CV_alu_pow2.
* config/riscv/corev.md: Likewise.
---
 gcc/config/riscv/constraints.md | 15 ---
 gcc/config/riscv/corev.md   |  4 ++--
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 68be4515c04..2711efe68c5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -151,13 +151,6 @@
 (define_register_constraint "zmvr" "(TARGET_ZFA || TARGET_XTHEADFMV) ? GR_REGS 
: NO_REGS"
   "An integer register for  ZFA or XTheadFmv.")
 
-;; CORE-V Constraints
-(define_constraint "CVP2"
-  "Checking for CORE-V ALU clip if ival plus 1 is a power of 2"
-  (and (match_code "const_int")
-   (and (match_test "IN_RANGE (ival, 0, 1073741823)")
-(match_test "exact_log2 (ival + 1) != -1"
-
 ;; Vector constraints.
 
 (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"
@@ -246,3 +239,11 @@
A MEM with a valid address for th.[l|s]*ur* instructions."
   (and (match_code "mem")
(match_test "th_memidx_legitimate_index_p (op, true)")))
+
+;; CORE-V Constraints
+(define_constraint "CV_alu_pow2"
+  "@internal
+   Checking for CORE-V ALU clip if ival plus 1 is a power of 2"
+  (and (match_code "const_int")
+   (and (match_test "IN_RANGE (ival, 0, 1073741823)")
+(match_test "exact_log2 (ival + 1) != -1"
diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index be66b1428a7..0109e1836cf 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -516,7 +516,7 @@
 (define_insn "riscv_cv_alu_clip"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
(unspec:SI [(match_operand:SI 1 "register_operand" "r,r")
-   (match_operand:SI 2 "immediate_register_operand" "CVP2,r")]
+   (match_operand:SI 2 "immediate_register_operand" 
"CV_alu_pow2,r")]
 UNSPEC_CV_ALU_CLIP))]
 
   "TARGET_XCVALU && !TARGET_64BIT"
@@ -529,7 +529,7 @@
 (define_insn "riscv_cv_alu_clipu"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
(unspec:SI [(match_operand:SI 1 "register_operand" "r,r")
-   (match_operand:SI 2 "immediate_register_operand" "CVP2,r")]
+   (match_operand:SI 2 "immediate_register_operand" 
"CV_alu_pow2,r")]
 UNSPEC_CV_ALU_CLIPU))]
 
   "TARGET_XCVALU && !TARGET_64BIT"
-- 
2.34.1

[PATCH 3/3] RISC-V: Add support for XCVbi extension in CV32E40P

2023-11-08 Thread Mary Bennett

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 


gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Change pattern priority so corev.md
  patterns run before riscv.md patterns.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.
---
 gcc/common/config/riscv/riscv-common.cc   |  2 +
 gcc/config/riscv/constraints.md   |  6 +++
 gcc/config/riscv/corev.md | 14 ++
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv.md |  9 +++-
 gcc/config/riscv/riscv.opt|  2 +
 gcc/doc/sourcebuild.texi  |  3 ++
 .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++
 gcc/testsuite/lib/target-supports.exp | 13 +
 12 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 6a1978bd0e4..04631e007f0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -313,6 +313,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvelw", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xcvbi", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1669,6 +1670,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xcvmac",&gcc_options::x_riscv_xcv_subext, MASK_XCVMAC},
   {"xcvalu",&gcc_options::x_riscv_xcv_subext, MASK_XCVALU},
   {"xcvelw",&gcc_options::x_riscv_xcv_subext, MASK_XCVELW},
+  {"xcvbi", &gcc_options::x_riscv_xcv_subext, MASK_XCVBI},
 
   {"xtheadba",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBA},
   {"xtheadbb",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBB},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 2711efe68c5..718b4bd77df 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -247,3 +247,9 @@
   (and (match_code "const_int")
(and (match_test "IN_RANGE (ival, 0, 1073741823)")
 (match_test "exact_log2 (ival + 1) != -1"
+
+(define_constraint "CV_bi_sign5"
+  "@internal
+   A 5-bit signed immediate for CORE-V Immediate Branch."
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (ival, -16, 15)")))
diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index 0109e1836cf..7d7b952d817 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -706,3 +706,17 @@
 
   [(set_attr "type" "load")
   (set_attr "mode" "SI")])
+
+;; XCVBI Builtins
+(define_insn "cv_branch"
+  [(set (pc)
+   (if_then_else
+(match_operator 1 "equality_operator"
+[(match_operand:X 2 "register_operand" "r")
+ (match_operand:X 3 "const_int5s_operand" 
"CV_bi_sign5")])
+(label_ref (match_operand 0 "" ""))
+(pc)))]
+  "TARGET_XCVBI"
+  "cv.b%C1imm\t%2,%3,%0"
+  [(set_attr "type" "branch")
+   (set_attr "mode" "none")])
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index a37d035fa61..69a6319c2c8 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -400,6 +400,10 @@
   (ior (match_operand 0 "register_operand")
(match_code "const_int")))
 
+(define_predicate "const_int5s_operand"
+  (and (match_code

[Committed] RISC-V: Fix VSETVL VL check condition bug

2023-11-08 Thread Juzhe-Zhong

When fixing the induction variable vectorization bug, notice there is a ICE bug
in VSETVL PASS:

0x178015b rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
../../../../gcc/gcc/rtl.cc:770
0x1079cdd rhs_regno(rtx_def const*)
../../../../gcc/gcc/rtl.h:1934
0x1dab360 vsetvl_info::parse_insn(rtl_ssa::insn_info*)
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:1070
0x1daa272 vsetvl_info::vsetvl_info(rtl_ssa::insn_info*)
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:746
0x1da5d98 pre_vsetvl::fuse_local_vsetvl_info()
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2708
0x1da94d9 pass_vsetvl::lazy_vsetvl()
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3444
0x1da977c pass_vsetvl::execute(function*)
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3504

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vl-use-ice.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/vl-use-ice.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vl-use-ice.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 77dbf159d41..3fa25a6404d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1067,7 +1067,7 @@ public:
break;
  }
rtx avl = ::get_avl (rinsn);
-   if (!avl || REGNO (get_vl ()) != REGNO (avl))
+   if (!avl || !REG_P (avl) || REGNO (get_vl ()) != REGNO (avl))
  {
m_vl_used_by_non_rvv_insn = true;
break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vl-use-ice.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vl-use-ice.c
new file mode 100644
index 000..715c7e0cad2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vl-use-ice.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d" } */
+
+#include "riscv_vector.h"
+
+void foo(void *in1, void *out, size_t avl) {
+
+  size_t vl = __riscv_vsetvl_e32m1(avl);
+  vint32m1_t v = __riscv_vmv_v_x_i32m1 (vl, 16);
+  __riscv_vse32_v_i32m1 (out, v, 16);
+}
-- 
2.36.3

[avr,committed] Tweak IEEE double multiplication

2023-11-08 Thread Georg-Johann Lay


Applied this patch that improves IEEE double multiplication.
The old code consumed time for calling local helpers and to
prepare arguments.

Functions that use mul like expl or sinl are around 5%...9% faster
now.  The code size did not increase.

Johann

--

LibF7: Tweak IEEE double multiplication.

libgcc/config/avr/libf7/
* libf7-asm.sx (mul_mant) [AVR_HAVE_MUL]: Tweak code.


diff --git a/libgcc/config/avr/libf7/libf7-asm.sx 
b/libgcc/config/avr/libf7/libf7-asm.sx

index 4505764c126..01d1fa3e876 100644
--- a/libgcc/config/avr/libf7/libf7-asm.sx
+++ b/libgcc/config/avr/libf7/libf7-asm.sx
@@ -877,10 +877,14 @@ DEFUN ashldi3
 ;; R18.0 = 1: No rounding.

 DEFUN mul_mant
+;; 10 = Y, R17...R10
 do_prologue_saves 10
+;; T = R18.0: Skip rounding?
 bst r18,0
+;; Save result address for later.
 pushr25
 pushr24
+;; Load A's mantissa.
 movwZL, r22
 LDD A0, Z+0+Off
 LDD A1, Z+1+Off
@@ -913,26 +917,15 @@ DEFUN mul_mant
 adc C6, ZERO
 ;; Done B6

-;; 3 * 3 -> 0:a
-;; 4 * 4 -> 2:1
-;; 5 * 5 -> 4:3
-ldd BB, Z+3+Off $   mul A3, BB  $   movwTT0, r0
-ldd BB, Z+4+Off $   mul A4, BB  $   movwTT2, r0
-ldd BB, Z+5+Off $   mul A5, BB
-
-ADD CA, TT0 $   adc C0, TT1
-adc C1, TT2 $   adc C2, TT3
-adc C3, r0  $   adc C4, r1
-brcc .+2
-adiwC5, 1
-
 ;; 6 * 5 -> 5:4
 ;; 4 * 5 -> 3:2
 ;; 2 * 5 -> 1:0
 ;; 0 * 5 -> a:-
+ldd BB, Z+5+Off
 mul A0, BB
-;; A0 done
+;; Done A0
 #define Atmp A0
+#define Null A0

 mov Atmp, r1
 mul A6, BB  $   movwTT2, r0
@@ -942,82 +935,127 @@ DEFUN mul_mant
 ADD CA, Atmp
 adc C0, r0  $   adc C1, r1
 adc C2, TT0 $   adc C3, TT1
-adc C4, TT2 $   adc C5, TT3 $   clr ZERO
-adc C6, ZERO
+adc C4, TT2 $   adc C5, TT3 $   clr Null
+adc C6, Null

 ;; 1 * 5 -> 0:a
 ;; 3 * 5 -> 2:1
-;; 6 * 4 -> 4:3
+;; 5 * 5 -> 4:3
 mul A1, BB  $   movwTT0, r0
 mul A3, BB  $   movwTT2, r0
+mul A5, BB
+
+ADD CA, TT0 $   adc C0, TT1
+adc C1, TT2 $   adc C2, TT3
+adc C3, r0  $   adc C4, r1
+adc C5, Null$   adc C6, Null
+;; Done B5
+
+;; 2 * 4 -> 0:a
+;; 4 * 4 -> 2:1
+;; 6 * 4 -> 4:3
 ldd BB, Z+4+Off
+mul A2, BB  $   movwTT0, r0
+mul A4, BB  $   movwTT2, r0
 mul A6, BB

 ADD CA, TT0 $   adc C0, TT1
 adc C1, TT2 $   adc C2, TT3
-adc C3, r0  $   adc C4, r1  $   clr ZERO
-adc C5, ZERO$   adc C6, ZERO
-;; B5 done
+adc C3, r0  $   adc C4, r1
+adc C5, Null$   adc C6, Null

+;; 1 * 4 -> a:-
+;; 3 * 4 -> 1:0
+;; 5 * 4 -> 3:2
+mul A1, BB  $   mov TT1, r1
+mul A3, BB  $   movwTT2, r0
+mul A5, BB
+;; Done A1
+;; Done B4
+ADD CA, TT1
+adc C0, TT2 $   adc C1, TT3
+adc C2, r0  $   adc C3, r1
+;; Accumulate carry for C3 in TT1.
+;; Accumulate carry for C4 in A1.
+#define Cry3 TT1
+#define Cry4 A1
+clr Cry3
+clr Cry4
+rol Cry4
+
+;; 6 * 2 -> 2:1
 ;; 6 * 3 -> 3:2
-;; 6 * 1 -> 1:0
-;; 4 * 1 -> a:-
-mov TT0, A6 $   ldd TMP,  Z+3+Off
-mov BB,  A4 $   ldd Atmp, Z+1+Off
-rcall   .Lmul.help.3
+;; 5 * 3 -> 2:1
+ldd BB, Z+2+Off
+mul A6, BB
+add C1, r0
+adc C2, r1
+adc Cry3, Null

-;; 5 * 4 -> 3:2
-;; 5 * 2 -> 1:0
-;; 3 * 2 -> a:-
-mov TT0, A5 $   ldd TMP,  Z+4+Off
-mov BB,  A3 $   ldd Atmp, Z+2+Off
-rcall   .Lmul.help.3
+ldd BB, Z+3+Off
+mul A6, BB
+add C2, r0
+adc C3, r1
+adc Cry4, Null
+
+mul A5, BB
+add C1, r0
+adc C2, r1
+adc Cry3, Null

-;; 4 *   -> 3:2 (=0)
+;; Perform the remaining 11 multiplications in 4 loopings:
 ;; 4 * 3 -> 1:0
+;; 3 * 3 -> 0:a
 ;; 2 * 3 -> a:-
-mov TT0, A4 $   clr TMP
-mov BB,  A2 $   ldd Atmp, Z+3+Off
-rcall   .Lmul.help.3
-
-;; 3 * . -> 3:2 (=0)
-;; 3 * 4 -> 1:0
-;; 1 * 4 -> a:-
-mov TT0, A3 $   clr TMP
-mov BB,  A1 $   ldd Atmp, Z+4+Off
-rcall   .Lmul.help.3
-
-;; . * ? -> 3:2 (=0)
-;; . * 0 -> 1:0 (=0)
+;;
+;; 5 * 2 -> 1:0
+;; 4 * 2 -> 0:a
+;; 3 * 2 -> a:-
+;;
+;; 6 * 1 -> 1:0
+;; 5 * 1 -> 0:a
+;; 4 * 1 -> a:-
+;;
+;; . * 0 -> 1:0  (=0)
+;; 6 * 0 -> 0:a
 ;; 5 * 0 -> a:-
-clr TT0
-mov BB,  A5 $   ldd Atmp, Z+0+Off
-rcall   .Lmul.help.3

-clr TT3  ;; Asserted by .Lmul.help.2
-

Re: [PATCH v2 0/3] libgfortran: empty array fixes

2023-11-08 Thread Mikael Morin


Le 07/11/2023 à 19:16, Harald Anlauf a écrit :

Hi Mikael,

this is OK.

Thanks for the patches!

Harald



Patches pushed.
Thanks for the (fruitful) review.

Re: [PATCH 1/3] tree-ssa-sink: do not sink to in front of setjmp

2023-11-08 Thread Alexander Monakov


On Wed, 8 Nov 2023, Richard Biener wrote:

> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/setjmp-7.c
> >> @@ -0,0 +1,13 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fno-guess-branch-probability -w" } */
> >> +/* { dg-require-effective-target indirect_jumps } */
> >> +
> >> +struct __jmp_buf_tag { };
> >> +typedef struct __jmp_buf_tag jmp_buf[1];
> >> +struct globals { jmp_buf listingbuf; };
> >> +extern struct globals *const ptr_to_globals;
> >> +void foo()
> >> +{
> >> +if ( _setjmp ( ((*ptr_to_globals).listingbuf )))
> >> +;
> >> +}
> > 
> > Is the implicit declaration of _setjmp important to this test?
> > Could we declare it explicitly instead?
> 
> It shouldn’t be important.

Yes, it's an artifact from testcase minimization, sorry about that.

Florian, I see you've sent a patch to fix this up — thank you!

Alexander

[PATCH] minimal support for xtheadv

2023-11-08 Thread chenyixuan

From: XYenChi 

This patch is for support xtheadv.

gcc/ChangeLog:

2023-11-08  Chen Yixuan  

* common/config/riscv/riscv-common.cc: Add xthead minimal support.

gcc/config/ChangeLog:

2023-11-08  Chen Yixuan  

* riscv/riscv.opt: Add xthead minimal support.
---
 gcc/common/config/riscv/riscv-common.cc | 2 ++
 gcc/config/riscv/riscv.opt  | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 526dbb7603b..d5ea0ee9b70 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -325,6 +325,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadv",ISA_SPEC_CLASS_NONE, 0, 7},
 
   {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1680,6 +1681,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadmemidx",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
   {"xtheadmempair", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
   {"xtheadsync",&gcc_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
+  {"xtheadv",   &gcc_options::x_riscv_xthead_subext, MASK_XTHEADV},
 
   {"xventanacondops", &gcc_options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 70d78151cee..2bbdf680fa2 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -438,6 +438,8 @@ Mask(XTHEADMEMPAIR) Var(riscv_xthead_subext)
 
 Mask(XTHEADSYNC)Var(riscv_xthead_subext)
 
+Mask(XTHEADV)   Var(riscv_xthead_subext)
+
 TargetVariable
 int riscv_xventana_subext
 
-- 
2.42.0

[PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-08 Thread Lehua Ding

Hi,

This patch try to combine bellow two insns and then further remove
unnecessary sign_extend operations. This optimization is borrowed
from LLVM (https://godbolt.org/z/4f6v56xej):
  (set (reg:DI 134 [ _1 ])
   (unspec:DI [
   (const_int 19 [0x13])
   (const_int 8 [0x8])
   (const_int 5 [0x5])
   (const_int 2 [0x2]) repeated x2
   ] UNSPEC_VSETVL))
  (set (reg/v:DI 135 [  ])
  (sign_extend:DI (subreg:SI (reg:DI 134 [ _1 ]) 0)))

The reason we can remove signe_extend is because currently the vl value
returned by the vsetvl instruction ranges from 0 to 65536 (uint16_t), and
bits 17 to 63 (including 31) are always 0, so there is no change after
sign_extend. Note that for HI and QI modes we cannot do this.
Of course, if the range of instructions returned by vsetvl later expands
to 32bits, then this combine pattern needs to be removed. But that could be
a long time from now.

gcc/ChangeLog:

* config/riscv/vector.md (*vsetvldi_no_side_effects_si_extend):
New combine pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_int.c: New test.

---
 gcc/config/riscv/vector.md| 41 +++
 .../gcc.target/riscv/rvv/vsetvl/vsetvl_int.c  | 31 ++
 2 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e23f64938b7..d1499d330ff 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1604,6 +1604,47 @@
   [(set_attr "type" "vsetvl")
(set_attr "mode" "SI")])
 
+;; This pattern use to combine bellow two insns and then further remove
+;; unnecessary sign_extend operations:
+;;   (set (reg:DI 134 [ _1 ])
+;;(unspec:DI [
+;;(const_int 19 [0x13])
+;;(const_int 8 [0x8])
+;;(const_int 5 [0x5])
+;;(const_int 2 [0x2]) repeated x2
+;;] UNSPEC_VSETVL))
+;;   (set (reg/v:DI 135 [  ])
+;;   (sign_extend:DI (subreg:SI (reg:DI 134 [ _1 ]) 0)))
+;;
+;; The reason we can remove signe_extend is because currently the vl value
+;; returned by the vsetvl instruction ranges from 0 to 65536 (uint16_t), and
+;; bits 17 to 63 (including 31) are always 0, so there is no change after
+;; sign_extend. Note that for HI and QI modes we cannot do this.
+;; Of course, if the range of instructions returned by vsetvl later expands
+;; to 32bits, then this combine pattern needs to be removed. But that could be
+;; a long time from now.
+(define_insn_and_split "*vsetvldi_no_side_effects_si_extend"
+  [(set (match_operand:DI 0 "register_operand")
+(sign_extend:DI
+  (subreg:SI
+   (unspec:DI [(match_operand:P 1 "csr_operand")
+   (match_operand 2 "const_int_operand")
+   (match_operand 3 "const_int_operand")
+   (match_operand 4 "const_int_operand")
+   (match_operand 5 "const_int_operand")] UNSPEC_VSETVL) 
0)))]
+  "TARGET_VECTOR && TARGET_64BIT"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(unspec:DI [(match_dup 1)
+(match_dup 2)
+(match_dup 3)
+(match_dup 4)
+(match_dup 5)] UNSPEC_VSETVL))]
+  ""
+  [(set_attr "type" "vsetvl")
+   (set_attr "mode" "SI")])
+
 ;; RVV machine description matching format
 ;; (define_insn ""
 ;;   [(set (match_operand:MODE 0)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c
new file mode 100644
index 000..4cdd5877742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d" } */
+
+#include "riscv_vector.h"
+
+void bar1 (int32_t a);
+
+int32_t
+foo1 ()
+{
+  int32_t a = __riscv_vsetvl_e8mf8(19);
+  bar1 (a);
+  return a;
+}
+
+void bar2 (uint32_t a);
+
+uint32_t
+foo2 ()
+{
+  uint32_t a = __riscv_vsetvl_e8mf8(19);
+  bar2 (a);
+  return a;
+}
+
+int32_t foo3 ()
+{
+  return __riscv_vsetvl_e8mf8(19);
+}
+
+/* { dg-final { scan-assembler-not {sext\.w} { target { no-opts "-O0" "-g" } } 
} } */
-- 
2.36.3

Re: [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-08 Thread juzhe.zhong

lgtm Replied Message FromLehua DingDate11/08/2023 21:27 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,kito.ch...@gmail.com,rdapp@gmail.com,pal...@rivosinc.com,jeffreya...@gmail.com,lehua.d...@rivai.aiSubject[PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

Re: [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-08 Thread Lehua Ding


Committed, thanks Juzhe.

On 2023/11/8 21:29, juzhe.zhong wrote:

lgtm
 Replied Message 
FromLehua Ding 
Date11/08/2023 21:27
To	gcc-patches@gcc.gnu.org 


Cc  juzhe.zh...@rivai.ai 
,
kito.ch...@gmail.com ,
rdapp@gmail.com ,
pal...@rivosinc.com ,
jeffreya...@gmail.com ,
lehua.d...@rivai.ai 
Subject [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

RE: [PATCH]AArch64: Use SVE unpredicated LOGICAL expressions when Advanced SIMD inefficient [PR109154]

2023-11-08 Thread Tamar Christina

> >> > +  "&& TARGET_SVE && rtx_equal_p (operands[0], operands[1])
> >> > +   && satisfies_constraint_ (operands[2])
> >> > +   && FP_REGNUM_P (REGNO (operands[0]))"
> >> > +  [(const_int 0)]
> >> > +  {
> >> > +rtx op1 = lowpart_subreg (mode, operands[1],
> mode);
> >> > +rtx op2 = gen_const_vec_duplicate (mode, operands[2]);
> >> > +emit_insn (gen_3 (op1, op1, op2));
> >> > +DONE;
> >> > +  }
> >> >  )
> >>
> >> The WIP SME patches add a %Z modifier for 'z' register prefixes,
> >> similarly to b/h/s/d for scalar FP.  With that I think the alternative can 
> >> be:
> >>
> >>  [w , 0 , ; * , sve ] \t%Z0., %Z0., #%2
> >>
> >> although it would be nice to keep the hex constant.
> >
> > My original patch added a %u for (undecorated) which just prints the
> > register number and changed %C to also accept a single constant instead of
> only a uniform vector.
> 
> Not saying no to %u in future, but %Z seems more consistent with the current
> approach.  And yeah, I'd also wondered about extending %C.
> The problem is guessing whether to print a 32-bit, 64-bit or 128-bit constant
> for negative immediates.
> 

Rebased patch,

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64.md (3): Add SVE case.
* config/aarch64/aarch64-simd.md (ior3): Likewise.
* config/aarch64/iterators.md (VCONV, vconv): New.
* config/aarch64/predicates.md(aarch64_orr_imm_sve_advsimd): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/fneg-abs_1.c: Updated.
* gcc.target/aarch64/sve/fneg-abs_2.c: Updated.
* gcc.target/aarch64/sve/fneg-abs_4.c: Updated.

--- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
33eceb436584ff73c7271f93639f2246d1af19e0..98c418c54a82a348c597310caa23916f9c16f9b6
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1219,11 +1219,14 @@ (define_insn "and3"
 (define_insn "ior3"
   [(set (match_operand:VDQ_I 0 "register_operand")
(ior:VDQ_I (match_operand:VDQ_I 1 "register_operand")
-  (match_operand:VDQ_I 2 "aarch64_reg_or_orr_imm")))]
-  "TARGET_SIMD"
-  {@ [ cons: =0 , 1 , 2   ]
- [ w, w , w   ] orr\t%0., %1., %2.
- [ w, 0 , Do  ] << aarch64_output_simd_mov_immediate (operands[2], 
, AARCH64_CHECK_ORR);
+  (match_operand:VDQ_I 2 "aarch64_orr_imm_sve_advsimd")))]
+  "TARGET_SIMD"
+  {@ [ cons: =0 , 1 , 2; attrs: arch ]
+ [ w, w , w  ; simd  ] orr\t%0., %1., 
%2.
+ [ w, 0 , vsl; sve   ] orr\t%Z0., %Z0., #%2
+ [ w, 0 , Do ; simd  ] \
+   << aarch64_output_simd_mov_immediate (operands[2], , \
+AARCH64_CHECK_ORR);
   }
   [(set_attr "type" "neon_logic")]
 )
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
4fcd71a2e9d1e8c35f35593255c4f66a68856a79..c6b1506fe7b47dd40741f26ef0cc92692008a631
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4599,7 +4599,8 @@ (define_insn "3"
   ""
   {@ [ cons: =0 , 1  , 2; attrs: type , arch  ]
  [ r, %r , r; logic_reg   , * ] \t%0, 
%1, %2
- [ rk   , r  ,  ; logic_imm   , * ] \t%0, 
%1, %2
+ [ rk   , ^r ,  ; logic_imm   , * ] \t%0, 
%1, %2
+ [ w, 0  ,  ; *   , sve   ] \t%Z0., 
%Z0., #%2
  [ w, w  , w; neon_logic  , simd  ] 
\t%0., %1., %2.
   }
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
1593a8fd04f91259295d0e393cbc7973daf7bf73..d24109b4fe6a867125b9474d34d616155bc36b3f
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1435,6 +1435,19 @@ (define_mode_attr VCONQ [(V8QI "V16QI") (V16QI "V16QI")
 (HI   "V8HI") (QI   "V16QI")
 (SF   "V4SF") (DF   "V2DF")])
 
+;; 128-bit container modes for the lower part of an SVE vector to the inner or
+;; neon source mode.
+(define_mode_attr VCONV [(SI   "VNx4SI")  (DI"VNx2DI")
+(V8QI "VNx16QI") (V16QI "VNx16QI")
+(V4HI "VNx8HI")  (V8HI  "VNx8HI")
+(V2SI "VNx4SI")  (V4SI  "VNx4SI")
+(V2DI "VNx2DI")])
+(define_mode_attr vconv [(SI   "vnx4si")  (DI"vnx2di")
+(V8QI "vnx16qi") (V16QI "vnx16qi")
+(V4HI "vnx8hi")  (V8HI  "vnx8hi")
+(V2SI "vnx4si")  (V4SI  "vnx4si")
+(V2DI "vnx2di")])
+
 ;; Half modes of all vector modes.
 (define_mode_attr VHALF [(V8QI "V4QI")  (V16QI "V8QI")
 (V4HI "V2HI")  (V8HI  "V4HI")
diff --git a/gcc/config/aarch64/predicates.md b/

Re: [PATCH][_Hashtable] Add missing destructor call

2023-11-08 Thread Jonathan Wakely

On Wed, 8 Nov 2023 at 05:39, François Dumont  wrote:
>
>
> On 07/11/2023 00:28, Jonathan Wakely wrote:
> > On Mon, 6 Nov 2023 at 21:39, François Dumont  wrote:
> >> Noticed looking for other occasion to replace __try/__catch with RAII
> >> helper.
> >>
> >>   libstdc++: [_Hashtable] Add missing node destructor call
> >>
> >>   libstdc++-v3/ChangeLog:
> >>
> >>   * include/bits/hashtable_policy.h
> >>   (_Hashtable_alloc<>::_M_allocate_node): Add missing call to
> >> node destructor
> >>   on construct exception.
> >>
> >> Tested under Linux x64, ok to commit ?
> > OK.
> >
> > Is this missing on any branches too?
> Clearly all maintained branches.
> > I don't think it's actually a problem, since it's a trivial destructor 
> > anyway.
>
> Yes, me neither, I was only thinking about sanity checker tools when
> doing this so no plan for backports.

OK, that seems fine, thanks.

[PATCH] i386: Fix C99 compatibility issues in the x86-64 AVX ABI test suite

2023-11-08 Thread Florian Weimer

* gcc.target/x86_64/abi/avx/avx-check.h (main): Call
__builtin_printf instead of printf.
* gcc.target/x86_64/abi/avx/test_passing_m256.c
(fun_check_passing_m256_8_values): Add missing void return
type.
* gcc.target/x86_64/abi/avx512f/avx512f-check.h (main): Call
__builtin_printf instead of printf.
* gcc.target/x86_64/abi/avx512f/test_passing_m512.c
(fun_check_passing_m512_8_values): Add missing void return
type.

---
 gcc/testsuite/gcc.target/x86_64/abi/avx/avx-check.h | 4 ++--
 gcc/testsuite/gcc.target/x86_64/abi/avx/test_passing_m256.c | 1 +
 gcc/testsuite/gcc.target/x86_64/abi/avx512f/avx512f-check.h | 6 +++---
 gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_m512.c | 1 +
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx/avx-check.h 
b/gcc/testsuite/gcc.target/x86_64/abi/avx/avx-check.h
index e66a27e9afd..a04d0777637 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/avx/avx-check.h
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx/avx-check.h
@@ -16,12 +16,12 @@ main ()
 {
   avx_test ();
 #ifdef DEBUG
-  printf ("PASSED\n");
+  __builtin_printf ("PASSED\n");
 #endif
 }
 #ifdef DEBUG
   else
-printf ("SKIPPED\n");
+__builtin_printf ("SKIPPED\n");
 #endif
 
   return 0;
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx/test_passing_m256.c 
b/gcc/testsuite/gcc.target/x86_64/abi/avx/test_passing_m256.c
index ffc3ec36bf7..f739670431b 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/avx/test_passing_m256.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx/test_passing_m256.c
@@ -24,6 +24,7 @@ int failed = 0;
   assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
 } while (0)
 
+void
 fun_check_passing_m256_8_values (__m256 i0 ATTRIBUTE_UNUSED, __m256 i1 
ATTRIBUTE_UNUSED, __m256 i2 ATTRIBUTE_UNUSED, __m256 i3 ATTRIBUTE_UNUSED, 
__m256 i4 ATTRIBUTE_UNUSED, __m256 i5 ATTRIBUTE_UNUSED, __m256 i6 
ATTRIBUTE_UNUSED, __m256 i7 ATTRIBUTE_UNUSED)
 {
   /* Check argument values.  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/avx512f-check.h 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/avx512f-check.h
index 25ce544c4a3..00a7578d2b5 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/avx512f-check.h
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/avx512f-check.h
@@ -24,17 +24,17 @@ main ()
{
  avx512f_test ();
 #ifdef DEBUG
- printf ("PASSED\n");
+ __builtin_printf ("PASSED\n");
 #endif
}
 #ifdef DEBUG
   else
-   printf ("SKIPPED\n");
+   __builtin_printf ("SKIPPED\n");
 #endif
 }
 #ifdef DEBUG
   else
-printf ("SKIPPED\n");
+__builtin_printf ("SKIPPED\n");
 #endif
 
   return 0;
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_m512.c 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_m512.c
index ead9c6797e1..1c88a55fb4b 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_m512.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_passing_m512.c
@@ -24,6 +24,7 @@ int failed = 0;
   assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
 } while (0)
 
+void
 fun_check_passing_m512_8_values (__m512 i0 ATTRIBUTE_UNUSED, __m512 i1 
ATTRIBUTE_UNUSED, __m512 i2 ATTRIBUTE_UNUSED, __m512 i3 ATTRIBUTE_UNUSED, 
__m512 i4 ATTRIBUTE_UNUSED, __m512 i5 ATTRIBUTE_UNUSED, __m512 i6 
ATTRIBUTE_UNUSED, __m512 i7 ATTRIBUTE_UNUSED)
 {
   /* Check argument values.  */

base-commit: d72492cca7a66af166b8dcf58d1166d8b0c59c59

Re: [PATCH] i386: Fix C99 compatibility issues in the x86-64 AVX ABI test suite

2023-11-08 Thread Jakub Jelinek

On Wed, Nov 08, 2023 at 03:55:17PM +0100, Florian Weimer wrote:
>   * gcc.target/x86_64/abi/avx/avx-check.h (main): Call
>   __builtin_printf instead of printf.
>   * gcc.target/x86_64/abi/avx/test_passing_m256.c
>   (fun_check_passing_m256_8_values): Add missing void return
>   type.
>   * gcc.target/x86_64/abi/avx512f/avx512f-check.h (main): Call
>   __builtin_printf instead of printf.
>   * gcc.target/x86_64/abi/avx512f/test_passing_m512.c
>   (fun_check_passing_m512_8_values): Add missing void return
>   type.

LGTM.

Jakub

[PATCH] tree-ssa-loop-ivopts : Add live analysis in regs used in decision making

2023-11-08 Thread Ajit Agarwal

tree-ssa-loop-ivopts : Add live analysis in regs used in decision making.

Add live anaysis in regs used calculation in decision making of
selecting ivopts candidates.

2023-11-08  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_regs_used): New function.
(determine_set_costs): Call to get_regs_used to use live
analysis.
---
 gcc/tree-ssa-loop-ivopts.cc | 73 +++--
 1 file changed, 70 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index c3336603778..e02fe7d434b 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -6160,6 +6160,68 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, 
unsigned n_invs,
   return cost + n_cands;
 }
 
+/* Return regs used based on live-in and liveout of given ssa variables.  */
+static unsigned
+get_regs_used (tree ssa_name)
+{
+  unsigned regs_used = 0;
+  gimple *stmt;
+  use_operand_p use;
+  basic_block def_bb = NULL;
+  imm_use_iterator imm_iter;
+
+  stmt = SSA_NAME_DEF_STMT (ssa_name);
+  if (stmt)
+{
+  def_bb = gimple_bb (stmt);
+  /* Mark defs in liveout bitmap temporarily.  */
+  if (def_bb)
+   regs_used++;
+}
+  else
+def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  /* An undefined local variable does not need to be very alive.  */
+  if (virtual_operand_p (ssa_name)
+  || ssa_undefined_value_p (ssa_name, false))
+return 0;
+
+  /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
+ add it to the list of live on entry blocks.  */
+  FOR_EACH_IMM_USE_FAST (use, imm_iter, ssa_name)
+{
+  gimple *use_stmt = USE_STMT (use);
+  basic_block add_block = NULL;
+
+  if (gimple_code (use_stmt) == GIMPLE_PHI)
+   {
+ /* Uses in PHI's are considered to be live at exit of the SRC block
+as this is where a copy would be inserted.  Check to see if it is
+defined in that block, or whether its live on entry.  */
+ int index = PHI_ARG_INDEX_FROM_USE (use);
+ edge e = gimple_phi_arg_edge (as_a  (use_stmt), index);
+ if (e->src != def_bb)
+   add_block = e->src;
+   }
+  else if (is_gimple_debug (use_stmt))
+   continue;
+  else
+   {
+ /* If its not defined in this block, its live on entry.  */
+ basic_block use_bb = gimple_bb (use_stmt);
+ if (use_bb != def_bb)
+   add_block = use_bb;
+   }
+
+  /* If there was a live on entry use, increment register used.  */
+  if (add_block)
+   {
+ regs_used++;
+   }
+}
+  return regs_used;
+}
+
 /* For each size of the induction variable set determine the penalty.  */
 
 static void
@@ -6200,15 +6262,20 @@ determine_set_costs (struct ivopts_data *data)
   n++;
 }
 
+  unsigned max = 0;
   EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi)
 {
   struct version_info *info = ver_info (data, j);
-
   if (info->inv_id && info->has_nonlin_use)
-   n++;
+   {
+ tree ssa_name = ssa_name (j);
+ n = get_regs_used (ssa_name);
+ if (n >= max)
+   max = n;
+   }
 }
 
-  data->regs_used = n;
+  data->regs_used = max;
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "  regs_used %d\n", n);
 
-- 
2.39.3

Re: [PATCH] gcc.dg/Wmissing-parameter-type*: Test the intended warning

2023-11-08 Thread Jeff Law





On 11/8/23 01:53, Florian Weimer wrote:

gcc/testsuite/ChangeLog:

* gcc.dg/Wmissing-parameter-type.c: Build with -std=gnu89
 to trigger the -Wmissing-parameter-type warning
and not the default -Wimplicit warning.  Also match
against -Wmissing-parameter-type.
* gcc.dg/Wmissing-parameter-type.c: Likewise.

OK
jeff

[PATCH 1/4] Fix SLP of masked loads

2023-11-08 Thread Richard Biener

The following adjusts things to use the correct mask operand for
the SLP of masked loads and gathers.  Test coverage is from
runtime fails of i386 specific AVX512 tests when enabling single-lane
SLP.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vectorizable_load): Use the correct
vectorized mask operand.
---
 gcc/tree-vect-stmts.cc | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 65883e04ad7..096a857f2dd 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10920,9 +10920,6 @@ vectorizable_load (vec_info *vinfo,
   gsi, stmt_info, bump);
}
 
- if (mask && !costing_p)
-   vec_mask = vec_masks[j];
-
  gimple *new_stmt = NULL;
  for (i = 0; i < vec_num; i++)
{
@@ -10931,6 +10928,8 @@ vectorizable_load (vec_info *vinfo,
  tree bias = NULL_TREE;
  if (!costing_p)
{
+ if (mask)
+   vec_mask = vec_masks[vec_num * j + i];
  if (loop_masks)
final_mask
  = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
@@ -11285,8 +11284,6 @@ vectorizable_load (vec_info *vinfo,
  at_loop,
  offset, &dummy, gsi, &ptr_incr,
  simd_lane_access_p, bump);
- if (mask)
-   vec_mask = vec_masks[0];
}
   else if (!costing_p)
{
@@ -11297,8 +11294,6 @@ vectorizable_load (vec_info *vinfo,
  else
dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi,
   stmt_info, bump);
- if (mask)
-   vec_mask = vec_masks[j];
}
 
   if (grouped_load || slp_perm)
@@ -11312,6 +11307,8 @@ vectorizable_load (vec_info *vinfo,
  tree bias = NULL_TREE;
  if (!costing_p)
{
+ if (mask)
+   vec_mask = vec_masks[vec_num * j + i];
  if (loop_masks)
final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
 vec_num * ncopies, vectype,
-- 
2.35.3

[PATCH 2/4] TLC to vect_check_store_rhs and vect_slp_child_index_for_operand

2023-11-08 Thread Richard Biener

This prepares us for the SLP of scatters.  We have to tell
vect_slp_child_index_for_operand whether we are dealing with
a scatter/gather stmt so this adds an argument similar to
the one we have for vect_get_operand_map.  This also refactors
vect_check_store_rhs to get the actual rhs and the associated
SLP node instead of leaving that to the caller.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vectorizer.h (vect_slp_child_index_for_operand):
Add gatherscatter_p argument.
* tree-vect-slp.cc (vect_slp_child_index_for_operand): Likewise.
Pass it on.
* tree-vect-stmts.cc (vect_check_store_rhs): Turn the rhs
argument into an output, also output the SLP node associated
with it.
(vectorizable_simd_clone_call): Adjust.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-slp.cc   |  5 ++--
 gcc/tree-vect-stmts.cc | 52 ++
 gcc/tree-vectorizer.h  |  2 +-
 3 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 13137ede8d4..176aaf270f4 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -589,9 +589,10 @@ vect_get_operand_map (const gimple *stmt, bool 
gather_scatter_p = false,
 /* Return the SLP node child index for operand OP of STMT.  */
 
 int
-vect_slp_child_index_for_operand (const gimple *stmt, int op)
+vect_slp_child_index_for_operand (const gimple *stmt, int op,
+ bool gather_scatter_p)
 {
-  const int *opmap = vect_get_operand_map (stmt);
+  const int *opmap = vect_get_operand_map (stmt, gather_scatter_p);
   if (!opmap)
 return op;
   for (int i = 1; i < 1 + opmap[0]; ++i)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 096a857f2dd..61e23b29516 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2486,42 +2486,33 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return true;
 }
 
-/* Return true if stored value RHS is suitable for vectorizing store
-   statement STMT_INFO.  When returning true, store the type of the
-   definition in *RHS_DT_OUT, the type of the vectorized store value in
+/* Return true if stored value is suitable for vectorizing store
+   statement STMT_INFO.  When returning true, store the scalar stored
+   in *RHS and *RHS_NODE, the type of the definition in *RHS_DT_OUT,
+   the type of the vectorized store value in
*RHS_VECTYPE_OUT and the type of the store in *VLS_TYPE_OUT.  */
 
 static bool
 vect_check_store_rhs (vec_info *vinfo, stmt_vec_info stmt_info,
- slp_tree slp_node, tree rhs,
+ slp_tree slp_node, tree *rhs, slp_tree *rhs_node,
  vect_def_type *rhs_dt_out, tree *rhs_vectype_out,
  vec_load_store_type *vls_type_out)
 {
-  /* In the case this is a store from a constant make sure
- native_encode_expr can handle it.  */
-  if (CONSTANT_CLASS_P (rhs) && native_encode_expr (rhs, NULL, 64) == 0)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"cannot encode constant as a byte sequence.\n");
-  return false;
-}
-
   int op_no = 0;
   if (gcall *call = dyn_cast  (stmt_info->stmt))
 {
   if (gimple_call_internal_p (call)
  && internal_store_fn_p (gimple_call_internal_fn (call)))
op_no = internal_fn_stored_value_index (gimple_call_internal_fn (call));
-  if (slp_node)
-   op_no = vect_slp_child_index_for_operand (call, op_no);
 }
+  if (slp_node)
+op_no = vect_slp_child_index_for_operand
+ (stmt_info->stmt, op_no, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
 
   enum vect_def_type rhs_dt;
   tree rhs_vectype;
-  slp_tree slp_op;
   if (!vect_is_simple_use (vinfo, stmt_info, slp_node, op_no,
-  &rhs, &slp_op, &rhs_dt, &rhs_vectype))
+  rhs, rhs_node, &rhs_dt, &rhs_vectype))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2529,6 +2520,16 @@ vect_check_store_rhs (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
+  /* In the case this is a store from a constant make sure
+ native_encode_expr can handle it.  */
+  if (CONSTANT_CLASS_P (*rhs) && native_encode_expr (*rhs, NULL, 64) == 0)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot encode constant as a byte sequence.\n");
+  return false;
+}
+
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   if (rhs_vectype && !useless_type_conversion_p (vectype, rhs_vectype))
 {
@@ -4052,7 +4053,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   int op_no = i + masked_call_offset;
   if (slp_node)
-   op_no = vect_slp_child_index_

[PATCH 3/4] Fix SLP of emulated gathers

2023-11-08 Thread Richard Biener

The following fixes an error in the SLP of emulated gathers,
discovered by x86 specific tests when enabling single-lane SLP.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vectorizable_load): Adjust offset
vector gathering for SLP of emulated gathers.
---
 gcc/tree-vect-stmts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 61e23b29516..913a4fb08ed 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11163,7 +11163,7 @@ vectorizable_load (vec_info *vinfo,
 than the data vector for now.  */
  unsigned HOST_WIDE_INT factor
= const_offset_nunits / const_nunits;
- vec_offset = vec_offsets[j / factor];
+ vec_offset = vec_offsets[(vec_num * j + i) / factor];
  unsigned elt_offset = (j % factor) * const_nunits;
  tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset));
  tree scale = size_int (gs_info.scale);
-- 
2.35.3

[PATCH 4/4] Refactor x86 decl based scatter vectorization, prepare SLP

2023-11-08 Thread Richard Biener

The following refactors the x86 decl based scatter vectorization
similar to what I did to the gather path.  This prepares scatters
for SLP as well, mainly single-lane since there are multiple
missing bits to support multi-lane scatters.

Tested extensively on the SLP-only branch which has the ability
to force SLP even for single lanes.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/33
* tree-vect-stmts.cc (vect_build_scatter_store_calls):
Remove and refactor to ...
(vect_build_one_scatter_store_call): ... this new function.
(vectorizable_store): Use vect_check_scalar_mask to record
the SLP node for the mask operand.  Code generate scatters
with builtin decls from the main scatter vectorization
path and prepare that for SLP.
---
 gcc/tree-vect-stmts.cc | 683 -
 1 file changed, 326 insertions(+), 357 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 913a4fb08ed..f41b4825a6a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2703,238 +2703,87 @@ vect_build_one_gather_load_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 }
 
 /* Build a scatter store call while vectorizing STMT_INFO.  Insert new
-   instructions before GSI and add them to VEC_STMT.  GS_INFO describes
-   the scatter store operation.  If the store is conditional, MASK is the
-   unvectorized condition, otherwise MASK is null.  */
+   instructions before GSI.  GS_INFO describes the scatter store operation.
+   PTR is the base pointer, OFFSET the vectorized offsets and OPRND the
+   vectorized data to store.
+   If the store is conditional, MASK is the vectorized condition, otherwise
+   MASK is null.  */
 
-static void
-vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info,
-   gimple_stmt_iterator *gsi, gimple **vec_stmt,
-   gather_scatter_info *gs_info, tree mask,
-   stmt_vector_for_cost *cost_vec)
+static gimple *
+vect_build_one_scatter_store_call (vec_info *vinfo, stmt_vec_info stmt_info,
+  gimple_stmt_iterator *gsi,
+  gather_scatter_info *gs_info,
+  tree ptr, tree offset, tree oprnd, tree mask)
 {
-  loop_vec_info loop_vinfo = dyn_cast (vinfo);
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  int ncopies = vect_get_num_copies (loop_vinfo, vectype);
-  enum { NARROW, NONE, WIDEN } modifier;
-  poly_uint64 scatter_off_nunits
-= TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
-
-  /* FIXME: Keep the previous costing way in vect_model_store_cost by
- costing N scalar stores, but it should be tweaked to use target
- specific costs on related scatter store calls.  */
-  if (cost_vec)
-{
-  tree op = vect_get_store_rhs (stmt_info);
-  enum vect_def_type dt;
-  gcc_assert (vect_is_simple_use (op, vinfo, &dt));
-  unsigned int inside_cost, prologue_cost = 0;
-  if (dt == vect_constant_def || dt == vect_external_def)
-   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
-  stmt_info, 0, vect_prologue);
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
- scalar_store, stmt_info, 0, vect_body);
-
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"vect_model_store_cost: inside_cost = %d, "
-"prologue_cost = %d .\n",
-inside_cost, prologue_cost);
-  return;
-}
-
-  tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE;
-  if (known_eq (nunits, scatter_off_nunits))
-modifier = NONE;
-  else if (known_eq (nunits * 2, scatter_off_nunits))
-{
-  modifier = WIDEN;
-
-  /* Currently gathers and scatters are only supported for
-fixed-length vectors.  */
-  unsigned int count = scatter_off_nunits.to_constant ();
-  vec_perm_builder sel (count, count, 1);
-  for (unsigned i = 0; i < (unsigned int) count; ++i)
-   sel.quick_push (i | (count / 2));
-
-  vec_perm_indices indices (sel, 1, count);
-  perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, 
indices);
-  gcc_assert (perm_mask != NULL_TREE);
-}
-  else if (known_eq (nunits, scatter_off_nunits * 2))
-{
-  modifier = NARROW;
-
-  /* Currently gathers and scatters are only supported for
-fixed-length vectors.  */
-  unsigned int count = nunits.to_constant ();
-  vec_perm_builder sel (count, count, 1);
-  for (unsigned i = 0; i < (unsigned int) count; ++i)
-   sel.quick_push (i | (count / 2));
-
-  vec_perm_indices indices (s

Re: testsuite: introduce hostedlib effective target

2023-11-08 Thread Alexandre Oliva

On Nov  7, 2023, Jonathan Wakely  wrote:

> An alternative approach for the g++ testsuite would be to provide a
> set of dummy headers for the non-freestanding ones, so that all the
> hosted-only headers are provided by the testsuite itself, but consist
> of a single line:

> #error not available in freestanding

> Then match on that and XFAIL. So the individual tests themselves
> wouldn't need the dg-skip-if added to them, they would just
> automatically XFAIL if they use a hosted-only header.

*nod*.  That wouldn't cover all the circumstances, alas: there are tests
that fail in freestanding mode not because of headers, but because
-fcontracts (currently?) links libstdc++exp in, and that library is not
even built in freestanding mode.

> The difficulty would be where to add those dummy headers for
> ,  etc. so that they're only found when testing a
> non-hosted build. Maybe libstdc++ could provide them in the build dir
> for the purposes of the testsuite, but not install them?

We run install-tree testing, so that wouldn't quite work for us.  If the
headers are in some subdirectory in the source tree, that we (or the
testsuite machinery) would just add to the -I set, that would help.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] Fix SIMD clone SLP a bit more

2023-11-08 Thread Richard Biener

The following fixes an omission, mangling the non-SLP and SLP
simd-clone info.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vectorizable_simd_clone_call): Record
to the correct simd_clone_info.
---
 gcc/tree-vect-stmts.cc | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b4c2ae31c92..ee89f47c468 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4237,16 +4237,15 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP:
  {
-   auto &clone_info = STMT_VINFO_SIMD_CLONE_INFO (stmt_info);
-   clone_info.safe_grow_cleared (i * 3 + 1, true);
-   clone_info.safe_push (arginfo[i].op);
+   simd_clone_info.safe_grow_cleared (i * 3 + 1, true);
+   simd_clone_info.safe_push (arginfo[i].op);
tree lst = POINTER_TYPE_P (TREE_TYPE (arginfo[i].op))
   ? size_type_node : TREE_TYPE (arginfo[i].op);
tree ls = build_int_cst (lst, arginfo[i].linear_step);
-   clone_info.safe_push (ls);
+   simd_clone_info.safe_push (ls);
tree sll = arginfo[i].simd_lane_linear
   ? boolean_true_node : boolean_false_node;
-   clone_info.safe_push (sll);
+   simd_clone_info.safe_push (sll);
  }
  break;
case SIMD_CLONE_ARG_TYPE_MASK:
-- 
2.35.3

Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-11-08 Thread Tatsuyuki Ishi

> On Nov 7, 2023, at 23:37, Richard Biener  wrote:
> 
> On Tue, 7 Nov 2023, Tatsuyuki Ishi wrote:
> 
>>> On Oct 16, 2023, at 18:16, Richard Biener  wrote:
>>> 
>>> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
>>> 
 
 
> On Oct 16, 2023, at 17:55, Richard Biener  wrote:
> 
> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> 
>> 
>> 
>>> On Oct 16, 2023, at 17:39, Richard Biener  wrote:
>>> 
>>> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
>>> 
 lld and mold are platform-agnostic and not prefixed with target triple.
 Prepending the target triple makes it less likely to find the intended
 linker executable.
 
 A potential breaking change is that we no longer try to search for
 triple-prefixed lld/mold binaries anymore. However, since there doesn't
 seem to be support to build LLVM or mold with triple-prefixed 
 executable
 names, it seems better to just not bother with that case.
 
PR driver/111605
 
 gcc/Changelog:
 
* collect2.cc (main): Do not prepend target triple to
-fuse-ld=lld,mold.
 ---
 gcc/collect2.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)
 
 diff --git a/gcc/collect2.cc b/gcc/collect2.cc
 index 63b9a0c233a..c943f9f577c 100644
 --- a/gcc/collect2.cc
 +++ b/gcc/collect2.cc
 @@ -865,12 +865,15 @@ main (int argc, char **argv)
 int i;
 
 for (i = 0; i < USE_LD_MAX; i++)
 -full_ld_suffixes[i]
 #ifdef CROSS_DIRECTORY_STRUCTURE
 -  = concat (target_machine, "-", ld_suffixes[i], NULL);
 -#else
 -  = ld_suffixes[i];
 -#endif
 +/* lld and mold are platform-agnostic and not prefixed with target
 +   triple.  */
 +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
 +  full_ld_suffixes[i] = concat (target_machine, "-", 
 ld_suffixes[i],
 +  NULL);
 +else
 +#endif
 +  full_ld_suffixes[i] = ld_suffixes[i];
 
 p = argv[0] + strlen (argv[0]);
 while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
>>> 
>>> Since we later do
>>> 
>>> /* Search the compiler directories for `ld'.  We have protection against
>>>  recursive calls in find_a_file.  */
>>> if (ld_file_name == 0)
>>> ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], 
>>> X_OK);
>>> /* Search the ordinary system bin directories
>>>  for `ld' (if native linking) or `TARGET-ld' (if cross).  */
>>> if (ld_file_name == 0)
>>> ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
>>> X_OK);
>>> 
>>> I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
>>> fixes anything?
>> 
>> Per the linked PR, the intended use case for this is when one wants to 
>> use their system lld/mold with a separately packaged cross toolchain, 
>> without requiring them to symlink their system lld/mold into the cross 
>> toolchain bin directory.
>> 
>> (Note that the first search is against COMPILER_PATH while the latter is 
>> against PATH).
> 
> Ah.  So what about instead adding here
> 
> /* Search the ordinary system bin directories for mold/lld even in
>a cross configuration.  */
> if (ld_file_name == 0
> && selected_linker == ...)
>   ld_file_name = find_a_file (&path, ld_suffixes[selected_linker], X_OK);
> 
> instead?  That would keep things working in case the user has a
> xyz-arch-mold in the system dir but uses GNU ld on the host
> otherwise, lacking a 'mold' binary there?
> 
> That is, we'd only add, not change what we search for.
 
 I considered that, but as described in commit message, it doesn?t seem 
 anyone has created stuff named xyz-arch-lld or xyz-arch-mold. Closest is 
 Gentoo?s symlink mentioned in this thread, but that?s xyz-arch-ld -> 
 ld.lld/mold.
 As such, this feels like a quirk, not something we need to keep 
 compatibility for.
>>> 
>>> I don't have a good idea whether this is the case or not unfortunately
>>> so if it's my call I would err on the safe side.
>>> 
>>> We seem to recognize mold and lld only since GCC 12 which both are
>>> still maintained so I think we might want to do the change on all
>>> those branches?
>>> 
>>> If you feel confident there's indeed no such installs then let's go
>>> with your original patch.
>>> 
>>> Thus, OK for trunk and the affected branches after a while of no
>>> reported issues.
>> 
>> Hi,
>> 
>> Can I consider this an approval for this patch to be applied to trunk?
> 
> Yes.
> 
>> I would appreciate if this patch could be tested in GCC 14 prereleases.
>> 
>> I suppose backporti

Re: testsuite: introduce hostedlib effective target

2023-11-08 Thread Jonathan Wakely

On Wed, 8 Nov 2023 at 15:30, Alexandre Oliva  wrote:
>
> On Nov  7, 2023, Jonathan Wakely  wrote:
>
> > An alternative approach for the g++ testsuite would be to provide a
> > set of dummy headers for the non-freestanding ones, so that all the
> > hosted-only headers are provided by the testsuite itself, but consist
> > of a single line:
>
> > #error not available in freestanding
>
> > Then match on that and XFAIL. So the individual tests themselves
> > wouldn't need the dg-skip-if added to them, they would just
> > automatically XFAIL if they use a hosted-only header.
>
> *nod*.  That wouldn't cover all the circumstances, alas: there are tests
> that fail in freestanding mode not because of headers, but because
> -fcontracts (currently?) links libstdc++exp in, and that library is not
> even built in freestanding mode.

Hmm, yes, that seems like a bug. Either we should provide
libstdc++exp.a for freestanding builds (with a simplified contract
violation handler that doesn't print to stdout), or the front end
should not add -lstdc++exp when -ffreestanding is used (which would
require teh user, or the testsuite in your case, to provide a custom
contract violation handler), or it should be an error to use
-fcontracts and -ffreestanding together.

The libstdc++-v3/src/experimental/contract.cc file *already* supports
freestanding, we just don't actually build it for freestanding. We can
do that.

Re: testsuite: introduce hostedlib effective target

2023-11-08 Thread Jonathan Wakely

On Wed, 8 Nov 2023 at 15:48, Jonathan Wakely  wrote:
>
> On Wed, 8 Nov 2023 at 15:30, Alexandre Oliva  wrote:
> >
> > On Nov  7, 2023, Jonathan Wakely  wrote:
> >
> > > An alternative approach for the g++ testsuite would be to provide a
> > > set of dummy headers for the non-freestanding ones, so that all the
> > > hosted-only headers are provided by the testsuite itself, but consist
> > > of a single line:
> >
> > > #error not available in freestanding
> >
> > > Then match on that and XFAIL. So the individual tests themselves
> > > wouldn't need the dg-skip-if added to them, they would just
> > > automatically XFAIL if they use a hosted-only header.
> >
> > *nod*.  That wouldn't cover all the circumstances, alas: there are tests
> > that fail in freestanding mode not because of headers, but because
> > -fcontracts (currently?) links libstdc++exp in, and that library is not
> > even built in freestanding mode.
>
> Hmm, yes, that seems like a bug. Either we should provide
> libstdc++exp.a for freestanding builds (with a simplified contract
> violation handler that doesn't print to stdout), or the front end
> should not add -lstdc++exp when -ffreestanding is used (which would
> require teh user, or the testsuite in your case, to provide a custom
> contract violation handler), or it should be an error to use
> -fcontracts and -ffreestanding together.
>
> The libstdc++-v3/src/experimental/contract.cc file *already* supports
> freestanding, we just don't actually build it for freestanding. We can
> do that.

Which might be as simple as:

--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -34,14 +34,13 @@ backtrace_dir = libbacktrace
else
backtrace_dir =
endif
-
-experimental_dir = experimental
else
filesystem_dir =
backtrace_dir =
-experimental_dir =
endif

+experimental_dir = experimental
+
## Keep this list sync'd with acinclude.m4:GLIBCXX_CONFIGURE.
SUBDIRS = c++98 c++11 c++17 c++20 c++23 \
   $(filesystem_dir) $(backtrace_dir) $(experimental_dir)

[PATCH] skip debug stmts when assigning locus discriminators

2023-11-08 Thread Alexandre Oliva



c-c++-common/goacc/kernels-loop-g.c has been failing (compare-debug)
on i686-linux-gnu since r13-3172, because the implementation enabled
debug stmts to cause discriminators to be assigned differently, and
the discriminators are printed in the .gkd dumps that -fcompare-debug
compares.

This patch prevents debug stmts from affecting the discriminators in
nondebug stmts, but enables debug stmts to get discriminators just as
nondebug stmts would if their line numbers match.

I suppose we could arrange for discriminators to be omitted from the
-fcompare-debug dumps, but keeping discriminators in sync is probably
good to avoid other potential sources of divergence between debug and
nondebug.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(Eugene, I suppose what's special about this testcase, that may not
apply to most other uses of assign_discriminators, is that goacc creates
new functions out of already optimized code.  I think
assign_discriminators may not be suitable for new functions, with code
that isn't exactly pristinely in-order.  WDYT?)


for  gcc/ChangeLog

* tree-cfg.cc (assign_discriminators): Handle debug stmts.
---
 gcc/tree-cfg.cc |   16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 40a6f2a3b529f..a30a2de33a106 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -1214,6 +1214,22 @@ assign_discriminators (void)
{
  gimple *stmt = gsi_stmt (gsi);
 
+ /* Don't allow debug stmts to affect discriminators, but
+allow them to take discriminators when they're on the
+same line as the preceding nondebug stmt.  */
+ if (is_gimple_debug (stmt))
+   {
+ if (curr_locus != UNKNOWN_LOCATION
+ && same_line_p (curr_locus, &curr_locus_e,
+ gimple_location (stmt)))
+   {
+ location_t loc = gimple_location (stmt);
+ location_t dloc = location_with_discriminator (loc,
+curr_discr);
+ gimple_set_location (stmt, dloc);
+   }
+ continue;
+   }
  if (curr_locus == UNKNOWN_LOCATION)
{
  curr_locus = gimple_location (stmt);

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] testsuite: arg-pushing reqs -mno-accumulate-outgoing-args

2023-11-08 Thread Alexandre Oliva



gcc.target/i386/pr95126-m32-[34].c expect push instructions that are
only present with -mno-accumulate-outgoing-args, so make that option
explicit rather than dependent on tuning.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr95126-m32-3.c: Add
-mno-accumulate-outgoing-args.
* gcc.target/i386/pr95126-m32-4.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr95126-m32-3.c |2 +-
 gcc/testsuite/gcc.target/i386/pr95126-m32-4.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr95126-m32-3.c 
b/gcc/testsuite/gcc.target/i386/pr95126-m32-3.c
index cc2fe9480093b..91608f86206d2 100644
--- a/gcc/testsuite/gcc.target/i386/pr95126-m32-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr95126-m32-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-accumulate-outgoing-args" } */
 
 struct small{ short a; };
 
diff --git a/gcc/testsuite/gcc.target/i386/pr95126-m32-4.c 
b/gcc/testsuite/gcc.target/i386/pr95126-m32-4.c
index e82933525450c..85b30f69eca3c 100644
--- a/gcc/testsuite/gcc.target/i386/pr95126-m32-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr95126-m32-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-accumulate-outgoing-args" } */
 
 struct small{ short a,b; };
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] testsuite: adjust gomp test for x86 -m32

2023-11-08 Thread Alexandre Oliva



declare-target-3.C expects .quad for entries in offload_var_table, but
the entries are pointer-wide, so 32-bit targets use .long instead.
Accept both.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

for  gcc/testsuite/ChangeLog

* g++.dg/gomp/declare-target-3.C: Adjust for 32-bit targets.
---
 gcc/testsuite/g++.dg/gomp/declare-target-3.C |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/g++.dg/gomp/declare-target-3.C 
b/gcc/testsuite/g++.dg/gomp/declare-target-3.C
index 1e23c8633acac..b0a90d8d31f50 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-target-3.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-target-3.C
@@ -22,10 +22,10 @@ int *g = &f;// Explicitly marked
 // { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare 
target\\\)\\\)\\\nint bar \\\(\\\)" "gimple" } }
 // { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare 
target\\\)\\\)\\\nint baz \\\(\\\)" "gimple" } }
 // { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare 
target\\\)\\\)\\\nint qux \\\(\\\)" "gimple" } }
-// { dg-final { scan-assembler-not "\\\.offload_var_table:\\n.+\\\.quad\\s+a" 
{ target { offloading_enabled } } } }
-// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+b" { 
target { offloading_enabled } } } }
-// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+c" { 
target { offloading_enabled } } } }
-// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+d" { 
target { offloading_enabled } } } }
-// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+e" { 
target { offloading_enabled } } } }
-// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+f" { 
target { offloading_enabled } } } }
-// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+g" { 
target { offloading_enabled } } } }
+// { dg-final { scan-assembler-not 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+a" { target { offloading_enabled 
} } } }
+// { dg-final { scan-assembler 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+b" { target { offloading_enabled 
} } } }
+// { dg-final { scan-assembler 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+c" { target { offloading_enabled 
} } } }
+// { dg-final { scan-assembler 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+d" { target { offloading_enabled 
} } } }
+// { dg-final { scan-assembler 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+e" { target { offloading_enabled 
} } } }
+// { dg-final { scan-assembler 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+f" { target { offloading_enabled 
} } } }
+// { dg-final { scan-assembler 
"\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+g" { target { offloading_enabled 
} } } }

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] testsuite: force PIC/PIE off for pr58245-1.C

2023-11-08 Thread Alexandre Oliva



This test expects a single mention of stack_chk_fail, as part of a
call sequence, but when e.g. PIE is enabled by default, we output
.hidden stack_chk_fail_local, which makes for a count mismatch.

Disable PIC/PIE so as to not depend on the configurable default.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?


for  gcc/testsuite/ChangeLog

* g++.dg/pr58245-1.C: Disable PIC/PIE.
---
 gcc/testsuite/g++.dg/pr58245-1.C |4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/g++.dg/pr58245-1.C b/gcc/testsuite/g++.dg/pr58245-1.C
index 1439bc62e710e..71d4736ddf610 100644
--- a/gcc/testsuite/g++.dg/pr58245-1.C
+++ b/gcc/testsuite/g++.dg/pr58245-1.C
@@ -8,3 +8,7 @@ bar (void)
 }
 
 /* { dg-final { scan-assembler-times "stack_chk_fail" 1 } } */
+
+/* When compiling for PI[EC], we issue a .hidden stack_chk_fail_local,
+   that causes the above to fail the expected match count.  */
+/* { dg-additional-options "-fno-PIC" } */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

RFA: make scan-assembler* ignore LTO sections (Was: Re: committed [RISC-V]: Harden test scan patterns)

2023-11-08 Thread Joern Rennecke

On Fri, 29 Sept 2023 at 14:54, Jeff Law  wrote:
> ...  Joern  can you post a follow-up manual twiddle so
> that other ports can follow your example and avoid this problem?
>
> THanks,
>
> jeff

The attached patch makes the scan-assembler* directives ignore the LTO
sections.

Regression tested (using QEMU) for
riscv-sim

riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv32imac/-mabi=ilp32

riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
riscv-sim/-march=rv64imac/-mabi=lp64
2023-11-08  Joern Rennecke  

gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Disregard LTO sections.
(scan-assembler-dem, scan-assembler-dem-not): Likewise.
(dg-scan): Likewise, if name starts with scan-assembler.
(scan-raw-assembler): New proc.
* gcc.dg/pr61868.c: Use scan-raw-assembler.
* gcc.dg/scantest-lto.c: New test.
gcc/
* doc/sourcebuild.texi (Scan the assembly output): Document change.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 8bf701461ec..5a34a10e6c2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3276,21 +3276,28 @@ Passes if @var{regexp} does not match text in the file 
generated by
 
 @table @code
 @item scan-assembler @var{regex} [@{ target/xfail @var{selector} @}]
-Passes if @var{regex} matches text in the test's assembler output.
+Passes if @var{regex} matches text in the test's assembler output,
+excluding LTO sections.
+
+@item scan-raw-assembler @var{regex} [@{ target/xfail @var{selector} @}]
+Passes if @var{regex} matches text in the test's assembler output,
+including LTO sections.
 
 @item scan-assembler-not @var{regex} [@{ target/xfail @var{selector} @}]
-Passes if @var{regex} does not match text in the test's assembler output.
+Passes if @var{regex} does not match text in the test's assembler output,
+excluding LTO sections.
 
 @item scan-assembler-times @var{regex} @var{num} [@{ target/xfail 
@var{selector} @}]
 Passes if @var{regex} is matched exactly @var{num} times in the test's
-assembler output.
+assembler output, excluding LTO sections.
 
 @item scan-assembler-dem @var{regex} [@{ target/xfail @var{selector} @}]
-Passes if @var{regex} matches text in the test's demangled assembler output.
+Passes if @var{regex} matches text in the test's demangled assembler output,
+excluding LTO sections.
 
 @item scan-assembler-dem-not @var{regex} [@{ target/xfail @var{selector} @}]
 Passes if @var{regex} does not match text in the test's demangled assembler
-output.
+output, excluding LTO sections.
 
 @item scan-assembler-symbol-section @var{functions} @var{section} [@{ 
target/xfail @var{selector} @}]
 Passes if @var{functions} are all in @var{section}.  The caller needs to
diff --git a/gcc/testsuite/gcc.dg/pr61868.c b/gcc/testsuite/gcc.dg/pr61868.c
index 4a7e8f6ae2d..52ab7838643 100644
--- a/gcc/testsuite/gcc.dg/pr61868.c
+++ b/gcc/testsuite/gcc.dg/pr61868.c
@@ -7,4 +7,4 @@ int main ()
   foo (100);
   return 0;
 }
-/* { dg-final { scan-assembler "\.gnu\.lto.*.12345" } } */
+/* { dg-final { scan-raw-assembler "\.gnu\.lto.*.12345" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 5df80325dff..16b5198d38b 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -79,6 +79,12 @@ proc dg-scan { name positive testcase output_file orig_args 
} {
 }
 set text [read $fd]
 close $fd
+if { [string compare -length 14 $name scan-assembler] == 0 } {
+  # Remove LTO sections.
+  # ??? Somehow, .*? is still greedy.
+  # regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_.*?\n(?=[[:space:]]*\.text\n)}
 $text {\1} text
+  regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_(?:[^\n]*\n(?![[:space:]]*\.(section|text|data|bss)))*[^\n]*\n}
 $text {\1} text
+}
 
 set match [regexp -- $pattern $text]
 if { $match == $positive } {
@@ -108,6 +114,16 @@ proc scan-assembler { args } {
 
 set_required_options_for scan-assembler
 
+proc scan-raw-assembler { args } {
+set testcase [testname-for-summary]
+# The name might include a list of options; extract the file name.
+set filename [lindex $testcase 0]
+set output_file "[file rootname [file tail $filename]].s"
+dg-scan "scan-raw-assembler" 1 $testcase $output_file $args
+}
+
+set_required_options_for scan-raw-assembler
+
 # Check that a pattern is not present in the .s file produced by the
 # compiler.  See dg-scan for details.
 
@@ -487,6 +503,7 @@ proc scan-assembler-times { args } {
 set fd [open $output_file r]
 set text [read $fd]
 close $fd
+regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_(?:[^\n]*\n(?![[:space:]]*\.(section|text|data|bss)))*[^\n]*\n}
 $text {\1} text
 
 set result_count [llength [regexp -inline -a

[PATCH] testsuite: xfail scev-[35].c on ia32

2023-11-08 Thread Alexandre Oliva



These gimplefe tests never got the desired optimization on ia32, but
they only started visibly failing when the representation of MEMs in
dumps changed from printing 'symbol: a' to '&a'.

The transformation is not considered profitable on ia32, that's why it
doesn't take place.  Maybe that's a bug in itself, but it's not a
regression, and not something to be noisy about.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(Richi, is the non-optimization choice on ia32 something unexpected that
ought to be looked into?  I could file a PR, and maybe even look into it
a bit further.)


for  gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/scev-3.c: xfail on ia32.
* gcc.dg/tree-ssa/scev-5.c: Likewise.

Issue: gcc#155
TN: W517-007
---
 gcc/testsuite/gcc.dg/tree-ssa/scev-3.c |2 +-
 gcc/testsuite/gcc.dg/tree-ssa/scev-5.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
index 4babd33f5c062..ac8c8d4519e30 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -40,4 +40,4 @@ __BB(6):
 
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" { xfail ia32 } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
index c2feebdfc2489..c911a9298866f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
@@ -40,4 +40,4 @@ __BB(6):
 
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" { xfail ia32 } } } */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] testsuite: adjust gomp test for x86 -m32

2023-11-08 Thread Jakub Jelinek

On Wed, Nov 08, 2023 at 12:56:44PM -0300, Alexandre Oliva wrote:
> 
> declare-target-3.C expects .quad for entries in offload_var_table, but
> the entries are pointer-wide, so 32-bit targets use .long instead.
> Accept both.
> 
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
> x86_64-.  Ok to install?
> 
> for  gcc/testsuite/ChangeLog
> 
>   * g++.dg/gomp/declare-target-3.C: Adjust for 32-bit targets.

Ok (though, I wonder if it isn't sa .4byte or .8byte etc. on even other
targets).

> --- a/gcc/testsuite/g++.dg/gomp/declare-target-3.C
> +++ b/gcc/testsuite/g++.dg/gomp/declare-target-3.C
> @@ -22,10 +22,10 @@ int *g = &f;  // Explicitly marked
>  // { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare 
> target\\\)\\\)\\\nint bar \\\(\\\)" "gimple" } }
>  // { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare 
> target\\\)\\\)\\\nint baz \\\(\\\)" "gimple" } }
>  // { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare 
> target\\\)\\\)\\\nint qux \\\(\\\)" "gimple" } }
> -// { dg-final { scan-assembler-not 
> "\\\.offload_var_table:\\n.+\\\.quad\\s+a" { target { offloading_enabled } } 
> } }
> -// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+b" { 
> target { offloading_enabled } } } }
> -// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+c" { 
> target { offloading_enabled } } } }
> -// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+d" { 
> target { offloading_enabled } } } }
> -// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+e" { 
> target { offloading_enabled } } } }
> -// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+f" { 
> target { offloading_enabled } } } }
> -// { dg-final { scan-assembler "\\\.offload_var_table:\\n.+\\\.quad\\s+g" { 
> target { offloading_enabled } } } }
> +// { dg-final { scan-assembler-not 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+a" { target { 
> offloading_enabled } } } }
> +// { dg-final { scan-assembler 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+b" { target { 
> offloading_enabled } } } }
> +// { dg-final { scan-assembler 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+c" { target { 
> offloading_enabled } } } }
> +// { dg-final { scan-assembler 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+d" { target { 
> offloading_enabled } } } }
> +// { dg-final { scan-assembler 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+e" { target { 
> offloading_enabled } } } }
> +// { dg-final { scan-assembler 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+f" { target { 
> offloading_enabled } } } }
> +// { dg-final { scan-assembler 
> "\\\.offload_var_table:\\n.+\\\.(quad|long)\\s+g" { target { 
> offloading_enabled } } } }

Jakub

Revert: [PATCH] Power10: Add options to disable load and store vector pair.

2023-11-08 Thread Michael Meissner

I discovered a short coming in the patch I proposed to add
-mno-load-vector-pair and -mno-store-vector-pair tuning options.  I will submit
a new patch shortly.

| Date: Fri, 13 Oct 2023 19:41:13 -0400
| From: Michael Meissner 
| Subject: [PATCH] Power10: Add options to disable load and store vector pair.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH] libstdc++: optimize bit iterators assuming normalization [PR110807]

2023-11-08 Thread Alexandre Oliva



The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.

Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  The challenge is to
do so without runtime overhead, because this is not about checking
that it's in range, it's only about telling the compiler about it.

This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
optimizing, expands to code that invokes undefined behavior in case
the expression doesn't hold, so that the compiler optimizes out the
test and the entire branch containing, but retaining enough
information about the paths that shouldn't be taken, so that at
remaining paths it optimizes based on the assumption.

I also introduce a member function in bit iterators that conveys to
the compiler the information that the assumption is supposed to hold,
and various calls throughout member functions of bit iterators that
might not otherwise know that the offsets have to be in range,
making pessimistic decisions and failing to optimize out cases that it
could.

With the explicit assumptions, the compiler can correlate the test for
available storage in the vector with the test for how much storage
might need to be copied, and determine that, if we're not asking for
enough room for two or more words, we can omit entirely the code to
copy two or more words, without any runtime overhead whatsoever: no
traces remain of the undefined behavior or of the tests that inform
the compiler about the assumptions that must hold.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(It was later found to fix 23_containers/vector/bool/allocator/copy_cc
on x86_64-linux-gnu as well, that fails on gcc-13 with the same warning.)

(The constant_evaluated/static_assert bit is disabled because expr is
not a constant according to some libstdc++ build errors, but there
doesn't seem to be a problem with the other bits.  I haven't really
thought that bit through, it was something I started out as potentially
desirable, but that turned out to be not required.  I suppose I could
just drop it.)

(I suppose __GLIBCXX_BUILTIN_ASSUME could be moved to a more general
place and put to more general uses, but I didn't feel that bold ;-)


for  libstdc++-v3/ChangeLog

PR libstdc++/110807
* include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
(_Bit_iterator_base): Add _M_normalized_p and
_M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
_M_incr, operator==, operator<=>, operator<, and operator-.
(_Bit_iterator): Also use them in operator*.
(_Bit_const_iterator): Likewise.
---
 libstdc++-v3/include/bits/stl_bvector.h |   75 ++-
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd434..81b316846454b 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 _Bit_type * _M_p;
 unsigned int _M_offset;
 
+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but i

[PATCH] vect: Use statement vectype for conditional mask.

2023-11-08 Thread Robin Dapp

Hi,

as Tamar reported in PR112406 we still ICE on aarch64 in SPEC2017
when creating COND_OPs in ifcvt.

The problem is that we fail to deduce the mask's type from the statement
vectype and then end up with a non-matching mask in expand.  This patch
checks if the current op is equal to the mask operand and, if so, uses
the truth type from the stmt_vectype.  Is that a valid approach?  

Bootstrapped and regtested on aarch64, x86 is running.

Besides, the testcase is Tamar's reduced example, originally from
SPEC.  I hope it's ok to include it as is (as imagick is open source
anyway).

Regards
 Robin

gcc/ChangeLog:

PR middle-end/112406

* tree-vect-stmts.cc (vect_get_vec_defs_for_operand): Handle
masks of conditional ops.

gcc/testsuite/ChangeLog:

* gcc.dg/pr112406.c: New test.
---
 gcc/testsuite/gcc.dg/pr112406.c | 37 +
 gcc/tree-vect-stmts.cc  | 20 +-
 2 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112406.c

diff --git a/gcc/testsuite/gcc.dg/pr112406.c b/gcc/testsuite/gcc.dg/pr112406.c
new file mode 100644
index 000..46459c68c4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr112406.c
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-options "-march=armv8-a+sve -w -Ofast" } */
+
+typedef struct {
+  int red
+} MagickPixelPacket;
+
+GetImageChannelMoments_image, GetImageChannelMoments_image_0,
+GetImageChannelMoments___trans_tmp_1, GetImageChannelMoments_M11_0,
+GetImageChannelMoments_pixel_3, GetImageChannelMoments_y,
+GetImageChannelMoments_p;
+
+double GetImageChannelMoments_M00_0, GetImageChannelMoments_M00_1,
+GetImageChannelMoments_M01_1;
+
+MagickPixelPacket GetImageChannelMoments_pixel;
+
+SetMagickPixelPacket(int color, MagickPixelPacket *pixel) {
+  pixel->red = color;
+}
+
+GetImageChannelMoments() {
+  for (; GetImageChannelMoments_y; GetImageChannelMoments_y++) {
+SetMagickPixelPacket(GetImageChannelMoments_p,
+ &GetImageChannelMoments_pixel);
+GetImageChannelMoments_M00_1 += GetImageChannelMoments_pixel.red;
+if (GetImageChannelMoments_image)
+  GetImageChannelMoments_M00_1++;
+GetImageChannelMoments_M01_1 +=
+GetImageChannelMoments_y * GetImageChannelMoments_pixel_3;
+if (GetImageChannelMoments_image_0)
+  GetImageChannelMoments_M00_0++;
+GetImageChannelMoments_M01_1 +=
+GetImageChannelMoments_y * GetImageChannelMoments_p++;
+  }
+  GetImageChannelMoments___trans_tmp_1 = atan(GetImageChannelMoments_M11_0);
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 65883e04ad7..6793b01bf44 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1238,10 +1238,28 @@ vect_get_vec_defs_for_operand (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
   tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
   tree vector_type;
 
+  /* For a COND_OP the mask operand's type must not be deduced from the
+scalar type but from the statement's vectype.  */
+  bool use_stmt_vectype = false;
+  gcall *call;
+  if ((call = dyn_cast  (STMT_VINFO_STMT (stmt_vinfo)))
+ && gimple_call_internal_p (call))
+   {
+ internal_fn ifn = gimple_call_internal_fn (call);
+ int mask_idx = -1;
+ if (ifn != IFN_LAST
+ && (mask_idx = internal_fn_mask_index (ifn)) != -1)
+   {
+ tree maskop = gimple_call_arg (call, mask_idx);
+ if (op == maskop)
+   use_stmt_vectype = true;
+   }
+   }
+
   if (vectype)
vector_type = vectype;
   else if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
-  && VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
+  && (use_stmt_vectype || VECTOR_BOOLEAN_TYPE_P (stmt_vectype)))
vector_type = truth_type_for (stmt_vectype);
   else
vector_type = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op));
-- 
2.41.0

Re: PR111754

2023-11-08 Thread Prathamesh Kulkarni

On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
 wrote:
>
> On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
>  wrote:
> >
> > Prathamesh Kulkarni  writes:
> > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > >  wrote:
> > >>
> > >> Hi,
> > >>
> > >> Sorry the slow review.  I clearly didn't think this through properly
> > >> when doing the review of the original patch, so I wanted to spend
> > >> some time working on the code to get a better understanding of
> > >> the problem.
> > >>
> > >> Prathamesh Kulkarni  writes:
> > >> > Hi,
> > >> > For the following test-case:
> > >> >
> > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > >> > F foo (F a, F b)
> > >> > {
> > >> >   F v = (F) { 9 };
> > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > >> > }
> > >> >
> > >> > Compiling with -O2 results in following ICE:
> > >> > foo.c: In function ‘foo’:
> > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > >> >   |  ^~
> > >> > 0x7f3185 wi::int_traits
> > >> >>::decompose(long*, unsigned int, std::pair
> > >> > const&)
> > >> > ../../gcc/gcc/rtl.h:2314
> > >> > 0x7f3185 wide_int_ref_storage > >> > false>::wide_int_ref_storage
> > >> >>(std::pair const&)
> > >> > ../../gcc/gcc/wide-int.h:1089
> > >> > 0x7f3185 generic_wide_int
> > >> >>::generic_wide_int
> > >> >>(std::pair const&)
> > >> > ../../gcc/gcc/wide-int.h:847
> > >> > 0x7f3185 poly_int<1u, generic_wide_int > >> > false> > >::poly_int
> > >> >>(poly_int_full, std::pair const&)
> > >> > ../../gcc/gcc/poly-int.h:467
> > >> > 0x7f3185 poly_int<1u, generic_wide_int > >> > false> > >::poly_int
> > >> >>(std::pair const&)
> > >> > ../../gcc/gcc/poly-int.h:453
> > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > >> > ../../gcc/gcc/rtl.h:2383
> > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > >> > 0xfd4e1b vector_builder > >> > rtx_vector_builder>::elt(unsigned int) const
> > >> > ../../gcc/gcc/vector-builder.h:253
> > >> > 0xfd4d11 rtx_vector_builder::build()
> > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > >> > 0xc21d9c const_vector_from_tree
> > >> > ../../gcc/gcc/expr.cc:13487
> > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > >> > expand_modifier, rtx_def**, bool)
> > >> > ../../gcc/gcc/expr.cc:11059
> > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > >> > expand_modifier)
> > >> > ../../gcc/gcc/expr.h:310
> > >> > 0xaee682 expand_return
> > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > >> > 0xaee682 expand_gimple_stmt_1
> > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > >> > 0xaee682 expand_gimple_stmt
> > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > >> > 0xaf28f0 expand_gimple_basic_block
> > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > >> > 0xaf4996 execute
> > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > >> >
> > >> > IIUC, the issue is that fold_vec_perm returns a vector having float 
> > >> > element
> > >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > >> > to derive element v[3], not present in the encoding, while trying to
> > >> > build rtx vector
> > >> > in rtx_vector_builder::build():
> > >> >  for (unsigned int i = 0; i < nelts; ++i)
> > >> > RTVEC_ELT (v, i) = elt (i);
> > >> >
> > >> > The attached patch tries to fix this by returning false from
> > >> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > >> > input vector has non-integral element type, so for VLA vectors, it
> > >> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > >> > non-integral element type.
> > >> >
> > >> > For VLS vectors, this will still work for stepped sequence since it
> > >> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > >> > res_npattern = res_nelts and
> > >> > res_nelts_per_pattern = 1
> > >> >
> > >> > and fold the above case to:
> > >> > F foo (F a, F b)
> > >> > {
> > >> >[local count: 1073741824]:
> > >> >   return { 0.0, 9.0e+0, 0.0, 0.0 };
> > >> > }
> > >> >
> > >> > But I am not sure if this is entirely correct, since:
> > >> > tree res = out_elts.build ();
> > >> > will canonicalize the encoding and may result in a stepped sequence
> > >> > (vector_builder::finalize() may reduce npatterns at the cost of 
> > >> > increasing
> > >> > nelts_per_pattern)  ?
> > >> >
> > >> > PS: This issue is now latent after PR111648 fix, since
> > >> > valid_mask_for_fold_vec_perm_cst with  sel = {1, 0, 1, ...} returns
> > >> > false because the corresponding pattern in arg0 is not a natural
> > >> > stepped sequence, and folds correctly using VLS exception. However, I
> > >> > guess the underlying issue of dealing with non-integral element types
> >

Re: testsuite: introduce hostedlib effective target

2023-11-08 Thread Alexandre Oliva

On Nov  5, 2023, Mike Stump  wrote:

> that, otherwise, I'll approve this version.

FWIW, this version is not usable as is.  Something went wrong in my
testing, and several regressions only visible in hosted mode made to the
version I posted, that adds some missing end-of-comment markers for the
added dg directives, and moving the new dg directive to the end so as to
not disturb line numbers.  I've got a fully fixed and properly tested
version, but since it's about as big as the original patch, I'll only
post it upon request.

(in case anyone's interested, it's one of the changes in
refs/users/aoliva/heads/testme that's not in
refs/users/aoliva/headers/testbase right now)

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH v2] i386 PIE: accept @GOTOFF in load/store multi base address

2023-11-08 Thread Alexandre Oliva

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598872.html

Looking at the code generated for sse2-{load,store}-multi.c with PIE,
I realized we could use UNSPEC_GOTOFF as a base address, and that this
would enable the test to use the vector insns expected by the tests
even with PIC, so I extended the base + offset logic used by the SSE2
multi-load/store peepholes to accept reg + symbolic base + offset too,
so that the test generated the expected insns even with PIE.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?


for  gcc/ChangeLog

* config/i386/i386.cc (symbolic_base_address_p,
base_address_p): New, factored out from...
(extract_base_offset_in_addr): ... here and extended to
recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c
and sse2-store-multi.c with PIE enabled by default.
---
 gcc/config/i386/i386.cc |   89 ---
 1 file changed, 75 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index c2bd07fced7b1..eec9b42396e0a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25198,11 +25198,40 @@ ix86_reloc_rw_mask (void)
 }
 #endif
 
-/* If MEM is in the form of [base+offset], extract the two parts
-   of address and set to BASE and OFFSET, otherwise return false.  */
+/* Return true iff ADDR can be used as a symbolic base address.  */
 
 static bool
-extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset)
+symbolic_base_address_p (rtx addr)
+{
+  if (GET_CODE (addr) == SYMBOL_REF)
+return true;
+
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_GOTOFF)
+return true;
+
+  return false;
+}
+
+/* Return true iff ADDR can be used as a base address.  */
+
+static bool
+base_address_p (rtx addr)
+{
+  if (REG_P (addr))
+return true;
+
+  if (symbolic_base_address_p (addr))
+return true;
+
+  return false;
+}
+
+/* If MEM is in the form of [(base+symbase)+offset], extract the three
+   parts of address and set to BASE, SYMBASE and OFFSET, otherwise
+   return false.  */
+
+static bool
+extract_base_offset_in_addr (rtx mem, rtx *base, rtx *symbase, rtx *offset)
 {
   rtx addr;
 
@@ -25213,21 +25242,52 @@ extract_base_offset_in_addr (rtx mem, rtx *base, rtx 
*offset)
   if (GET_CODE (addr) == CONST)
 addr = XEXP (addr, 0);
 
-  if (REG_P (addr) || GET_CODE (addr) == SYMBOL_REF)
+  if (base_address_p (addr))
 {
   *base = addr;
+  *symbase = const0_rtx;
   *offset = const0_rtx;
   return true;
 }
 
   if (GET_CODE (addr) == PLUS
-  && (REG_P (XEXP (addr, 0))
- || GET_CODE (XEXP (addr, 0)) == SYMBOL_REF)
-  && CONST_INT_P (XEXP (addr, 1)))
+  && base_address_p (XEXP (addr, 0)))
 {
-  *base = XEXP (addr, 0);
-  *offset = XEXP (addr, 1);
-  return true;
+  rtx addend = XEXP (addr, 1);
+
+  if (GET_CODE (addend) == CONST)
+   addend = XEXP (addend, 0);
+
+  if (CONST_INT_P (addend))
+   {
+ *base = XEXP (addr, 0);
+ *symbase = const0_rtx;
+ *offset = addend;
+ return true;
+   }
+
+  /* Also accept REG + symbolic ref, with or without a CONST_INT
+offset.  */
+  if (REG_P (XEXP (addr, 0)))
+   {
+ if (symbolic_base_address_p (addend))
+   {
+ *base = XEXP (addr, 0);
+ *symbase = addend;
+ *offset = const0_rtx;
+ return true;
+   }
+
+ if (GET_CODE (addend) == PLUS
+ && symbolic_base_address_p (XEXP (addend, 0))
+ && CONST_INT_P (XEXP (addend, 1)))
+   {
+ *base = XEXP (addr, 0);
+ *symbase = XEXP (addend, 0);
+ *offset = XEXP (addend, 1);
+ return true;
+   }
+   }
 }
 
   return false;
@@ -25242,7 +25302,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool 
load,
machine_mode mode)
 {
   HOST_WIDE_INT offval_1, offval_2, msize;
-  rtx mem_1, mem_2, reg_1, reg_2, base_1, base_2, offset_1, offset_2;
+  rtx mem_1, mem_2, reg_1, reg_2, base_1, base_2,
+symbase_1, symbase_2, offset_1, offset_2;
 
   if (load)
 {
@@ -25265,13 +25326,13 @@ ix86_operands_ok_for_move_multiple (rtx *operands, 
bool load,
 return false;
 
   /* Check if the addresses are in the form of [base+offset].  */
-  if (!extract_base_offset_in_addr (mem_1, &base_1, &offset_1))
+  if (!extract_base_offset_in_addr (mem_1, &base_1, &symbase_1, &offset_1))
 return false;
-  if (!extract_base_offset_in_addr (mem_2, &base_2, &offset_2))
+  if (!extract_base_offset_in_addr (mem_2, &base_2, &symbase_2, &offset_2))
 return false;
 
   /* Check if the bases are the same.  */
-  if (!rtx_equal_p (base_1, base_2))
+  if (!rtx_equal_p (base_1, base_2) || !rtx_equal_p (symbase_1, symbase_2))
 return false;
 
   offval_1 = INTVAL (offset_1);


--

[PATCH v2] [PR83782] ifunc: back-propagate ifunc_resolver to aliases

2023-11-08 Thread Alexandre Oliva

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599453.html

gcc.target/i386/mvc10.c fails with -fPIE on ia32 because we omit the
@PLT mark when calling an alias to an indirect function.  Such aliases
aren't marked as ifunc_resolvers in the cgraph, so the test that would
have forced the PLT call fails.

I've arranged for ifunc_resolver to be back-propagated to aliases, and
relaxed the test that required the ifunc attribute to be attached to
directly the decl, rather than taken from an aliased decl, when the
ifunc_resolver bit is set.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(in the initial patchset for PR83782 and mvc10, I also needed
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598873.html but I'm
not getting that fail any more with gcc-13, apparently because a
different patch was put in to address that part)


for  gcc/ChangeLog

PR target/83782
* cgraph.h (symtab_node::set_ifunc_resolver): New, overloaded.
Back-propagate flag to aliases.
* cgraph.cc (cgraph_node::create): Use set_ifunc_resolver.
(cgraph_node::create_alias): Likewise.
* lto-cgraph.cc (input_node): Likewise.
* multiple_target.cc (create_dispatcher_calls): Propagate to
aliases when redirecting them.
* symtab.cc (symtab_node::verify_base): Accept ifunc_resolver
set in an alias to another ifunc_resolver nodes.
(symtab_node::resolve_alias): Propagate ifunc_resolver from
resolved target to alias.
* varasm.cc (do_assemble_alias): Checking for the attribute.
---
 gcc/cgraph.cc  |4 ++--
 gcc/cgraph.h   |   13 +
 gcc/lto-cgraph.cc  |2 +-
 gcc/multiple_target.cc |2 ++
 gcc/symtab.cc  |   15 ++-
 gcc/varasm.cc  |5 -
 6 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index e41e5ad3ae74d..046dadf53af93 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -518,7 +518,7 @@ cgraph_node::create (tree decl)
 }
 
   if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)))
-node->ifunc_resolver = true;
+node->set_ifunc_resolver ();
 
   node->register_symbol ();
   maybe_record_nested_function (node);
@@ -576,7 +576,7 @@ cgraph_node::create_alias (tree alias, tree target)
   if (lookup_attribute ("weakref", DECL_ATTRIBUTES (alias)) != NULL)
 alias_node->transparent_alias = alias_node->weakref = true;
   if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (alias)))
-alias_node->ifunc_resolver = true;
+alias_node->set_ifunc_resolver ();
   return alias_node;
 }
 
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cedaaac3a45b7..e118ac75121ac 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -471,6 +471,19 @@ public:
 return decl->decl_with_vis.symtab_node;
   }
 
+  /* Worked for the nonstatic set_ifunc_resolver, to vback-propagate
+ ifunc_resolver in the alias chain.  */
+  static bool set_ifunc_resolver (symtab_node *n, void * = NULL)
+  {
+n->ifunc_resolver = true;
+return false;
+  }
+
+  /* Set the ifunc_resolver bit in this node and in any aliases thereof.  */
+  void set_ifunc_resolver () {
+call_for_symbol_and_aliases (set_ifunc_resolver, NULL, true);
+  }
+
   /* Try to find a symtab node for declaration DECL and if it does not
  exist or if it corresponds to an inline clone, create a new one.  */
   static inline symtab_node * get_create (tree node);
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6dbc1..e7f77ca72242a 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -1294,7 +1294,7 @@ input_node (struct lto_file_decl_data *file_data,
   node = symtab->create_empty ();
   node->decl = fn_decl;
   if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (fn_decl)))
-   node->ifunc_resolver = 1;
+   node->set_ifunc_resolver ();
   node->register_symbol ();
 }
 
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index a2ed048d7dd28..26c73d6a1e4cf 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -160,6 +160,8 @@ create_dispatcher_calls (struct cgraph_node *node)
  source->create_reference (inode, IPA_REF_ALIAS);
  if (inode->get_comdat_group ())
source->add_to_same_comdat_group (inode);
+ if (!source->ifunc_resolver)
+   source->set_ifunc_resolver ();
}
  else
gcc_unreachable ();
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index 0470509a98d2a..b35b879028def 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -1109,9 +1109,19 @@ symtab_node::verify_base (void)
   error ("function symbol is not function");
   error_found = true;
}
+  /* If the ifunc attribute is present, the node must be marked as
+ifunc_resolver, but it may also be marked on a node that
+doesn't have the attribute, if it's an alias to another
+ma

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-08 Thread Dimitar Dimitrov

On Wed, Nov 08, 2023 at 11:47:33AM +0800, Lehua Ding wrote:
> Hi,
> 
> These patchs try to support subreg coalesce feature in
> register allocation passes (ira and lra).

Hi Lehua,

This patch set breaks the build for at least three embedded targets. See
below.

For avr the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of 
overloaded ‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
  149 | set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);


For arm-none-eabi the newlib build fails with:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal 
compiler error: Floating point exception
  279 | }
  | ^
0x1176e0f crash_signal
/mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0xf6008d get_range_hard_regs(int, subreg_range const&)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0xf6008d get_range_hard_regs(int, subreg_range const&)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
0xf60312 new_insn_reg
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0xf6064d add_regs_to_insn_regno_info
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
0xf62909 lra_update_insn_regno_info(rtx_insn*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0xf62e46 lra_update_insn_regno_info(rtx_insn*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
0xf62e46 lra_push_insn_1
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0xf62f2d lra_push_insn(rtx_insn*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0xf62f2d push_insns
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0xf63302 push_insns
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
0xf63302 lra(_IO_FILE*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0xf0e399 do_reload 
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xf0e399 execute
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148


For pru-elf the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c: In function 
'linear_search_fdes':
/mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c:1035:1: internal 
compiler error: Floating point exception
 1035 | }
  | ^
0x1694f2e crash_signal
/mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0x1313178 get_range_hard_regs(int, subreg_range const&)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0x131343a new_insn_reg
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0x13174f0 add_regs_to_insn_regno_info
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1608
0x1318479 lra_update_insn_regno_info(rtx_insn*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0x13196ab lra_push_insn_1
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0x13196de lra_push_insn(rtx_insn*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0x13197da push_insns
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0x131b6dc lra(_IO_FILE*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0x129f237 do_reload
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0x129f6c6 execute
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148


The divide by zero error above is interesting. I'm not sure why 
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following 
rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding 
])) -1
 (nil))

Regards,
Dimitar

[patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-11-08 Thread Tobias Burnus


Hi all,

Comment to reviewers:
* Fortran: Except for ensuring that the version field in array descriptors
  is always set to the default (zero), the generated code should only be
  affected when -fopenmp-allocators is set, even though several files are
  touched.
* Middle-end: BUILT_IN_GOMP_REALLOC has been added - otherwise untouched.
* Otherwise, smaller libgomp changes, conditions to (de)allocation code in
  fortran/trans*.cc and and some checking updates (mostly openmp.cc)

* * *

GCC supports OpenMP's allocators, which work typically as:

  my_c_ptr = omp_alloc (byte_size, my_allocator)
  ...
  call omp_free (my_c_ptr, omp_null_allocator)

where (if called as such) the runtime has to find the used allocator in
order to handle the 'free' (and likewise: omp_realloc) correctly. libgomp
implements this by allocating a bit more bytes - and using the first bytes
to store the handle for the allocator such that 'my_c_ptr minus size of handle'
will be the address. See also OpenMP spec and:
  https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html
  https://gcc.gnu.org/onlinedocs/libgomp/Memory-Management-Routines.html
  https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html
and https://gcc.gnu.org/wiki/cauldron2023 (OpenMP BoF; video recordings not
yet available, slide is)


FOR FORTRAN, OpenMP permits to allocate ALLOCATABLES and POINTERS also as
follows:

  !$omp allocators allocate(allocator(my_alloc), align(128) : A)
   allocate(A(10), B)
   A = [1,2,3]   ! reallocate with same allocator
   call intent_out_function(B)  ! Has to use proper deallocator
   deallocate(A)   ! Likewise.
   ! end of scope deallocation: Likewise.

(Side remark: In 5.{1,2}, '!$omp allocate(A,B) allocator(my_alloc) align(123)'
is the syntax to use - which has nearly the same effect, except that for
non-specified variables, 'omp allocators' uses the normal Fortran allocation
while for a 'omp allocate' without a variable list uses that OpenMP allocator
for nonlisted variables.)

* * *

The problem is really that 'malloc'ed memory has to be freed/realloced by 'free'
and 'realloc' while 'omp_alloc'ed memory has to be by handled by 'omp_free'
and 'omp_realloc' - getting this wrong will nearly always crash the program-

I assume that the propagation depth is rather slow, i.e. most likely all 
deallocation
will happen in the file as the allocation, but that's not guaranteed and I bet 
that
a few "leaks" to other files are likely in every other software package.

* * *

ASSUMPTIONS for the attached implementation:

* Most OpenMP code will not use '!$omp allocators'
  (Note: Using the API routines or 'allocate' clauses on block-associated
   directives (like: omp parallel firstprivate(a) allocate(allocator(my_alloc) 
:a)')
   or 'omp allocate' for stack variables are separate and pose no problems.)

* The (de,re)allocation will not happen in a hot code

* And, if used, the number of scalar variables of this kind will be small


SOLUTION as implemented:

* All code that uses 'omp allocator' and all code that might deallocate such 
memory
  must be compiled by a special flag:
 -fopenmp-allocators
  This solves the issues:
  - Always having an overhead even if -fopenmp code does not need it
  - Permitting (de,re)allocation of such a variable from code which is not 
compiled
with -fopenmp

  While -fopenmp-allocators could be auto-enabled when 'omp allocators' shows 
up in
  a file, I decided to require it explicitly by the user in order to highlight 
that
  other files might require the same flag as thy might do (de,re)allocation on 
such
  memory.

* For ARRAYS, we fortunately can encode it in the descriptor. I (mis)use the 
version
  field for this: version = 0 means standard Fortran way while version = 1 
means using
  omp_alloc and friends.

* For SCALARS, there is no way to store this. As (see assumptions) this is 
neither in a
  hot path nor are there very many variables, we simply keep track of such 
variables in
  a separate way. (O (log2 N)) in libgomp - by keekping track of the pointer 
address in
  libgomp.


Disclaimer:
* I am not 100% sure that I have caught all corner cases for 
deallocation/reallocation;
  however, it should covers most.

* One area that is probably not fully covered is BIND(C). A Fortran actual to a 
BIND(C)
  intent(out) should work (dealloced on the caller side), once converted to a 
CFI descriptor,
  all deallocations will likely fail, be it a later intrinsic-assignment 
realloc,
  cfi_deallocate or 'deallocate' after conversion to Fortran.

  This can be fixed but requires (a) adding the how-allocated to the CFI 
descriptor but not
  as version (as that is user visible) and (b) handling it in CFI_deallocate.
  The latter will add a dependency on 'omp_free', which somehow has to be 
resolved.
  (Like weak symbols, which is likely not supported on all platforms.)

  Thus, as very special case, it has been left out - but it could be added. If 
a user
  code hits it, it should cause a repro

Re: [PATCH] libgcc: Add {unsigned , }__int128 <-> _Decimal{32, 64, 128} conversion support [PR65833]

2023-11-08 Thread Joseph Myers

On Wed, 8 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds the missing
> {unsigned ,}__int128 <-> _Decimal{32,64,128}
> conversion support into libgcc.a on top of the _BitInt support
> (doing it without that would be larger amount of code and I hope all
> the targets which support __int128 will eventually support _BitInt,
> after all it is a required part of C23) and because it is in libgcc.a
> only, it doesn't hurt that much if it is added for some architectures
> only in GCC 15.
> Initially I thought about doing this on the compiler side, but doing
> it on the library side seems to be easier and more -Os friendly.
> The tests currently require bitint effective target, that can be
> removed when all the int128 targets support bitint.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v2] c: Add -Wreturn-mismatch warning, split from -Wreturn-type

2023-11-08 Thread Joseph Myers

On Wed, 8 Nov 2023, Florian Weimer wrote:

> > v2: Update comment in gcc.dg/noncompile/pr55976-2.c.  Do not produce
> > an error in C90 pedantic-error mode for return; in a function
> > returning non-void.  Add gcc.dg/Wreturn-mismatch-6.c to demonstrate
> > this behavior.
> 
> Ping?  Original patch:
> 
> 

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-08 Thread Jeff Law





On 11/8/23 02:40, Richard Sandiford wrote:

Lehua Ding  writes:

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).


Thanks a lot for the series.  This is definitely something we've
needed for a while.

I probably won't be able to look at it in detail for a couple of weeks
(and the real review should come from Vlad anyway), but one initial
comment:

Absolutely agreed on the above.

The other thing to ponder.  Jivan and I have been banging on Joern's 
sub-object tracking bits for a totally different problem in the RISC-V 
space.  But there may be some overlap.


Essentially Joern's code tracks liveness for a few chunks in registers. 
bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
propagating liveness from the destination through to the sources.  SO 
for example if we have


(set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))

If we had previously determined that only bits 0..15 were live in DEST, 
then we'll propagate that into the source registers.


The goal is to ultimately transform something like

(set (dest:mode) (any_extend:mode (reg:narrower_mode)))

into

(set (dest:mode) (subreg:mode (reg:narrower_mode)))

Where the latter typically will get simplified and propagated away.


Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
it from a correctness standpoint.  It'll also need the usual cleanups.


Anyway, point being I think it'll be worth looking at Lehua's bits and 
Joern's bits to see if there's anything that can and should be shared. 
Given I'm getting fairly familiar with Joern's bits, that likely falls 
to me.


Jeff



Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.

I agree with Richi of course that compile-time is a concern.
The patch seems to add quite a bit of new data to ira_allocno,
but perhaps that's OK.  ira_object + ira_allocno is already quite big.

However:

@@ -387,8 +398,8 @@ struct ira_allocno
/* An array of structures describing conflict information and live
   ranges for each object associated with the allocno.  There may be
   more than one such object in cases where the allocno represents a
- multi-word register.  */
-  ira_object_t objects[2];
+ multi-hardreg pesudo.  */
+  std::vector objects;
/* Registers clobbered by intersected calls.  */
 HARD_REG_SET crossed_calls_clobbered_regs;
/* Array of usage costs (accumulated and the one updated during

adds an extra level of indirection (and separate extra storage) for
every allocno, not just multi-hardreg ones.  It'd be worth optimising
the data structures' representation of single-hardreg pseudos even if
that slows down the multi-hardreg code, since single-hardreg pseudos are
so much more common.  And the different single-hardreg and multi-hardreg
representations could be hidden behind accessors, to make life easier
for consumers.  (Of course, performance of the accessors is also then
an issue. :))

Richard

Re: [PATCH] libstdc++: optimize bit iterators assuming normalization [PR110807]

2023-11-08 Thread Jonathan Wakely


On 08/11/23 13:10 -0300, Alexandre Oliva wrote:

The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.

Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  The challenge is to
do so without runtime overhead, because this is not about checking
that it's in range, it's only about telling the compiler about it.

This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
optimizing, expands to code that invokes undefined behavior in case
the expression doesn't hold, so that the compiler optimizes out the
test and the entire branch containing, but retaining enough
information about the paths that shouldn't be taken, so that at
remaining paths it optimizes based on the assumption.

I also introduce a member function in bit iterators that conveys to
the compiler the information that the assumption is supposed to hold,
and various calls throughout member functions of bit iterators that
might not otherwise know that the offsets have to be in range,
making pessimistic decisions and failing to optimize out cases that it
could.

With the explicit assumptions, the compiler can correlate the test for
available storage in the vector with the test for how much storage
might need to be copied, and determine that, if we're not asking for
enough room for two or more words, we can omit entirely the code to
copy two or more words, without any runtime overhead whatsoever: no
traces remain of the undefined behavior or of the tests that inform
the compiler about the assumptions that must hold.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(It was later found to fix 23_containers/vector/bool/allocator/copy_cc
on x86_64-linux-gnu as well, that fails on gcc-13 with the same warning.)

(The constant_evaluated/static_assert bit is disabled because expr is
not a constant according to some libstdc++ build errors, but there
doesn't seem to be a problem with the other bits.  I haven't really
thought that bit through, it was something I started out as potentially
desirable, but that turned out to be not required.  I suppose I could
just drop it.)

(I suppose __GLIBCXX_BUILTIN_ASSUME could be moved to a more general
place and put to more general uses, but I didn't feel that bold ;-)


for  libstdc++-v3/ChangeLog

PR libstdc++/110807
* include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
(_Bit_iterator_base): Add _M_normalized_p and
_M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
_M_incr, operator==, operator<=>, operator<, and operator-.
(_Bit_iterator): Also use them in operator*.
(_Bit_const_iterator): Likewise.
---
libstdc++-v3/include/bits/stl_bvector.h |   75 ++-
1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd434..81b316846454b 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_Bit_type * _M_p;
unsigned int _M_offset;

+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing,

[PATCH][_Hahstable] Use RAII to guard node pointer while constructing

2023-11-08 Thread François Dumont


Another proposal to use RAII rather than __try/__catch block.

libstdc++: [_Hashtable] Use RAII type to guard node while constructing value

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h
    (struct _NodePtrGuard<_HashtableAlloc, _NodePtr>): New.
    (_ReuseAllocNode::operator()(_Args&&...)): Use latter to guard 
allocated node

    pointer while constructing in place the value_type instance.

Tested under Linux x64, ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index cd8943d8d05..c67eebd3b2b 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -173,6 +173,19 @@ namespace __detail
{ return __node_gen(std::forward<_Kt>(__k)); }
 };
 
+  template
+struct _NodePtrGuard
+{
+  _HashtableAlloc& _M_h;
+  _NodePtr _M_ptr;
+
+  ~_NodePtrGuard()
+  {
+   if (_M_ptr)
+ _M_h._M_deallocate_node_ptr(_M_ptr);
+  }
+};
+
   template
 struct _Hashtable_alloc;
 
@@ -201,27 +214,20 @@ namespace __detail
__node_ptr
operator()(_Args&&... __args) const
{
- if (_M_nodes)
-   {
+ if (!_M_nodes)
+   return _M_h._M_allocate_node(std::forward<_Args>(__args)...);
+
  __node_ptr __node = _M_nodes;
  _M_nodes = _M_nodes->_M_next();
  __node->_M_nxt = nullptr;
  auto& __a = _M_h._M_node_allocator();
  __node_alloc_traits::destroy(__a, __node->_M_valptr());
- __try
-   {
+ _NodePtrGuard<__hashtable_alloc, __node_ptr> __guard { _M_h, __node };
  __node_alloc_traits::construct(__a, __node->_M_valptr(),
 std::forward<_Args>(__args)...);
-   }
- __catch(...)
-   {
- _M_h._M_deallocate_node_ptr(__node);
- __throw_exception_again;
-   }
+ __guard._M_ptr = nullptr;
  return __node;
}
- return _M_h._M_allocate_node(std::forward<_Args>(__args)...);
-   }
 
 private:
   mutable __node_ptr _M_nodes;

[committed] i386: Apply LRA reload workaround to insns with high registers [PR82524]

2023-11-08 Thread Uros Bizjak

LRA is not able to reload zero_extracted in-out operand with matched input
operand in the same way as strict_low_part in-out operand.  The patch
applies the strict_low_part workaround, where we allow LRA to generate
an instruction with non-matched input operand, which is split post reload
to the instruction that inserts non-matched input operand to an in-out
operand and the instruction that uses matched operand, also to
zero_extracted in-out operand case.

The generated code from the pr82524.c testcase improves from:

movl%esi, %ecx
movl%edi, %eax
movsbl  %ch, %esi
addl%esi, %edx
movb%dl, %ah

to:
movl%edi, %eax
movl%esi, %ecx
movb%ch, %ah
addb%dl, %ah

The compiler is now also able to handle non-commutative operations:

movl%edi, %eax
movl%esi, %ecx
movb%ch, %ah
subb%dl, %ah

and unary operations:

movl%edi, %eax
movl%esi, %edx
movb%dh, %ah
negb%ah

The patch also robustifies split condition of the splitters to ensure that
only alternatives with unmatched operands are split.

PR target/82524

gcc/ChangeLog:

* config/i386/i386.md (*add_1_slp):
Split insn only for unmatched operand 0.
(*sub_1_slp): Ditto.
(*_1_slp): Merge pattern from "*and_1_slp"
and "*_1_slp" using any_logic code iterator.
Split insn only for unmatched operand 0.
(*neg1_slp): Split insn only for unmatched operand 0.
(*one_cmpl_1_slp): Ditto.
(*ashl3_1_slp): Ditto.
(*_1_slp): Ditto.
(*_1_slp): Ditto.
(*addqi_ext_1): Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*qi_ext_2): Merge pattern from
"*addqi_ext_2" and "*subqi_ext_2" using plusminus code
iterator. Redefine as define_insn_and_split.  Add alternative 1
and split insn after reload for unmatched operand 0.
(*subqi_ext_1): Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*qi_ext_0): Merge pattern from
"*andqi_ext_0" and and "*qi_ext_0" using
any_logic code iterator.
(*qi_ext_1): Merge pattern from
"*andqi_ext_1" and "*qi_ext_1" using
any_logic code iterator. Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*qi_ext_1_cc): Merge pattern from
"*andqi_ext_1_cc" and "*xorqi_ext_1_cc" using any_logic
code iterator. Redefine as define_insn_and_split.  Add alternative 1
and split insn after reload for unmatched operand 0.
(*qi_ext_2): Merge pattern from
"*andqi_ext_2" and "*qi_ext_2" using
any_logic code iterator. Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*qi_ext_3): Redefine as define_insn_and_split.
Add alternative 1 and split insn after reload for unmatched operand 0.
(*negqi_ext_1): Rename from "*negqi_ext_2".  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*one_cmplqi_ext_1): Ditto.
(*ashlqi_ext_1): Ditto.
(*qi_ext_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr78904-1.c (test_sub): New test.
* gcc.target/i386/pr78904-1a.c (test_sub): Ditto.
* gcc.target/i386/pr78904-1b.c (test_sub): Ditto.
* gcc.target/i386/pr78904-2.c (test_sub): Ditto.
* gcc.target/i386/pr78904-2a.c (test_sub): Ditto.
* gcc.target/i386/pr78904-2b.c (test_sub): Ditto.
* gcc.target/i386/pr78952-4.c (test_sub): Ditto.
* gcc.target/i386/pr82524.c: New test.
* gcc.target/i386/pr82524-1.c: New test.
* gcc.target/i386/pr82524-2.c: New test.
* gcc.target/i386/pr82524-3.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 99022990377..ce7102af44f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6596,7 +6596,9 @@ (define_insn_and_split "*add_1_slp"
   return "add{}\t{%2, %0|%0, %2}";
 }
 }
-  "&& reload_completed"
+  "&& reload_completed
+   && !(rtx_equal_p (operands[0], operands[1])
+   || rtx_equal_p (operands[0], operands[2]))"
   [(set (strict_low_part (match_dup 0)) (match_dup 1))
(parallel
  [(set (strict_low_part (match_dup 0))
@@ -7001,38 +7003,58 @@ (define_expand "addqi_ext_1"
   (match_operand:QI 2 "const_int_operand")) 0))
   (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "*addqi_ext_1"
+;; Alternative 1 is needed to work around LRA limitation, see PR82524.
+(define_insn_and_split "*addqi_ext_1"
   [(set (zero_extract:SWI248
- (match_operand 0 "int248_register_operand" "+Q")
+ (match_operand 0 "int248_register_operand" "+Q,&Q")
  (const_int 8)
  (const_int 8))
(subreg:SWI248
  (plus:QI
(subreg:QI
  (match_operator:SWI248 3 "extract_operator"
-   [(match_operand

Re: [PATCH] testsuite: force PIC/PIE off for pr58245-1.C

2023-11-08 Thread Jeff Law





On 11/8/23 08:57, Alexandre Oliva wrote:


This test expects a single mention of stack_chk_fail, as part of a
call sequence, but when e.g. PIE is enabled by default, we output
.hidden stack_chk_fail_local, which makes for a count mismatch.

Disable PIC/PIE so as to not depend on the configurable default.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?


for  gcc/testsuite/ChangeLog

* g++.dg/pr58245-1.C: Disable PIC/PIE.

OK
jeff

Re: [PATCH] skip debug stmts when assigning locus discriminators

2023-11-08 Thread Jeff Law





On 11/8/23 08:51, Alexandre Oliva wrote:


c-c++-common/goacc/kernels-loop-g.c has been failing (compare-debug)
on i686-linux-gnu since r13-3172, because the implementation enabled
debug stmts to cause discriminators to be assigned differently, and
the discriminators are printed in the .gkd dumps that -fcompare-debug
compares.

This patch prevents debug stmts from affecting the discriminators in
nondebug stmts, but enables debug stmts to get discriminators just as
nondebug stmts would if their line numbers match.

I suppose we could arrange for discriminators to be omitted from the
-fcompare-debug dumps, but keeping discriminators in sync is probably
good to avoid other potential sources of divergence between debug and
nondebug.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-.  Ok to install?

(Eugene, I suppose what's special about this testcase, that may not
apply to most other uses of assign_discriminators, is that goacc creates
new functions out of already optimized code.  I think
assign_discriminators may not be suitable for new functions, with code
that isn't exactly pristinely in-order.  WDYT?)


for  gcc/ChangeLog

* tree-cfg.cc (assign_discriminators): Handle debug stmts.

OK
jeff

[PATCH] c++: non-dependent .* folding [PR112427]

2023-11-08 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here when building up the non-dependent .* expression, we crash from
fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an
IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines
expect.  Like in r14-4899-gd80a26cca02587, this patch fixes this by
replacing the problematic piecemeal folding with a single call to
cp_fully_fold.

PR c++/112427

gcc/cp/ChangeLog:

* typeck2.cc (build_m_component_ref): Use cp_convert, build2 and
cp_fully_fold instead of fold_build_pointer_plus and fold_convert.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent29.C: New test.
---
 gcc/cp/typeck2.cc   |  5 -
 gcc/testsuite/g++.dg/template/non-dependent29.C | 13 +
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent29.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 309903afed8..208004221da 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -2378,7 +2378,10 @@ build_m_component_ref (tree datum, tree component, 
tsubst_flags_t complain)
   /* Build an expression for "object + offset" where offset is the
 value stored in the pointer-to-data-member.  */
   ptype = build_pointer_type (type);
-  datum = fold_build_pointer_plus (fold_convert (ptype, datum), component);
+  datum = cp_convert (ptype, datum, complain);
+  datum = build2 (POINTER_PLUS_EXPR, ptype,
+ datum, convert_to_ptrofftype (component));
+  datum = cp_fully_fold (datum);
   datum = cp_build_fold_indirect_ref (datum);
   if (datum == error_mark_node)
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/template/non-dependent29.C 
b/gcc/testsuite/g++.dg/template/non-dependent29.C
new file mode 100644
index 000..41bd11ae6b4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent29.C
@@ -0,0 +1,13 @@
+// PR c++/112427
+
+struct A { int m; void f(); };
+struct B { A a; };
+
+template
+void f(B b) {
+  int A::*pd = &A::m;
+  b.a.*pd;
+
+  void (A::*pf)() = &A::f;
+  (b.a.*pf)();
+}
-- 
2.43.0.rc1

Re: [PATCH][_Hahstable] Use RAII to guard node pointer while constructing

2023-11-08 Thread Jonathan Wakely

On Wed, 8 Nov 2023 at 20:00, François Dumont  wrote:
>
> Another proposal to use RAII rather than __try/__catch block.
>
> libstdc++: [_Hashtable] Use RAII type to guard node while constructing value
>
> libstdc++-v3/ChangeLog:
>
>  * include/bits/hashtable_policy.h
>  (struct _NodePtrGuard<_HashtableAlloc, _NodePtr>): New.
>  (_ReuseAllocNode::operator()(_Args&&...)): Use latter to guard
> allocated node
>  pointer while constructing in place the value_type instance.
>
> Tested under Linux x64, ok to commit ?

Looks good, OK.

[PATCH v2] libstdc++: optimize bit iterators assuming normalization [PR110807]

2023-11-08 Thread Alexandre Oliva

On Nov  8, 2023, Jonathan Wakely  wrote:

> A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and
> __GLIBCXX_DISABLE_ASSUMPTIONS please.

That's entirely gone now.

>> +do  \
>> +  if (std::is_constant_evaluated ())\
>> +static_assert(expr);\

> This can never be valid.

*nod*

> This already works fine in constant evaluation anyway.

Yeah, that's what I figured.

> But what's the null dereference for?

The idea was to clearly trigger undefined behavior.  Maybe it wasn't
needed, it didn't occur to me that __builtin_unreachable() would be
enough.  I realize I was really trying to emulate attribute assume, even
without knowing it existed ;-)

>> +#define __GLIBCXX_BUILTIN_ASSUME(expr)  \
>> +(void)(false && (expr))

> What's the point of this, just to verify that (expr) is contextually
> convertible to bool?

I'd have phrased it as "avoid the case in which something compiles with
-O0 but not with -O", but yeah ;-)

> We don't use the _p suffix for predicates in the library.
> Please use just _M_normalized or _M_is_normalized.

ACK.  It's also gone now.

> But do we even need this function? It's not used anywhere else, can we
> just inline the condition into _M_assume_normalized() ?

I had other uses for it in earlier versions of the patch, but it makes
no sense any more indeed.

>> +_GLIBCXX20_CONSTEXPR
>> +void
>> +_M_assume_normalized() const

> I think this should use _GLIBCXX_ALWAYS_INLINE

*nod*, thanks

>> +{
>> +  __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());

> Is there even any benefit to this macro?

I just thought it could have other uses, without being aware that the
entire concept was available as a statement attribute.  Funny, I'd even
searched for it among the existing attributes and builtins, but somehow
I managed to miss it.  Thanks for getting me back down that path.

>__attribute__((__assume__(_M_offset < unsigned(_S_word_bit;

That unfortunately doesn't work, because the assume lowering doesn't go
as far as dereferencing the implicit this and making an SSA_NAME out of
the loaded _M_offset, which we'd need to be able to optimize based on
it.  But that only took me a while to figure out and massage into
something that had the desired effect.  Now, maybe the above *should*
have that effect already, but unfortunately it doesn't.

> Maybe even get rid of _M_assume_normalized() as a function and just
> put that attribute everywhere you currently use _M_assume_normalized.

Because of the slight kludge required to make the attribute have the
desired effect (namely ensuring the _M_offset reference is evaluated),
I've retained it as an inline function.

Here's what I'm retesting now.  WDYT?

libstdc++: optimize bit iterators assuming normalization [PR110807]

The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.

Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  Using attribute
assume (_M_offset <= ...) didn't work, because gimple lowered that to
something that vrp could only use to ensure 'this' was non-NULL.
Exposing _M_offset as an automatic variable/gimple register outside
the unevaluated assume operand enabled the optimizer to do its job.

Rather than placing such load-then-assume constructs all over, I
introduced an always-inline member function in bit iterators that does
the job of conveying to the compiler the information that the
assumption is supposed to hold, and various calls throughout functions
pertaini

Re: [PATCH v2] libstdc++: optimize bit iterators assuming normalization [PR110807]

2023-11-08 Thread Jonathan Wakely

On Thu, 9 Nov 2023, 01:17 Alexandre Oliva,  wrote:

> On Nov  8, 2023, Jonathan Wakely  wrote:
>
> > A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and
> > __GLIBCXX_DISABLE_ASSUMPTIONS please.
>
> That's entirely gone now.
>
> >> +do  \
> >> +  if (std::is_constant_evaluated ())\
> >> +static_assert(expr);\
>
> > This can never be valid.
>
> *nod*
>
> > This already works fine in constant evaluation anyway.
>
> Yeah, that's what I figured.
>
> > But what's the null dereference for?
>
> The idea was to clearly trigger undefined behavior.  Maybe it wasn't
> needed, it didn't occur to me that __builtin_unreachable() would be
> enough.  I realize I was really trying to emulate attribute assume, even
> without knowing it existed ;-)
>
> >> +#define __GLIBCXX_BUILTIN_ASSUME(expr)  \
> >> +(void)(false && (expr))
>
> > What's the point of this, just to verify that (expr) is contextually
> > convertible to bool?
>
> I'd have phrased it as "avoid the case in which something compiles with
> -O0 but not with -O", but yeah ;-)
>
> > We don't use the _p suffix for predicates in the library.
> > Please use just _M_normalized or _M_is_normalized.
>
> ACK.  It's also gone now.
>
> > But do we even need this function? It's not used anywhere else, can we
> > just inline the condition into _M_assume_normalized() ?
>
> I had other uses for it in earlier versions of the patch, but it makes
> no sense any more indeed.
>
> >> +_GLIBCXX20_CONSTEXPR
> >> +void
> >> +_M_assume_normalized() const
>
> > I think this should use _GLIBCXX_ALWAYS_INLINE
>
> *nod*, thanks
>
> >> +{
> >> +  __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
>
> > Is there even any benefit to this macro?
>
> I just thought it could have other uses, without being aware that the
> entire concept was available as a statement attribute.  Funny, I'd even
> searched for it among the existing attributes and builtins, but somehow
> I managed to miss it.  Thanks for getting me back down that path.
>
> >__attribute__((__assume__(_M_offset < unsigned(_S_word_bit;
>
> That unfortunately doesn't work, because the assume lowering doesn't go
> as far as dereferencing the implicit this and making an SSA_NAME out of
> the loaded _M_offset, which we'd need to be able to optimize based on
> it.  But that only took me a while to figure out and massage into
> something that had the desired effect.  Now, maybe the above *should*
> have that effect already, but unfortunately it doesn't.
>
> > Maybe even get rid of _M_assume_normalized() as a function and just
> > put that attribute everywhere you currently use _M_assume_normalized.
>
> Because of the slight kludge required to make the attribute have the
> desired effect (namely ensuring the _M_offset reference is evaluated),
> I've retained it as an inline function.
>
> Here's what I'm retesting now.  WDYT?
>

ofst needs to be __ofst but OK for trunk with that change.

We probably want this on the gcc-13 branch too, but let's give it some time
on trunk in case the assume attribute isn't quite ready for prime time.



>
> libstdc++: optimize bit iterators assuming normalization [PR110807]
>
> The representation of bit iterators, using a pointer into an array of
> words, and an unsigned bit offset into that word, makes for some
> optimization challenges: because the compiler doesn't know that the
> offset is always in a certain narrow range, beginning at zero and
> ending before the word bitwidth, when a function loads an offset that
> it hasn't normalized itself, it may fail to derive certain reasonable
> conclusions, even to the point of retaining useless calls that elicit
> incorrect warnings.
>
> Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
> list to a global bit vector variable.  Based on the compile-time
> constant length of the list, we decide in _M_insert_range whether to
> use the existing storage or to allocate new storage for the vector.
> After allocation, we decide in _M_copy_aligned how to copy any
> preexisting portions of the vector to the newly-allocated storage.
> When copying two or more words, we use __builtin_memmove.
>
> However, because we compute the available room using bit offsets
> without range information, even comparing them with constants, we fail
> to infer ranges for the preexisting vector depending on word size, and
> may thus retain the memmove call despite knowing we've only allocated
> one word.
>
> Other parts of the compiler then detect the mismatch between the
> constant allocation size and the much larger range that could
> theoretically be copied into the newly-allocated storage if we could
> reach the call.
>
> Ensuring the compiler is aware of the constraints on the offset range
> enables it to do a much better job at optimizing.  Using attribute
> assume (_M_offset <= ...) didn't work, because gimple lowered that to
> something that

[PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-08 Thread HAO CHEN GUI

Hi,
  This patch enables vector mode for by pieces equality compare. It
adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
relies both move and compare instructions, so both macro are changed.
As the vector load/store might be unaligned, the 16-byte move and
compare are only enabled when VSX and EFFICIENT_UNALIGNED_VSX are both
enabled.

  This patch enables 16-byte by pieces move. As the vector mode is not
enabled for by pieces move, TImode is used for the move. It caused 2
regression cases. The root cause is that now 16-byte length array can
be constructed by one load instruction and not be put into LC0 so that
SRA optimization will not be taken.

  Compared to previous version, the main change is to modify the guard
of expand pattern and compiling options of the test case. Also the fix
for two regression cases caused by 16-byte move enablement is moved to
this patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable vector mode for by pieces equality compare

This patch adds a new expand pattern - cbranchv16qi4 to enable vector
mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
(COMPARE_MAX_PIECES) is set to 16 bytes when VSX and
EFFICIENT_UNALIGNED_VSX is enabled, otherwise keeps unchanged.  The
macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by
default, so now it's explicitly defined and keeps unchanged.

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
insn sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(STORE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-1.c: New.
* gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc.
* gcc.dg/tree-ssa/sra-18.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..a1423c76451 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,48 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+/* The cbranch_optabs doesn't allow FAIL, so old cpus which are
+   inefficient on unaligned vsx are disabled as the cost is high
+   for unaligned load/store.  */
+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:V16QI 1 "reg_or_mem_operand")
+(match_operand:V16QI 2 "reg_or_mem_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_MEM_VSX_P (V16QImode)
+   && TARGET_EFFICIENT_UNALIGNED_VSX"
+{
+  /* Use direct move for P8 LE to skip double-word swap, as the byte
+ order doesn't matter for equality compare.  If any operands are
+ altivec indexed or indirect operands, the load can be implemented
+ directly by altivec aligned load instruction and swap is no
+ need.  */
+  if (!TARGET_P9_VECTOR
+  && !BYTES_BIG_ENDIAN
+  && MEM_P (operands[1])
+  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+  && MEM_P (operands[2])
+  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+{
+  rtx reg_op1 = gen_reg_rtx (V16QImode);
+  rtx reg_op2 = gen_reg_rtx (V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+  operands[1] = reg_op1;
+  operands[2] = reg_op2;
+}
+  else
+{
+  operands[1] = force_reg (V16QImode, operands[1]);
+  operands[2] = force_reg (V16QImode, operands[2]);
+}
+
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cc24dd5301e..10279052636 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == V16QImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ rtx cc_bit = gen_reg_rtx (SImode);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ emit_insn (gen_cr6_test_for_lt (cc_bit));
+ emit_insn (gen_rtx_SET (compare_result,
+ gen_rtx_COMPARE (comp_mode, cc_bit,
+

[PATCH-3v3, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-08 Thread HAO CHEN GUI

Hi,
  Originally 16-byte memory to memory is expanded via pattern.
expand_block_move does an optimization on P8 LE to leverage V2DI reversed
load/store for memory to memory move. Now it's done by 16-byte by pieces
move and the optimization is lost. This patch adds an insn_and_split
pattern to retake the optimization.

  Compared to the previous version, the main change is to move fix for
two regression cases to former patch and change the condition of pattern.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Fix regression cases caused 16-byte by pieces move

The previous patch enables 16-byte by pieces move. Originally 16-byte
move is implemented via pattern.  expand_block_move does an optimization
on P8 LE to leverage V2DI reversed load/store for memory to memory move.
Now 16-byte move is implemented via by pieces move and finally split to
two DImode load/store.  This patch creates an insn_and_split pattern to
retake the optimization.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..3f71e96dc6b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,29 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN
+   && TARGET_VSX
+   && !TARGET_P9_VECTOR
+   && !MEM_VOLATILE_P (operands[0])
+   && !MEM_VOLATILE_P (operands[1])
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+void move2 (void *s1)
+{
+  __builtin_memcpy (s1, "0123456789012345", 16);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */

[PATCH] testsuite: tsan: add fallback overload for pthread_cond_clockwait

2023-11-08 Thread Alexandre Oliva



LTS GNU/Linux distros from 2018, still in use, don't have
pthread_cond_clockwait.  There's no trivial way to detect it so as to
make the test conditional, but there's an easy enough way to silence
the fail due to lack of the function in libc, and that has nothing to
do with the false positive that this is testing against.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-, on distros that offer and that lack pthread_cond_clockwait.  Ok
to install?


for  gcc/testsuite/ChangeLog

* g++.dg/tsan/pthread_cond_clockwait.C: Add fallback overload.
---
 gcc/testsuite/g++.dg/tsan/pthread_cond_clockwait.C |   13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/testsuite/g++.dg/tsan/pthread_cond_clockwait.C 
b/gcc/testsuite/g++.dg/tsan/pthread_cond_clockwait.C
index 82d6a5c8329ed..b43f3ebf80e2c 100644
--- a/gcc/testsuite/g++.dg/tsan/pthread_cond_clockwait.C
+++ b/gcc/testsuite/g++.dg/tsan/pthread_cond_clockwait.C
@@ -4,6 +4,19 @@
 
 #include 
 
+// This overloaded version should only be selected on targets that
+// don't have a pthread_cond_clockwait in pthread.h, and it will wait
+// indefinitely for the cond_signal that, in this testcase, ought to
+// be delivered.
+static inline int
+pthread_cond_clockwait (pthread_cond_t *cv,
+   pthread_mutex_t *mtx,
+   __clockid_t,
+   void const /* struct timespec */ *)
+{
+  return pthread_cond_wait (cv, mtx);
+} 
+
 pthread_cond_t cv;
 pthread_mutex_t mtx;
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[Committed] RISC-V: Fix dynamic tests [NFC]

2023-11-08 Thread Juzhe-Zhong

This patch just adapt dynamic LMUL tests for following preparing patches.

Committed.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: Run all tests.

---
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c   | 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-9.c| 5 +++--
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c   | 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c| 2 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c| 2

[PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-11-08 Thread Alexandre Oliva



g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
a platform that lacks __cxa_thread_atexit_impl, even if the program is
built and run using that toolchain on a (later) platform that offers
__cxa_thread_atexit_impl.

This patch adds runtime testing for __cxa_thread_atexit_impl on
platforms that support weak symbols.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
x86_64-, and with ac_cv_func___cxa_thread_atexit_impl=no, that, on a
distro that lacks __cxa_thread_atexit in libc, forces the newly-added
code to be exercised, and that enabled thread_local-order2.C to pass
where the runtime libc has __cxa_thread_atexit_impl.  Ok to install?


for  libstdc++-v3/ChangeLog

* libsupc++/atexit_thread.cc [__GXX_WEAK__]: Add dynamic
detection of __cxa_thread_atexit_impl.
---
 libstdc++-v3/libsupc++/atexit_thread.cc |   15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc 
b/libstdc++-v3/libsupc++/atexit_thread.cc
index 9346d50f5dafe..cabd7c0a4a057 100644
--- a/libstdc++-v3/libsupc++/atexit_thread.cc
+++ b/libstdc++-v3/libsupc++/atexit_thread.cc
@@ -138,11 +138,24 @@ namespace {
   }
 }
 
+#if __GXX_WEAK__
+extern "C"
+int __attribute__ ((__weak__))
+__cxa_thread_atexit_impl (void (_GLIBCXX_CDTOR_CALLABI *func) (void *),
+ void *arg, void *d);
+#endif
+
+// ??? We can't make it an ifunc, can we?
 extern "C" int
 __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
-void *obj, void */*dso_handle*/)
+void *obj, void *dso_handle)
   _GLIBCXX_NOTHROW
 {
+#if __GXX_WEAK__
+  if (__cxa_thread_atexit_impl)
+return __cxa_thread_atexit_impl (dtor, obj, dso_handle);
+#endif
+
   // Do this initialization once.
   if (__gthread_active_p ())
 {

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] RISC-V: Fix dynamic LMUL cost model ICE

2023-11-08 Thread Juzhe-Zhong

When trying to use dynamic LMUL to compile benchmark.
Notice there is a bunch ICEs.

This patch fixes those ICEs and append tests.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Fix 
ICE.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 11 +---
 .../costmodel/riscv/rvv/dynamic-lmul-ice-1.c  | 25 +++
 .../costmodel/riscv/rvv/dynamic-lmul-ice-2.c  | 22 
 .../costmodel/riscv/rvv/dynamic-lmul-ice-3.c  | 14 +++
 4 files changed, 69 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index af87388a1e4..8036c9c40d7 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -231,7 +231,9 @@ compute_local_live_ranges (
 
 TODO: We may elide the cases that the unnecessary IMM in
 the future.  */
- if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
+ if (poly_int_tree_p (var)
+ || (is_gimple_val (var)
+ && !POINTER_TYPE_P (TREE_TYPE (var
{
  biggest_mode
= get_biggest_mode (biggest_mode,
@@ -416,7 +418,8 @@ static void
 update_local_live_ranges (
   vec_info *vinfo,
   hash_map> &program_points_per_bb,
-  hash_map> &live_ranges_per_bb)
+  hash_map> &live_ranges_per_bb,
+  machine_mode *biggest_mode)
 {
   loop_vec_info loop_vinfo = dyn_cast (vinfo);
   if (!loop_vinfo)
@@ -501,6 +504,8 @@ update_local_live_ranges (
   : get_store_value (gsi_stmt (si));
  tree sel_type = build_nonstandard_integer_type (
TYPE_PRECISION (TREE_TYPE (var)), 1);
+ *biggest_mode
+   = get_biggest_mode (*biggest_mode, TYPE_MODE (sel_type));
  tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
 get_identifier ("vect_perm"), sel_type);
  pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
@@ -572,7 +577,7 @@ costs::preferred_new_lmul_p (const vector_costs 
*uncast_other) const
 
   /* Update live ranges according to PHI.  */
   update_local_live_ranges (other->m_vinfo, program_points_per_bb,
-   live_ranges_per_bb);
+   live_ranges_per_bb, &biggest_mode);
 
   /* TODO: We calculate the maximum live vars base on current STMTS
  sequence.  We can support live range shrink if it can give us
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
new file mode 100644
index 000..4f019ccae6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -O3 -ftree-vectorize --param 
riscv-autovec-lmul=dynamic" } */
+
+int a, *b[9], c, d, e; 
+
+static int
+fn1 ()
+{
+  for (c = 6; c >= 0; c--)
+for (d = 0; d < 2; d++)
+  {
+b[d * 2 + c] = 0;
+e = a > 1 ? : 0;
+if (e == 2) 
+  return 0;
+  }
+  return 0;
+}
+
+int
+main ()
+{
+  fn1 ();
+  return 0; 
+}
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
new file mode 100644
index 000..6fc8062f23b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -Ofast -ftree-vectorize --param 
riscv-autovec-lmul=dynamic" } */
+
+typedef struct rtx_def *rtx;
+struct replacement {
+rtx *where;
+rtx *subreg_loc;
+int mode;
+};
+static struct replacement replacements[150];
+void move_replacements (rtx *x, rtx *y, int n_replacements)
+{
+  int i;
+  for (i = 0; i < n_replacements; i++)
+if (replacements[i].subreg_loc == x)
+  replacements[i].subreg_loc = y;
+else if (replacements[i].where == x) 
+  {
+   replacements[i].where = y;
+   replacements[i].subreg_loc = 0;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c
new file mode 100644
index 000..c1f698b9a68
--- /dev/null
+++ b/gcc/testsuite/g

Re: [PATCH] RISC-V: Fix dynamic LMUL cost model ICE

2023-11-08 Thread Kito Cheng

LGTM, thanks :)

On Thu, Nov 9, 2023 at 10:39 AM Juzhe-Zhong  wrote:
>
> When trying to use dynamic LMUL to compile benchmark.
> Notice there is a bunch ICEs.
>
> This patch fixes those ICEs and append tests.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): 
> Fix ICE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c: New test.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c: New test.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vector-costs.cc| 11 +---
>  .../costmodel/riscv/rvv/dynamic-lmul-ice-1.c  | 25 +++
>  .../costmodel/riscv/rvv/dynamic-lmul-ice-2.c  | 22 
>  .../costmodel/riscv/rvv/dynamic-lmul-ice-3.c  | 14 +++
>  4 files changed, 69 insertions(+), 3 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c
>
> diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
> b/gcc/config/riscv/riscv-vector-costs.cc
> index af87388a1e4..8036c9c40d7 100644
> --- a/gcc/config/riscv/riscv-vector-costs.cc
> +++ b/gcc/config/riscv/riscv-vector-costs.cc
> @@ -231,7 +231,9 @@ compute_local_live_ranges (
>
>  TODO: We may elide the cases that the unnecessary IMM in
>  the future.  */
> - if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE 
> (var)))
> + if (poly_int_tree_p (var)
> + || (is_gimple_val (var)
> + && !POINTER_TYPE_P (TREE_TYPE (var
> {
>   biggest_mode
> = get_biggest_mode (biggest_mode,
> @@ -416,7 +418,8 @@ static void
>  update_local_live_ranges (
>vec_info *vinfo,
>hash_map> &program_points_per_bb,
> -  hash_map> &live_ranges_per_bb)
> +  hash_map> &live_ranges_per_bb,
> +  machine_mode *biggest_mode)
>  {
>loop_vec_info loop_vinfo = dyn_cast (vinfo);
>if (!loop_vinfo)
> @@ -501,6 +504,8 @@ update_local_live_ranges (
>: get_store_value (gsi_stmt (si));
>   tree sel_type = build_nonstandard_integer_type (
> TYPE_PRECISION (TREE_TYPE (var)), 1);
> + *biggest_mode
> +   = get_biggest_mode (*biggest_mode, TYPE_MODE (sel_type));
>   tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
>  get_identifier ("vect_perm"), sel_type);
>   pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
> @@ -572,7 +577,7 @@ costs::preferred_new_lmul_p (const vector_costs 
> *uncast_other) const
>
>/* Update live ranges according to PHI.  */
>update_local_live_ranges (other->m_vinfo, program_points_per_bb,
> -   live_ranges_per_bb);
> +   live_ranges_per_bb, &biggest_mode);
>
>/* TODO: We calculate the maximum live vars base on current STMTS
>   sequence.  We can support live range shrink if it can give us
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
> new file mode 100644
> index 000..4f019ccae6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -O3 -ftree-vectorize --param 
> riscv-autovec-lmul=dynamic" } */
> +
> +int a, *b[9], c, d, e;
> +
> +static int
> +fn1 ()
> +{
> +  for (c = 6; c >= 0; c--)
> +for (d = 0; d < 2; d++)
> +  {
> +b[d * 2 + c] = 0;
> +e = a > 1 ? : 0;
> +if (e == 2)
> +  return 0;
> +  }
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  fn1 ();
> +  return 0;
> +}
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
> new file mode 100644
> index 000..6fc8062f23b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -Ofast -ftree-vectorize --param 
> riscv-autovec-lmul=dynamic" } */
> +
> +typedef struct rtx_def *rtx;
> +struct replacement {
> +rtx *where;
> +rtx *subreg_loc;
> +int mode;
> +};
> +static struct replacement replacements[150];
> +void move_replacements (rtx *x, rtx *y, int n_replacements)
> +{
> +  int i;
> +  for (i = 0; i < n_replacements; i++)
> +if (replacements[i].subreg_loc == x)
> +  replacements[i].subreg_loc = y;
> +else if (replacements[i].where == x)
> +

RE: [EXTERNAL] [PATCH] skip debug stmts when assigning locus discriminators

2023-11-08 Thread Eugene Rozenfeld

The fix looks good to me. Will this also fix 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107169 ? It was also a bad 
interaction of -gstatement-frontiers and discriminators.

Eugene

-Original Message-
From: Alexandre Oliva 
Sent: Wednesday, November 8, 2023 7:51 AM
To: gcc-patches@gcc.gnu.org
Cc: Eugene Rozenfeld 
Subject: [EXTERNAL] [PATCH] skip debug stmts when assigning locus discriminators

c-c++-common/goacc/kernels-loop-g.c has been failing (compare-debug)
on i686-linux-gnu since r13-3172, because the implementation enabled debug 
stmts to cause discriminators to be assigned differently, and the 
discriminators are printed in the .gkd dumps that -fcompare-debug compares.

This patch prevents debug stmts from affecting the discriminators in nondebug 
stmts, but enables debug stmts to get discriminators just as nondebug stmts 
would if their line numbers match.

I suppose we could arrange for discriminators to be omitted from the 
-fcompare-debug dumps, but keeping discriminators in sync is probably good to 
avoid other potential sources of divergence between debug and nondebug.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and x86_64-.  
Ok to install?

(Eugene, I suppose what's special about this testcase, that may not apply to 
most other uses of assign_discriminators, is that goacc creates new functions 
out of already optimized code.  I think assign_discriminators may not be 
suitable for new functions, with code that isn't exactly pristinely in-order.  
WDYT?)

for  gcc/ChangeLog

* tree-cfg.cc (assign_discriminators): Handle debug stmts.
---
 gcc/tree-cfg.cc |   16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 
40a6f2a3b529f..a30a2de33a106 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -1214,6 +1214,22 @@ assign_discriminators (void)
{
  gimple *stmt = gsi_stmt (gsi);

+ /* Don't allow debug stmts to affect discriminators, but
+allow them to take discriminators when they're on the
+same line as the preceding nondebug stmt.  */
+ if (is_gimple_debug (stmt))
+   {
+ if (curr_locus != UNKNOWN_LOCATION
+ && same_line_p (curr_locus, &curr_locus_e,
+ gimple_location (stmt)))
+   {
+ location_t loc = gimple_location (stmt);
+ location_t dloc = location_with_discriminator (loc,
+curr_discr);
+ gimple_set_location (stmt, dloc);
+   }
+ continue;
+   }
  if (curr_locus == UNKNOWN_LOCATION)
{
  curr_locus = gimple_location (stmt);

--
Alexandre Oliva, happy hackerhttps://fsfla.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity Excluding 
neuro-others for not behaving ""normal"" is *not* inclusive

RE: [PATCH] RISC-V: Fix dynamic LMUL cost model ICE

2023-11-08 Thread Li, Pan2

Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, November 9, 2023 10:43 AM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Fix dynamic LMUL cost model ICE

LGTM, thanks :)

On Thu, Nov 9, 2023 at 10:39 AM Juzhe-Zhong  wrote:
>
> When trying to use dynamic LMUL to compile benchmark.
> Notice there is a bunch ICEs.
>
> This patch fixes those ICEs and append tests.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): 
> Fix ICE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c: New test.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c: New test.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vector-costs.cc| 11 +---
>  .../costmodel/riscv/rvv/dynamic-lmul-ice-1.c  | 25 +++
>  .../costmodel/riscv/rvv/dynamic-lmul-ice-2.c  | 22 
>  .../costmodel/riscv/rvv/dynamic-lmul-ice-3.c  | 14 +++
>  4 files changed, 69 insertions(+), 3 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c
>
> diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
> b/gcc/config/riscv/riscv-vector-costs.cc
> index af87388a1e4..8036c9c40d7 100644
> --- a/gcc/config/riscv/riscv-vector-costs.cc
> +++ b/gcc/config/riscv/riscv-vector-costs.cc
> @@ -231,7 +231,9 @@ compute_local_live_ranges (
>
>  TODO: We may elide the cases that the unnecessary IMM in
>  the future.  */
> - if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE 
> (var)))
> + if (poly_int_tree_p (var)
> + || (is_gimple_val (var)
> + && !POINTER_TYPE_P (TREE_TYPE (var
> {
>   biggest_mode
> = get_biggest_mode (biggest_mode,
> @@ -416,7 +418,8 @@ static void
>  update_local_live_ranges (
>vec_info *vinfo,
>hash_map> &program_points_per_bb,
> -  hash_map> &live_ranges_per_bb)
> +  hash_map> &live_ranges_per_bb,
> +  machine_mode *biggest_mode)
>  {
>loop_vec_info loop_vinfo = dyn_cast (vinfo);
>if (!loop_vinfo)
> @@ -501,6 +504,8 @@ update_local_live_ranges (
>: get_store_value (gsi_stmt (si));
>   tree sel_type = build_nonstandard_integer_type (
> TYPE_PRECISION (TREE_TYPE (var)), 1);
> + *biggest_mode
> +   = get_biggest_mode (*biggest_mode, TYPE_MODE (sel_type));
>   tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
>  get_identifier ("vect_perm"), sel_type);
>   pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
> @@ -572,7 +577,7 @@ costs::preferred_new_lmul_p (const vector_costs 
> *uncast_other) const
>
>/* Update live ranges according to PHI.  */
>update_local_live_ranges (other->m_vinfo, program_points_per_bb,
> -   live_ranges_per_bb);
> +   live_ranges_per_bb, &biggest_mode);
>
>/* TODO: We calculate the maximum live vars base on current STMTS
>   sequence.  We can support live range shrink if it can give us
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
> new file mode 100644
> index 000..4f019ccae6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -O3 -ftree-vectorize --param 
> riscv-autovec-lmul=dynamic" } */
> +
> +int a, *b[9], c, d, e;
> +
> +static int
> +fn1 ()
> +{
> +  for (c = 6; c >= 0; c--)
> +for (d = 0; d < 2; d++)
> +  {
> +b[d * 2 + c] = 0;
> +e = a > 1 ? : 0;
> +if (e == 2)
> +  return 0;
> +  }
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  fn1 ();
> +  return 0;
> +}
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
> new file mode 100644
> index 000..6fc8062f23b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -Ofast -ftree-vectorize --param 
> riscv-autovec-lmul=dynamic" } */
> +
> +typedef struct rtx_def *rtx;
> +struct replacement {
> +rtx *where;
> +rtx *subreg_loc;
> +int mode;
> +};
> +static str

[PATCH v3] libstdc++: optimize bit iterators assuming normalization [PR110807]

2023-11-08 Thread Alexandre Oliva

On Nov  8, 2023, Jonathan Wakely  wrote:

> ofst needs to be __ofst but OK for trunk with that change.

Oh, doh, thanks for catching that last-minute tweak.

Retesting with that change completed successfully, so I've just pushed
the following:


libstdc++: optimize bit iterators assuming normalization [PR110807]

The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.

Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  Using attribute
assume (_M_offset <= ...) didn't work, because gimple lowered that to
something that vrp could only use to ensure 'this' was non-NULL.
Exposing _M_offset as an automatic variable/gimple register outside
the unevaluated assume operand enabled the optimizer to do its job.

Rather than placing such load-then-assume constructs all over, I
introduced an always-inline member function in bit iterators that does
the job of conveying to the compiler the information that the
assumption is supposed to hold, and various calls throughout functions
pertaining to bit iterators that might not otherwise know that the
offsets have to be in range, so that the compiler no longer needs to
make conservative assumptions that prevent optimizations.

With the explicit assumptions, the compiler can correlate the test for
available storage in the vector with the test for how much storage
might need to be copied, and determine that, if we're not asking for
enough room for two or more words, we can omit entirely the code to
copy two or more words, without any runtime overhead whatsoever: no
traces remain of the undefined behavior or of the tests that inform
the compiler about the assumptions that must hold.


for  libstdc++-v3/ChangeLog

PR libstdc++/110807
* include/bits/stl_bvector.h (_Bit_iterator_base): Add
_M_assume_normalized member function.  Call it in _M_bump_up,
_M_bump_down, _M_incr, operator==, operator<=>, operator<, and
operator-.
(_Bit_iterator): Also call it in operator*.
(_Bit_const_iterator): Likewise.
---
 libstdc++-v3/include/bits/stl_bvector.h |   37 ---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd434..2b91af2005f2d 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -56,6 +56,10 @@
 #ifndef _STL_BVECTOR_H
 #define _STL_BVECTOR_H 1
 
+#ifndef _GLIBCXX_ALWAYS_INLINE
+#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__))
+#endif
+
 #if __cplusplus >= 201103L
 #include 
 #include 
@@ -177,6 +181,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 _Bit_type * _M_p;
 unsigned int _M_offset;
 
+_GLIBCXX20_CONSTEXPR _GLIBCXX_ALWAYS_INLINE
+void
+_M_assume_normalized() const
+{
+  unsigned int __ofst = _M_offset;
+  __attribute__ ((__assume__ (__ofst < unsigned(_S_word_bit;
+}
+
 _GLIBCXX20_CONSTEXPR
 _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
 : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 void
 _M_bump_up()
 {
+  _M_assume_normalized();
   if (_M_offset++ == int(_S_word_bit) - 1)
{
  _M_offset = 0;
@@ -196,6 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 void
 _M_bump_down()
 {
+  _M_assume_normalized();
   if (_M_offset-- == 0)
{
  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTA

[PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-08 Thread HAO CHEN GUI

Hi,
  This patch modifies expand_builtin_return and make it call
expand_misaligned_mem_ref to load unaligned memory.  The memory reference
pointed by void* pointer might be unaligned, so expanding it with
unaligned move optabs is safe.

  The new test case illustrates the problem. rs6000 doesn't have
unaligned vector load instruction with VSX disabled. When calling
builtin_return, it shouldn't load the memory to vector register by
unaligned load instruction directly. It should store it to an on stack
variable by extract_bit_field then load to return register from stack
by aligned load instruction.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
expand: Call misaligned memory reference in expand_builtin_return

expand_builtin_return loads memory to return registers.  The memory might
be unaligned compared to the mode of the registers.  So it should be
expanded by unaligned move optabs if the memory reference is unaligned.

gcc/
PR target/112417
* builtins.cc (expand_builtin_return): Call
expand_misaligned_mem_ref for loading unaligned memory reference.
* builtins.h (expand_misaligned_mem_ref): Declare.
* expr.cc (expand_misaligned_mem_ref): No longer static.

gcc/testsuite/
PR target/112417
* gcc.target/powerpc/pr112417.c: New.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index cb90bd03b3e..b879eb88b7c 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -1816,7 +1816,12 @@ expand_builtin_return (rtx result)
if (size % align != 0)
  size = CEIL (size, align) * align;
reg = gen_rtx_REG (mode, INCOMING_REGNO (regno));
-   emit_move_insn (reg, adjust_address (result, mode, size));
+   rtx tmp = adjust_address (result, mode, size);
+   unsigned int align = MEM_ALIGN (tmp);
+   if (align < GET_MODE_ALIGNMENT (mode))
+ tmp = expand_misaligned_mem_ref (tmp, mode, 1, align,
+  NULL, NULL);
+   emit_move_insn (reg, tmp);

push_to_sequence (call_fusage);
emit_use (reg);
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 88a26d70cd5..a3d7954ee6e 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -157,5 +157,7 @@ extern internal_fn replacement_internal_fn (gcall *);

 extern bool builtin_with_linkage_p (tree);
 extern int type_to_class (tree);
+extern rtx expand_misaligned_mem_ref (rtx, machine_mode, int, unsigned int,
+ rtx, rtx *);

 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/expr.cc b/gcc/expr.cc
index ed4dbb13d89..b0adb35a095 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9156,7 +9156,7 @@ expand_cond_expr_using_cmove (tree treeop0 
ATTRIBUTE_UNUSED,
If the result can be stored at TARGET, and ALT_RTL is non-NULL,
then *ALT_RTL is set to TARGET (before legitimziation).  */

-static rtx
+rtx
 expand_misaligned_mem_ref (rtx temp, machine_mode mode, int unsignedp,
   unsigned int align, rtx target, rtx *alt_rtl)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112417.c 
b/gcc/testsuite/gcc.target/powerpc/pr112417.c
new file mode 100644
index 000..ef82fc82033
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112417.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { has_arch_pwr7 } } } */
+/* { dg-options "-mno-vsx -maltivec -O2" } */
+
+void * foo (void * p)
+{
+  if (p)
+__builtin_return (p);
+}
+
+/* Ensure that unaligned load is generated via stack load/store.  */
+/* { dg-final { scan-assembler {\mstw\M} { target { ! has_arch_ppc64 } } } } */
+/* { dg-final { scan-assembler {\mstd\M} { target has_arch_ppc64 } } } */

[PATCH v2] DSE: Allow vector type for get_stored_val when read < store

2023-11-08 Thread pan2 . li

From: Pan Li 

Update in v2:
* Move vector type support to get_stored_val.

Original log:

This patch would like to allow the vector mode in the
get_stored_val in the DSE. It is valid for the read
rtx if and only if the read bitsize is less than the
stored bitsize.

Given below example code with
--param=riscv-autovec-preference=fixed-vlmax.

vuint8m1_t test () {
  uint8_t arr[32] = {
1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
  };

  return __riscv_vle8_v_u8m1(arr, 32);
}

Before this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addisp,sp,-32
  addia5,a5,%lo(.LANCHOR0)
  li  a3,32
  vl2re64.v   v2,0(a5)
  vsetvli zero,a3,e8,m1,ta,ma
  vs2r.v  v2,0(sp) <== Unnecessary store to stack
  vle8.v  v1,0(sp) <== Ditto
  vs1r.v  v1,0(a0)
  addisp,sp,32
  jr  ra

After this patch:
test:
  lui a5,%hi(.LANCHOR0)
  addia5,a5,%lo(.LANCHOR0)
  li  a4,32
  addisp,sp,-32
  vsetvli zero,a4,e8,m1,ta,ma
  vle8.v  v1,0(a5)
  vs1r.v  v1,0(a0)
  addisp,sp,32
  jr  ra

Below tests are passed within this patch:

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression test.

PR target/111720

gcc/ChangeLog:

* dse.cc (get_stored_val): Allow vector mode if the read
bitsize is less than stored bitsize.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111720-0.c: New test.
* gcc.target/riscv/rvv/base/pr111720-1.c: New test.
* gcc.target/riscv/rvv/base/pr111720-10.c: New test.
* gcc.target/riscv/rvv/base/pr111720-2.c: New test.
* gcc.target/riscv/rvv/base/pr111720-3.c: New test.
* gcc.target/riscv/rvv/base/pr111720-4.c: New test.
* gcc.target/riscv/rvv/base/pr111720-5.c: New test.
* gcc.target/riscv/rvv/base/pr111720-6.c: New test.
* gcc.target/riscv/rvv/base/pr111720-7.c: New test.
* gcc.target/riscv/rvv/base/pr111720-8.c: New test.
* gcc.target/riscv/rvv/base/pr111720-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/dse.cc|  4 
 .../gcc.target/riscv/rvv/base/pr111720-0.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-1.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-10.c   | 18 
 .../gcc.target/riscv/rvv/base/pr111720-2.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-3.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-4.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-5.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-6.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-7.c| 21 +++
 .../gcc.target/riscv/rvv/base/pr111720-8.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-9.c| 15 +
 12 files changed, 202 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-9.c

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 1a85dae1f8c..21004becd4a 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1940,6 +1940,10 @@ get_stored_val (store_info *store_info, machine_mode 
read_mode,
   || GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode)))
 read_reg = extract_low_bits (read_mode, store_mode,
 copy_rtx (store_info->const_rhs));
+  else if (VECTOR_MODE_P (read_mode) && VECTOR_MODE_P (store_mode)
+&& known_lt (GET_MODE_BITSIZE (read_mode), GET_MODE_BITSIZE (store_mode))
+&& targetm.modes_tieable_p (read_mode, store_mode))
+read_reg = gen_lowpart (read_mode, copy_rtx (store_info->rhs));
   else
 read_reg = extract_low_bits (read_mode, store_mode,
 copy_rtx (store_info->rhs));
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c
new file mode 100644
index 000..a61e94a6d98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -ftree-vectorize 
--param=riscv-autovec-preference=fixed-vlmax -Wno-psabi" } */
+
+#include "riscv

[PATCH v1] RISC-V: Refine frm emit after bb end in succ edges

2023-11-08 Thread pan2 . li

From: Pan Li 

This patch would like to fine the frm insn emit when we
meet abnormal edge in the loop. Conceptually, we only need
to emit once when abnormal instead of every iteration in
the loop.

This patch would like to fix this defect and only perform
insert_insn_end_basic_block when at least one succ edge is
abnormal.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Only
perform once emit when at least one succ edge is abnormal.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 08ff05dcc3f..e25692b86fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9348,20 +9348,33 @@ static void
 riscv_frm_emit_after_bb_end (rtx_insn *cur_insn)
 {
   edge eg;
+  bool abnormal_edge_p = false;
   edge_iterator eg_iterator;
   basic_block bb = BLOCK_FOR_INSN (cur_insn);
 
   FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
+{
+  if (eg->flags & EDGE_ABNORMAL)
+   abnormal_edge_p = true;
+  else
+   {
+ start_sequence ();
+ emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+ rtx_insn *backup_insn = get_insns ();
+ end_sequence ();
+
+ insert_insn_on_edge (backup_insn, eg);
+   }
+}
+
+  if (abnormal_edge_p)
 {
   start_sequence ();
   emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
   rtx_insn *backup_insn = get_insns ();
   end_sequence ();
 
-  if (eg->flags & EDGE_ABNORMAL)
-   insert_insn_end_basic_block (backup_insn, bb);
-  else
-   insert_insn_on_edge (backup_insn, eg);
+  insert_insn_end_basic_block (backup_insn, bb);
 }
 
   commit_edge_insertions ();
-- 
2.34.1

Re: [PATCH v1] RISC-V: Refine frm emit after bb end in succ edges

2023-11-08 Thread juzhe.zh...@rivai.ai

OK。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-11-09 14:50
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Refine frm emit after bb end in succ edges
From: Pan Li 
 
This patch would like to fine the frm insn emit when we
meet abnormal edge in the loop. Conceptually, we only need
to emit once when abnormal instead of every iteration in
the loop.
 
This patch would like to fix this defect and only perform
insert_insn_end_basic_block when at least one succ edge is
abnormal.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Only
perform once emit when at least one succ edge is abnormal.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv.cc | 21 +
1 file changed, 17 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 08ff05dcc3f..e25692b86fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9348,20 +9348,33 @@ static void
riscv_frm_emit_after_bb_end (rtx_insn *cur_insn)
{
   edge eg;
+  bool abnormal_edge_p = false;
   edge_iterator eg_iterator;
   basic_block bb = BLOCK_FOR_INSN (cur_insn);
   FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
+{
+  if (eg->flags & EDGE_ABNORMAL)
+ abnormal_edge_p = true;
+  else
+ {
+   start_sequence ();
+   emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+   rtx_insn *backup_insn = get_insns ();
+   end_sequence ();
+
+   insert_insn_on_edge (backup_insn, eg);
+ }
+}
+
+  if (abnormal_edge_p)
 {
   start_sequence ();
   emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
   rtx_insn *backup_insn = get_insns ();
   end_sequence ();
-  if (eg->flags & EDGE_ABNORMAL)
- insert_insn_end_basic_block (backup_insn, bb);
-  else
- insert_insn_on_edge (backup_insn, eg);
+  insert_insn_end_basic_block (backup_insn, bb);
 }
   commit_edge_insertions ();
-- 
2.34.1

[PATCH] Avoid generate vblendps with ymm16+

2023-11-08 Thread Hu, Lin1

This patch aims to avoid generate vblendps with ymm16+, And have
bootstrapped and tested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk?

gcc/ChangeLog:

PR target/112435
* config/i386/sse.md: Adding constraints to restrict the generation of
vblendps.

gcc/testsuite/ChangeLog:

PR target/112435
* gcc.target/i386/pr112435-1.c: New test.
* gcc.target/i386/pr112435-2.c: Ditto.
* gcc.target/i386/pr112435-3.c: Ditto.
---
 gcc/config/i386/sse.md | 28 +---
 gcc/testsuite/gcc.target/i386/pr112435-1.c | 14 
 gcc/testsuite/gcc.target/i386/pr112435-2.c | 64 ++
 gcc/testsuite/gcc.target/i386/pr112435-3.c | 79 ++
 4 files changed, 175 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112435-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112435-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112435-3.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 33198756bb0..666f931c88d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -19254,7 +19254,8 @@
   mask = INTVAL (operands[3]) / 2;
   mask |= (INTVAL (operands[5]) - 4) / 2 << 1;
   operands[3] = GEN_INT (mask);
-  if (INTVAL (operands[3]) == 2 && !)
+  if (INTVAL (operands[3]) == 2 && !
+  && !x86_evex_reg_mentioned_p (operands, 3))
 return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
   return "vshuf64x2\t{%3, %2, %1, 
%0|%0, %1, %2, %3}";
 }
@@ -19414,7 +19415,8 @@
   mask |= (INTVAL (operands[7]) - 8) / 4 << 1;
   operands[3] = GEN_INT (mask);
 
-  if (INTVAL (operands[3]) == 2 && !)
+  if (INTVAL (operands[3]) == 2 && !
+  && !x86_evex_reg_mentioned_p (operands, 3))
 return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
 
   return "vshuf32x4\t{%3, %2, %1, 
%0|%0, %1, %2, %3}";
@@ -26776,10 +26778,13 @@
else
  return "vmovaps\t{%2, %0|%0, %2}";
   }
-if ((mask & 0xbb) == 18)
-  return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
-if ((mask & 0xbb) == 48)
-  return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
+if (!x86_evex_reg_mentioned_p (operands, 3))
+  {
+   if ((mask & 0xbb) == 18)
+ return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
+   if ((mask & 0xbb) == 48)
+ return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
+  }
 return "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}";
   }
   [(set_attr "type" "sselog")
@@ -27433,10 +27438,13 @@
&& avx_vperm2f128_parallel (operands[3], mode)"
 {
   int mask = avx_vperm2f128_parallel (operands[3], mode) - 1;
-  if ((mask & 0xbb) == 0x12)
-return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
-  if ((mask & 0xbb) == 0x30)
-return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
+  if (!x86_evex_reg_mentioned_p (operands, 3))
+{
+  if ((mask & 0xbb) == 0x12)
+   return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
+  if ((mask & 0xbb) == 0x30)
+   return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
+}
   if ((mask & 0xbb) == 0x20)
 return "vinsert\t{$1, %x2, %1, %0|%0, %1, %x2, 1}";
   operands[3] = GEN_INT (mask);
diff --git a/gcc/testsuite/gcc.target/i386/pr112435-1.c 
b/gcc/testsuite/gcc.target/i386/pr112435-1.c
new file mode 100644
index 000..ff56523b4e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112435-1.c
@@ -0,0 +1,14 @@
+/* PR target/112435 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Ofast -march=sapphirerapids" } */
+/* { dg-final { scan-assembler-not "vblendps" } } */
+
+#include
+
+__m256i
+f(__m256i a, __m256i  b)
+{
+  register __m256i t __asm__("ymm17") = a;
+  asm("":"+v"(t));
+  return _mm256_shuffle_i32x4 (t, b, 2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr112435-2.c 
b/gcc/testsuite/gcc.target/i386/pr112435-2.c
new file mode 100644
index 000..27ba80b1e68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112435-2.c
@@ -0,0 +1,64 @@
+/* PR target/112435 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Ofast -march=sapphirerapids" } */
+/* { dg-final { scan-assembler-not "vblendps.*ymm17\$" } } */
+
+#include
+
+/* Vpermi128/Vpermf128 */
+__m256i
+perm0 (__m256i a, __m256i b)
+{
+  register __m256i t __asm__("ymm17") = a;
+  asm("":"+v"(t));
+  return _mm256_permute2x128_si256 (t, b, 50);
+}
+
+__m256i
+perm1 (__m256i a, __m256i b)
+{
+  register __m256i t __asm__("ymm17") = a;
+  asm("":"+v"(t));
+  return _mm256_permute2x128_si256 (t, b, 18);
+}
+
+__m256i
+perm2 (__m256i a, __m256i b)
+{
+  register __m256i t __asm__("ymm17") = a;
+  asm("":"+v"(t));
+  return _mm256_permute2x128_si256 (t, b, 48);
+}
+
+/* vshuf{i,f}{32x4,64x2} ymm .*/
+__m256i
+shuff0 (__m256i a, __m256i b)
+{
+  register __m256i t __asm__("ymm17") = a;
+  asm("":"+v"(t));
+  return _mm256_shuffle_i32x4(t, b, 2);
+}
+
+__m256
+shuff1 (__m256 a, __m256 b)
+{
+  register __m256 t __asm__("ymm17") = a;
+  asm("":"+v"(t)

Re: [PATCH] Avoid generate vblendps with ymm16+

2023-11-08 Thread Hongtao Liu

On Thu, Nov 9, 2023 at 3:15 PM Hu, Lin1  wrote:
>
> This patch aims to avoid generate vblendps with ymm16+, And have
> bootstrapped and tested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/112435
> * config/i386/sse.md: Adding constraints to restrict the generation of
> vblendps.
It should be "Don't output vblendps when evex sse reg or gpr32 is involved."
Others LGTM.
>
> gcc/testsuite/ChangeLog:
>
> PR target/112435
> * gcc.target/i386/pr112435-1.c: New test.
> * gcc.target/i386/pr112435-2.c: Ditto.
> * gcc.target/i386/pr112435-3.c: Ditto.
> ---
>  gcc/config/i386/sse.md | 28 +---
>  gcc/testsuite/gcc.target/i386/pr112435-1.c | 14 
>  gcc/testsuite/gcc.target/i386/pr112435-2.c | 64 ++
>  gcc/testsuite/gcc.target/i386/pr112435-3.c | 79 ++
>  4 files changed, 175 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112435-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112435-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112435-3.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 33198756bb0..666f931c88d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -19254,7 +19254,8 @@
>mask = INTVAL (operands[3]) / 2;
>mask |= (INTVAL (operands[5]) - 4) / 2 << 1;
>operands[3] = GEN_INT (mask);
> -  if (INTVAL (operands[3]) == 2 && !)
> +  if (INTVAL (operands[3]) == 2 && !
> +  && !x86_evex_reg_mentioned_p (operands, 3))
>  return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
>return "vshuf64x2\t{%3, %2, %1, 
> %0|%0, %1, %2, %3}";
>  }
> @@ -19414,7 +19415,8 @@
>mask |= (INTVAL (operands[7]) - 8) / 4 << 1;
>operands[3] = GEN_INT (mask);
>
> -  if (INTVAL (operands[3]) == 2 && !)
> +  if (INTVAL (operands[3]) == 2 && !
> +  && !x86_evex_reg_mentioned_p (operands, 3))
>  return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
>
>return "vshuf32x4\t{%3, %2, %1, 
> %0|%0, %1, %2, %3}";
> @@ -26776,10 +26778,13 @@
> else
>   return "vmovaps\t{%2, %0|%0, %2}";
>}
> -if ((mask & 0xbb) == 18)
> -  return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
> -if ((mask & 0xbb) == 48)
> -  return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +if (!x86_evex_reg_mentioned_p (operands, 3))
> +  {
> +   if ((mask & 0xbb) == 18)
> + return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
> +   if ((mask & 0xbb) == 48)
> + return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +  }
>  return "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}";
>}
>[(set_attr "type" "sselog")
> @@ -27433,10 +27438,13 @@
> && avx_vperm2f128_parallel (operands[3], mode)"
>  {
>int mask = avx_vperm2f128_parallel (operands[3], mode) - 1;
> -  if ((mask & 0xbb) == 0x12)
> -return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
> -  if ((mask & 0xbb) == 0x30)
> -return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +  if (!x86_evex_reg_mentioned_p (operands, 3))
> +{
> +  if ((mask & 0xbb) == 0x12)
> +   return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
> +  if ((mask & 0xbb) == 0x30)
> +   return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +}
>if ((mask & 0xbb) == 0x20)
>  return "vinsert\t{$1, %x2, %1, %0|%0, %1, %x2, 1}";
>operands[3] = GEN_INT (mask);
> diff --git a/gcc/testsuite/gcc.target/i386/pr112435-1.c 
> b/gcc/testsuite/gcc.target/i386/pr112435-1.c
> new file mode 100644
> index 000..ff56523b4e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112435-1.c
> @@ -0,0 +1,14 @@
> +/* PR target/112435 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-Ofast -march=sapphirerapids" } */
> +/* { dg-final { scan-assembler-not "vblendps" } } */
> +
> +#include
> +
> +__m256i
> +f(__m256i a, __m256i  b)
> +{
> +  register __m256i t __asm__("ymm17") = a;
> +  asm("":"+v"(t));
> +  return _mm256_shuffle_i32x4 (t, b, 2);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr112435-2.c 
> b/gcc/testsuite/gcc.target/i386/pr112435-2.c
> new file mode 100644
> index 000..27ba80b1e68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112435-2.c
> @@ -0,0 +1,64 @@
> +/* PR target/112435 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-Ofast -march=sapphirerapids" } */
> +/* { dg-final { scan-assembler-not "vblendps.*ymm17\$" } } */
> +
> +#include
> +
> +/* Vpermi128/Vpermf128 */
> +__m256i
> +perm0 (__m256i a, __m256i b)
> +{
> +  register __m256i t __asm__("ymm17") = a;
> +  asm("":"+v"(t));
> +  return _mm256_permute2x128_si256 (t, b, 50);
> +}
> +
> +__m256i
> +perm1 (__m256i a, __m256i b)
> +{
> +  register __m256i t __asm__("ymm17") = a;
> +  asm("":"+v"(t));
> +  return _mm256_permute2x128_si256 (t, b, 18);
> +}
> +
> +__m256i
> +perm2 (__m256i a, __m25

Re: [PATCH] testsuite: xfail scev-[35].c on ia32

2023-11-08 Thread Richard Biener

On Wed, 8 Nov 2023, Alexandre Oliva wrote:

> 
> These gimplefe tests never got the desired optimization on ia32, but
> they only started visibly failing when the representation of MEMs in
> dumps changed from printing 'symbol: a' to '&a'.
> 
> The transformation is not considered profitable on ia32, that's why it
> doesn't take place.  Maybe that's a bug in itself, but it's not a
> regression, and not something to be noisy about.
> 
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
> x86_64-.  Ok to install?

OK.

> (Richi, is the non-optimization choice on ia32 something unexpected that
> ought to be looked into?  I could file a PR, and maybe even look into it
> a bit further.)

There might be even a PR already.  The testcase expects that IVOPTs
chooses an IV that satisfies both

 a_p = &a[i_12];

and

 *&a[i_12] = 100;

basically code-generating a LEA, a store of the address and an
register indirect memory access.  That's what happens for 64bit
(and presumably on all other archs).  For some reason (I can only
guess costing), on ia32 we choose to prioritize using a single
induction variable (we need the original GIV for the exit test)
and so we get an obfuscated LEA for the address store and a
base with scaled index access for the store.

Note the testcase is a bit "bad" because we later sink the store
to a_p, so the generated assembly for the ia32 looks actually better.

Richard.


> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/tree-ssa/scev-3.c: xfail on ia32.
>   * gcc.dg/tree-ssa/scev-5.c: Likewise.
> 
> Issue: gcc#155
> TN: W517-007
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-3.c |2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/scev-5.c |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
> index 4babd33f5c062..ac8c8d4519e30 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
> @@ -40,4 +40,4 @@ __BB(6):
>  
>  }
>  
> -/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" } } */
> +/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" { xfail ia32 } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
> index c2feebdfc2489..c911a9298866f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
> @@ -40,4 +40,4 @@ __BB(6):
>  
>  }
>  
> -/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" } } */
> +/* { dg-final { scan-tree-dump-times "&a" 1 "ivopts" { xfail ia32 } } } */
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] minimal support for xtheadv

2023-11-08 Thread Kito Cheng

Hi Yi Xuan:

This patch is trivial, and generally LGTM, but I would require putting
the spec into https://github.com/riscv-non-isa/riscv-toolchain-conventions
before merging this, also don't forget include "RISC-V:" in the title,
it would be easier to track during the RISC-V GCC sync meeting :)

And I am a little bit confused by the author's info? is it from you or
"XYenChi "? or oriachi...@gmail.com is also your
mail address?

cc Christoph since I believe you may know more about that process.
cc JoJo since you are T-head folk :P


On Wed, Nov 8, 2023 at 9:13 PM  wrote:
>
> From: XYenChi 
>
> This patch is for support xtheadv.
>
> gcc/ChangeLog:
>
> 2023-11-08  Chen Yixuan  
>
> * common/config/riscv/riscv-common.cc: Add xthead minimal support.
>
> gcc/config/ChangeLog:
>
> 2023-11-08  Chen Yixuan  
>
> * riscv/riscv.opt: Add xthead minimal support.
> ---
>  gcc/common/config/riscv/riscv-common.cc | 2 ++
>  gcc/config/riscv/riscv.opt  | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 526dbb7603b..d5ea0ee9b70 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -325,6 +325,7 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"xtheadv",ISA_SPEC_CLASS_NONE, 0, 7},
>
>{"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
>
> @@ -1680,6 +1681,7 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"xtheadmemidx",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
>{"xtheadmempair", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
>{"xtheadsync",&gcc_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
> +  {"xtheadv",   &gcc_options::x_riscv_xthead_subext, MASK_XTHEADV},
>
>{"xventanacondops", &gcc_options::x_riscv_xventana_subext, 
> MASK_XVENTANACONDOPS},
>
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 70d78151cee..2bbdf680fa2 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -438,6 +438,8 @@ Mask(XTHEADMEMPAIR) Var(riscv_xthead_subext)
>
>  Mask(XTHEADSYNC)Var(riscv_xthead_subext)
>
> +Mask(XTHEADV)   Var(riscv_xthead_subext)
> +
>  TargetVariable
>  int riscv_xventana_subext
>
> --
> 2.42.0
>

[PATCH] RISC-V: Fix the illegal operands for the XTheadMemidx extension.

2023-11-08 Thread Jin Ma

The pattern "*extend2_bitmanip" and
"*zero_extendhi2_bitmanip" in bitmanip.md are similar
to the pattern "*th_memidx_bb_extendqi2" and
"*th_memidx_bb_zero_extendhi2" in thead.md, which will
cause the wrong instruction to be generated and report the
following error in binutils:
Assembler messages:
Error: illegal operands `lb a5,(a0),1,0'

In fact, the correct instruction is "th.lbia a5,(a0),1,0".

gcc/ChangeLog:

* config/riscv/bitmanip.md: Avoid the conflict between
zbb and xtheadmemidx in patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-uindex-zbb.c: New test.
---
 gcc/config/riscv/bitmanip.md  |  4 +--
 .../riscv/xtheadfmemidx-uindex-zbb.c  | 30 +++
 2 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-zbb.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index a9c8275fca7..878395c3ffa 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -290,7 +290,7 @@ (define_insn "*di2"
 (define_insn "*zero_extendhi2_bitmanip"
   [(set (match_operand:GPR 0 "register_operand" "=r,r")
 (zero_extend:GPR (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
-  "TARGET_ZBB"
+  "TARGET_ZBB  && !TARGET_XTHEADMEMIDX"
   "@
zext.h\t%0,%1
lhu\t%0,%1"
@@ -301,7 +301,7 @@ (define_insn "*extend2_bitmanip"
   [(set (match_operand:SUPERQI   0 "register_operand" "=r,r")
(sign_extend:SUPERQI
(match_operand:SHORT 1 "nonimmediate_operand" " r,m")))]
-  "TARGET_ZBB"
+  "TARGET_ZBB && !TARGET_XTHEADMEMIDX"
   "@
sext.\t%0,%1
l\t%0,%1"
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-zbb.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-zbb.c
new file mode 100644
index 000..a05bc220cba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-zbb.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
+/* { dg-options "-march=rv64gc_zbb_xtheadmemidx -mabi=lp64d" { target { rv64 } 
} } */
+/* { dg-options "-march=rv32imafc_zbb_xtheadmemidx -mabi=ilp32f" { target { 
rv32 } } } */
+
+const unsigned char *
+read_uleb128(const unsigned char *p, unsigned long *val)
+{
+  unsigned int shift = 0;
+  unsigned char byte;
+  unsigned long result;
+
+  result = 0;
+  do
+  {
+byte = *p++;
+result |= ((unsigned long)byte & 0x7f) << shift;
+shift += 7;
+  } while (byte & 0x80);
+
+  *val = result;
+  return p;
+}
+
+void test(const unsigned char *p, unsigned long utmp)
+{
+  p = read_uleb128(p, &utmp);
+}
+
+/* { dg-final { scan-assembler-not {\mlb\ta[0-9],\(a[0-9]\),1,0\M} } } */

base-commit: 04d8a47608dcae7f61805e3566e3a1571b574405
-- 
2.17.1

90 matches

Mail list logo