Re: [Patch, Fortran/90068] Add finalizer creation to array constructor for functions of derived type.

2024-06-07 Thread Paul Richard Thomas
Hi Andre,

I had been working in exactly the same area to correct the implementation
of finalization of function results in array constructors. However, I
couldn't see a light way of having the finalization occur at the correct
time; "If an executable construct references a nonpointer function, the
result is finalized after execution of the innermost executable construct
containing the reference." This caused all manner of difficulty with
assignment. I'll come back to this.

In the meantime, preventing memory leaks should take priority. This is fine
for mainline.

Thanks

Paul


On Wed, 5 Jun 2024 at 10:47, Andre Vehreschild  wrote:

> Hi Fortraneers,
>
> another patch to fix a memory leak. This time temporaries created during an
> array construction did not have their finalizers called. I.e. allocated
> memory
> was not freed. The attached patch addresses this issue.
>
> Regtested ok on x86_64/Fedora 39. Ok for trunk?
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


[PATCH] bitint: Fix up lower_addsub_overflow [PR115352]

2024-06-07 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled because of a flawed optimization.
If one changes the 65 in the testcase to e.g. 66, one gets:
...
  _25 = .USUBC (0, _24, _14);
  _12 = IMAGPART_EXPR <_25>;
  _26 = REALPART_EXPR <_25>;
  if (_23 >= 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  if (_23 != 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  _27 = (signed long) _26;
  _28 = _27 >> 1;
  _29 = (unsigned long) _28;
  _31 = _29 + 1;
  _30 = _31 > 1;
  goto ; [100.00%]

   :
  _32 = _26 != _18;
  _33 = _22 | _32;

   :
  # _17 = PHI <_30(9), _22(7), _33(10)>
  # _19 = PHI <_29(9), _18(7), _18(10)>
...
so there is one path for limbs below the boundary (in this case there are
actually no limbs there, maybe we could consider optimizing that further,
say with simply folding that _23 >= 1 condition to 1 == 1 and letting
cfg cleanup handle it), another case where it is exactly the limb on the
boundary (that is the bb 9 handling where it extracts the interesting
bits (the first 3 statements) and then checks if it is zero or all ones and
finally the case of limbs above that where it compares the current result
limb against the previously recorded 0 or all ones and ors differences into
accumulated result.

Now, the optimization which the first hunk removes was based on the idea
that for that case the extraction of the interesting bits from the limb
don't need anything special, so the _27/_28/_29 statements above aren't
needed, the whole limb is interesting bits, so it handled the >= 1
case like the bb 9 above without the first 3 statements and bb 10 wasn't
there at all.  There are 2 problems with that, for the higher limbs it
only checks if the the result limb bits are all zeros or all ones, but
doesn't check if they are the same as the other extension bits, and
it forgets the previous flag whether there was an overflow.
First I wanted to fix it just by adding the _33 = _22 | _30; statement
to the end of bb 9 above, which fixed the originally filed huge testcase
and the first 2 foo calls in the testcase included in the patch, it no
longer forgets about previously checked differences from 0/1.
But as the last 2 foo calls show, it still didn't check whether each
even (or each odd depending on the exact position) result limb is
equal to the first one, so every second limb it could choose some other
0 vs. all ones value and as long as it repeated in another limb above it
it would be ok.

So, the optimization just can't work properly and the following patch
removes it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/14.2?

2024-06-07  Jakub Jelinek  

PR middle-end/115352
* gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable
single_comparison if cmp_code is GE_EXPR.

* gcc.dg/torture/bitint-71.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-04-12 10:59:48.233153262 +0200
+++ gcc/gimple-lower-bitint.cc  2024-06-06 12:06:57.065717651 +0200
@@ -4286,11 +4286,7 @@ bitint_large_huge::lower_addsub_overflow
  bool single_comparison
= (startlimb + 2 >= fin || (startlimb & 1) != (i & 1));
  if (!single_comparison)
-   {
- cmp_code = GE_EXPR;
- if (!check_zero && (start % limb_prec) == 0)
-   single_comparison = true;
-   }
+   cmp_code = GE_EXPR;
  else if ((startlimb & 1) == (i & 1))
cmp_code = EQ_EXPR;
  else
--- gcc/testsuite/gcc.dg/torture/bitint-71.c.jj 2024-06-06 12:20:55.824913276 
+0200
+++ gcc/testsuite/gcc.dg/torture/bitint-71.c2024-06-06 12:20:45.260044338 
+0200
@@ -0,0 +1,28 @@
+/* PR middle-end/115352 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 385
+int
+foo (_BitInt (385) b)
+{
+  return __builtin_sub_overflow_p (0, b, (_BitInt (65)) 0);
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 385
+  if (!foo (-(_BitInt (385)) 
0x0c377e8a3fd1881fff035bb487a51c9ed1f7350befa7ec445a3cf8d1ebb723981wb))
+__builtin_abort ();
+  if (!foo 
(-0x1c377e8a3fd1881fff035bb487a51c9ed1f7350befa7ec445a3cf8d1ebb723981uwb))
+__builtin_abort ();
+  if (!foo (-(_BitInt (385)) 
0x0a3cf8d1ebb723981wb))
+__builtin_abort ();
+  if (!foo 
(-0x1a3cf8d1ebb723981uwb))
+__builtin_abort ();
+#endif
+}

Jakub



[RE] [v2] RISC-V: Add Zfbfmin extension

2024-06-07 Thread Jin Ma
Hi,
 
Is there a plan to implement zvfbfmin and zvfbfwma? Or how can I get the 
relevant patches
in advance for testing? By the way, The LLVM seems to be fully implemented now 
:-)

Ref:

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/293

https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/auto-generated/bfloat16/intrinsic_funcs.adoc



Thanks,
Jin


Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-07 Thread Ajit Agarwal
Hello Richard:

On 07/06/24 4:24 am, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> On 06/06/24 8:03 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 On 06/06/24 2:28 pm, Richard Sandiford wrote:
> Hi,
>
> Just some comments on the fuseable_load_p part, since that's what
> we were discussing last time.
>
> It looks like this now relies on:
>
> Ajit Agarwal  writes:
>> +  /* We use DF data flow because we change location rtx
>> + which is easier to find and modify.
>> + We use mix of rtl-ssa def-use and DF data flow
>> + where it is easier.  */
>> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>> +  df_analyze ();
>> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>
> But please don't do this!  For one thing, building DU/UD chains
> as well as rtl-ssa is really expensive in terms of compile time.
> But more importantly, modifications need to happen via rtl-ssa
> to ensure that the IL is kept up-to-date.  If we don't do that,
> later fuse attempts will be based on stale data and so could
> generate incorrect code.
>

 Sure I have made changes to use only rtl-ssa and not to use
 UD/DU chains. I will send the changes in separate subsequent
 patch.
>>>
>>> Thanks.  Before you send the patch though:
>>>
>> +// Check whether load can be fusable or not.
>> +// Return true if fuseable otherwise false.
>> +bool
>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>> +{
>> +  for (auto def : info->defs())
>> +{
>> +  auto set = dyn_cast (def);
>> +  for (auto use1 : set->nondebug_insn_uses ())
>> +use1->set_is_live_out_use (true);
>> +}
>
> What was the reason for adding this loop?
>

 The purpose of adding is to avoid assert failure in 
 gcc/rtl-ssa/changes.cc:252
>>>
>>> That assert is making sure that we don't delete a definition of a
>>> register (or memory) while a real insn still uses it.  If the assert
>>> is firing then something has gone wrong.
>>>
>>> Live-out uses are a particular kind of use that occur at the end of
>>> basic blocks.  It's incorrect to mark normal insn uses as live-out.
>>>
>>> When an assert fails, it's important to understand why the failure
>>> occurs, rather than brute-force the assert condition to true.
>>>
>>
>> The above assert failure occurs when there is a debug insn and its
>> use is not live-out.
> 
> Uses in debug insns are never live-out uses.
> 
> It sounds like the bug is that we're failing to update all debug uses of
> the original register.  We need to do that, or "reset" the debug insn if
> substitution fails for some reason.
> 
> See fixup_debug_uses for what the target-independent part of the pass
> does for debug insns that are affected by movement.  Hopefully the
> update needed here will be simpler than that.
> 

Sure. Thanks.

>> [...]
>> +
>> +  rtx addr = XEXP (SET_SRC (body), 0);
>> +
>> +  if (GET_CODE (addr) == PLUS
>> +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>> +{
>> +  if (INTVAL (XEXP (addr, 1)) == -16)
>> +return false;
>> +  }
>
> What's special about -16?
>

 The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
 Thats why I have added this check.
>>>
>>> But why does it fail though?  It sounds like the testcase is pointing
>>> out a problem in the pass (or perhaps elsewhere).  It's important that
>>> we try to understand and fix the underlying problem.
>>>
>>
>> This check is not required anymore and will remove from subsequent patches.
> 
> OK, great.
> 

Thanks.

>> +
>> +  df_ref use;
>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>> +{
>> +  struct df_link *def_link = DF_REF_CHAIN (use);
>> +
>> +  if (!def_link || !def_link->ref
>> +  || DF_REF_IS_ARTIFICIAL (def_link->ref))
>> +continue;
>> +
>> +  while (def_link && def_link->ref)
>> +{
>> +  rtx_insn *insn = DF_REF_INSN (def_link->ref);
>> +  if (GET_CODE (PATTERN (insn)) == PARALLEL)
>> +return false;
>
> Why do you need to skip PARALLELs?
>

 vec_select with parallel give failures final.cc "can't split-up with 
 subreg 128 (reg OO"
 Thats why I have added this.
>>>
>>> But in (vec_select ... (parallel ...)), the parallel won't be the 
>>> PATTERN (insn).  It'll instead be a suboperand of the vec_select.
>>>
>>> Here too it's important to understand why the final.cc failure occurs
>>> and what the correct fix is.
>>>
>>
>> subreg with vec_select operand already exists before fusion pass.
>> We overwrite them with subreg 128 bits from 256 OO mode operand.
> 
> But why is that wrong?  What was the full rtl o

Re: [PATCH 1/4] Relax COND_EXPR reduction vectorization SLP restriction

2024-06-07 Thread Kugan Vivekanandarajah
Hi Richard,

This seems to have introduced a regression. I am seeing ICE while
building TSVC_2 for AARCH64
with -O3 -flto -mcpu=neoverse-v2 -msve-vector-bits=128

tsvc.c: In function 's331':
tsvc.c:2744:8: internal compiler error: Segmentation fault
 2744 | real_t s331(struct args_t * func_args)
  |^
0xdfc23b crash_signal
/var/jenkins/workspace/GCC_Nightly/gcc/toplev.cc:319
0xa3a6f8 phi_nodes_ptr(basic_block_def*)
/var/jenkins/workspace/GCC_Nightly/gcc/gimple.h:4701
0xa3a6f8 gsi_start_phis(basic_block_def*)
/var/jenkins/workspace/GCC_Nightly/gcc/gimple-iterator.cc:937
0xa3a6f8 gsi_for_stmt(gimple*)
/var/jenkins/workspace/GCC_Nightly/gcc/gimple-iterator.cc:621
0x1e5f22f vectorizable_condition
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-stmts.cc:12577
0x1e7a027 vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-stmts.cc:13467
0x1112653 vect_schedule_slp_node
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:9729
0x1127757 vect_schedule_slp_node
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:9522
0x1127757 vect_schedule_scc
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:10017
0x11285ff vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap,
vl_ptr> const&)
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:10110
0x10f56b7 vect_transform_loop(_loop_vec_info*, gimple*)
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-loop.cc:12114
0x1138c7f vect_transform_loops
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1007
0x1139307 try_vectorize_loop_1
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1153
0x1139307 try_vectorize_loop
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1183
0x113967b execute
/var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1299
Please submit a full bug report, with preprocessed source (by using
-freport-bug).

Please let me know if you need a reduced testcase.

Thanks,
Kugan


[PATCH] Fix fold-left reduction vectorization with multiple stmt copies

2024-06-07 Thread Richard Biener
There's a typo when code generating the mask operand for conditional
fold-left reductions in the case we have multiple stmt copies.  The
latter is now allowed for SLP and possibly disabled for non-SLP by
accident.

This fixes the observed run-FAIL for
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c with AVX512
and 256bit sized vectors.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-loop.cc (vectorize_fold_left_reduction): Fix
mask vector operand indexing.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ceb92156b58..028692614bb 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7217,7 +7217,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
   else if (is_cond_op)
-   mask = vec_opmask[0];
+   mask = vec_opmask[i];
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
{
  len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
-- 
2.35.3


Re: [PATCH 1/4] Relax COND_EXPR reduction vectorization SLP restriction

2024-06-07 Thread Richard Biener
On Fri, 7 Jun 2024, Kugan Vivekanandarajah wrote:

> Hi Richard,
> 
> This seems to have introduced a regression. I am seeing ICE while
> building TSVC_2 for AARCH64
> with -O3 -flto -mcpu=neoverse-v2 -msve-vector-bits=128
> 
> tsvc.c: In function 's331':
> tsvc.c:2744:8: internal compiler error: Segmentation fault
>  2744 | real_t s331(struct args_t * func_args)
>   |^
> 0xdfc23b crash_signal
> /var/jenkins/workspace/GCC_Nightly/gcc/toplev.cc:319
> 0xa3a6f8 phi_nodes_ptr(basic_block_def*)
> /var/jenkins/workspace/GCC_Nightly/gcc/gimple.h:4701
> 0xa3a6f8 gsi_start_phis(basic_block_def*)
> /var/jenkins/workspace/GCC_Nightly/gcc/gimple-iterator.cc:937
> 0xa3a6f8 gsi_for_stmt(gimple*)
> /var/jenkins/workspace/GCC_Nightly/gcc/gimple-iterator.cc:621
> 0x1e5f22f vectorizable_condition
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-stmts.cc:12577
> 0x1e7a027 vect_transform_stmt(vec_info*, _stmt_vec_info*,
> gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-stmts.cc:13467
> 0x1112653 vect_schedule_slp_node
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:9729
> 0x1127757 vect_schedule_slp_node
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:9522
> 0x1127757 vect_schedule_scc
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:10017
> 0x11285ff vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap,
> vl_ptr> const&)
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:10110
> 0x10f56b7 vect_transform_loop(_loop_vec_info*, gimple*)
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-loop.cc:12114
> 0x1138c7f vect_transform_loops
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1007
> 0x1139307 try_vectorize_loop_1
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1153
> 0x1139307 try_vectorize_loop
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1183
> 0x113967b execute
> /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1299
> Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> 
> Please let me know if you need a reduced testcase.

Please open a bugzilla with a reduced testcase.

Thanks,
Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak

2024-06-07 Thread Andre Vehreschild
Hi Paul,

thank you for the review. No need to apologize. I am happily working on and will
ping if I get no reviews.

Btw, Mikael, Nikolas and I are covered by the same funding and agreed to not
review each others work to prevent any "smells", like "they follow there own
agenda". We can of course be triggered by the community to do a second review
of each others work, when no one has enough expertise in the area worked on.

The patch has been commited to master as gcc-15-1090-g51046e46ae6

> PS That's good news about the funding. Maybe we will get to see "built in"
> coarrays soon?

You hopefully will see Nikolas work on the shared memory coarray support, if
that is what you mean by "built in" coarrays. I will be working on the
distributed memory coarray support esp. fixing the module issues and some other
team related things.

Thanks again for the review.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH] OpenMP: warn about iteration var modifications in loop body

2024-06-07 Thread Jakub Jelinek
On Wed, Mar 06, 2024 at 06:08:47PM +0100, Frederik Harwath wrote:
> Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

Note, the partially rewritten OpenMP loop transformations changes are now
in.
See below.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -235,6 +235,8 @@ struct gimplify_omp_ctx
>bool order_concurrent;
>bool has_depend;
>bool in_for_exprs;
> +  bool in_omp_for_body;
> +  bool is_doacross;
>int defaultmap[5];
>  };
>  
> @@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
>c->privatized_types = new hash_set;
>c->location = input_location;
>c->region_type = region_type;
> +  c->loop_iter_var.create (0);
> +  c->in_omp_for_body = false;
> +  c->is_doacross = false;

I'm not sure it is a good idea to reuse loop_iter_var for this.

>if ((region_type & ORT_TASK) == 0)
>  c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
>else
> @@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
> || TREE_CODE (*expr_p) == INIT_EXPR);
>  
> +  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
> +{
> +  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
> +  for (size_t i = 0; i < num_vars; i++)
> + {
> +   if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
> + warning_at (input_location, OPT_Wopenmp,
> + "forbidden modification of iteration variable %qE in "
> + "OpenMP loop", *to_p);

I think the forbidden word doesn't belong there, just modification of ...

Note, your patch seems to handle just one gimplify_omp_ctxp, not all.
If I do:
#pragma omp for
for (int i = 0; i < 32; ++i)
{
  ++i; // This is warned about
  #pragma omp parallel shared (i)
  #pragma omp master
  ++i; // This is not
  #pragma omp parallel private (i)
  ++i; // This should not
  #pragma omp target map(tofrom:i)
  ++i; // This is not
  #pragma omp target firstprivate (i)\
  ++i; // This should not
  #pragma omp simd
  for (i = 0; i < 32; ++i) // This is not
;
}
The question is if it isn't just too hard to figure out the data sharing
in nested constructs.  But to be useful, perhaps at least loop
transformation constructs which don't have any privatization on the
iterators (pending the resolution of the data sharing loop transformation
issue) should be handled.

> @@ -15380,23 +15398,22 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>gcc_assert (DECL_P (decl));
>gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl))
> || POINTER_TYPE_P (TREE_TYPE (decl)));
> -  if (is_doacross)
> +
> +  if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt))

There is nothing specific about OMP_FOR for the orig decls, the reason
why the check is (probably) there is that simd construct has extra
restriction:
"The only random access iterator types that are allowed for the associated 
loops are pointer
types."
and so there is no point at looking at the orig decls for say for simd 
ordered(2)
doacross loops.
I was worried your patch wouldn't handle
void bar (int &);

void
foo ()
{
  int i;
  #pragma omp for
  for (i = 0; i < 32; ++i)
bar (i);
}
where because the IV is addressable we actually choose to use an artificial
IV and assign i = i.0; at the start of the loop body, but apparently that
works right (though maybe it should go into the testsuite), supposedly we
emit it in gimplify_omp_for in GIMPLE before actually gimplifying the actual
OMP_FOR_BODY (but it is an assignment in there).

Anyway, what the patch certainly doesn't handle is the loop transformations.
The tile/unroll partial as done right now have the inter-tile emitted into
the OMP_FOR body, so both the initial assignment and the increment in there
would trigger the warning.  I guess similarly for reverse construct when
implemented.  Furthermore, the generated loops together with associated
ORIG_DECLs move to whatever outer construct loop needs them.

So, I think instead of doing it during gimplification of actual statements,
we should do it through a walk_tree on the bodies, done perhaps from inside
of omp_maybe_apply_loop_xforms or better right before that and mark through some
new flag loops whose bodies were walked for the diagnostics so that we don't
do that again.  Just have one hash map based on say DECL_UID into which we
mark all the loop iterators which should be warned about,
*walk_subtrees = 0; for OpenMP constructs which could privatize stuff
because it would be too difficult to handle but walk using a separate
walk_tree the loop transformation constructs and normally walk say
OMP_CRITICAL, OMP_MASKED and other constructs which never privatize stuff.
So, handle say
#pragma omp for
#pragma omp tile sizes (2, 2)
for (int i = 0; i < 32; ++i)
for (int j = 0; j < 32; ++j)
{
  ++i; // warn here; this is in the end generated loop of for, b

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-07 Thread Richard Sandiford
Ajit Agarwal  writes:
>>> +
>>> +  df_ref use;
>>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>>> +{
>>> +  struct df_link *def_link = DF_REF_CHAIN (use);
>>> +
>>> +  if (!def_link || !def_link->ref
>>> + || DF_REF_IS_ARTIFICIAL (def_link->ref))
>>> +   continue;
>>> +
>>> +  while (def_link && def_link->ref)
>>> +   {
>>> + rtx_insn *insn = DF_REF_INSN (def_link->ref);
>>> + if (GET_CODE (PATTERN (insn)) == PARALLEL)
>>> +   return false;
>>
>> Why do you need to skip PARALLELs?
>>
>
> vec_select with parallel give failures final.cc "can't split-up with 
> subreg 128 (reg OO"
> Thats why I have added this.

 But in (vec_select ... (parallel ...)), the parallel won't be the 
 PATTERN (insn).  It'll instead be a suboperand of the vec_select.

 Here too it's important to understand why the final.cc failure occurs
 and what the correct fix is.

>>>
>>> subreg with vec_select operand already exists before fusion pass.
>>> We overwrite them with subreg 128 bits from 256 OO mode operand.
>> 
>> But why is that wrong?  What was the full rtl of the subreg before the
>> pass runs, what did the subreg look like after the pass, and why is the
>> change not correct?
>> 
>> In general, there are two main ways that an rtl change can be incorrect:
>> 
>> (1) The new rtl isn't well-formed (such as (subreg (subreg X A) B)).
>> In this case, the new rtl makes no inherent sense when viewed
>> in isolation: it isn't necessary to see the old rtl to tell that
>> the new rtl is wrong.
>> 
>> (2) The new rtl is well-formed (i.e. makes inherent sense when viewed in
>> isolation) but it does not have the same semantics as the old rtl.
>> In other words, the new rtl is describing a different operation
>> from the old rtl.
>> 
>> I think we need to talk about it in those terms, rather than where
>> the eventual ICE occurs.
>> 
> Before the fusion.
> old rtl looks like this:
>
> (vec_select:HI (subreg:V8HI (reg:V16QI 125 [ vect__29.38 ]) 0)
>
> After the fusion
> new rtl looks like this:
>
> (vec_select:HI (subreg:V16QI (reg:OO 125 [ vect__29.38 ]) 16)
>
> new rtl is not well formed.
>
> Thats why its failing.
>
> reg:v16QI 125 is the destination of the load that needs to be fused.

This indicates that there's a bug in the substitution code.

It's probably better to create a fresh OO register, rather than
change an existing 128-bit register to 256 bits.  If we do that,
and if reg:V16QI 125 is the destination of the second load
(which I assume it is from the 16 offset in the subreg),
then the new RTL should be:

  (vec_select:HI (subreg:V8HI (reg:OO NEW_REG) 16) ...)

It's possible to get this by using insn_propagation to replace
(reg:V16QI 125) with (subreg:V16QI (reg:OO NEW_REG) 16).
insn_propagation should then take care of the rest.

There are no existing rtl-ssa routines for handling new registers
though.  (The idea was to add things as the need arose.)

>>> Due to this in final.cc we couldnt splt at line 2807 and bails
>>> out fatal_insn.
>>>
>>> Currently we dont support already existing subreg vector operand
>>> to generate register pairs.
>>> We should bail out from fusion pass in this case.
>>> +
>>> + rtx set = single_set (insn);
>>> + if (set == NULL_RTX)
>>> +   return false;
>>> +
>>> + rtx op0 = SET_SRC (set);
>>> + rtx_code code = GET_CODE (op0);
>>> +
>>> + // This check is added as register pairs are not generated
>>> + // by RA for neg:V2DF (fma: V2DF (reg1)
>>> + //  (reg2)
>>> + //  (neg:V2DF (reg3)))
>>> + if (GET_RTX_CLASS (code) == RTX_UNARY)
>>> +   return false;
>>
>> What's special about (neg (fma ...))?
>>
>
> I am not sure why register allocator fails allocating register pairs with
> NEG Unary operation with fma operand. I have not debugged register 
> allocator why the NEG
> Unary operation with fma operand. 

>>>
>>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 bits are
>>> set correctly. 
>>> IRA marked them spill candidates as spill priority is zero.
>>>
>>> Due to this LRA reload pass couldn't allocate register pairs.
>> 
>> I think this is just restating the symptom though.  I suppose the same
>> kind of questions apply here too: what was the instruction before the
>> pass runs, what was the instruction after the pass runs, and why is
>> the rtl change incorrect (by the meaning above)?
>> 
>
> Original case where we dont do load fusion, spill happens, in that
> case we dont require sequential register pairs to be generated for 2 loads
> for. Hence it worked.
>
> rtl change is correct and there is no error.
>
>

Re: [PATCH] [libstdc++] drop workaround for clang<=7 (was: [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__)

2024-06-07 Thread Jonathan Wakely
On Fri, 7 Jun 2024 at 05:43, Alexandre Oliva  wrote:
>
> On May 31, 2024, Alexandre Oliva  wrote:
>
> >> I think we could drop this kluge entirely, clang 7 is old now, we
> >> generally only support the most recent 3 or 4 clang versions.
>
> > Fine with me, but I'd do that in a separate later patch, so that this
> > goes in, and if it gets backported, it will cover this change, rather
> > than miss it.  Though, as you say, it doesn't matter much either way.
>
> In response to a request in the review of the patch that introduced
> _GLIBCXX_CLANG, this patch removes from std/variant an obsolete
> workaround for clang 7-.
>
> Regstrapping on x86_64-linux-gnu.  Ok to install?

OK for trunk, thanks.

I've tested it with Clang 17.0.6


>
>
> for  libstdc++-v3/ChangeLog
>
> * include/std/variant: Drop obsolete workaround.
> ---
>  libstdc++-v3/include/std/variant |5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/variant 
> b/libstdc++-v3/include/std/variant
> index 51aaa62085170..13ea1dd384965 100644
> --- a/libstdc++-v3/include/std/variant
> +++ b/libstdc++-v3/include/std/variant
> @@ -1758,11 +1758,6 @@ namespace __detail::__variant
>   }, __rhs);
>}
>
> -#if defined(_GLIBCXX_CLANG) && __clang_major__ <= 7
> -public:
> -  using _Base::_M_u; // See https://bugs.llvm.org/show_bug.cgi?id=31852
> -#endif
> -
>  private:
>template
> friend constexpr decltype(auto)
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>



Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric [PR108760]

2024-06-07 Thread Jonathan Wakely
On Thu, 6 Jun 2024 at 21:49, Michael Levine (BLOOMBERG/ 731 LEX)
 wrote:
>
> To test the theory that this issue was unrelated to my patch, I moved the 
> out_value_result definition into std/numeric and restored the version of 
> bits/ranges_algobase.h to the version in master. I kept the include line 
> "include " in std/numeric even though it wasn't being 
> used. With the include line I see the same error about __memcmp is not a 
> member of 'std'.

Thanks, I'll fix that separately and then apply your patch for iota.


>
> From: Michael Levine (BLOOMBERG/ 731 LEX) At: 05/30/24 13:43:58 UTC-4:00
> To: jwak...@redhat.com
> Cc: ppa...@redhat.com, gcc-patches@gcc.gnu.org, libstd...@gcc.gnu.org
> Subject: Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric 
> [PR108760]
>
> When I remove  for importing __memcmp (my apologies for 
> writing __memcpy) from libstdc++-v3/include/bits/ranges_algobase.h and try to 
> rerun the code, I get the following error:
>
> In file included from 
> $HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/numeric:69,
> from ranges-iota-fix.cpp:1:
> $HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/bits/ranges_algobase.h: 
> In member function ‘constexpr bool 
> std::ranges::__equal_fn::operator()(_Iter1, _Sent1, _Iter2, _Sent2, _Pred, 
> _Proj1, _Proj2) const’:
> $HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/bits/ranges_algobase.h:143:32:
>  error: ‘__memcmp’ is not a member of ‘std’; did you mean ‘__memcmpable’?
> 143 | return !std::__memcmp(__first1, __first2, __len);
> | ^~~~
> | __memcmpable
>
> From: jwak...@redhat.com At: 05/24/24 10:12:57 UTC-4:00
> To: Michael Levine (BLOOMBERG/ 731 LEX )
> Cc: ppa...@redhat.com, gcc-patches@gcc.gnu.org, libstd...@gcc.gnu.org
> Subject: Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric 
> [PR108760]
>
> On 24/05/24 13:56 -, Michael Levine (BLOOMBERG/ 731 LEX) wrote:
> >I've attached the v3 version of the patch as a single, squashed patch
> containing all of the changes. I manually prepended my sign off to the patch.
>
>
> >Signed-off-by: Michael Levine 
> >---
> >diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> b/libstdc++-v3/include/bits/ranges_algo.h
> >index 62faff173bd..d258be0b93f 100644
> >--- a/libstdc++-v3/include/bits/ranges_algo.h
> >+++ b/libstdc++-v3/include/bits/ranges_algo.h
> >@@ -3521,58 +3521,6 @@ namespace ranges
> >
> > #endif // __glibcxx_ranges_contains
> >
> >-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
> >-
> >- template
> >- struct out_value_result
> >- {
> >- [[no_unique_address]] _Out out;
> >- [[no_unique_address]] _Tp value;
> >-
> >- template
> >- requires convertible_to
> >- && convertible_to
> >- constexpr
> >- operator out_value_result<_Out2, _Tp2>() const &
> >- { return {out, value}; }
> >-
> >- template
> >- requires convertible_to<_Out, _Out2>
> >- && convertible_to<_Tp, _Tp2>
> >- constexpr
> >- operator out_value_result<_Out2, _Tp2>() &&
> >- { return {std::move(out), std::move(value)}; }
> >- };
> >-
> >- template
> >- using iota_result = out_value_result<_Out, _Tp>;
> >-
> >- struct __iota_fn
> >- {
> >- template _Sent,
> weakly_incrementable _Tp>
> >- requires indirectly_writable<_Out, const _Tp&>
> >- constexpr iota_result<_Out, _Tp>
> >- operator()(_Out __first, _Sent __last, _Tp __value) const
> >- {
> >- while (__first != __last)
> >- {
> >- *__first = static_cast(__value);
> >- ++__first;
> >- ++__value;
> >- }
> >- return {std::move(__first), std::move(__value)};
> >- }
> >-
> >- template _Range>
> >- constexpr iota_result, _Tp>
> >- operator()(_Range&& __r, _Tp __value) const
> >- { return (*this)(ranges::begin(__r), ranges::end(__r),
> std::move(__value)); }
> >- };
> >-
> >- inline constexpr __iota_fn iota{};
> >-
> >-#endif // __glibcxx_ranges_iota
> >-
> > #if __glibcxx_ranges_find_last >= 202207L // C++ >= 23
> >
> > struct __find_last_fn
> >diff --git a/libstdc++-v3/include/bits/ranges_algobase.h
> b/libstdc++-v3/include/bits/ranges_algobase.h
> >index e26a73a27d6..965b36aed35 100644
> >--- a/libstdc++-v3/include/bits/ranges_algobase.h
> >+++ b/libstdc++-v3/include/bits/ranges_algobase.h
> >@@ -35,6 +35,7 @@
> > #include 
> > #include 
> > #include 
> >+#include  // __memcpy
>
> Why is this being added here? What is __memcpy?
>
> I don't think out_value_result requires any new headers to be included
> here, does it?
>
> > #include  // ranges::begin, ranges::range etc.
> > #include  // __invoke
> > #include  // __is_byte
> >@@ -70,6 +71,32 @@ namespace ranges
> > __is_move_iterator> = true;
> > } // namespace __detail
> >
> >+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
> >+
> >+ template
> >+ struct out_value_result
> >+ {
> >+ [[no_unique_address]] _Out out;
> >+ [[no_unique_address]] _Tp value;
> >+
> >+ template
> >+ requires convertible_to
> >+ && convertible_to
> >+ constexpr
> >+ operator out_value_result<_Out2, _Tp2>() const &
> >+ { return {out, value}; }
> >+
> >+ template
> >+ requires convertible_to<_Out, _Out2>
> >+ &&

Re: [PATCH] libstdc++: Optimize std::gcd

2024-06-07 Thread Sam James
Stephen Face  writes:

> This patch is to optimize the runtime execution of gcd. Mathematically,
> it computes with the same algorithm as before, but subtractions and
> branches are rearranged to encourage generation of code that can use
> flags from the subtractions for conditional moves. Additionally, most
> pairs of integers are coprime, so this patch also includes a check for
> one of the integers to be equal to 1, and then it will exit the loop
> early in this case.

Is it worth filing a bug for the missed optimisation? You shouldn't have
to write things in a specific order. Thanks.

>
> libstdc++-v3/ChangeLog:
>
>   * include/std/numeric(__gcd): Optimize.
> ---
> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> timing with random distributions of small inputs and large inputs on a
> couple of machines with -O2 and found decreases in execution time from
> 20% to 60% depending on the machine and distribution of inputs.
>
>  libstdc++-v3/include/std/numeric | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/numeric 
> b/libstdc++-v3/include/std/numeric
> index c912db4a519..3c9e8387a0e 100644
> --- a/libstdc++-v3/include/std/numeric
> +++ b/libstdc++-v3/include/std/numeric
> @@ -148,19 +148,20 @@ namespace __detail
>  
>while (true)
>   {
> -   if (__m > __n)
> - {
> -   _Tp __tmp = __m;
> -   __m = __n;
> -   __n = __tmp;
> - }
> +   _Tp __m_minus_n = __m - __n;
> +   if (__m_minus_n == 0)
> + return __m << __k;
>  
> -   __n -= __m;
> +   _Tp __next_m = __m < __n ? __m : __n;
>  
> -   if (__n == 0)
> - return __m << __k;
> +   if (__next_m == 1)
> + return __next_m << __k;
> +
> +   _Tp __n_minus_m = __n - __m;
> +   __n = __n < __m ? __m_minus_n : __n_minus_m;
> +   __m = __next_m;
>  
> -   __n >>= std::__countr_zero(__n);
> +   __n >>= std::__countr_zero(__m_minus_n);
>   }
>  }
>  } // namespace __detail


[PATCH v2 2/2] testsuite: Fix expand-return CMSE test for Armv8.1-M [PR115253]

2024-06-07 Thread Torbjörn SVENSSON
For Armv8.1-M, the clearing of the registers is handled differently than
for Armv8-M, so update the test case accordingly.

gcc/testsuite/ChangeLog:

PR target/115253
* gcc.target/arm/cmse/extend-return.c: Update test case
condition for Armv8.1-M.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 .../gcc.target/arm/cmse/extend-return.c   | 62 +--
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
index 081de0d699f..2288d166bd3 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mcmse -fshort-enums" } */
+/* ARMv8-M expectation with target { ! arm_cmse_clear_ok }.  */
+/* ARMv8.1-M expectation with target arm_cmse_clear_ok.  */
 /* { dg-final { check-function-bodies "**" "" "" } } */
 
 #include 
@@ -20,7 +22,15 @@ typedef enum offset __attribute__ ((cmse_nonsecure_call)) 
ns_enum_foo_t (void);
 typedef bool __attribute__ ((cmse_nonsecure_call)) ns_bool_foo_t (void);
 
 /*
-**unsignNonsecure0:
+**unsignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**unsignNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxtbr0, r0
@@ -32,7 +42,15 @@ unsigned char unsignNonsecure0 (ns_unsign_foo_t * ns_foo_p)
 }
 
 /*
-**signNonsecure0:
+**signNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** sxtbr0, r0
+** ...
+*/
+/*
+**signNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** sxtbr0, r0
@@ -44,7 +62,15 @@ signed char signNonsecure0 (ns_sign_foo_t * ns_foo_p)
 }
 
 /*
-**shortUnsignNonsecure0:
+**shortUnsignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxthr0, r0
+** ...
+*/
+/*
+**shortUnsignNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxthr0, r0
@@ -56,7 +82,15 @@ unsigned short shortUnsignNonsecure0 (ns_short_unsign_foo_t 
* ns_foo_p)
 }
 
 /*
-**shortSignNonsecure0:
+**shortSignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** sxthr0, r0
+** ...
+*/
+/*
+**shortSignNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** sxthr0, r0
@@ -68,7 +102,15 @@ signed short shortSignNonsecure0 (ns_short_sign_foo_t * 
ns_foo_p)
 }
 
 /*
-**enumNonsecure0:
+**enumNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**enumNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxtbr0, r0
@@ -80,7 +122,15 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
(ns_enum_foo_t * ns_foo_p)
 }
 
 /*
-**boolNonsecure0:
+**boolNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**boolNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxtbr0, r0
-- 
2.25.1



[PATH 0/2] arm: Zero/Sign extends for CMSE security on

2024-06-07 Thread Torbjörn SVENSSON


Hi,

Updated the patch to also fix the Cortex-M55 issue reported in PR115253 and
updated the commit message to mention the PR number.

Initial issue reported at https://linaro.atlassian.net/browse/GNU-1205.

Ok for these branches?

- releases/gcc-11
- releases/gcc-12
- releases/gcc-13
- releases/gcc-14
- trunk


Kind regards,
Torbjörn and Yvan



Re: [Commited, Patch, Fortran/90068] Add finalizer creation to array constructor for functions of derived type.

2024-06-07 Thread Andre Vehreschild
Hi Paul,

and thanks for the review. Merged as gcc-15-1090-g51046e46ae6.

> I had been working in exactly the same area to correct the implementation
> of finalization of function results in array constructors. However, I
> couldn't see a light way of having the finalization occur at the correct
> time; "If an executable construct references a nonpointer function, the
> result is finalized after execution of the innermost executable construct
> containing the reference." This caused all manner of difficulty with
> assignment. I'll come back to this.

This sounds to me like another application for a try-finally, but that is just
a first shot w/o any deep thought on the aspects. If you like a rubber ducking
feel free to contact me. Sometimes talking about it helps...

Thanks again and regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH v2 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-07 Thread Torbjörn SVENSSON
Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is a internal compiler error on Cortex-M23 for the
epilog processing of sign extension.

This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.

gcc/ChangeLog:

PR target/115253
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Sign extend for Thumb1.
(thumb1_expand_prologue): Add zero/sign extend.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/config/arm/arm.cc | 68 ++-
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ea0c963a4d6..d1bb173c135 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19220,17 +19220,23 @@ cmse_nonsecure_call_inline_register_clear (void)
  || TREE_CODE (ret_type) == BOOLEAN_TYPE)
  && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
{
- machine_mode ret_mode = TYPE_MODE (ret_type);
+ rtx ret_mode = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
+ rtx si_mode = gen_rtx_REG (SImode, R0_REGNUM);
  rtx extend;
  if (TYPE_UNSIGNED (ret_type))
-   extend = gen_rtx_ZERO_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
+   extend = gen_rtx_SET (si_mode, gen_rtx_ZERO_EXTEND (SImode,
+   ret_mode));
+ else if (TARGET_THUMB1)
+   {
+ if (known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
+   extend = gen_thumb1_extendqisi2 (si_mode, ret_mode);
+ else
+   extend = gen_thumb1_extendhisi2 (si_mode, ret_mode);
+   }
  else
-   extend = gen_rtx_SIGN_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
- emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
-extend), insn);
-
+   extend = gen_rtx_SET (si_mode, gen_rtx_SIGN_EXTEND (SImode,
+   ret_mode));
+ emit_insn_after (extend, insn);
}
 
 
@@ -27250,6 +27256,52 @@ thumb1_expand_prologue (void)
   live_regs_mask = offsets->saved_regs_mask;
   lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
 
+  /* The AAPCS requires the callee to widen integral types narrower
+ than 32 bits to the full width of the register; but when handling
+ calls to non-secure space, we cannot trust the callee to have
+ correctly done so.  So forcibly re-widen the result here.  */
+  if (IS_CMSE_ENTRY (func_type))
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type;
+  tree fndecl = current_function_decl;
+  tree fntype = TREE_TYPE (fndecl);
+  arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (&args_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+   {
+ rtx arg_rtx;
+
+ if (VOID_TYPE_P (arg_type))
+   break;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ if (!first_param)
+   /* We should advance after processing the argument and pass
+  the argument we're advancing past.  */
+   arm_function_arg_advance (args_so_far, arg);
+ first_param = false;
+ arg_rtx = arm_function_arg (args_so_far, arg);
+ gcc_assert (REG_P (arg_rtx));
+ if ((TREE_CODE (arg_type) == INTEGER_TYPE
+ || TREE_CODE (arg_type) == ENUMERAL_TYPE
+ || TREE_CODE (arg_type) == BOOLEAN_TYPE)
+ && known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 4))
+   {
+ rtx res_reg = gen_rtx_REG (SImode, REGNO (arg_rtx));
+ if (TYPE_UNSIGNED (arg_type))
+   emit_set_insn (res_reg, gen_rtx_ZERO_EXTEND (SImode, arg_rtx));
+ else if (known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 2))
+   emit_insn (gen_thumb1_extendqisi2 (res_reg, arg_rtx));
+ else
+   emit_insn (gen_thumb1_extendhisi2 (res_reg, arg_rtx));
+   }
+   }
+}
+
   /* Extract a mask of the ones we can give to the Thumb's push instruction.  
*/
   l_mask = live_regs_mask & 0x40ff;
   /* Then count how many other high registers will need to be pushed.  */
-- 
2.25.1



Re: [PATCH 1/4] Relax COND_EXPR reduction vectorization SLP restriction

2024-06-07 Thread Kugan Vivekanandarajah
Thanks Richard.
Created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383

Thanks,
Kugan

On Fri, Jun 7, 2024 at 5:51 PM Richard Biener  wrote:
>
> On Fri, 7 Jun 2024, Kugan Vivekanandarajah wrote:
>
> > Hi Richard,
> >
> > This seems to have introduced a regression. I am seeing ICE while
> > building TSVC_2 for AARCH64
> > with -O3 -flto -mcpu=neoverse-v2 -msve-vector-bits=128
> >
> > tsvc.c: In function 's331':
> > tsvc.c:2744:8: internal compiler error: Segmentation fault
> >  2744 | real_t s331(struct args_t * func_args)
> >   |^
> > 0xdfc23b crash_signal
> > /var/jenkins/workspace/GCC_Nightly/gcc/toplev.cc:319
> > 0xa3a6f8 phi_nodes_ptr(basic_block_def*)
> > /var/jenkins/workspace/GCC_Nightly/gcc/gimple.h:4701
> > 0xa3a6f8 gsi_start_phis(basic_block_def*)
> > /var/jenkins/workspace/GCC_Nightly/gcc/gimple-iterator.cc:937
> > 0xa3a6f8 gsi_for_stmt(gimple*)
> > /var/jenkins/workspace/GCC_Nightly/gcc/gimple-iterator.cc:621
> > 0x1e5f22f vectorizable_condition
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-stmts.cc:12577
> > 0x1e7a027 vect_transform_stmt(vec_info*, _stmt_vec_info*,
> > gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-stmts.cc:13467
> > 0x1112653 vect_schedule_slp_node
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:9729
> > 0x1127757 vect_schedule_slp_node
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:9522
> > 0x1127757 vect_schedule_scc
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:10017
> > 0x11285ff vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap,
> > vl_ptr> const&)
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-slp.cc:10110
> > 0x10f56b7 vect_transform_loop(_loop_vec_info*, gimple*)
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vect-loop.cc:12114
> > 0x1138c7f vect_transform_loops
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1007
> > 0x1139307 try_vectorize_loop_1
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1153
> > 0x1139307 try_vectorize_loop
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1183
> > 0x113967b execute
> > /var/jenkins/workspace/GCC_Nightly/gcc/tree-vectorizer.cc:1299
> > Please submit a full bug report, with preprocessed source (by using
> > -freport-bug).
> >
> > Please let me know if you need a reduced testcase.
>
> Please open a bugzilla with a reduced testcase.
>
> Thanks,
> Richard.
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] fixincludes: bypass some fixes for recent darwin headers

2024-06-07 Thread FX Coudert
Hi,

macOS SDKs sometimes contain non-standard constructs, and require fixes through 
fixincludes. However, they are typically fixed in later SDK versions, although 
the process can be slow. Fixes have accumulated, which may be needed only for 
some older versions of the SDKs. They should be bypassed on modern macOS, and 
this patch does that for 4 cases. This makes the compiler less fragile when 
switching between SDKs.

Before the patch, 8 headers are fixincluded on x86_64-apple-darwin23:

AvailabilityInternal.h math.h os/base.h  stdint.h
dispatch/object.h  objc/runtime.h os/trace.h stdio.h

After the patch, only 4:

AvailabilityInternal.h math.h objc/runtime.h stdio.h


Bootstrapped and regtested on x86_64-apple-darwin23.
OK to push?

FX




0001-fixincludes-bypass-some-fixes-for-recent-darwin-head.patch
Description: Binary data


Re: [PATCH] fixincludes: bypass some fixes for recent darwin headers

2024-06-07 Thread Iain Sandoe
Hi FX,

> On 7 Jun 2024, at 09:57, FX Coudert  wrote:
> 

> macOS SDKs sometimes contain non-standard constructs, and require fixes 
> through fixincludes. However, they are typically fixed in later SDK versions, 
> although the process can be slow. Fixes have accumulated, which may be needed 
> only for some older versions of the SDKs. They should be bypassed on modern 
> macOS, and this patch does that for 4 cases. This makes the compiler less 
> fragile when switching between SDKs.
> 
> Before the patch, 8 headers are fixincluded on x86_64-apple-darwin23:
> 
> AvailabilityInternal.h math.h os/base.h  stdint.h
> dispatch/object.h  objc/runtime.h os/trace.h stdio.h
> 
> After the patch, only 4:
> 
> AvailabilityInternal.h math.h objc/runtime.h stdio.h
> 
> 
> Bootstrapped and regtested on x86_64-apple-darwin23.
> OK to push?

OK for trunk, thanks for doing this,
maybe we should consider it for branches after some bake time.

Iain

> 
> FX
> 
> 
> <0001-fixincludes-bypass-some-fixes-for-recent-darwin-head.patch>



[x86 PATCH] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-06-07 Thread Roger Sayle

This patch addresses PR target/115351, which is a code quality regression
on x86 when passing floating point complex numbers.  The ABI considers
these arguments to have TImode, requiring interunit moves to place the
FP values (which are actually passed in SSE registers) into the upper
and lower parts of a TImode pseudo, and then similar moves back again
before they can be used.

The cause of the regression is that changes in how TImode initialization
is represented in RTL now prevents the RTL optimizers from eliminating
these redundant moves.  The specific cause is that the *concatditi3
pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
rtx_cost, preventing fwprop1 from propagating it.  This pattern just
sets the hipart and lopart of a double-word register, typically two
instructions (less if reload can allocate things appropriately) but
the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.

propagating insn 5 into insn 6, replacing:
(set (reg:TI 110)
(ior:TI (and:TI (reg:TI 110)
(const_wide_int 0x0))
(ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
(const_int 64 [0x40]
successfully matched this instruction to *concatditi3_3:
(set (reg:TI 110)
(ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ])
0))
(const_int 64 [0x40]))
(zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0
change not profitable (cost 50 -> cost 52)

This issue is resolved by having ix86_rtx_costs return more reasonable
values for these (place-holder) patterns.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-06-07  Roger Sayle  

gcc/ChangeLog
PR target/115351
* config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the
*concatditi3 and *insvti_highpart patterns, about two insns.

gcc/testsuite/ChangeLog
PR target/115351
* g++.target/i386/pr115351.C: New test case.


Thanks in advance (and sorry for any inconvenience),
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 69cd4ae..9d08dc7 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21868,6 +21868,49 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
}
  *total = ix86_vec_cost (mode, cost->sse_op);
}
+  else if (TARGET_64BIT
+  && mode == TImode
+  && GET_CODE (XEXP (x, 0)) == ASHIFT
+  && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND
+  && GET_MODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == DImode
+  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+  && INTVAL (XEXP (XEXP (x, 0), 1)) == 64
+  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+  && GET_MODE (XEXP (XEXP (x, 1), 0)) == DImode)
+   {
+ /* *concatditi3 is cheap.  */
+ rtx op0 = XEXP (XEXP (XEXP (x, 0), 0), 0);
+ rtx op1 = XEXP (XEXP (x, 1), 0);
+ *total = (SUBREG_P (op0) && GET_MODE (SUBREG_REG (op0)) == DFmode)
+  ? COSTS_N_INSNS (1)/* movq.  */
+  : set_src_cost (op0, DImode, speed);
+ *total += (SUBREG_P (op1) && GET_MODE (SUBREG_REG (op1)) == DFmode)
+   ? COSTS_N_INSNS (1)/* movq.  */
+   : set_src_cost (op1, DImode, speed);
+ return true;
+   }
+  else if (TARGET_64BIT
+  && mode == TImode
+  && GET_CODE (XEXP (x, 0)) == AND
+  && REG_P (XEXP (XEXP (x, 0), 0))
+  && CONST_WIDE_INT_P (XEXP (XEXP (x, 0), 1))
+  && CONST_WIDE_INT_NUNITS (XEXP (XEXP (x, 0), 1)) == 2
+  && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 0) == -1
+  && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 1) == 0
+  && GET_CODE (XEXP (x, 1)) == ASHIFT
+  && GET_CODE (XEXP (XEXP (x, 1), 0)) == ZERO_EXTEND
+  && GET_MODE (XEXP (XEXP (XEXP (x, 1), 0), 0)) == DImode
+  && CONST_INT_P (XEXP (XEXP (x, 1), 1))
+  && INTVAL (XEXP (XEXP (x, 1), 1)) == 64)
+   {
+ /* *insvti_highpart is cheap.  */
+ rtx op = XEXP (XEXP (XEXP (x, 1), 0), 0);
+ *total = COSTS_N_INSNS (1) + 1;
+ *total += (SUBREG_P (op) && GET_MODE (SUBREG_REG (op)) == DFmode)
+   ? COSTS_N_INSNS (1)/* movq.  */
+   : set_src_cost (op, DImode, speed);
+ return true;
+   }
   else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
*total = cost->add * 2;
   else
diff --git a/gcc/testsuite/g++.target/i386/pr115351.C 
b/gcc/testsuite/g++.target/i386/pr115351.C
new file mode 100644
index 000..24132f3
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr115351.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -std=c+

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-07 Thread Robin Dapp
> Is there any way we can avoid using pattern_cost here?  Using it means
> that we can make use of targetm.insn_cost for the jump but circumvent
> it for the condition, giving a bit of a mixed metric.
> 
> (I realise there are existing calls to pattern_cost in ifcvt.cc,
> but if possible I think we should try to avoid adding more.)

Yes, I believe there is.  In addition, what I did with
if_info->cond wasn't what I intended to do.

The whole point of the exercise is that noce_convert_multiple_sets
can re-use the CC comparison that is already present (because it
is used in the jump pattern).  Therefore I want to split costs
into a jump part and a CC-setting part so the final costing
decision for multiple sets can be:

 insn_cost (jump) + n * insn_cost (set)
vs
 n * insn_cost ("cmov")

Still, the original costs should be:
 insn_cost (set_cc) + insn_cost (jump)
and with the split we can just remove insn_cost (set_cc) before
the multiple-set cost comparison and re-add it afterwards.

For non-CC targets this is not necessary.

So what I'd hope is better is to use
insn_cost (if_info.earliest_cond)
which is indeed the CC-set/comparison if it exists.

The attached v2 was bootstrapped and regtested on x86, aarch64 and
power10 and regtested on riscv64.

Regards
 Robin

gcc/ChangeLog:

* ifcvt.cc (noce_process_if_block): Subtract condition pattern
cost if applicable.
(noce_find_if_block): Use insn_cost and pattern_cost for
original cost.
---
 gcc/ifcvt.cc | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..ebb838fd82c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3931,16 +3931,16 @@ noce_process_if_block (struct noce_if_info *if_info)
  to calculate a value for x.
  ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  The original costs already include
- a base cost of COSTS_N_INSNS (2): one instruction for the compare
- (which we will be needing either way) and one instruction for the
- branch.  When comparing costs we want to use the branch instruction
- cost and the sets vs. the cmovs generated here.  Therefore subtract
- the costs of the compare before checking.
- ??? Actually, instead of the branch instruction costs we might want
- to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
-
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  /* First look for multiple SETS.
+ The original costs already include costs for the jump insn as well
+ as for a CC comparison if there is any.
+ We want to allow the backend to re-use the existing CC comparison
+ and therefore don't consider it for the cost comparison (as it is
+ then needed for both the jump as well as the cmov sequence).  */
+
+  unsigned potential_cost = if_info->original_cost;
+  if (if_info->cond_earliest && if_info->jump != if_info->cond_earliest)
+potential_cost -= insn_cost (if_info->cond_earliest, if_info->speed_p);
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
   && HAVE_conditional_move
@@ -4703,11 +4703,12 @@ noce_find_if_block (basic_block test_bb, edge 
then_edge, edge else_edge,
 = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
  that they are valid to transform.  We can't easily get back to the insn
- for COND (and it may not exist if we had to canonicalize to get COND),
- and jump_insns are always given a cost of 1 by seq_cost, so treat
- both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+ for COND (and it may not exist if we had to canonicalize to get COND).
+ jump insn that is costed via insn_cost.  It is assumed that the
+ costs of a jump insn are dependent on the branch costs.  */
+  if (if_info.cond_earliest && if_info.jump != if_info.cond_earliest)
+if_info.original_cost = insn_cost (if_info.cond_earliest, if_info.speed_p);
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */
 
-- 
2.45.1



[committed] libstdc++: Optimize std::to_address

2024-06-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

We can use if-constexpr and variable templates to simplify and optimize
std::to_address. This should compile faster (and run faster for -O0)
than dispatching to the pre-C++20 std::__to_address overloads.

libstdc++-v3/ChangeLog:

* include/bits/ptr_traits.h (to_address): Optimize.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error text.
---
 libstdc++-v3/include/bits/ptr_traits.h| 47 +++
 .../testsuite/20_util/to_address/1_neg.cc |  2 +-
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/libstdc++-v3/include/bits/ptr_traits.h 
b/libstdc++-v3/include/bits/ptr_traits.h
index 6c65001cb74..ca67feecca3 100644
--- a/libstdc++-v3/include/bits/ptr_traits.h
+++ b/libstdc++-v3/include/bits/ptr_traits.h
@@ -200,36 +200,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 using __ptr_rebind = typename pointer_traits<_Ptr>::template rebind<_Tp>;
 
+#ifndef __glibcxx_to_address // C++ < 20
   template
+[[__gnu__::__always_inline__]]
 constexpr _Tp*
 __to_address(_Tp* __ptr) noexcept
 {
-  static_assert(!std::is_function<_Tp>::value, "not a function pointer");
+  static_assert(!std::is_function<_Tp>::value, "std::to_address argument "
+   "must not be a function pointer");
   return __ptr;
 }
 
-#ifndef __glibcxx_to_address // C++ < 20
   template
 constexpr typename std::pointer_traits<_Ptr>::element_type*
 __to_address(const _Ptr& __ptr)
 { return std::__to_address(__ptr.operator->()); }
 #else
-  template
-constexpr auto
-__to_address(const _Ptr& __ptr) noexcept
--> decltype(std::pointer_traits<_Ptr>::to_address(__ptr))
-{ return std::pointer_traits<_Ptr>::to_address(__ptr); }
-
-  template
-constexpr auto
-__to_address(const _Ptr& __ptr, _None...) noexcept
-{
-  if constexpr (is_base_of_v<__gnu_debug::_Safe_iterator_base, _Ptr>)
-   return std::__to_address(__ptr.base().operator->());
-  else
-   return std::__to_address(__ptr.operator->());
-}
-
   /**
* @brief Obtain address referenced by a pointer to an object
* @param __ptr A pointer to an object
@@ -237,9 +223,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @ingroup pointer_abstractions
   */
   template
+[[__gnu__::__always_inline__]]
 constexpr _Tp*
 to_address(_Tp* __ptr) noexcept
-{ return std::__to_address(__ptr); }
+{
+  static_assert(!is_function_v<_Tp>, "std::to_address argument "
+   "must not be a function pointer");
+  return __ptr;
+}
 
   /**
* @brief Obtain address referenced by a pointer to an object
@@ -251,7 +242,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 constexpr auto
 to_address(const _Ptr& __ptr) noexcept
-{ return std::__to_address(__ptr); }
+{
+  if constexpr (requires { pointer_traits<_Ptr>::to_address(__ptr); })
+   return pointer_traits<_Ptr>::to_address(__ptr);
+  else if constexpr (is_base_of_v<__gnu_debug::_Safe_iterator_base, _Ptr>)
+   return std::to_address(__ptr.base().operator->());
+  else
+   return std::to_address(__ptr.operator->());
+}
+
+  /// @cond undocumented
+  /// Compatibility for use in code that is also compiled as pre-C++20.
+  template
+[[__gnu__::__always_inline__]]
+constexpr auto
+__to_address(const _Ptr& __ptr) noexcept
+{ return std::to_address(__ptr); }
+  /// @endcond
 #endif // __glibcxx_to_address
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc 
b/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc
index 7385f0f335c..10e919757bb 100644
--- a/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc
@@ -16,7 +16,7 @@
 // .
 
 // { dg-do compile { target c++20 } }
-// { dg-error "not a function pointer" "" { target *-*-* } 0 }
+// { dg-error "must not be a function pointer" "" { target *-*-* } 0 }
 
 #include 
 
-- 
2.45.1



Re: [PATCH] libstdc++: Optimize std::gcd

2024-06-07 Thread Jonathan Wakely
On Fri, 7 Jun 2024 at 09:57, Sam James  wrote:
>
> Stephen Face  writes:
>
> > This patch is to optimize the runtime execution of gcd. Mathematically,
> > it computes with the same algorithm as before, but subtractions and
> > branches are rearranged to encourage generation of code that can use
> > flags from the subtractions for conditional moves. Additionally, most
> > pairs of integers are coprime, so this patch also includes a check for
> > one of the integers to be equal to 1, and then it will exit the loop
> > early in this case.
>
> Is it worth filing a bug for the missed optimisation? You shouldn't have
> to write things in a specific order. Thanks.

Yes, I think a bug report would be good. But 20%-60% decreases in run
time seems significant enough that we should take the libstdc++ patch
now, rather than wait for a possible compiler fix to come later.

Stephen, could you please confirm whether you have a copyright
assignment in place for GCC, and if not whether you would be will to
complete one, or alternatively contribute this under the DCO terms:
https://gcc.gnu.org/dco.html
Thanks!


>
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/std/numeric(__gcd): Optimize.
> > ---
> > I have tested this on x86_64-linux and aarch64-linux. I have tested the
> > timing with random distributions of small inputs and large inputs on a
> > couple of machines with -O2 and found decreases in execution time from
> > 20% to 60% depending on the machine and distribution of inputs.
> >
> >  libstdc++-v3/include/std/numeric | 21 +++--
> >  1 file changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/numeric 
> > b/libstdc++-v3/include/std/numeric
> > index c912db4a519..3c9e8387a0e 100644
> > --- a/libstdc++-v3/include/std/numeric
> > +++ b/libstdc++-v3/include/std/numeric
> > @@ -148,19 +148,20 @@ namespace __detail
> >
> >while (true)
> >   {
> > -   if (__m > __n)
> > - {
> > -   _Tp __tmp = __m;
> > -   __m = __n;
> > -   __n = __tmp;
> > - }
> > +   _Tp __m_minus_n = __m - __n;
> > +   if (__m_minus_n == 0)
> > + return __m << __k;
> >
> > -   __n -= __m;
> > +   _Tp __next_m = __m < __n ? __m : __n;
> >
> > -   if (__n == 0)
> > - return __m << __k;
> > +   if (__next_m == 1)
> > + return __next_m << __k;
> > +
> > +   _Tp __n_minus_m = __n - __m;
> > +   __n = __n < __m ? __m_minus_n : __n_minus_m;
> > +   __m = __next_m;
> >
> > -   __n >>= std::__countr_zero(__n);
> > +   __n >>= std::__countr_zero(__m_minus_n);
> >   }
> >  }
> >  } // namespace __detail
>



Re: [RE] [v2] RISC-V: Add Zfbfmin extension

2024-06-07 Thread Fei Gao




Hi Jin

We have completed zvfbfmin and zvfbfwma in GCC. 
Wang Feng will post after dragon boat festival. 

BR, 
Fei
From: Jin Ma
Date: 2024-06-07 15:35
To: gcc-patches; zengxiao
CC: jeffreyalaw; kito.cheng; juzhe.zhong; jinma.contrib; Jin Ma
Subject: [RE] [v2] RISC-V: Add Zfbfmin extension
Hi,
Is there a plan to implement zvfbfmin and zvfbfwma? Or how can I get the 
relevant patches
in advance for testing? By the way, The LLVM seems to be fully implemented now 
:-)
 
Ref:
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/293
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/auto-generated/bfloat16/intrinsic_funcs.adoc
 
 
 
Thanks,
Jin


[PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-06-07 Thread FX Coudert
The fixincludes fix “math_exception” is being applied overly broadly, including 
many targets which don’t need it, like darwin (and probably all non-glibc 
targets). I’m not sure if it is still needed on any target, but because I can’t 
be absolutely positive about that, I don’t want to remove it. But it dates from 
before 1998.

In subsequent times (2000) it was bypassed on glibc headers, as well as Solaris 
10. It was still needed on Solaris 8 and 9, which are (AFAICT) unsupported 
nowadays. The fix was originally bypassed on __cplusplus, which is the correct 
thing to do, but that bypass was neutralized to cater to a bug on Solaris 8 and 
9 headers. Now that those are gone… let’s revert to the previous bypass.


Bootstrapped and regtested on x86_64-apple-darwin23, where it no longer “fixes” 
the header unnecessarily.
OK to push?

FX





0001-fixincludes-bypass-the-math_exception-fix-on-__cplus.patch
Description: Binary data


[PATCH v2 0/6] Add DLL import/export implementation to AArch64

2024-06-07 Thread Evgeny Karpov
Hello,

Thank you for reviewing v1!
v2 addresses all comments on v1.

Changes in v2:
- Move winnt.h and winnt-dll.h to config.gcc.
- Resolve the issue with GCC GC in winnt-dll.cc.
- Add definitions for GOT_ALIAS_SET, PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED, 
and HAVE_64BIT_POINTERS to cygming.h.
- Replace intermediate functions for PECOFF with ifdef checks in ix86.
- Update the copyright date in winnt-dll.cc.
- Correct the style.
- Rebase from 7th June 2024

Regards,
Evgeny

Evgeny Karpov (6):
  Move mingw_* declarations to the mingw folder
  Extract ix86 dllimport implementation to mingw
  Rename functions for reuse in AArch64
  aarch64: Add selectany attribute handling
  Adjust DLL import/export implementation for AArch64
  aarch64: Add DLL import/export to AArch64 target

 gcc/config.gcc  |  20 ++-
 gcc/config/aarch64/aarch64-protos.h |   5 -
 gcc/config/aarch64/aarch64.cc   |  42 -
 gcc/config/aarch64/cygming.h|  33 +++-
 gcc/config/i386/cygming.h   |  16 +-
 gcc/config/i386/i386-expand.cc  |   4 +-
 gcc/config/i386/i386-expand.h   |   2 -
 gcc/config/i386/i386-protos.h   |  10 --
 gcc/config/i386/i386.cc | 205 ++--
 gcc/config/i386/i386.h  |   2 +
 gcc/config/mingw/mingw32.h  |   2 +-
 gcc/config/mingw/t-cygming  |   6 +
 gcc/config/mingw/winnt-dll.cc   | 231 
 gcc/config/mingw/winnt-dll.h|  30 
 gcc/config/mingw/winnt.cc   |  10 +-
 gcc/config/mingw/winnt.h|  38 +
 16 files changed, 423 insertions(+), 233 deletions(-)
 create mode 100644 gcc/config/mingw/winnt-dll.cc
 create mode 100644 gcc/config/mingw/winnt-dll.h
 create mode 100644 gcc/config/mingw/winnt.h

-- 
2.25.1



[PATCH v2 1/6] Move mingw_* declarations to the mingw folder

2024-06-07 Thread Evgeny Karpov
This patch refactors recent changes to move mingw-related
functionality to the mingw folder. More renamings to the mingw_
prefix will be done in follow-up commits.

This is the first commit in the second patch series to add DLL
import/export implementation to AArch64.

Coauthors: Zac Walker ,
Mark Harmstone   and
Ron Riddle 

Refactored, prepared, and validated by
Radek Barton  and
Evgeny Karpov 

gcc/ChangeLog:

* config.gcc: Move mingw_* declations to mingw.
* config/aarch64/aarch64-protos.h
(mingw_pe_maybe_record_exported_symbol): Likewise.
(mingw_pe_section_type_flags): Likewise.
(mingw_pe_unique_section): Likewise.
(mingw_pe_encode_section_info): Likewise.
* config/aarch64/cygming.h
(mingw_pe_asm_named_section): Likewise.
(mingw_pe_declare_function_type): Likewise.
* config/i386/i386-protos.h
(mingw_pe_unique_section): Likewise.
(mingw_pe_declare_function_type): Likewise.
(mingw_pe_maybe_record_exported_symbol): Likewise.
(mingw_pe_encode_section_info): Likewise.
(mingw_pe_section_type_flags): Likewise.
(mingw_pe_asm_named_section): Likewise.
* config/mingw/winnt.h: New file.
---
 gcc/config.gcc  |  4 
 gcc/config/aarch64/aarch64-protos.h |  5 -
 gcc/config/aarch64/cygming.h|  4 
 gcc/config/i386/i386-protos.h   |  6 --
 gcc/config/mingw/winnt.h| 33 +
 5 files changed, 37 insertions(+), 15 deletions(-)
 create mode 100644 gcc/config/mingw/winnt.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e500ba63e32..553a310f4bd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1275,6 +1275,7 @@ aarch64-*-mingw*)
tm_file="${tm_file} aarch64/cygming.h"
tm_file="${tm_file} mingw/mingw32.h"
tm_file="${tm_file} mingw/mingw-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
tmake_file="${tmake_file} aarch64/t-aarch64"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
@@ -2175,6 +2176,7 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
;;
 i[34567]86-*-cygwin*)
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
@@ -2193,6 +2195,7 @@ i[34567]86-*-cygwin*)
 x86_64-*-cygwin*)
need_64bit_isa=yes
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
@@ -2262,6 +2265,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
;;
esac
tm_file="${tm_file} mingw/mingw-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
tmake_file="${tmake_file} t-winnt mingw/t-cygming t-slibgcc"
 case ${target} in
x86_64-w64-*)
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 1d3f94c813e..42639e9efcf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1110,11 +1110,6 @@ extern void aarch64_output_patchable_area (unsigned int, 
bool);
 
 extern void aarch64_adjust_reg_alloc_order ();
 
-extern void mingw_pe_maybe_record_exported_symbol (tree, const char *, int);
-extern unsigned int mingw_pe_section_type_flags (tree, const char *, int);
-extern void mingw_pe_unique_section (tree, int);
-extern void mingw_pe_encode_section_info (tree, rtx, int);
-
 bool aarch64_optimize_mode_switching (aarch64_mode_entity);
 void aarch64_restore_za (rtx);
 
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 2e7b01feb76..0d048879311 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -51,10 +51,6 @@ still needed for compilation.  */
 #include 
 #endif
 
-extern void mingw_pe_asm_named_section (const char *, unsigned int, tree);
-extern void mingw_pe_declare_function_type (FILE *file, const char *name,
-   int pub);
-
 #define TARGET_ASM_NAMED_SECTION  mingw_pe_asm_named_section
 
 /* Select attributes for named sections.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index aa50b897b2b..65ef3d77c3a 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -301,16 +301,10 @@ extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
 
 /* In winnt.cc  */
-extern void mingw_pe_unique_section (tree, int);
-extern void mingw_pe_declare_function_type (FILE *, const ch

[PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-07 Thread Evgeny Karpov
This patch extracts the ix86 implementation for expanding a SYMBOL
into its corresponding dllimport, far-address, or refptr symbol.
It will be reused in the aarch64-w64-mingw32 target.
The implementation is copied as is from i386/i386.cc with
minor changes to follow to the code style.

Also this patch replaces the original DLL import/export
implementation in ix86 with mingw.

gcc/ChangeLog:

* config.gcc: Add winnt-dll.o, which contains the DLL
import/export implementation.
* config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Remove the
old implementation. Rename the required function to MinGW.
Use MinGW implementation for COFF and nothing otherwise.
(GOT_ALIAS_SET): Likewise.
* config/i386/i386-expand.cc (ix86_expand_move): Likewise.
* config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise.
(legitimize_pe_coff_symbol): Likewise.
* config/i386/i386-protos.h (i386_pe_record_stub): Likewise.
* config/i386/i386.cc (is_imported_p): Likewise.
(legitimate_pic_address_disp_p): Likewise.
(ix86_GOT_alias_set): Likewise.
(legitimize_pic_address): Likewise.
(legitimize_tls_address): Likewise.
(struct dllimport_hasher): Likewise.
(GTY): Likewise.
(get_dllimport_decl): Likewise.
(legitimize_pe_coff_extern_decl): Likewise.
(legitimize_dllimport_symbol): Likewise.
(legitimize_pe_coff_symbol): Likewise.
(ix86_legitimize_address): Likewise.
* config/i386/i386.h (GOT_ALIAS_SET): Likewise.
* config/mingw/winnt.cc (i386_pe_record_stub): Likewise.
(mingw_pe_record_stub): Likewise.
* config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
* config/mingw/t-cygming: Add the winnt-dll.o compilation.
* config/mingw/winnt-dll.cc: New file.
* config/mingw/winnt-dll.h: New file.
---
 gcc/config.gcc |  12 +-
 gcc/config/i386/cygming.h  |   5 +-
 gcc/config/i386/i386-expand.cc |   4 +-
 gcc/config/i386/i386-expand.h  |   2 -
 gcc/config/i386/i386-protos.h  |   1 -
 gcc/config/i386/i386.cc| 205 ++---
 gcc/config/i386/i386.h |   2 +
 gcc/config/mingw/t-cygming |   6 +
 gcc/config/mingw/winnt-dll.cc  | 231 +
 gcc/config/mingw/winnt-dll.h   |  30 +
 gcc/config/mingw/winnt.cc  |   2 +-
 gcc/config/mingw/winnt.h   |   1 +
 12 files changed, 298 insertions(+), 203 deletions(-)
 create mode 100644 gcc/config/mingw/winnt-dll.cc
 create mode 100644 gcc/config/mingw/winnt-dll.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 553a310f4bd..d053b98efa8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2177,11 +2177,13 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
 i[34567]86-*-cygwin*)
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
-   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
+   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
d_target_objs="${d_target_objs} cygwin-d.o"
@@ -2196,11 +2198,13 @@ x86_64-*-cygwin*)
need_64bit_isa=yes
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
-   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
+   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
d_target_objs="${d_target_objs} cygwin-d.o"
@@ -2266,6 +2270,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
esac
tm_file="${tm_file} mingw/mingw-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
tmake_file="${tmake_file} t-winnt mingw/t-cygming t-slibgcc"
 case ${target} in
x86_64-w64-*)
@@ -2277,6 +2282,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
esac
 na

[PATCH v2 3/6] Rename functions for reuse in AArch64

2024-06-07 Thread Evgeny Karpov
This patch renames functions related to dllimport/dllexport
and selectany functionality. These functions will be reused
in the aarch64-w64-mingw32 target.

gcc/ChangeLog:

* config/i386/cygming.h (mingw_pe_record_stub):
Rename functions in mingw folder which will be reused for
aarch64.
(TARGET_ASM_FILE_END): Update to new target-independent name.
(SUBTARGET_ATTRIBUTE_TABLE): Likewise.
(TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
(SUB_TARGET_RECORD_STUB): Likewise.
* config/i386/i386-protos.h (ix86_handle_selectany_attribute): Likewise.
(mingw_handle_selectany_attribute): Likewise.
(i386_pe_valid_dllimport_attribute_p): Likewise.
(mingw_pe_valid_dllimport_attribute_p): Likewise.
(i386_pe_file_end): Likewise.
(mingw_pe_file_end): Likewise.
(i386_pe_record_stub): Likewise.
(mingw_pe_record_stub): Likewise.
* config/mingw/winnt.cc (ix86_handle_selectany_attribute): Likewise.
(mingw_handle_selectany_attribute): Likewise.
(i386_pe_valid_dllimport_attribute_p): Likewise.
(mingw_pe_valid_dllimport_attribute_p): Likewise.
(i386_pe_record_stub): Likewise.
(mingw_pe_record_stub): Likewise.
(i386_pe_file_end): Likewise.
(mingw_pe_file_end): Likewise.
* config/mingw/winnt.h (mingw_handle_selectany_attribute):
Declate functionality that will be reused by multiple targets.
(mingw_pe_file_end): Likewise.
(mingw_pe_record_stub): Likewise.
(mingw_pe_valid_dllimport_attribute_p): Likewise.
---
 gcc/config/i386/cygming.h | 6 +++---
 gcc/config/i386/i386-protos.h | 3 ---
 gcc/config/mingw/winnt.cc | 8 
 gcc/config/mingw/winnt.h  | 6 +-
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 56945f00c11..4bb8d7f920c 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -344,7 +344,7 @@ do {\
 
 /* Output function declarations at the end of the file.  */
 #undef TARGET_ASM_FILE_END
-#define TARGET_ASM_FILE_END i386_pe_file_end
+#define TARGET_ASM_FILE_END mingw_pe_file_end
 
 /* Kludge because of missing PE-COFF support for early LTO debug.  */
 #undef  TARGET_ASM_LTO_START
@@ -445,7 +445,7 @@ do {\
 
 #define SUBTARGET_ATTRIBUTE_TABLE \
   { "selectany", 0, 0, true, false, false, false, \
-ix86_handle_selectany_attribute, NULL }
+mingw_handle_selectany_attribute, NULL }
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
 
@@ -453,7 +453,7 @@ do {\
 #undef NO_PROFILE_COUNTERS
 #define NO_PROFILE_COUNTERS 1
 
-#define TARGET_VALID_DLLIMPORT_ATTRIBUTE_P i386_pe_valid_dllimport_attribute_p
+#define TARGET_VALID_DLLIMPORT_ATTRIBUTE_P mingw_pe_valid_dllimport_attribute_p
 #define TARGET_CXX_ADJUST_CLASS_AT_DEFINITION 
i386_pe_adjust_class_at_definition
 #define SUBTARGET_MANGLE_DECL_ASSEMBLER_NAME i386_pe_mangle_decl_assembler_name
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index b2e9b9e0d3d..08c69abb45f 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -264,7 +264,6 @@ extern unsigned int ix86_local_alignment (tree, 
machine_mode,
 extern unsigned int ix86_minimum_alignment (tree, machine_mode,
unsigned int);
 extern tree ix86_handle_shared_attribute (tree *, tree, tree, int, bool *);
-extern tree ix86_handle_selectany_attribute (tree *, tree, tree, int, bool *);
 extern int x86_field_alignment (tree, int);
 extern tree ix86_valid_target_attribute_tree (tree, tree,
  struct gcc_options *,
@@ -304,12 +303,10 @@ extern void ix86_register_pragmas (void);
 extern void i386_pe_record_external_function (tree, const char *);
 extern bool i386_pe_binds_local_p (const_tree);
 extern const char *i386_pe_strip_name_encoding_full (const char *);
-extern bool i386_pe_valid_dllimport_attribute_p (const_tree);
 extern void i386_pe_asm_output_aligned_decl_common (FILE *, tree,
const char *,
HOST_WIDE_INT,
HOST_WIDE_INT);
-extern void i386_pe_file_end (void);
 extern void i386_pe_asm_lto_start (void);
 extern void i386_pe_asm_lto_end (void);
 extern void i386_pe_start_function (FILE *, const char *, tree);
diff --git a/gcc/config/mingw/winnt.cc b/gcc/config/mingw/winnt.cc
index 9901576ade0..803e5f5ec85 100644
--- a/gcc/config/mingw/winnt.cc
+++ b/gcc/config/mingw/winnt.cc
@@ -71,8 +71,8 @@ ix86_handle_shared_attribute (tree *node, tree name, tree, 
int,
 /* Handle a "selectany" attribute;
arg

[PATCH v2 4/6] aarch64: Add selectany attribute handling

2024-06-07 Thread Evgeny Karpov
This patch extends the aarch64 attributes list with the selectany
attribute for the aarch64-w64-mingw32 target and reuses the mingw
implementation to handle it.

* config/aarch64/aarch64.cc:
Extend the aarch64 attributes list.
* config/aarch64/cygming.h (SUBTARGET_ATTRIBUTE_TABLE):
Define the selectany attribute.
---
 gcc/config/aarch64/aarch64.cc | 5 -
 gcc/config/aarch64/cygming.h  | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 13191ec8e34..3418e57218f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -859,7 +859,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
  NULL },
   { "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
   { "SVE type",  3, 3, false, true,  false, true,  NULL, NULL 
},
-  { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL }
+  { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
+#ifdef SUBTARGET_ATTRIBUTE_TABLE
+  SUBTARGET_ATTRIBUTE_TABLE
+#endif
 };
 
 static const scoped_attribute_specs aarch64_gnu_attribute_table =
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 0d048879311..76623153080 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -154,6 +154,9 @@ still needed for compilation.  */
 flag_stack_check = STATIC_BUILTIN_STACK_CHECK; \
   } while (0)
 
+#define SUBTARGET_ATTRIBUTE_TABLE \
+  { "selectany", 0, 0, true, false, false, false, \
+mingw_handle_selectany_attribute, NULL }
 
 #define SUPPORTS_ONE_ONLY 1
 
-- 
2.25.1



[PATCH v2 5/6] Adjust DLL import/export implementation for AArch64

2024-06-07 Thread Evgeny Karpov
The DLL import/export mingw implementation, originally from ix86, requires
minor adjustments to be compatible with AArch64.

gcc/ChangeLog:

* config/i386/cygming.h (PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED):
Declare whether an external declaration should be legitimized.
(HAVE_64BIT_POINTERS): Define whether the target supports 64-bit
pointers.
* config/mingw/mingw32.h (defined): Use the correct DllMainCRTStartup
entry function.
* config/mingw/winnt-dll.cc (defined): Exclude ix86-related code.
---
 gcc/config/i386/cygming.h | 5 +
 gcc/config/mingw/mingw32.h| 2 +-
 gcc/config/mingw/winnt-dll.cc | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 4bb8d7f920c..0493b3be875 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -472,3 +472,8 @@ do {\
 
 #undef GOT_ALIAS_SET
 #define GOT_ALIAS_SET mingw_GOT_alias_set ()
+
+#define PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED \
+  ix86_cmodel == CM_LARGE_PIC || ix86_cmodel == CM_MEDIUM_PIC
+
+#define HAVE_64BIT_POINTERS TARGET_64BIT_DEFAULT
diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
index fa6e307476c..0c9d5424942 100644
--- a/gcc/config/mingw/mingw32.h
+++ b/gcc/config/mingw/mingw32.h
@@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 
 #undef SUB_LINK_ENTRY
-#if TARGET_64BIT_DEFAULT
+#if HAVE_64BIT_POINTERS
 #define SUB_LINK_ENTRY SUB_LINK_ENTRY64
 #else
 #define SUB_LINK_ENTRY SUB_LINK_ENTRY32
diff --git a/gcc/config/mingw/winnt-dll.cc b/gcc/config/mingw/winnt-dll.cc
index 1354402a959..66c445cba77 100644
--- a/gcc/config/mingw/winnt-dll.cc
+++ b/gcc/config/mingw/winnt-dll.cc
@@ -206,7 +206,7 @@ legitimize_pe_coff_symbol (rtx addr, bool inreg)
}
 }
 
-  if (ix86_cmodel != CM_LARGE_PIC && ix86_cmodel != CM_MEDIUM_PIC)
+  if (!PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED)
 return NULL_RTX;
 
   if (GET_CODE (addr) == SYMBOL_REF
-- 
2.25.1



[PATCH v2 6/6] aarch64: Add DLL import/export to AArch64 target

2024-06-07 Thread Evgeny Karpov
This patch reuses the MinGW implementation to enable DLL import/export
functionality for the aarch64-w64-mingw32 target. It also modifies
environment configurations for MinGW.

gcc/ChangeLog:

* config.gcc: Add winnt-dll.o, which contains the DLL
import/export implementation.
* config/aarch64/aarch64.cc (aarch64_legitimize_pe_coff_symbol):
Add a conditional function that reuses the MinGW implementation
for COFF and does nothing otherwise.
(aarch64_load_symref_appropriately): Add dllimport
implementation.
(aarch64_expand_call): Likewise.
(aarch64_legitimize_address): Likewise.
* config/aarch64/cygming.h (SYMBOL_FLAG_DLLIMPORT): Modify MinGW
environment to support DLL import/export.
(SYMBOL_FLAG_DLLEXPORT): Likewise.
(SYMBOL_REF_DLLIMPORT_P): Likewise.
(SYMBOL_FLAG_STUBVAR): Likewise.
(SYMBOL_REF_STUBVAR_P): Likewise.
(TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
(TARGET_ASM_FILE_END): Likewise.
(SUB_TARGET_RECORD_STUB): Likewise.
(GOT_ALIAS_SET): Likewise.
(PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Likewise.
(HAVE_64BIT_POINTERS): Likewise.
---
 gcc/config.gcc|  4 +++-
 gcc/config/aarch64/aarch64.cc | 37 +++
 gcc/config/aarch64/cygming.h  | 26 ++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d053b98efa8..331285b7b6d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1276,10 +1276,12 @@ aarch64-*-mingw*)
tm_file="${tm_file} mingw/mingw32.h"
tm_file="${tm_file} mingw/mingw-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
tmake_file="${tmake_file} aarch64/t-aarch64"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
-   extra_objs="${extra_objs} winnt.o"
+   extra_objs="${extra_objs} winnt.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
d_target_objs="${d_target_objs} winnt-d.o"
tmake_file="${tmake_file} mingw/t-cygming"
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3418e57218f..5706b9aeb6b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -860,6 +860,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
   { "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
   { "SVE type",  3, 3, false, true,  false, true,  NULL, NULL 
},
   { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
+#if TARGET_DLLIMPORT_DECL_ATTRIBUTES
+  { "dllimport", 0, 0, false, false, false, false, handle_dll_attribute, NULL 
},
+  { "dllexport", 0, 0, false, false, false, false, handle_dll_attribute, NULL 
},
+#endif
 #ifdef SUBTARGET_ATTRIBUTE_TABLE
   SUBTARGET_ATTRIBUTE_TABLE
 #endif
@@ -2819,6 +2823,15 @@ tls_symbolic_operand_type (rtx addr)
   return tls_kind;
 }
 
+rtx aarch64_legitimize_pe_coff_symbol (rtx addr, bool inreg)
+{
+#if TARGET_PECOFF
+  return legitimize_pe_coff_symbol (addr, inreg);
+#else
+  return NULL_RTX;
+#endif
+}
+
 /* We'll allow lo_sum's in addresses in our legitimate addresses
so that combine would take care of combining addresses where
necessary, but for generation purposes, we'll generate the address
@@ -2865,6 +2878,17 @@ static void
 aarch64_load_symref_appropriately (rtx dest, rtx imm,
   enum aarch64_symbol_type type)
 {
+  /* If legitimize returns a value
+ copy it directly to the destination and return.  */
+
+  rtx tmp = aarch64_legitimize_pe_coff_symbol (imm, true);
+
+  if (tmp)
+{
+   emit_insn (gen_rtx_SET (dest, tmp));
+   return;
+}
+
   switch (type)
 {
 case SYMBOL_SMALL_ABSOLUTE:
@@ -11233,6 +11257,12 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, 
bool sibcall)
 
   gcc_assert (MEM_P (mem));
   callee = XEXP (mem, 0);
+
+  tmp = aarch64_legitimize_pe_coff_symbol (callee, false);
+
+  if (tmp)
+callee = tmp;
+
   mode = GET_MODE (callee);
   gcc_assert (mode == Pmode);
 
@@ -12709,6 +12739,13 @@ aarch64_anchor_offset (HOST_WIDE_INT offset, 
HOST_WIDE_INT size,
 static rtx
 aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
 {
+  if (TARGET_DLLIMPORT_DECL_ATTRIBUTES)
+{
+  rtx tmp = aarch64_legitimize_pe_coff_symbol (x, true);
+  if (tmp)
+   return tmp;
+}
+
   /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
  where mask is selected by alignment and size of the offset.
  We try to pick as large a range for the offset as possible to
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 76623153080..e26488735db 

Re: [PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-07 Thread Robin Dapp
Hi Pan,

> +  /* Step-2: lt = x < y  */
> +  riscv_emit_binary (LTU, pmode_lt, pmode_x, pmode_y);
> +
> +  /* Step-3: lt = -lt  */
> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> +
> +  /* Step-4: lt = ~lt  */
> +  riscv_emit_unary (NOT, pmode_lt, pmode_lt);

Can we replace step 3 and 4 with sub lt, -1 directly when
it's supposed to be optimized like that anyway?
I was a bit irritated when reading the code because I
figured we could surely save one instruction there but then
realized that the cover letter has the shorter sequence.

The rest LGTM.

When you say other variants are still to be implemented
does that also include variants for zbb with min/max
or zicond?

Regards
 Robin


[PATCH] tree-optimization/115383 - EXTRACT_LAST_REDUCTION with multiple stmt copies

2024-06-07 Thread Richard Biener
The EXTRACT_LAST_REDUCTION code isn't ready to deal with multiple stmt
copies but SLP no longer checks for this.  The following adjusts
code generation to handle the situation.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Testing on aarch64 appreciated, note the testcase in the testsuite
likely doesn't exercise .FOLD_EXTRACT_LAST with default opts,
the PR asks for a SVE capable CPU but fixed-length vectorization.

PR tree-optimization/115383
* tree-vect-stmts.cc (vectorizable_condition): Handle
generating a chain of .FOLD_EXTRACT_LAST.

* gcc.dg/vect/pr115383.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr115383.c | 20 
 gcc/tree-vect-stmts.cc   | 20 +++-
 2 files changed, 35 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115383.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115383.c 
b/gcc/testsuite/gcc.dg/vect/pr115383.c
new file mode 100644
index 000..92c24699146
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115383.c
@@ -0,0 +1,20 @@
+#include "tree-vect.h"
+
+int __attribute__((noipa))
+s331 (int i, int n)
+{
+  int j = 0;
+  for (; i < n; i++)
+if ((float)i < 0.)
+  j = i;
+  return j;
+}
+
+int main()
+{
+  check_vect ();
+  int j = s331(-13, 17);
+  if (j != -1)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 69ba906a33f..0a76e81a0ea 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12413,6 +12413,9 @@ vectorizable_condition (vec_info *vinfo,
   reduction_type != EXTRACT_LAST_REDUCTION
   ? else_clause : NULL, vectype, &vec_oprnds3);
 
+  if (reduction_type == EXTRACT_LAST_REDUCTION)
+vec_else_clause = else_clause;
+
   /* Arguments are ready.  Create the new vector stmt.  */
   FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_cond_lhs)
 {
@@ -12555,17 +12558,24 @@ vectorizable_condition (vec_info *vinfo,
{
  gimple *old_stmt = vect_orig_stmt (stmt_info)->stmt;
  tree lhs = gimple_get_lhs (old_stmt);
+ if ((unsigned)i != vec_oprnds0.length () - 1)
+   lhs = copy_ssa_name (lhs);
  if (len)
new_stmt = gimple_build_call_internal
-   (IFN_LEN_FOLD_EXTRACT_LAST, 5, else_clause, vec_compare,
-vec_then_clause, len, bias);
+   (IFN_LEN_FOLD_EXTRACT_LAST, 5, vec_else_clause, vec_compare,
+vec_then_clause, len, bias);
  else
new_stmt = gimple_build_call_internal
-   (IFN_FOLD_EXTRACT_LAST, 3, else_clause, vec_compare,
-vec_then_clause);
+   (IFN_FOLD_EXTRACT_LAST, 3, vec_else_clause, vec_compare,
+vec_then_clause);
  gimple_call_set_lhs (new_stmt, lhs);
  SSA_NAME_DEF_STMT (lhs) = new_stmt;
- if (old_stmt == gsi_stmt (*gsi))
+ if ((unsigned)i != vec_oprnds0.length () - 1)
+   {
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ vec_else_clause = lhs;
+   }
+ else if (old_stmt == gsi_stmt (*gsi))
vect_finish_replace_stmt (vinfo, stmt_info, new_stmt);
  else
{
-- 
2.35.3


Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-07 Thread Ajit Agarwal
Hello Richard:

On 07/06/24 1:52 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
 +
 +  df_ref use;
 +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
 +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
 +{
 +  struct df_link *def_link = DF_REF_CHAIN (use);
 +
 +  if (!def_link || !def_link->ref
 +|| DF_REF_IS_ARTIFICIAL (def_link->ref))
 +  continue;
 +
 +  while (def_link && def_link->ref)
 +  {
 +rtx_insn *insn = DF_REF_INSN (def_link->ref);
 +if (GET_CODE (PATTERN (insn)) == PARALLEL)
 +  return false;
>>>
>>> Why do you need to skip PARALLELs?
>>>
>>
>> vec_select with parallel give failures final.cc "can't split-up with 
>> subreg 128 (reg OO"
>> Thats why I have added this.
>
> But in (vec_select ... (parallel ...)), the parallel won't be the 
> PATTERN (insn).  It'll instead be a suboperand of the vec_select.
>
> Here too it's important to understand why the final.cc failure occurs
> and what the correct fix is.
>

 subreg with vec_select operand already exists before fusion pass.
 We overwrite them with subreg 128 bits from 256 OO mode operand.
>>>
>>> But why is that wrong?  What was the full rtl of the subreg before the
>>> pass runs, what did the subreg look like after the pass, and why is the
>>> change not correct?
>>>
>>> In general, there are two main ways that an rtl change can be incorrect:
>>>
>>> (1) The new rtl isn't well-formed (such as (subreg (subreg X A) B)).
>>> In this case, the new rtl makes no inherent sense when viewed
>>> in isolation: it isn't necessary to see the old rtl to tell that
>>> the new rtl is wrong.
>>>
>>> (2) The new rtl is well-formed (i.e. makes inherent sense when viewed in
>>> isolation) but it does not have the same semantics as the old rtl.
>>> In other words, the new rtl is describing a different operation
>>> from the old rtl.
>>>
>>> I think we need to talk about it in those terms, rather than where
>>> the eventual ICE occurs.
>>>
>> Before the fusion.
>> old rtl looks like this:
>>
>> (vec_select:HI (subreg:V8HI (reg:V16QI 125 [ vect__29.38 ]) 0)
>>
>> After the fusion
>> new rtl looks like this:
>>
>> (vec_select:HI (subreg:V16QI (reg:OO 125 [ vect__29.38 ]) 16)
>>
>> new rtl is not well formed.
>>
>> Thats why its failing.
>>
>> reg:v16QI 125 is the destination of the load that needs to be fused.
> 
> This indicates that there's a bug in the substitution code.
> 
> It's probably better to create a fresh OO register, rather than
> change an existing 128-bit register to 256 bits.  If we do that,
> and if reg:V16QI 125 is the destination of the second load
> (which I assume it is from the 16 offset in the subreg),
> then the new RTL should be:
> 
>   (vec_select:HI (subreg:V8HI (reg:OO NEW_REG) 16) ...)
> 
> It's possible to get this by using insn_propagation to replace
> (reg:V16QI 125) with (subreg:V16QI (reg:OO NEW_REG) 16).
> insn_propagation should then take care of the rest.
> 
> There are no existing rtl-ssa routines for handling new registers
> though.  (The idea was to add things as the need arose.)
> 

Sure I will do that. Thanks.

 Due to this in final.cc we couldnt splt at line 2807 and bails
 out fatal_insn.

 Currently we dont support already existing subreg vector operand
 to generate register pairs.
 We should bail out from fusion pass in this case.
 +
 +rtx set = single_set (insn);
 +if (set == NULL_RTX)
 +  return false;
 +
 +rtx op0 = SET_SRC (set);
 +rtx_code code = GET_CODE (op0);
 +
 +// This check is added as register pairs are not generated
 +// by RA for neg:V2DF (fma: V2DF (reg1)
 +//  (reg2)
 +//  (neg:V2DF (reg3)))
 +if (GET_RTX_CLASS (code) == RTX_UNARY)
 +  return false;
>>>
>>> What's special about (neg (fma ...))?
>>>
>>
>> I am not sure why register allocator fails allocating register pairs with
>> NEG Unary operation with fma operand. I have not debugged register 
>> allocator why the NEG
>> Unary operation with fma operand. 
>

 For neg (fma ...) cases because of subreg 128 bits from OOmode 256 bits are
 set correctly. 
 IRA marked them spill candidates as spill priority is zero.

 Due to this LRA reload pass couldn't allocate register pairs.
>>>
>>> I think this is just restating the symptom though.  I suppose the same
>>> kind of questions apply here too: what was the instruction before the
>>> pass runs, what was the instruction after the pass runs, and why is
>>> the rtl change incorrect (by the meaning above)?
>>>
>>
>> Ori

Re: [x86 PATCH] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle  wrote:
>
>
> This patch addresses PR target/115351, which is a code quality regression
> on x86 when passing floating point complex numbers.  The ABI considers
> these arguments to have TImode, requiring interunit moves to place the
> FP values (which are actually passed in SSE registers) into the upper
> and lower parts of a TImode pseudo, and then similar moves back again
> before they can be used.
>
> The cause of the regression is that changes in how TImode initialization
> is represented in RTL now prevents the RTL optimizers from eliminating
> these redundant moves.  The specific cause is that the *concatditi3
> pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
> rtx_cost, preventing fwprop1 from propagating it.  This pattern just
> sets the hipart and lopart of a double-word register, typically two
> instructions (less if reload can allocate things appropriately) but
> the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.
>
> propagating insn 5 into insn 6, replacing:
> (set (reg:TI 110)
> (ior:TI (and:TI (reg:TI 110)
> (const_wide_int 0x0))
> (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
> (const_int 64 [0x40]
> successfully matched this instruction to *concatditi3_3:
> (set (reg:TI 110)
> (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ])
> 0))
> (const_int 64 [0x40]))
> (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0
> change not profitable (cost 50 -> cost 52)
>
> This issue is resolved by having ix86_rtx_costs return more reasonable
> values for these (place-holder) patterns.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-06-07  Roger Sayle  
>
> gcc/ChangeLog
> PR target/115351
> * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the
> *concatditi3 and *insvti_highpart patterns, about two insns.
>
> gcc/testsuite/ChangeLog
> PR target/115351
> * g++.target/i386/pr115351.C: New test case.

LGTM.

Thanks,
Uros.

>
>
> Thanks in advance (and sorry for any inconvenience),
> Roger
> --
>


Re: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:48 AM Evgeny Karpov
 wrote:
>
> This patch extracts the ix86 implementation for expanding a SYMBOL
> into its corresponding dllimport, far-address, or refptr symbol.
> It will be reused in the aarch64-w64-mingw32 target.
> The implementation is copied as is from i386/i386.cc with
> minor changes to follow to the code style.
>
> Also this patch replaces the original DLL import/export
> implementation in ix86 with mingw.
>
> gcc/ChangeLog:
>
> * config.gcc: Add winnt-dll.o, which contains the DLL
> import/export implementation.
> * config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Remove the
> old implementation. Rename the required function to MinGW.
> Use MinGW implementation for COFF and nothing otherwise.
> (GOT_ALIAS_SET): Likewise.
> * config/i386/i386-expand.cc (ix86_expand_move): Likewise.
> * config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise.
> (legitimize_pe_coff_symbol): Likewise.
> * config/i386/i386-protos.h (i386_pe_record_stub): Likewise.
> * config/i386/i386.cc (is_imported_p): Likewise.
> (legitimate_pic_address_disp_p): Likewise.
> (ix86_GOT_alias_set): Likewise.
> (legitimize_pic_address): Likewise.
> (legitimize_tls_address): Likewise.
> (struct dllimport_hasher): Likewise.
> (GTY): Likewise.
> (get_dllimport_decl): Likewise.
> (legitimize_pe_coff_extern_decl): Likewise.
> (legitimize_dllimport_symbol): Likewise.
> (legitimize_pe_coff_symbol): Likewise.
> (ix86_legitimize_address): Likewise.
> * config/i386/i386.h (GOT_ALIAS_SET): Likewise.
> * config/mingw/winnt.cc (i386_pe_record_stub): Likewise.
> (mingw_pe_record_stub): Likewise.
> * config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
> * config/mingw/t-cygming: Add the winnt-dll.o compilation.
> * config/mingw/winnt-dll.cc: New file.
> * config/mingw/winnt-dll.h: New file.

LGTM for generic x86 changes.

Thanks,
Uros.

> ---
>  gcc/config.gcc |  12 +-
>  gcc/config/i386/cygming.h  |   5 +-
>  gcc/config/i386/i386-expand.cc |   4 +-
>  gcc/config/i386/i386-expand.h  |   2 -
>  gcc/config/i386/i386-protos.h  |   1 -
>  gcc/config/i386/i386.cc| 205 ++---
>  gcc/config/i386/i386.h |   2 +
>  gcc/config/mingw/t-cygming |   6 +
>  gcc/config/mingw/winnt-dll.cc  | 231 +
>  gcc/config/mingw/winnt-dll.h   |  30 +
>  gcc/config/mingw/winnt.cc  |   2 +-
>  gcc/config/mingw/winnt.h   |   1 +
>  12 files changed, 298 insertions(+), 203 deletions(-)
>  create mode 100644 gcc/config/mingw/winnt-dll.cc
>  create mode 100644 gcc/config/mingw/winnt-dll.h
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 553a310f4bd..d053b98efa8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2177,11 +2177,13 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
>  i[34567]86-*-cygwin*)
> tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
> i386/cygwin.h i386/cygwin-stdint.h"
> tm_file="${tm_file} mingw/winnt.h"
> +   tm_file="${tm_file} mingw/winnt-dll.h"
> xm_file=i386/xm-cygwin.h
> tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
> extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
> -   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> +   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
> c_target_objs="${c_target_objs} msformat-c.o"
> cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
> d_target_objs="${d_target_objs} cygwin-d.o"
> @@ -2196,11 +2198,13 @@ x86_64-*-cygwin*)
> need_64bit_isa=yes
> tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
> i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
> tm_file="${tm_file} mingw/winnt.h"
> +   tm_file="${tm_file} mingw/winnt-dll.h"
> xm_file=i386/xm-cygwin.h
> tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
> extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
> -   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> +   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
> c_target_objs="${c_target_objs} msformat-c.o"
> cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
> d_target_objs="${d_target_objs} cygwin-d.o"
> @@ -2266,6 +2270,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
> esac
> tm_file="${tm_file} mingw/mingw-stdint.h"
> 

[PATCH] tree-optimization/114107 - avoid peeling for gaps in more cases

2024-06-07 Thread Richard Biener
The following refactors the code to detect necessary peeling for
gaps, in particular the PR103116 case when there is no gap but
the group size is smaller than the vector size.  The testcase in
PR114107 shows we fail to SLP

  for (int i=0; i

[PATCH] c++: Make *_cast<*> parsing more robust to errors [PR108438]

2024-06-07 Thread Simon Martin
We ICE upon the following when trying to emit a -Wlogical-not-parentheses
warning:

=== cut here ===
template  T foo (T arg, T& ref, T* ptr) {
  int a = 1;
  return static_cast(a);
}
=== cut here ===

This patch makes *_cast<*> parsing more robust by skipping to the closing '>'
upon error in the target type.

Successfully tested on x86_64-pc-linux-gnu.

(Note that I have a patch pending review that also adds g++.dg/parse/crash74.C;
I will obviously handle the name conflict at commit time)

PR c++/108438

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Skip to the closing '>'
upon error parsing the target type of *_cast<*> expressions.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash74.C: New test.

---
 gcc/cp/parser.cc | 3 ++-
 gcc/testsuite/g++.dg/parse/crash74.C | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/crash74.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index bc4a2359153..3516c2aa38b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -7569,7 +7569,8 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
  NULL);
parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
/* Look for the closing `>'.  */
-   cp_parser_require (parser, CPP_GREATER, RT_GREATER);
+   if (!cp_parser_require (parser, CPP_GREATER, RT_GREATER))
+ cp_parser_skip_to_end_of_template_parameter_list (parser);
/* Restore the old message.  */
parser->type_definition_forbidden_message = saved_message;
 
diff --git a/gcc/testsuite/g++.dg/parse/crash74.C 
b/gcc/testsuite/g++.dg/parse/crash74.C
new file mode 100644
index 000..81a16e35b14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash74.C
@@ -0,0 +1,9 @@
+// PR c++/108438
+// { dg-options "-Wlogical-not-parentheses" }
+
+template 
+T foo (T arg, T& ref, T* ptr)
+{
+  int a = 1;
+  return static_cast(a); // { dg-error "expected" }
+}
-- 
2.44.0




RE: [PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-07 Thread Li, Pan2
Thanks Robin for comments.

> Can we replace step 3 and 4 with sub lt, -1 directly when
> it's supposed to be optimized like that anyway?

Sure thing, will update in v3.

> When you say other variants are still to be implemented
> does that also include variants for zbb with min/max
> or zicond?

No, I mean some other forms like branch need the improvement from the middle 
end(aka widen_mul).

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 7, 2024 6:18 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

Hi Pan,

> +  /* Step-2: lt = x < y  */
> +  riscv_emit_binary (LTU, pmode_lt, pmode_x, pmode_y);
> +
> +  /* Step-3: lt = -lt  */
> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> +
> +  /* Step-4: lt = ~lt  */
> +  riscv_emit_unary (NOT, pmode_lt, pmode_lt);

Can we replace step 3 and 4 with sub lt, -1 directly when
it's supposed to be optimized like that anyway?
I was a bit irritated when reading the code because I
figured we could surely save one instruction there but then
realized that the cover letter has the shorter sequence.

The rest LGTM.

When you say other variants are still to be implemented
does that also include variants for zbb with min/max
or zicond?

Regards
 Robin


Re: [PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-07 Thread Robin Dapp
>> When you say other variants are still to be implemented
>> does that also include variants for zbb with min/max
>> or zicond?
> 
> No, I mean some other forms like branch need the improvement from the
> middle end(aka widen_mul).

Ah, I see, thanks.  Those can save one instruction and we want them
at some point.  No need to add them now but maybe add a TODO for later.

Regards
 Robin



[PATCH v3] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-07 Thread pan2 . li
From: Pan Li 

As the middle support of .SAT_SUB committed,  implement the unsigned
scalar int of .SAT_SUB for the riscv backend.  Consider below example
code:

T __attribute__((noinline))\
sat_u_sub_##T##_fmt_1 (T x, T y)   \
{  \
  return (x - y) & (-(T)(x >= y)); \
}

T __attribute__((noinline))   \
sat_u_sub_##T##_fmt_2 (T x, T y)  \
{ \
  return (x - y) & (-(T)(x > y)); \
}

DEF_SAT_U_SUB_FMT_1(uint64_t);
DEF_SAT_U_SUB_FMT_2(uint64_t);

Before this patch:
sat_u_sub_uint64_t_fmt_1:
bltua0,a1,.L2
sub a0,a0,a1
ret
.L2:
li  a0,0
ret

After this patch:
sat_u_sub_uint64_t_fmt_1:
sltua5,a0,a1
addia5,a5,-1
sub a0,a0,a1
and a0,a5,a0
ret

ToDo:
Only above 2 forms of .SAT_SUB are support for now,  we will
support more forms of .SAT_SUB in the middle-end in short future.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_ussub): Add new func
decl for ussub expanding.
* config/riscv/riscv.cc (riscv_expand_ussub): Ditto but for impl.
* config/riscv/riscv.md (ussub3): Add new pattern ussub
for scalar modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macros and comments.
* gcc.target/riscv/sat_u_sub-1.c: New test.
* gcc.target/riscv/sat_u_sub-2.c: New test.
* gcc.target/riscv/sat_u_sub-3.c: New test.
* gcc.target/riscv/sat_u_sub-4.c: New test.
* gcc.target/riscv/sat_u_sub-5.c: New test.
* gcc.target/riscv/sat_u_sub-6.c: New test.
* gcc.target/riscv/sat_u_sub-7.c: New test.
* gcc.target/riscv/sat_u_sub-8.c: New test.
* gcc.target/riscv/sat_u_sub-run-1.c: New test.
* gcc.target/riscv/sat_u_sub-run-2.c: New test.
* gcc.target/riscv/sat_u_sub-run-3.c: New test.
* gcc.target/riscv/sat_u_sub-run-4.c: New test.
* gcc.target/riscv/sat_u_sub-run-5.c: New test.
* gcc.target/riscv/sat_u_sub-run-6.c: New test.
* gcc.target/riscv/sat_u_sub-run-7.c: New test.
* gcc.target/riscv/sat_u_sub-run-8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv.cc | 35 +++
 gcc/config/riscv/riscv.md | 11 ++
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 23 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-1.c  | 18 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-2.c  | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-3.c  | 18 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-4.c  | 17 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-5.c  | 18 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-6.c  | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-7.c  | 18 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-8.c  | 17 +
 .../gcc.target/riscv/sat_u_sub-run-1.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-2.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-3.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-4.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-5.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-6.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-7.c| 25 +
 .../gcc.target/riscv/sat_u_sub-run-8.c| 25 +
 20 files changed, 414 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0704968561b..09eb3a574e3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -134,6 +134,7 @@ extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 ex

Re: [PATCH v3] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-07 Thread Robin Dapp
LGTM.

Let's keep in mind that min/max will save us two insns(?)
and a conditional move would save us one.

Regards
 Robin


[PATCH] lto: Fix build on MacOS

2024-06-07 Thread Simon Martin
The build fails on x86_64-apple-darwin19.6.0 starting with 5b6d5a886ee because
vector is included after system.h and runs into poisoned identifiers.

This patch fixes this by defining INCLUDE_VECTOR before including system.h.

Validated by doing a full build on x86_64-apple-darwin19.6.0.

gcc/lto/ChangeLog:

* lto-partition.cc: Define INCLUDE_VECTOR to avoid running into
poisoned identifiers.

---
 gcc/lto/lto-partition.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index 44b457d0b2a..2238650fa0e 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -18,6 +18,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
@@ -38,7 +39,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-partition.h"
 
 #include 
-#include 
 
 vec ltrans_partitions;
 
-- 
2.44.0




Re: [PATCH] lto: Fix build on MacOS

2024-06-07 Thread Richard Biener



> Am 07.06.2024 um 16:30 schrieb Simon Martin :
> 
> The build fails on x86_64-apple-darwin19.6.0 starting with 5b6d5a886ee 
> because
> vector is included after system.h and runs into poisoned identifiers.
> 
> This patch fixes this by defining INCLUDE_VECTOR before including system.h.

Ok

> Validated by doing a full build on x86_64-apple-darwin19.6.0.
> 
> gcc/lto/ChangeLog:
> 
>* lto-partition.cc: Define INCLUDE_VECTOR to avoid running into
>poisoned identifiers.
> 
> ---
> gcc/lto/lto-partition.cc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
> index 44b457d0b2a..2238650fa0e 100644
> --- a/gcc/lto/lto-partition.cc
> +++ b/gcc/lto/lto-partition.cc
> @@ -18,6 +18,7 @@ along with GCC; see the file COPYING3.  If not see
> .  */
> 
> #include "config.h"
> +#define INCLUDE_VECTOR
> #include "system.h"
> #include "coretypes.h"
> #include "target.h"
> @@ -38,7 +39,6 @@ along with GCC; see the file COPYING3.  If not see
> #include "lto-partition.h"
> 
> #include 
> -#include 
> 
> vec ltrans_partitions;
> 
> --
> 2.44.0
> 
> 


[PATCH] c++: lambda in pack expansion [PR115378]

2024-06-07 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/14?

-- >8 --

Here find_parameter_packs_r is incorrectly treating the 'auto' return
type of a lambda as a parameter pack due to Concepts-TS specific logic
added in r6-4517, leading to confusion later when expanding the pattern.

Since we intend on removing Concepts TS support soon anyway, this patch
fixes this by restricting the problematic logic with flag_concepts_ts.
Doing so revealed that add_capture was relying on this logic to set
TEMPLATE_TYPE_PARAMETER_PACK for the 'auto' type of an init-capture pack
expansion, which we now need to do explicitly.

PR c++/115378

gcc/cp/ChangeLog:

* lambda.cc (lambda_capture_field_type): Set
TEMPLATE_TYPE_PARAMETER_PACK on the auto type of an init-capture
pack expansion.
* pt.cc (find_parameter_packs_r) :
Restrict TEMPLATE_TYPE_PARAMETER_PACK promotion with
flag_concepts_ts.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto-103497.C: Adjust expected diagnostic.
* g++.dg/template/pr95672.C: Likewise.
* g++.dg/cpp2a/lambda-targ5.C: New test.
---
 gcc/cp/lambda.cc  |  3 ++-
 gcc/cp/pt.cc  |  2 +-
 gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C |  2 +-
 gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C | 15 +++
 gcc/testsuite/g++.dg/template/pr95672.C   |  2 +-
 5 files changed, 20 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index 630cc4eade1..0770417810e 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -223,7 +223,8 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
   outermost CV qualifiers of EXPR.  */
type = build_reference_type (type);
   if (uses_parameter_packs (expr))
-   /* Stick with 'auto' even if the type could be deduced.  */;
+   /* Stick with 'auto' even if the type could be deduced.  */
+   TEMPLATE_TYPE_PARAMETER_PACK (auto_node) = true;
   else
type = do_auto_deduction (type, expr, auto_node);
 }
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index dfce1b3c359..6ee27d6fa16 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3940,7 +3940,7 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 parameter pack (14.6.3), or the type-specifier-seq of a type-id that
 is a pack expansion, the invented template parameter is a template
 parameter pack.  */
-  if (ppd->type_pack_expansion_p && is_auto (t)
+  if (flag_concepts_ts && ppd->type_pack_expansion_p && is_auto (t)
  && TEMPLATE_TYPE_LEVEL (t) != 0)
TEMPLATE_TYPE_PARAMETER_PACK (t) = true;
   if (TEMPLATE_TYPE_PARAMETER_PACK (t))
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
index cedd661710c..4162361d14f 100644
--- a/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
@@ -1,7 +1,7 @@
 // PR c++/103497
 // { dg-do compile { target c++14 } }
 
-void foo(decltype(auto)... args);  // { dg-error "cannot declare a parameter 
with .decltype.auto.." }
+void foo(decltype(auto)... args);  // { dg-error "contains no parameter packs" 
}
 
 int main() {
   foo();
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C
new file mode 100644
index 000..efd4bb45d58
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C
@@ -0,0 +1,15 @@
+// PR c++/115378
+// { dg-do compile { target c++20 } }
+
+struct tt {};
+
+template
+constexpr auto __counter = 1;
+
+template 
+using _as_base = tt;
+
+template 
+struct env : _as_base>... {};
+
+env t;
diff --git a/gcc/testsuite/g++.dg/template/pr95672.C 
b/gcc/testsuite/g++.dg/template/pr95672.C
index c752b4a2c08..d97b8db2e97 100644
--- a/gcc/testsuite/g++.dg/template/pr95672.C
+++ b/gcc/testsuite/g++.dg/template/pr95672.C
@@ -1,3 +1,3 @@
 // PR c++/95672
 // { dg-do compile { target c++14 } }
-struct g_class : decltype  (auto) ... {  }; // { dg-error "invalid use of pack 
expansion" }
+struct g_class : decltype  (auto) ... {  }; // { dg-error "contains no 
parameter packs" }
-- 
2.45.2.409.g7b0defb391



Re: [PATCH] c++: lambda in pack expansion [PR115378]

2024-06-07 Thread Jason Merrill

On 6/7/24 10:44, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/14?


OK.


-- >8 --

Here find_parameter_packs_r is incorrectly treating the 'auto' return
type of a lambda as a parameter pack due to Concepts-TS specific logic
added in r6-4517, leading to confusion later when expanding the pattern.

Since we intend on removing Concepts TS support soon anyway, this patch
fixes this by restricting the problematic logic with flag_concepts_ts.
Doing so revealed that add_capture was relying on this logic to set
TEMPLATE_TYPE_PARAMETER_PACK for the 'auto' type of an init-capture pack
expansion, which we now need to do explicitly.

PR c++/115378

gcc/cp/ChangeLog:

* lambda.cc (lambda_capture_field_type): Set
TEMPLATE_TYPE_PARAMETER_PACK on the auto type of an init-capture
pack expansion.
* pt.cc (find_parameter_packs_r) :
Restrict TEMPLATE_TYPE_PARAMETER_PACK promotion with
flag_concepts_ts.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto-103497.C: Adjust expected diagnostic.
* g++.dg/template/pr95672.C: Likewise.
* g++.dg/cpp2a/lambda-targ5.C: New test.
---
  gcc/cp/lambda.cc  |  3 ++-
  gcc/cp/pt.cc  |  2 +-
  gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C |  2 +-
  gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C | 15 +++
  gcc/testsuite/g++.dg/template/pr95672.C   |  2 +-
  5 files changed, 20 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index 630cc4eade1..0770417810e 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -223,7 +223,8 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
   outermost CV qualifiers of EXPR.  */
type = build_reference_type (type);
if (uses_parameter_packs (expr))
-   /* Stick with 'auto' even if the type could be deduced.  */;
+   /* Stick with 'auto' even if the type could be deduced.  */
+   TEMPLATE_TYPE_PARAMETER_PACK (auto_node) = true;
else
type = do_auto_deduction (type, expr, auto_node);
  }
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index dfce1b3c359..6ee27d6fa16 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3940,7 +3940,7 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 parameter pack (14.6.3), or the type-specifier-seq of a type-id that
 is a pack expansion, the invented template parameter is a template
 parameter pack.  */
-  if (ppd->type_pack_expansion_p && is_auto (t)
+  if (flag_concepts_ts && ppd->type_pack_expansion_p && is_auto (t)
  && TEMPLATE_TYPE_LEVEL (t) != 0)
TEMPLATE_TYPE_PARAMETER_PACK (t) = true;
if (TEMPLATE_TYPE_PARAMETER_PACK (t))
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
index cedd661710c..4162361d14f 100644
--- a/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
@@ -1,7 +1,7 @@
  // PR c++/103497
  // { dg-do compile { target c++14 } }
  
-void foo(decltype(auto)... args);  // { dg-error "cannot declare a parameter with .decltype.auto.." }

+void foo(decltype(auto)... args);  // { dg-error "contains no parameter packs" 
}
  
  int main() {

foo();
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C
new file mode 100644
index 000..efd4bb45d58
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ5.C
@@ -0,0 +1,15 @@
+// PR c++/115378
+// { dg-do compile { target c++20 } }
+
+struct tt {};
+
+template
+constexpr auto __counter = 1;
+
+template 
+using _as_base = tt;
+
+template 
+struct env : _as_base>... {};
+
+env t;
diff --git a/gcc/testsuite/g++.dg/template/pr95672.C 
b/gcc/testsuite/g++.dg/template/pr95672.C
index c752b4a2c08..d97b8db2e97 100644
--- a/gcc/testsuite/g++.dg/template/pr95672.C
+++ b/gcc/testsuite/g++.dg/template/pr95672.C
@@ -1,3 +1,3 @@
  // PR c++/95672
  // { dg-do compile { target c++14 } }
-struct g_class : decltype  (auto) ... {  }; // { dg-error "invalid use of pack 
expansion" }
+struct g_class : decltype  (auto) ... {  }; // { dg-error "contains no parameter 
packs" }




Re: [PATCH] c++: Handle erroneous DECL_LOCAL_DECL_ALIAS in duplicate_decls [PR107575]

2024-06-07 Thread Jason Merrill

On 6/5/24 05:20, Simon Martin wrote:

On 5 Jun 2024, at 10:34, Jakub Jelinek wrote:


On Wed, Jun 05, 2024 at 08:13:14AM +, Simon Martin wrote:

--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2792,10 +2792,13 @@ duplicate_decls (tree newdecl, tree olddecl,
bool hiding, bool was_hidden)
  retrofit_lang_decl (newdecl);
  tree alias = DECL_LOCAL_DECL_ALIAS (newdecl)
= DECL_LOCAL_DECL_ALIAS (olddecl);
- DECL_ATTRIBUTES (alias)
-   = (*targetm.merge_decl_attributes) (alias, newdecl);
- if (TREE_CODE (newdecl) == FUNCTION_DECL)
-   merge_attribute_bits (newdecl, alias);
+ if (alias != error_mark_node)
+   {
+ DECL_ATTRIBUTES (alias) =
+   (*targetm.merge_decl_attributes) (alias, newdecl);


Formatting nit, = should be on the next line, not at the end of a
line.
See https://gcc.gnu.org/codingconventions.html and
https://gcc.gnu.org/codingconventions.html

Indeed, thanks. This is fixed in the attached updated patch.


OK.




Re: [PATCH] c++: Make *_cast<*> parsing more robust to errors [PR108438]

2024-06-07 Thread Jason Merrill

On 6/7/24 08:12, Simon Martin wrote:

We ICE upon the following when trying to emit a -Wlogical-not-parentheses
warning:

=== cut here ===
template  T foo (T arg, T& ref, T* ptr) {
   int a = 1;
   return static_cast(a);
}
=== cut here ===

This patch makes *_cast<*> parsing more robust by skipping to the closing '>'
upon error in the target type.

Successfully tested on x86_64-pc-linux-gnu.

(Note that I have a patch pending review that also adds g++.dg/parse/crash74.C;
I will obviously handle the name conflict at commit time)

PR c++/108438

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Skip to the closing '>'
upon error parsing the target type of *_cast<*> expressions.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash74.C: New test.

---
  gcc/cp/parser.cc | 3 ++-
  gcc/testsuite/g++.dg/parse/crash74.C | 9 +
  2 files changed, 11 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/crash74.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index bc4a2359153..3516c2aa38b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -7569,7 +7569,8 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
  NULL);
parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
/* Look for the closing `>'.  */
-   cp_parser_require (parser, CPP_GREATER, RT_GREATER);
+   if (!cp_parser_require (parser, CPP_GREATER, RT_GREATER))
+ cp_parser_skip_to_end_of_template_parameter_list (parser);


Looks like this could use cp_parser_require_end_of_template_parameter_list.

OK with that change.

Jason



Re: [to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when profitable

2024-06-07 Thread Andreas Schwab
In file included from ../../gcc/rtl.h:3973,
 from ../../gcc/config/riscv/riscv.cc:31:
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, 
rtx)' at ./genrtl.h:50:26,
inlined from 'void riscv_move_integer(rtx, rtx, long int, machine_mode)' at 
../../gcc/config/riscv/riscv.cc:2786:10:
./genrtl.h:37:16: error: 'x' may be used uninitialized 
[-Werror=maybe-uninitialized]
   37 |   XEXP (rt, 0) = arg0;
../../gcc/config/riscv/riscv.cc: In function 'void riscv_move_integer(rtx, rtx, 
long int, machine_mode)':
../../gcc/config/riscv/riscv.cc:2723:7: note: 'x' was declared here
 2723 |   rtx x;
  |   ^
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2563: riscv.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[pushed] c++: -include and header unit translation

2024-06-07 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

 Within a source file, #include is translated to import if a suitable header
 unit is available, but this wasn't working with -include.  This turned out
 to be because we suppressed the translation before the beginning of the
 main file.  After removing that, I had to tweak libcpp file handling to
 accommodate the way it moves from an -include to the main file.

gcc/ChangeLog:

* doc/invoke.texi (C++ Modules): Mention -include.

gcc/cp/ChangeLog:

* module.cc (maybe_translate_include): Allow before the main file.

libcpp/ChangeLog:

* files.cc (_cpp_stack_file): LC_ENTER for -include header unit.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dashinclude-1_b.C: New test.
* g++.dg/modules/dashinclude-1_a.H: New test.
---
 gcc/doc/invoke.texi| 17 +
 gcc/cp/module.cc   |  4 
 gcc/testsuite/g++.dg/modules/dashinclude-1_b.C |  9 +
 libcpp/files.cc|  5 -
 gcc/testsuite/g++.dg/modules/dashinclude-1_a.H |  5 +
 5 files changed, 35 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/dashinclude-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/dashinclude-1_a.H

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e5a5d1d9335..ca2591ce2c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -37764,6 +37764,23 @@ installed.  Specifying the language as one of these 
variants also
 inhibits output of the object file, as header files have no associated
 object file.
 
+Header units can be used in much the same way as precompiled headers
+(@pxref{Precompiled Headers}), but with fewer restrictions: an
+#include that is translated to a header unit import can appear at any
+point in the source file, and multiple header units can be used
+together.  In particular, the @option{-include} strategy works: with
+the bits/stdc++.h header used for libstdc++ precompiled headers you
+can
+
+@smallexample
+g++ -fmodules-ts -x c++-system-header -c bits/stdc++.h
+g++ -fmodules-ts -include bits/stdc++.h mycode.C
+@end smallexample
+
+and any standard library #includes in mycode.C will be skipped,
+because the import brought in the whole library.  This can be a simple
+way to use modules to speed up compilation without any code changes.
+
 The @option{-fmodule-only} option disables generation of the
 associated object file for compiling a module interface.  Only the CMI
 is generated.  This option is implied when using the
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ed24814b601..21fc85150c9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19976,10 +19976,6 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   return nullptr;
 }
 
-  if (!spans.init_p ())
-/* Before the main file, don't divert.  */
-return nullptr;
-
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
diff --git a/gcc/testsuite/g++.dg/modules/dashinclude-1_b.C 
b/gcc/testsuite/g++.dg/modules/dashinclude-1_b.C
new file mode 100644
index 000..6e6a33407a4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dashinclude-1_b.C
@@ -0,0 +1,9 @@
+// Test that include translation works with command-line -include.
+// { dg-additional-options "-fmodules-ts -fdump-lang-module -include 
$srcdir/g++.dg/modules/dashinclude-1_a.H" }
+
+int main ()
+{
+  return f();
+}
+
+// { dg-final { scan-lang-dump {Translating include to import} module } }
diff --git a/libcpp/files.cc b/libcpp/files.cc
index c61df339e20..78f56e30bde 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -1008,7 +1008,10 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, 
include_type type,
   if (decrement)
 pfile->line_table->highest_location--;
 
-  if (file->header_unit <= 0)
+  /* Normally a header unit becomes an __import directive in the current file,
+ but with -include we need something to LC_LEAVE to trigger the file_change
+ hook and continue to the next -include or the main source file.  */
+  if (file->header_unit <= 0 || type == IT_CMDLINE)
 /* Add line map and do callbacks.  */
 _cpp_do_file_change (pfile, LC_ENTER, file->path,
   /* With preamble injection, start on line zero,
diff --git a/gcc/testsuite/g++.dg/modules/dashinclude-1_a.H 
b/gcc/testsuite/g++.dg/modules/dashinclude-1_a.H
new file mode 100644
index 000..c1b40a53924
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/dashinclude-1_a.H
@@ -0,0 +1,5 @@
+// { dg-module-do run }
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+inline int f() { return 0; }

base-commit: 5c761395402a730535983a5e49ef1775561ebc61
-- 
2.44.0



[analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-07 Thread Roger Sayle

This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.

Bootstrapped on x86_64-pc-linux-gnu using both gcc 4.8.5 (system) and
gcc 10.2.1 (using "scl enable devetoolset-10") as host compilers.
Ok for mainline?


2024-06-07  Roger Sayle  

gcc/analyzer/ChangeLog
* constraint-manager.cc (equiv_class::make_dump_widget): Use
std::move to return a std::unique_ptr.
(bounded_ranges_constraint::make_dump_widget): Likewise.
(constraint_manager::make_dump_widget): Likewise.
* program_state.cc (sm_state_map::make_dump_widget): Likewise.
(program_state::make_dump_widget): Likewise.
* region-model.cc (region_to_value_map::make_dump_widget): Likewise.
(region_model::make_dump_widget): Likewise.
* region.cc (region::make_dump_widget): Likewise.
* store.cc (binding_cluster::make_dump_widget): Likewise.
(store::make_dump_widget): Likewise.
* svalue.cc (svalue::make_dump_widget): Likewise.

Thanks in advance,
Roger
--

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 707385d..883f33b 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -1176,7 +1176,7 @@ equiv_class::make_dump_widget (const 
text_art::dump_widget_info &dwi,
   ec_widget->add_child (tree_widget::make (dwi, &pp));
 }
 
-  return ec_widget;
+  return std::move (ec_widget);
 }
 
 /* Generate a hash value for this equiv_class.
@@ -1500,7 +1500,7 @@ make_dump_widget (const text_art::dump_widget_info &dwi) 
const
 (tree_widget::from_fmt (dwi, nullptr,
"ec%i bounded ranges", m_ec_id.as_int ()));
   m_ranges->add_to_dump_widget (*brc_widget.get (), dwi);
-  return brc_widget;
+  return std::move (brc_widget);
 }
 
 bool
@@ -1853,7 +1853,7 @@ constraint_manager::make_dump_widget (const 
text_art::dump_widget_info &dwi) con
   if (cm_widget->get_num_children () == 0)
 return nullptr;
 
-  return cm_widget;
+  return std::move (cm_widget);
 }
 
 /* Attempt to add the constraint LHS OP RHS to this constraint_manager.
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index dc2d4bd..efaf569 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -382,7 +382,7 @@ sm_state_map::make_dump_widget (const 
text_art::dump_widget_info &dwi,
   state_widget->add_child (tree_widget::make (dwi, pp));
 }
 
-  return state_widget;
+  return std::move (state_widget);
 }
 
 /* Return true if no states have been set within this map
@@ -1247,7 +1247,7 @@ program_state::make_dump_widget (const 
text_art::dump_widget_info &dwi) const
state_widget->add_child (smap->make_dump_widget (dwi, m_region_model));
   }
 
-  return state_widget;
+  return std::move (state_widget);
 }
 
 /* Update this program_state to reflect a top-level call to FUN.
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d142d85..4fbc970 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -288,7 +288,7 @@ make_dump_widget (const text_art::dump_widget_info &dwi) 
const
   sval->dump_to_pp (pp, true);
   w->add_child (text_art::tree_widget::make (dwi, pp));
 }
-  return w;
+  return std::move (w);
 }
 
 /* Attempt to merge THIS with OTHER, writing the result
@@ -556,7 +556,7 @@ region_model::make_dump_widget (const 
text_art::dump_widget_info &dwi) const
   m_mgr->get_store_manager ()));
   model_widget->add_child (m_constraints->make_dump_widget (dwi));
   model_widget->add_child (m_dynamic_extents.make_dump_widget (dwi));
-  return model_widget;
+  return std::move (model_widget);
 }
 
 /* Assert that this object is valid.  */
diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index 71bae97..050feb6 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -1119,7 +1119,7 @@ region::make_dump_widget (const 
text_art::dump_widget_info &dwi,
   if (m_parent)
 w->add_child (m_parent->make_dump_widget (dwi, "parent"));
 
-  return w;
+  return std::move (w);
 }
 
 void
diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index d14cfa3..b20bc29 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -1489,7 +1489,7 @@ binding_cluster::make_dump_widget (const 
text_art::dump_widget_info &dwi,
 
   m_map.add_to_tree_widget (*cluster_widget, dwi);
 
-  return cluster_widget;
+  return std::move (cluster_widget);
 }
 }
 
@@ -2766,7 +2766,7 @@ store::make_dump_widget (const text_art::dump_widget_info 
&dwi,
   store_widget->add_child (std::move (parent_reg_widget));
 }
 
-  return store_widget;
+  return std::move (store_widget);
 }
 
 /* Get any svalue bound to REG, or NULL.  */
diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
i

Re: [PATCH] libstdc++: Optimize std::gcd

2024-06-07 Thread Stephen Face
On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> On Fri, 7 Jun 2024 at 09:57, Sam James  wrote:
>>
>> Stephen Face  writes:
>>
>>> This patch is to optimize the runtime execution of gcd. Mathematically,
>>> it computes with the same algorithm as before, but subtractions and
>>> branches are rearranged to encourage generation of code that can use
>>> flags from the subtractions for conditional moves. Additionally, most
>>> pairs of integers are coprime, so this patch also includes a check for
>>> one of the integers to be equal to 1, and then it will exit the loop
>>> early in this case.
>>
>> Is it worth filing a bug for the missed optimisation? You shouldn't have
>> to write things in a specific order. Thanks.

This patch is not a pure reordering. It has the added comparison with 1.
Also, the argument to the trailing zero count is now the result of the
subtraction before the comparison of m and n. This shortens the chain of
dependencies, which can help the loop run faster.

> 
> Yes, I think a bug report would be good. But 20%-60% decreases in run
> time seems significant enough that we should take the libstdc++ patch
> now, rather than wait for a possible compiler fix to come later.
> 
> Stephen, could you please confirm whether you have a copyright
> assignment in place for GCC, and if not whether you would be will to
> complete one, or alternatively contribute this under the DCO terms:
> https://gcc.gnu.org/dco.html
> Thanks!

I do not have a copyright assignment in place for GCC. I can certify to
the DCO terms for this contribution.

Signed-off-by: Stephen Face 

> 
> 
>>
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>>   * include/std/numeric(__gcd): Optimize.
>>> ---
>>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
>>> timing with random distributions of small inputs and large inputs on a
>>> couple of machines with -O2 and found decreases in execution time from
>>> 20% to 60% depending on the machine and distribution of inputs.
>>>
>>>  libstdc++-v3/include/std/numeric | 21 +++--
>>>  1 file changed, 11 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/libstdc++-v3/include/std/numeric 
>>> b/libstdc++-v3/include/std/numeric
>>> index c912db4a519..3c9e8387a0e 100644
>>> --- a/libstdc++-v3/include/std/numeric
>>> +++ b/libstdc++-v3/include/std/numeric
>>> @@ -148,19 +148,20 @@ namespace __detail
>>>
>>>while (true)
>>>   {
>>> -   if (__m > __n)
>>> - {
>>> -   _Tp __tmp = __m;
>>> -   __m = __n;
>>> -   __n = __tmp;
>>> - }
>>> +   _Tp __m_minus_n = __m - __n;
>>> +   if (__m_minus_n == 0)
>>> + return __m << __k;
>>>
>>> -   __n -= __m;
>>> +   _Tp __next_m = __m < __n ? __m : __n;
>>>
>>> -   if (__n == 0)
>>> - return __m << __k;
>>> +   if (__next_m == 1)
>>> + return __next_m << __k;
>>> +
>>> +   _Tp __n_minus_m = __n - __m;
>>> +   __n = __n < __m ? __m_minus_n : __n_minus_m;
>>> +   __m = __next_m;
>>>
>>> -   __n >>= std::__countr_zero(__n);
>>> +   __n >>= std::__countr_zero(__m_minus_n);
>>>   }
>>>  }
>>>  } // namespace __detail
>>
> 



Re: [PATCH] libstdc++: Optimize std::gcd

2024-06-07 Thread Jonathan Wakely
On Fri, 7 Jun 2024 at 19:42, Stephen Face  wrote:
>
> On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> > On Fri, 7 Jun 2024 at 09:57, Sam James  wrote:
> >>
> >> Stephen Face  writes:
> >>
> >>> This patch is to optimize the runtime execution of gcd. Mathematically,
> >>> it computes with the same algorithm as before, but subtractions and
> >>> branches are rearranged to encourage generation of code that can use
> >>> flags from the subtractions for conditional moves. Additionally, most
> >>> pairs of integers are coprime, so this patch also includes a check for
> >>> one of the integers to be equal to 1, and then it will exit the loop
> >>> early in this case.
> >>
> >> Is it worth filing a bug for the missed optimisation? You shouldn't have
> >> to write things in a specific order. Thanks.
>
> This patch is not a pure reordering. It has the added comparison with 1.
> Also, the argument to the trailing zero count is now the result of the
> subtraction before the comparison of m and n. This shortens the chain of
> dependencies, which can help the loop run faster.
>
> >
> > Yes, I think a bug report would be good. But 20%-60% decreases in run
> > time seems significant enough that we should take the libstdc++ patch
> > now, rather than wait for a possible compiler fix to come later.
> >
> > Stephen, could you please confirm whether you have a copyright
> > assignment in place for GCC, and if not whether you would be will to
> > complete one, or alternatively contribute this under the DCO terms:
> > https://gcc.gnu.org/dco.html
> > Thanks!
>
> I do not have a copyright assignment in place for GCC. I can certify to
> the DCO terms for this contribution.
>
> Signed-off-by: Stephen Face 

Thank you! I'll take care of this patch next week.


>
> >
> >
> >>
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>   * include/std/numeric(__gcd): Optimize.
> >>> ---
> >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> >>> timing with random distributions of small inputs and large inputs on a
> >>> couple of machines with -O2 and found decreases in execution time from
> >>> 20% to 60% depending on the machine and distribution of inputs.
> >>>
> >>>  libstdc++-v3/include/std/numeric | 21 +++--
> >>>  1 file changed, 11 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/std/numeric 
> >>> b/libstdc++-v3/include/std/numeric
> >>> index c912db4a519..3c9e8387a0e 100644
> >>> --- a/libstdc++-v3/include/std/numeric
> >>> +++ b/libstdc++-v3/include/std/numeric
> >>> @@ -148,19 +148,20 @@ namespace __detail
> >>>
> >>>while (true)
> >>>   {
> >>> -   if (__m > __n)
> >>> - {
> >>> -   _Tp __tmp = __m;
> >>> -   __m = __n;
> >>> -   __n = __tmp;
> >>> - }
> >>> +   _Tp __m_minus_n = __m - __n;
> >>> +   if (__m_minus_n == 0)
> >>> + return __m << __k;
> >>>
> >>> -   __n -= __m;
> >>> +   _Tp __next_m = __m < __n ? __m : __n;
> >>>
> >>> -   if (__n == 0)
> >>> - return __m << __k;
> >>> +   if (__next_m == 1)
> >>> + return __next_m << __k;
> >>> +
> >>> +   _Tp __n_minus_m = __n - __m;
> >>> +   __n = __n < __m ? __m_minus_n : __n_minus_m;
> >>> +   __m = __next_m;
> >>>
> >>> -   __n >>= std::__countr_zero(__n);
> >>> +   __n >>= std::__countr_zero(__m_minus_n);
> >>>   }
> >>>  }
> >>>  } // namespace __detail
> >>
> >
>



[committed] libstdc++: Add missing header to for std::__memcmp

2024-06-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

As noticed by Michael Levine.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h: Include .
---
 libstdc++-v3/include/bits/ranges_algobase.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index e26a73a27d6..e1f00838818 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -38,6 +38,7 @@
 #include  // ranges::begin, ranges::range etc.
 #include   // __invoke
 #include  // __is_byte
+#include  // __memcmp
 
 #if __cpp_lib_concepts
 namespace std _GLIBCXX_VISIBILITY(default)
-- 
2.45.1



Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-06-07 Thread Qing Zhao
Hi, Richard,

> On Jun 5, 2024, at 13:58, Qing Zhao  wrote:
>> 
> Like this?
> 
> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index e6e2b0897572..ee344f91333b 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h
> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
> struct GTY(()) location_adhoc_data {
> location_t locus;
> source_range src_range;
> -  void * GTY((skip)) data;
> unsigned discriminator;
> +  void * GTY((skip)) data;
> +  void * GTY((skip)) copy_history;
> };
> struct htab;
 
 Yes.
 
> How about the copy_history? Do we need a new data structure (like
> the following, or other suggestion) for this? Where should I add
> this new data structure?
 
 As it needs to be managed by libcpp it should be in this very same
 file.
 
> struct copy_history {
> location_t condition;
> Bool is_true_path;
> }
 
 I think we want a pointer to the previous copy-of state as well in
 case a stmt
 was copied twice.  We'll see whether a single (condition) location
 plus edge flag
 is sufficient.  I'd say we should plan for an enum to indicate the
 duplication
 reason at least (jump threading, unswitching, unrolling come to my
 mind).  For
 jump threading being able to say "when  is true/false" is
 probably
 good enough, though it might not always be easy to identify a single
 condition
 here given a threading path starts at an incoming edge to a CFG merge
 and
 will usually end with the outgoing edge of a condition that we can
 statically
 evaluate.  The condition controlling the path entry might not be
 determined
 fully by a single condition location.
 
 Possibly building a full "diagnostic path" object at threading time
 might be
 the only way to record all the facts, but that's also going to be
 more
 expensive.
>>> 
>>> Note that a diagnostic_path represents a path through some kind of
>>> graph, whereas it sounds like you want to be storing the *nodes* in the
>>> graph, and later generating the diagnostic_path from that graph when we
>>> need it (which is trivial if the graph is actually just a tree: just
>>> follow the parent links backwards, then reverse it).
>> 
>> I think we are mixing two things - one is that a single transform like 
>> jump
>> threading produces a stmt copy and when we emit a diagnostic on that
>> copied statement we want to tell the user the condition under which the
>> copy is executed.  That "condition" can be actually a sequence of
>> conditionals.  I wanted to point out that a diagnostic_path instance 
>> could
>> be used to describe such complex condition.
>> 
>> But then the other thing I wanted to address with the link to a previous
>> copy_history - that's when a statement gets copied twice, for example
>> by two distinct jump threading optimizations.  Like when dumping
>> the inlining decisions for diagnostics we could dump the logical "and"
>> of the conditions of the two threadings.  Since we have a single
>> location per GIMPLE stmt we'd have to keep a "stack" of copy events
>> associated with it.  That's the linked list (I think a linked list should
>> work fine here).
> Yes, the linked list to keep the “stack” of copy events should be good 
> enough
> to form the sequence of conditionals event for the diagnostic_path 
> instance.
>> 
>> I realize things may get a bit "fat", but eventually we are not 
>> duplicating
>> statements that much.  I do hope we can share for example a big
>> diagnostic_path when we duplicate a basic block during threading
>> and use one instance for all stmts in such block copy (IIRC we never
>> release locations or their ad-hoc data, we just make sure to never
>> use locations with ad-hoc data pointing to BLOCKs that we released,
>> but the linemap data will still have pointers in "dead" location entries,
>> right?)
> Are you still suggesting to add two artificial stmts in the beginning and 
> the
> end of the duplicated block to carry the copy history information for all 
> the
> stmts in the block to save space?
> 
> Compared with the approach to carry such information to each duplicated 
> stmts (which I preferred),
> The major concerns with the approach are:
> 1. Optimizations might move stmts out of these two artificial stmts, 
> therefore we need add
> Some memory barrier on these two artificial stmts to prevent such 
> movements.
>This migh

Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-07 Thread David Malcolm
On Fri, 2024-06-07 at 19:40 +0100, Roger Sayle wrote:
> 
> This patch restores bootstrap when using g++ 4.8 as a host compiler.
> Returning a std::unique_ptr requires a std::move on C++ compilers
> (pre-C++17) that don't guarantee copy elision/return value
> optimization.
> 
> Bootstrapped on x86_64-pc-linux-gnu using both gcc 4.8.5 (system) and
> gcc 10.2.1 (using "scl enable devetoolset-10") as host compilers.
> Ok for mainline?

Yes, thanks.  Sorry for the breakage.

Dave

> 
> 
> 2024-06-07  Roger Sayle  
> 
> gcc/analyzer/ChangeLog
>     * constraint-manager.cc (equiv_class::make_dump_widget): Use
>     std::move to return a std::unique_ptr.
>     (bounded_ranges_constraint::make_dump_widget): Likewise.
>     (constraint_manager::make_dump_widget): Likewise.
>     * program_state.cc (sm_state_map::make_dump_widget):
> Likewise.
>     (program_state::make_dump_widget): Likewise.
>     * region-model.cc (region_to_value_map::make_dump_widget):
> Likewise.
>     (region_model::make_dump_widget): Likewise.
>     * region.cc (region::make_dump_widget): Likewise.
>     * store.cc (binding_cluster::make_dump_widget): Likewise.
>     (store::make_dump_widget): Likewise.
>     * svalue.cc (svalue::make_dump_widget): Likewise.
> 
> Thanks in advance,
> Roger
> --
> 



[r15-1100 Regression] FAIL: gcc.target/i386/sse2-v1ti-vne.c scan-assembler-times pcmpeq 6 on Linux/x86_64

2024-06-07 Thread haochen.jiang
On Linux/x86_64,

ec985bc97a01577bca8307f986caba7ba7633cde is the first bad commit
commit ec985bc97a01577bca8307f986caba7ba7633cde
Author: Roger Sayle 
Date:   Fri Jun 7 13:57:23 2024 +0100

i386: Improve handling of ternlog instructions in i386/sse.md

caused

FAIL: gcc.target/i386/avx2-pr98461.c scan-assembler-times \tnotl\t 6
FAIL: gcc.target/i386/avx512f-copysign.c scan-assembler-times vpternlog[dq][ 
\\t]+\\$(?:216|228|0xd8|0xe4), 5
FAIL: gcc.target/i386/avx512f-vpternlogd-3.c scan-assembler-times vpternlogd[ 
\\t] 694
FAIL: gcc.target/i386/avx512f-vpternlogd-4.c scan-assembler-times vpternlogd[ 
\\t] 694
FAIL: gcc.target/i386/pr101989-broadcast-1.c scan-assembler-times \\{1to4\\} 4
FAIL: gcc.target/i386/sse2-v1ti-vne.c scan-assembler-times pcmpeq 6
FAIL: gcc.target/i386/sse2-v1ti-vne.c scan-assembler-times pxor 3

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1100/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx2-pr98461.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx2-pr98461.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-copysign.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-vpternlogd-3.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-vpternlogd-3.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-vpternlogd-4.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512f-vpternlogd-4.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101989-broadcast-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101989-broadcast-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse2-v1ti-vne.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[pushed] analyzer: new warning: -Wanalyzer-undefined-behavior-ptrdiff (PR analyzer/105892)

2024-06-07 Thread David Malcolm
Add a new warning to complain about pointer subtraction involving
different chunks of memory.

For example, given:

  #include 

  int arr[42];
  int sentinel;

  ptrdiff_t
  test_invalid_calc_of_array_size (void)
  {
return &sentinel - arr;
  }

this emits:

demo.c: In function ‘test_invalid_calc_of_array_size’:
demo.c:9:20: warning: undefined behavior when subtracting pointers [CWE-469] 
[-Wanalyzer-undefined-behavior-ptrdiff]
9 |   return &sentinel - arr;
  |^
  events 1-2
│
│3 | int arr[42];
│  | ~~~
│  | |
│  | (2) underlying object for right-hand side of subtraction 
created here
│4 | int sentinel;
│  | ^~~~
│  | |
│  | (1) underlying object for left-hand side of subtraction 
created here
│
└──> ‘test_invalid_calc_of_array_size’: event 3
   │
   │9 |   return &sentinel - arr;
   │  |^
   │  ||
   │  |(3) ⚠️  subtraction of pointers has 
undefined behavior if they do not point into the same array object
   │

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1107-g13dcaf1bb6d4f1.

gcc/analyzer/ChangeLog:
PR analyzer/105892
* analyzer.opt (Wanalyzer-undefined-behavior-ptrdiff): New option.
* analyzer.opt.urls: Regenerate.
* region-model.cc (class undefined_ptrdiff_diagnostic): New.
(check_for_invalid_ptrdiff): New.
(region_model::get_gassign_result): Call it for POINTER_DIFF_EXPR.

gcc/ChangeLog:
* doc/invoke.texi: Add -Wanalyzer-undefined-behavior-ptrdiff.

gcc/testsuite/ChangeLog:
PR analyzer/105892
* c-c++-common/analyzer/out-of-bounds-pr110387.c: Add
expected warnings about pointer subtraction.
* c-c++-common/analyzer/ptr-subtraction-1.c: New test.
* c-c++-common/analyzer/ptr-subtraction-CWE-469-example.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/analyzer.opt.urls|   3 +
 gcc/analyzer/region-model.cc  | 141 ++
 gcc/doc/invoke.texi   |  16 ++
 .../analyzer/out-of-bounds-pr110387.c |   4 +-
 .../c-c++-common/analyzer/ptr-subtraction-1.c |  46 ++
 .../ptr-subtraction-CWE-469-example.c |  81 ++
 7 files changed, 293 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/ptr-subtraction-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/analyzer/ptr-subtraction-CWE-469-example.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index bbf2ba670d8a..5335f7e1999e 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -222,6 +222,10 @@ Wanalyzer-tainted-size
 Common Var(warn_analyzer_tainted_size) Init(1) Warning
 Warn about code paths in which an unsanitized value is used as a size.
 
+Wanalyzer-undefined-behavior-ptrdiff
+Common Var(warn_analyzer_undefined_behavior_ptrdiff) Init(1) Warning
+Warn about code paths in which pointer subtraction involves undefined behavior.
+
 Wanalyzer-undefined-behavior-strtok
 Common Var(warn_analyzer_undefined_behavior_strtok) Init(1) Warning
 Warn about code paths in which a call is made to strtok with undefined 
behavior.
diff --git a/gcc/analyzer/analyzer.opt.urls b/gcc/analyzer/analyzer.opt.urls
index 5fcab7205823..18a0d6926de7 100644
--- a/gcc/analyzer/analyzer.opt.urls
+++ b/gcc/analyzer/analyzer.opt.urls
@@ -114,6 +114,9 @@ 
UrlSuffix(gcc/Static-Analyzer-Options.html#index-Wanalyzer-tainted-offset)
 Wanalyzer-tainted-size
 UrlSuffix(gcc/Static-Analyzer-Options.html#index-Wanalyzer-tainted-size)
 
+Wanalyzer-undefined-behavior-ptrdiff
+UrlSuffix(gcc/Static-Analyzer-Options.html#index-Wanalyzer-undefined-behavior-ptrdiff)
+
 Wanalyzer-undefined-behavior-strtok
 
UrlSuffix(gcc/Static-Analyzer-Options.html#index-Wanalyzer-undefined-behavior-strtok)
 
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d142d851a26f..d6bcb8630cd6 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -841,6 +841,144 @@ private:
   tree m_count_cst;
 };
 
+/* A subclass of pending_diagnostic for complaining about pointer
+   subtractions involving unrelated buffers.  */
+
+class undefined_ptrdiff_diagnostic
+: public pending_diagnostic_subclass
+{
+public:
+  /* Region_creation_event subclass to give a custom wording when
+ talking about creation of buffers for LHS and RHS of the
+ subtraction.  */
+  class ptrdiff_region_creation_event : public region_creation_event
+  {
+  public:
+ptrdiff_region_creation_event (const event_loc_info &loc_info,
+  bool is_lhs)
+: region_creation_event (loc_info),
+   

[pushed] analyzer: eliminate cast_region::m_original_region

2024-06-07 Thread David Malcolm
cast_region had its own field m_original_region, rather than
simply using region::m_parent, leading to lots of pointless
special-casing of RK_CAST.

Remove the field and simply use the parent region.

Doing so revealed a bug (seen in gcc.dg/analyzer/taint-alloc-4.c)
where region_model::get_representative_path_var_1's RK_CAST case
was always failing, due to using the "parent region" (actually
that of the original region's parent), rather than the original region;
the patch fixes the bug by removing the distinction.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1108-g70f26314b62e2d.

gcc/analyzer/ChangeLog:
* call-summary.cc
(call_summary_replay::convert_region_from_summary_1): Update
for removal of cast_region::m_original_region.
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Likewise.
* region-model.cc (region_model::get_store_value): Likewise.
* region.cc (region::get_base_region): Likewise.
(region::descendent_of_p): Likewise.
(region::maybe_get_frame_region): Likewise.
(region::get_memory_space): Likewise.
(region::calc_offset): Likewise.
(cast_region::accept): Delete.
(cast_region::dump_to_pp): Update for removal of
cast_region::m_original_region.
(cast_region::add_dump_widget_children): Delete.
* region.h (struct cast_region::key_t): Rename "original_region"
to "parent".
(cast_region::cast_region): Likewise.  Update for removal of
cast_region::m_original_region.
(cast_region::accept): Delete.
(cast_region::add_dump_widget_children): Delete.
(cast_region::get_original_region): Delete.
(cast_region::m_original_region): Delete.
* sm-taint.cc (region_model::check_region_for_taint): Remove
special-casing for RK_CAST.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/taint-alloc-4.c: Update expected result to
reflect change in message due to
region_model::get_representative_path_var_1 now handling RK_CAST.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/call-summary.cc  | 11 ++--
 gcc/analyzer/region-model-manager.cc  |  2 +-
 gcc/analyzer/region-model.cc  |  2 +-
 gcc/analyzer/region.cc| 50 +++
 gcc/analyzer/region.h | 37 +-
 gcc/analyzer/sm-taint.cc  |  8 ---
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  4 +-
 7 files changed, 29 insertions(+), 85 deletions(-)

diff --git a/gcc/analyzer/call-summary.cc b/gcc/analyzer/call-summary.cc
index 60ca78a334da..46b4e2a3bbd7 100644
--- a/gcc/analyzer/call-summary.cc
+++ b/gcc/analyzer/call-summary.cc
@@ -726,13 +726,12 @@ call_summary_replay::convert_region_from_summary_1 (const 
region *summary_reg)
   {
const cast_region *summary_cast_reg
  = as_a  (summary_reg);
-   const region *summary_original_reg
- = summary_cast_reg->get_original_region ();
-   const region *caller_original_reg
- = convert_region_from_summary (summary_original_reg);
-   if (!caller_original_reg)
+   const region *summary_parent_reg = summary_reg->get_parent_region ();
+   const region *caller_parent_reg
+ = convert_region_from_summary (summary_parent_reg);
+   if (!caller_parent_reg)
  return NULL;
-   return mgr->get_cast_region (caller_original_reg,
+   return mgr->get_cast_region (caller_parent_reg,
 summary_reg->get_type ());
   }
   break;
diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index b094b2f7e434..8154d914e81c 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -327,7 +327,7 @@ region_model_manager::get_or_create_initial_value (const 
region *reg,
   /* The initial value of a cast is a cast of the initial value.  */
   if (const cast_region *cast_reg = reg->dyn_cast_cast_region ())
 {
-  const region *original_reg = cast_reg->get_original_region ();
+  const region *original_reg = cast_reg->get_parent_region ();
   return get_or_create_cast (cast_reg->get_type (),
 get_or_create_initial_value (original_reg));
 }
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d6bcb8630cd6..9f24011c17bf 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2933,7 +2933,7 @@ region_model::get_store_value (const region *reg,
   /* Special-case: read the initial char of a STRING_CST.  */
   if (const cast_region *cast_reg = reg->dyn_cast_cast_region ())
 if (const string_region *str_reg
-   = cast_reg->get_original_region ()->dyn_cast_string_region ())
+   = cast_reg->get_pare

[pushed] analyzer: add logging to get_representative_path_var

2024-06-07 Thread David Malcolm
This was very helpful when debugging the cast_region::m_original_region
removal, but is probably too verbose to enable except by hand on
specific calls to get_representative_tree.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1109-gd039eef925878e.

gcc/analyzer/ChangeLog:
* engine.cc (impl_region_model_context::on_state_leak): Pass nullptr
to get_representative_path_var.
* region-model.cc (region_model::get_representative_path_var_1):
Add logger param and use it in both overloads.
(region_model::get_representative_path_var): Likewise.
(region_model::get_representative_tree): Likewise.
(selftest::test_get_representative_path_var): Pass nullptr to
get_representative_path_var.
* region-model.h (region_model::get_representative_tree): Add
optional logger param to both overloads.
(region_model::get_representative_path_var): Add logger param to
both overloads.
(region_model::get_representative_path_var_1): Likewise.
* store.cc (binding_cluster::get_representative_path_vars): Add
logger param and use it.
(store::get_representative_path_vars): Likewise.
* store.h (binding_cluster::get_representative_path_vars): Add
logger param.
(store::get_representative_path_vars): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc   |   3 +-
 gcc/analyzer/region-model.cc | 109 +++
 gcc/analyzer/region-model.h  |  18 --
 gcc/analyzer/store.cc|  12 +++-
 gcc/analyzer/store.h |   2 +
 5 files changed, 109 insertions(+), 35 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 8b3706cdfa87..30c0913c861d 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -903,7 +903,8 @@ impl_region_model_context::on_state_leak (const 
state_machine &sm,
   svalue_set visited;
   path_var leaked_pv
 = m_old_state->m_region_model->get_representative_path_var (sval,
-   &visited);
+   &visited,
+   nullptr);
 
   /* Strip off top-level casts  */
   if (leaked_pv.m_tree && TREE_CODE (leaked_pv.m_tree) == NOP_EXPR)
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 9f24011c17bf..a25181f2a3ec 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5343,7 +5343,8 @@ region_model::eval_condition (tree lhs,
 
 path_var
 region_model::get_representative_path_var_1 (const svalue *sval,
-svalue_set *visited) const
+svalue_set *visited,
+logger *logger) const
 {
   gcc_assert (sval);
 
@@ -5360,7 +5361,8 @@ region_model::get_representative_path_var_1 (const svalue 
*sval,
   /* Handle casts by recursion into get_representative_path_var.  */
   if (const svalue *cast_sval = sval->maybe_undo_cast ())
 {
-  path_var result = get_representative_path_var (cast_sval, visited);
+  path_var result = get_representative_path_var (cast_sval, visited,
+logger);
   tree orig_type = sval->get_type ();
   /* If necessary, wrap the result in a cast.  */
   if (result.m_tree && orig_type)
@@ -5369,7 +5371,7 @@ region_model::get_representative_path_var_1 (const svalue 
*sval,
 }
 
   auto_vec pvs;
-  m_store.get_representative_path_vars (this, visited, sval, &pvs);
+  m_store.get_representative_path_vars (this, visited, sval, logger, &pvs);
 
   if (tree cst = sval->maybe_get_constant ())
 pvs.safe_push (path_var (cst, 0));
@@ -5378,7 +5380,7 @@ region_model::get_representative_path_var_1 (const svalue 
*sval,
   if (const region_svalue *ptr_sval = sval->dyn_cast_region_svalue ())
 {
   const region *reg = ptr_sval->get_pointee ();
-  if (path_var pv = get_representative_path_var (reg, visited))
+  if (path_var pv = get_representative_path_var (reg, visited, logger))
return path_var (build1 (ADDR_EXPR,
 sval->get_type (),
 pv.m_tree),
@@ -5391,7 +5393,7 @@ region_model::get_representative_path_var_1 (const svalue 
*sval,
   const svalue *parent_sval = sub_sval->get_parent ();
   const region *subreg = sub_sval->get_subregion ();
   if (path_var parent_pv
-   = get_representative_path_var (parent_sval, visited))
+   = get_representative_path_var (parent_sval, visited, logger))
if (const field_region *field_reg = subreg->dyn_cast_field_region ())
  return path_var (build3 (COMPONENT_REF,
   sval->get_ty

[PATCH, obvious] rs6000: Update ELFv2 stack frame comment showing the correct ROP save location

2024-06-07 Thread Peter Bergner
I consider this one obvious, so I plan on pushing this soonish.

Peter


The ELFv2 stack frame layout comment in rs6000-logue.cc shows the ROP
hash save slot in the wrong location.  Update the comment to show the
correct ROP hash save location in the frame.

gcc/
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Update comment.


diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index bd5d56ba002..0ef309e043b 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -591,21 +591,21 @@ rs6000_savres_strategy (rs6000_stack_t *info,
+---+
| Parameter save area (+padding*) (P)   |  32
+---+
-   | Optional ROP hash slot (R)|  32+P
+   | Alloca space (A)  |  32+P
+---+
-   | Alloca space (A)  |  32+P+R
+   | Local variable space (L)  |  32+P+A
+---+
-   | Local variable space (L)  |  32+P+R+A
+   | Optional ROP hash slot (R)|  32+P+A+L
+---+
-   | Save area for AltiVec registers (W)   |  32+P+R+A+L
+   | Save area for AltiVec registers (W)   |  32+P+A+L+R
+---+
-   | AltiVec alignment padding (Y) |  32+P+R+A+L+W
+   | AltiVec alignment padding (Y) |  32+P+A+L+R+W
+---+
-   | Save area for GP registers (G)|  32+P+R+A+L+W+Y
+   | Save area for GP registers (G)|  32+P+A+L+R+W+Y
+---+
-   | Save area for FP registers (F)|  32+P+R+A+L+W+Y+G
+   | Save area for FP registers (F)|  32+P+A+L+R+W+Y+G
+---+
-   old SP->| back chain to caller's caller |  32+P+R+A+L+W+Y+G+F
+   old SP->| back chain to caller's caller |  32+P+A+L+R+W+Y+G+F
+---+
 
  * If the alloca area is present, the parameter save area is


[pushed] wwwdocs: simtest-howto: Use https to link to our install docs

2024-06-07 Thread Gerald Pfeifer


---
 htdocs/simtest-howto.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/simtest-howto.html b/htdocs/simtest-howto.html
index d9c027fd..3afbdb0b 100644
--- a/htdocs/simtest-howto.html
+++ b/htdocs/simtest-howto.html
@@ -115,7 +115,7 @@ cd gcc && find . -print | cpio -pdlmu ../combined 
&& cd ..
 Build it
 
 Make sure the
-http://gcc.gnu.org/install/prerequisites.html";>building
+https://gcc.gnu.org/install/prerequisites.html";>building
 prerequisites for GCC are met, for example a host GCC no earlier
 than 3.4 or later, with C++ support enabled.
 
-- 
2.45.1


Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-07 Thread Jeff Law




On 6/6/24 4:10 AM, Manolis Tsamis wrote:

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

  strbw2, [x1, 1]
  ldr x0, [x1]  # Expensive store forwarding to larger load.

To:

  ldr x0, [x1]
  strbw2, [x1]
  bfi x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:  +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:+17.5%

gcc/ChangeLog:

* Makefile.in: Add avoid-store-forwarding.o.
* common.opt: New option -favoid-store-forwarding.
* params.opt: New param store-forwarding-max-distance.
* doc/invoke.texi: Document new pass.
* doc/passes.texi: Document new pass.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
So this is getting a lot more interesting.  I think the first time I 
looked at this it was more concerned with stores feeding something like 
a load-pair and avoiding the store forwarding penalty for that case.  Am 
I mis-remembering, or did it get significantly more general?







+
+static unsigned int stats_sf_detected = 0;
+static unsigned int stats_sf_avoided = 0;
+
+static rtx
+get_load_mem (rtx expr)
Needs a function comment.  You should probably mention that EXPR must be 
a single_set in that comment.




 +

+  rtx dest;
+  if (eliminate_load)
+dest = gen_reg_rtx (load_inner_mode);
+  else
+dest = SET_DEST (load);
+
+  int move_to_front = -1;
+  int total_cost = 0;
+
+  /* Check if we can emit bit insert instructions for all forwarded stores.  */
+  FOR_EACH_VEC_ELT (stores, i, it)
+{
+  it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
+  rtx_insn *insns = NULL;
+
+  /* If we're eliminating the load then find the store with zero offset
+and use it as the base register to avoid a bit insert.  */
+  if (eliminate_load && it->offset == 0)
So often is this triggering?  We have various codes in the gimple 
optimizers to detect store followed by a load from the same address and 
do the forwarding.  If they're happening with any frequency that would 
be a good sign code in DOM and elsewhere isn't working well.


THe way these passes detect this case is to take store, flip the 
operands around (ie, it looks like a load) and enter that into the 
expression hash tables.  After that standard redundancy elimination 
approaches will work.




+   {
+ start_sequence ();
+
+ /* We can use a paradoxical subreg to force this to a wider mode, as
+the only use will be inserting the bits (i.e., we don't care about
+the value of the higher bits).  */
Which may be a good hint about the cases you're capturing -- if the 
modes/sizes differ that would make more sense since I don't think we're 
as likely to be capturing those cases.




diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e8967fd8ab..c769744d178 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12657,6 +12657,15 @@ loop unrolling.
  This option is enabled by default at optimization levels @option{-O1},
  @option{-O2}, @option{-O3}, @option{-Os}.
  
+@opindex favoid-store-forwarding

+@item -favoid-store-forwarding
+@itemx -fno-avoid-store-forwarding
+Many CPUs will stall for many cycles when a load partially depends on previous
+smaller stores.  This pass tries to detect such cases and avoid the penalty by
+changing the order of the load and store and then fixing up the loaded value.
+
+Disabled by default.
Is there any particular reason why this would be off by default at -O1 
or higher?  It would seem to me that on modern cores that this 
transformation should easily be a win.  Even on an old in-order core, 
avoiding the load with the bit insert is likely profitable, just not as 
much so.






diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..b8115f5c27a 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned 
stores if it is legal to do
  Common Joined UInteger Var(param_store_merging_max_size) Init(65536) 
IntegerRange(1, 65536) Param Optimization
  Maximum size of a single store merging region in bytes.
  
+-param=store-forwarding-max-distance=

+Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10) 
IntegerRange(1, 1000) Param Optimization
+Maximum number of instructio

Re: [PATCH v2 1/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-07 Thread Jeff Law




On 6/3/24 3:53 PM, Patrick O'Neill wrote:

The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zaamo and Zalrsc.
* config/riscv/arch-canonicalize: Make A imply Zaamo and Zalrsc.
* config/riscv/riscv.opt: Add Zaamo and Zalrsc
* config/riscv/sync.md: Convert TARGET_ATOMIC to TARGET_ZAAMO and
TARGET_ZALRSC.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: Adjust expected arch string.
* gcc.target/riscv/attribute-16.c: Ditto.
* gcc.target/riscv/attribute-17.c: Ditto.
* gcc.target/riscv/attribute-18.c: Ditto.
* gcc.target/riscv/pr110696.c: Ditto.
* gcc.target/riscv/rvv/base/pr114352-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr114352-3.c: Ditto.

OK
jeff



Re: [PATCH v2 2/3] RISC-V: Add Zalrsc and Zaamo testsuite support

2024-06-07 Thread Jeff Law




On 6/3/24 3:53 PM, Patrick O'Neill wrote:

Convert testsuite infrastructure to use Zalrsc and Zaamo rather than A.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Use Zaamo rather than A.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Use Zalrsc rather
than A.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Use Zaamo rather
than A.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Zaamo option.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Use Zalrsc 
rather
than A.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
* lib/target-supports.exp: Add testsuite infrastructure support for
Zaamo and Zalrsc.
So there's a lot of whitespace changes going on in target-supports.exp 
that make it harder to find the real changes.


There's always a bit of a judgement call for that kind of thing.  This 
one probably goes past would generally recommend, meaning that the 
formatting stuff would be a separate patch.


A reasonable starting point would be if you're not changing the function 
in question, then fixing formatting in it probably should be a distinct 
patch.


You probably should update the docs in sourcebuild.texi for the new 
target-supports tests.


So OK for the trunk (including the whitespace fixes) with a suitable 
change to sourcebuild.texi.


jeff


Re: [PATCH v2 3/3] RISC-V: Add Zalrsc amo-op patterns

2024-06-07 Thread Jeff Law




On 6/3/24 3:53 PM, Patrick O'Neill wrote:

All amo patterns can be represented with lrsc sequences.
Add these patterns as a fallback when Zaamo is not enabled.

gcc/ChangeLog:

* config/riscv/sync.md (atomic_): New expand 
pattern.
(amo_atomic_): Rename amo pattern.
(atomic_fetch_): New lrsc sequence pattern.
(lrsc_atomic_): New expand pattern.
(amo_atomic_fetch_): Rename amo pattern.
(lrsc_atomic_fetch_): New lrsc sequence pattern.
(atomic_exchange): New expand pattern.
(amo_atomic_exchange): Rename amo pattern.
(lrsc_atomic_exchange): New lrsc sequence pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-1.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-2.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-3.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-4.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
--
rv64imfdc_zalrsc has the same testsuite results as rv64imafdc after this
patch is applied.
---
AFAIK there isn't a way to subtract an extension similar to dg-add-options.
As a result I needed to specify a -march string for
amo-zaamo-preferred-over-zalrsc.c instead of using testsuite infra.

I believe you are correct.




diff --git a/gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c 
b/gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
new file mode 100644
index 000..1c124c2b8b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c

[ ... ]
Not a big fan of the function-bodies tests.  If we're going to use them, 
we need to be especially careful about requiring specific registers so 
that we're not stuck adjusting them all the time due to changes in the 
regsiter allocator, optimizers, etc.



diff --git a/gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
new file mode 100644
index 000..3cd6ce04830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that lrsc atomic op mappings match Table A.6's recommended mapping.  
*/
+/* { dg-options "-O3 -march=rv64id_zalrsc" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** foo:
+** 1:
+** lr.w\ta5, 0\(a0\)
+** add\ta5, a5, a1
+** sc.w\ta5, a5, 0\(a0\)
+**  bnez\ta5, 1b
+** ret
+*/
+void foo (int* bar, int* baz)
+{
+  __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED);
+}
This one is a good example.  We could just as easily use a variety of 
registers other than a5 for the temporary.


Obviously for registers that hold the incoming argument or an outgoing 
result, we can be more strict.


If you could take a look at the added tests and generalize the registers 
it'd be appreciated.  OK with that adjustment.


jeff




Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-07 Thread Jonathan Yong

On 6/6/24 09:23, Evgeny Karpov wrote:

Thursday, June 6, 2024 1:42 AM
Jonathan Yong <10wa...@gmail.com> wrote:


Where is HAVE_64BIT_POINTERS used?



Sorry, it was missed in the posted changes for review.

Regards,
Evgeny

diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
index 8a6f0e8e8a5..0c9d5424942 100644
--- a/gcc/config/mingw/mingw32.h
+++ b/gcc/config/mingw/mingw32.h
@@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
  #endif

  #undef SUB_LINK_ENTRY
-#if TARGET_64BIT_DEFAULT || defined (TARGET_AARCH64_MS_ABI)
+#if HAVE_64BIT_POINTERS
  #define SUB_LINK_ENTRY SUB_LINK_ENTRY64
  #else
  #define SUB_LINK_ENTRY SUB_LINK_ENTRY32



Looks OK to me.


[PATCH] aarch64: Add vector floating point trunc pattern

2024-06-07 Thread Pengxuan Zheng
This patch is a follow-up of r15-1079-g230d62a2cdd16c to add vector floating
point trunc pattern for V2DF->V2SF and V4SF->V4HF conversions by renaming the
existing aarch64_float_truncate_lo_ pattern to the standard
optab one, i.e., trunc2. This allows the vectorizer
to vectorize certain floating point narrowing operations for the aarch64 target.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VAR1): Remap float_truncate_lo_
builtin codes to standard optab ones.
* config/aarch64/aarch64-simd.md 
(aarch64_float_truncate_lo_):
Rename to...
(trunc2): ... This.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/trunc-vec.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-builtins.cc   |  7 +++
 gcc/config/aarch64/aarch64-simd.md   |  6 +++---
 gcc/testsuite/gcc.target/aarch64/trunc-vec.c | 21 
 3 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/trunc-vec.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 25189888d17..d589e59defc 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -543,6 +543,13 @@ BUILTIN_VDQ_BHSI (uhadd, uavg, _floor, 0)
 VAR1 (float_extend_lo_, extend, v2sf, v2df)
 VAR1 (float_extend_lo_, extend, v4hf, v4sf)
 
+/* __builtin_aarch64_float_truncate_lo_ should be expanded through the
+   standard optabs CODE_FOR_trunc2. */
+constexpr insn_code CODE_FOR_aarch64_float_truncate_lo_v4hf
+= CODE_FOR_truncv4sfv4hf2;
+constexpr insn_code CODE_FOR_aarch64_float_truncate_lo_v2sf
+= CODE_FOR_truncv2dfv2sf2;
+
 #undef VAR1
 #define VAR1(T, N, MAP, FLAG, A) \
   {#N #A, UP (A), CF##MAP (N, A), 0, TYPES_##T, FLAG_##FLAG},
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index c5e2c9f00d0..f644bd1731e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3197,7 +3197,7 @@ (define_expand "aarch64_float_trunc_rodd_hi_v4sf"
 }
 )
 
-(define_insn "aarch64_float_truncate_lo_"
+(define_insn "trunc2"
   [(set (match_operand:VDF 0 "register_operand" "=w")
   (float_truncate:VDF
(match_operand: 1 "register_operand" "w")))]
@@ -3256,7 +3256,7 @@ (define_expand "vec_pack_trunc_v2df"
 int lo = BYTES_BIG_ENDIAN ? 2 : 1;
 int hi = BYTES_BIG_ENDIAN ? 1 : 2;
 
-emit_insn (gen_aarch64_float_truncate_lo_v2sf (tmp, operands[lo]));
+emit_insn (gen_truncv2dfv2sf2 (tmp, operands[lo]));
 emit_insn (gen_aarch64_float_truncate_hi_v4sf (operands[0],
   tmp, operands[hi]));
 DONE;
@@ -3272,7 +3272,7 @@ (define_expand "vec_pack_trunc_df"
   {
 rtx tmp = gen_reg_rtx (V2SFmode);
 emit_insn (gen_aarch64_vec_concatdf (tmp, operands[1], operands[2]));
-emit_insn (gen_aarch64_float_truncate_lo_v2sf (operands[0], tmp));
+emit_insn (gen_truncv2dfv2sf2 (operands[0], tmp));
 DONE;
   }
 )
diff --git a/gcc/testsuite/gcc.target/aarch64/trunc-vec.c 
b/gcc/testsuite/gcc.target/aarch64/trunc-vec.c
new file mode 100644
index 000..05e8af7912d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/trunc-vec.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler-times {fcvtn\tv[0-9]+.2s, v[0-9]+.2d} 1 } } */
+void
+f (double *__restrict a, float *__restrict b)
+{
+  b[0] = a[0];
+  b[1] = a[1];
+}
+
+/* { dg-final { scan-assembler-times {fcvtn\tv[0-9]+.4h, v[0-9]+.4s} 1 } } */
+void
+f1 (float *__restrict a, _Float16 *__restrict b)
+{
+
+  b[0] = a[0];
+  b[1] = a[1];
+  b[2] = a[2];
+  b[3] = a[3];
+}
-- 
2.17.1



[PATCH] MIPS: Use signaling fcmp instructions for LT/LE/LTGT

2024-06-07 Thread YunQiang Su
LT/LE: c.lt.fmt/c.le.fmt on pre-R6 and cmp.lt.fmt/cmp.le.fmt have
different semantic:
   c.lt.fmt will signal for all NaN, including qNaN;
   cmp.lt.fmt will only signal sNaN, while not qNaN;
   cmp.slt.fmt has the same semantic as c.lt.fmt;
   lt/le of RTL will signaling qNaN.

while in `s__using_`, RTL operation
`lt`/`le` are convert to c/cmp's lt/le, which is correct for C.cond.fmt,
while not for CMP.cond.fmt. Let's convert them to slt/sle if ISA_HAS_CCF.

For LTGT, which signals qNaN, `sne` of r6 has same semantic, while pre-R6
has only inverse one `ngl`.  Thus for RTL we have to use the `uneq` as the
operator, and introduce a new CC mode: CCEmode to mark it as signaling.

This patch can fix
   gcc.dg/torture/pr91323.c for pre-R6;
   gcc.dg/torture/builtin-iseqsig-* for R6.

gcc:
* config/mips/mips-modes.def: New CC_MODE CCE.
* config/mips/mips-protos.h(mips_output_compare): New function.
* config/mips/mips.cc(mips_allocate_fcc): Set CCEmode count=1.
(mips_emit_compare): Use CCEmode for LTGT/LT/LE for pre-R6.
(mips_output_compare): New function. Convert lt/le to slt/sle
for R6; convert ueq to ngl for CCEmode.
(mips_hard_regno_mode_ok_uncached): Mention CCEmode.
* config/mips/mips.h: Mention CCEmode for LOAD_EXTEND_OP.
* config/mips/mips.md(FPCC): Add CCE.
(define_mode_attr reg): Add CCE with "z".
(define_mode_attr fpcmp): Add CCE with "c".
(define_code_attr fcond): ltgt should use sne instead of ne.
(s__using_): call mips_output_compare.
---
 gcc/config/mips/mips-modes.def |  1 +
 gcc/config/mips/mips-protos.h  |  2 ++
 gcc/config/mips/mips.cc| 48 +++---
 gcc/config/mips/mips.h |  2 +-
 gcc/config/mips/mips.md| 16 +++-
 5 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def
index 323570928fc..21f50a22546 100644
--- a/gcc/config/mips/mips-modes.def
+++ b/gcc/config/mips/mips-modes.def
@@ -54,4 +54,5 @@ ADJUST_ALIGNMENT (CCV4, 16);
 CC_MODE (CCDSP);
 
 /* For floating point conditions in FP registers.  */
+CC_MODE (CCE);
 CC_MODE (CCF);
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 835f42128b9..fcc0a0ae663 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -394,4 +394,6 @@ extern bool mips_bit_clear_p (enum machine_mode, unsigned 
HOST_WIDE_INT);
 extern void mips_bit_clear_info (enum machine_mode, unsigned HOST_WIDE_INT,
  int *, int *);
 
+extern const char *mips_output_compare (const char *fpcmp, const char *fcond,
+   const char *fmt, const char *fpcc_mode, bool swap);
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 278d9446482..b7acf041903 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -5659,7 +5659,7 @@ mips_allocate_fcc (machine_mode mode)
 
   gcc_assert (TARGET_HARD_FLOAT && ISA_HAS_8CC);
 
-  if (mode == CCmode)
+  if (mode == CCmode || mode == CCEmode)
 count = 1;
   else if (mode == CCV2mode)
 count = 2;
@@ -5788,17 +5788,57 @@ mips_emit_compare (enum rtx_code *code, rtx *op0, rtx 
*op1, bool need_eq_ne_p)
  /* Three FP conditions cannot be implemented by reversing the
 operands for C.cond.fmt, instead a reversed condition code is
 required and a test for false.  */
+ machine_mode ccmode = CCmode;
+ switch (*code)
+   {
+   case LTGT:
+   case LT:
+   case LE:
+ ccmode = CCEmode;
+ break;
+   default:
+ break;
+   }
  *code = mips_reversed_fp_cond (&cmp_code) ? EQ : NE;
  if (ISA_HAS_8CC)
-   *op0 = mips_allocate_fcc (CCmode);
+   *op0 = mips_allocate_fcc (ccmode);
  else
-   *op0 = gen_rtx_REG (CCmode, FPSW_REGNUM);
+   *op0 = gen_rtx_REG (ccmode, FPSW_REGNUM);
}
 
   *op1 = const0_rtx;
   mips_emit_binary (cmp_code, *op0, cmp_op0, cmp_op1);
 }
 }
+
+
+const char *
+mips_output_compare (const char *fpcmp, const char *fcond,
+   const char *fmt, const char *fpcc_mode, bool swap)
+{
+  const char *fc = fcond;
+
+  if (ISA_HAS_CCF)
+{
+  /* c.lt.fmt is signaling, while cmp.lt.fmt is quiet.  */
+  if (strcmp (fcond, "lt") == 0)
+   fc = "slt";
+  else if (strcmp (fcond, "le") == 0)
+   fc = "sle";
+}
+  else if (strcmp (fpcc_mode, "cce") == 0)
+{
+  /* It was LTGT, while we have only inverse one.  It was then converted
+to UNEQ by mips_reversed_fp_cond, and we used CCEmode to mark it.
+Lets convert it back to ngl now.  */
+  if (strcmp (fcond, "ueq") == 0)
+   fc = "ngl";
+}
+  if (swap)
+return concat(fpcmp, ".", fc, ".", fmt, "\t%Z0%2,%1", NULL);
+  return concat(fpcmp,

[PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-07 Thread Peter Bergner
We currently only compute the offset for the ROP hash save location in
the stack frame for Altivec compiles.  For non-Altivec compiles when we
emit ROP mitigation instructions, we use a default offset of zero which
corresponds to the backchain save location which will get clobbered on
any call.  The fix is to compute the ROP hash save location for all
compiles.

This passed bootstrap and regtesting on powerpc64le-linux.
Ok for trunk and backports after some burn-in time?

Peter


gcc/
PR target/115389
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Compute
rop_hash_save_offset for non-Altivec compiles.

gcc/testsuite/
PR target/115389
* gcc.target/powerpc/pr115389.c: New test.

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index d61a25a5126..cfa8a67a5f3 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -826,7 +826,14 @@ rs6000_stack_info (void)
  info->ehrd_offset -= info->rop_hash_size;
}
   else
-   info->ehrd_offset = info->gp_save_offset - ehrd_size;
+   {
+ info->ehrd_offset = info->gp_save_offset - ehrd_size;
+
+ /* Adjust for ROP protection.  */
+ info->rop_hash_save_offset
+   = info->gp_save_offset - info->rop_hash_size;
+ info->ehrd_offset -= info->rop_hash_size;
+   }
 
   info->ehcr_offset = info->ehrd_offset - ehcr_size;
   info->cr_save_offset = reg_size; /* first word when 64-bit.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115389.c 
b/gcc/testsuite/gcc.target/powerpc/pr115389.c
new file mode 100644
index 000..a091ee8a1be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115389.c
@@ -0,0 +1,17 @@
+/* PR target/115389 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect -mno-vsx -mno-altivec 
-mabi=no-altivec -save-temps" } */
+/* { dg-require-effective-target rop_ok } */
+
+/* Verify we do not emit invalid offsets for our ROP insns.  */
+
+extern void foo (void);
+long
+bar (void)
+{
+  foo ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\mhashst\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mhashchk\M} 1 } } */