Re: [PATCH v2] [aarch64] [vxworks] mark x18 as fixed, adjust tests

2025-05-22 Thread Olivier Hainque
Thanks Alex & Richard!

On Thu, 22 May 2025 at 12:41, Richard Sandiford 
wrote:

> Alexandre Oliva  writes:
> > On May 21, 2025, Richard Sandiford  wrote:
> >
> >> I think this one shows a deeper issue, though.
> -fsanitize=shadow-call-stack
> >> is currently hardcoded to use x18:
> >
> > Oh, indeed!
> >
> >> and I assume this usage will be incompatible with the TCB usage.
> >
> >> So I think instead we should emit a sorry() if
> -fsanitize=shadow-call-stack
> >> is used on VxWorks.
> >
> > Agreed.  Here's a revised version that implements sorry(), introduces
> > TARGET_OS_USES_R18 to guard that and the fixed-register setting, and
> > skips the tests that exercise -fsanitize-shadow-call-stack.
> >
> > Tested with gcc-14 on aarch64-vxworks7r2.  Ok to install?
> >
> >
> > [aarch64] [vxworks] mark x18 as fixed, adjust tests
> >
> > VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set
> > (in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead.
> >
> > This patch marks x18 as fixed if the newly-introduced
> > TARGET_OS_USES_R18 is defined, so that it is not chosen by the
> > register allocator, rejects -fsanitize-shadow-call-stack due to the
> > register conflict, and adjusts tests that depend on x18 or on the
> > static chain register.
> >
> >
> > for  gcc/ChangeLog
> >
> >   * config/aarch64/aarch64-vxworks.h (TARGET_OS_USES_R18): Define.
> >   Update comments.
> >   * config/aarch64/aarch64.c (aarch64_conditional_register_usage):
> >   Mark x18 as fixed on VxWorks.
> >   (aarch64_override_options_internal): Issue sorry message on
> >   -fsanitize=shadow-call-stack if TARGET_OS_USES_R18.
> >
> > for  gcc/testsuite/ChangeLog
> >
> >   * gcc.dg/cwsc1.c (CHAIN, aarch64): x9 instead x18 for __vxworks.
> >   * gcc.target/aarch64/reg-alloc-4.c: Drop x18-assigned asm
> >   operand on vxworks.
> >   * gcc.target/aarch64/shadow_call_stack_1.c: Don't expect
> >   -ffixed-x18 error on vxworks, but rather the sorry message.
> >   * gcc.target/aarch64/shadow_call_stack_2.c: Skip on vxworks.
> >   * gcc.target/aarch64/shadow_call_stack_3.c: Likewise.
> >   * gcc.target/aarch64/shadow_call_stack_4.c: Likewise.
> >   * gcc.target/aarch64/shadow_call_stack_5.c: Likewise.
> >   * gcc.target/aarch64/shadow_call_stack_6.c: Likewise.
> >   * gcc.target/aarch64/shadow_call_stack_7.c: Likewise.
> >   * gcc.target/aarch64/shadow_call_stack_8.c: Likewise.
> >   * gcc.target/aarch64/stack-check-prologue-19.c: Likewise.
> >   * gcc.target/aarch64/stack-check-prologue-20.c: Likewise.
>
> OK, thanks.
>
> Richard
>
> > ---
> >  gcc/config/aarch64/aarch64-vxworks.h   |7 +++
> >  gcc/config/aarch64/aarch64.cc  |   21
> +---
> >  gcc/testsuite/gcc.dg/cwsc1.c   |6 +-
> >  gcc/testsuite/gcc.target/aarch64/reg-alloc-4.c |2 ++
> >  .../gcc.target/aarch64/shadow_call_stack_1.c   |4 +++-
> >  .../gcc.target/aarch64/shadow_call_stack_2.c   |1 +
> >  .../gcc.target/aarch64/shadow_call_stack_3.c   |1 +
> >  .../gcc.target/aarch64/shadow_call_stack_4.c   |1 +
> >  .../gcc.target/aarch64/shadow_call_stack_5.c   |1 +
> >  .../gcc.target/aarch64/shadow_call_stack_6.c   |1 +
> >  .../gcc.target/aarch64/shadow_call_stack_7.c   |1 +
> >  .../gcc.target/aarch64/shadow_call_stack_8.c   |1 +
> >  .../gcc.target/aarch64/stack-check-prologue-19.c   |1 +
> >  .../gcc.target/aarch64/stack-check-prologue-20.c   |1 +
> >  14 files changed, 40 insertions(+), 9 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-vxworks.h
> b/gcc/config/aarch64/aarch64-vxworks.h
> > index 41adada9b1de3..7b4da934b6083 100644
> > --- a/gcc/config/aarch64/aarch64-vxworks.h
> > +++ b/gcc/config/aarch64/aarch64-vxworks.h
> > @@ -66,9 +66,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #define VXWORKS_PERSONALITY "llvm"
> >
> >  /* VxWorks uses R18 as a TCB pointer.  We must pick something else as
> > -   the static chain and R18 needs to be claimed "fixed".  Until we
> > -   arrange to override the common parts of the port family to
> > -   acknowledge the latter, configure --with-specs="-ffixed-r18".  */
> > +   the static chain and R18 needs to be claimed "fixed"
> (TARGET_OS_USES_R18
> > +   does that in aarch64_conditional_register_usage).  */
> >  #undef  STATIC_CHAIN_REGNUM
> >  #define STATIC_CHAIN_REGNUM 9
> > -
> > +#define TARGET_OS_USES_R18
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index 1da615c8955a4..ec9da0ed60c6f 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -18819,9 +18819,16 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
> >aarch64_stack_protector_guard_offset = offs;
> >  }
> >
> > -  if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
> > -  && !fixed_regs[R18_REGNUM])

Re: [PATCH 1/2] libstdc++: Fix concept checks for std::unique_copy [PR120384]

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely  wrote:

> This looks to have been wrong since r0-125454-gea89b2482f97aa which
> introduced the predefined_ops.h. Since that change, the binary predicate
> passed to std::__unique_copy is _Iter_comp_iter, which takes arguments
> of the iterator type, not the iterator's value type.
>
> This removes the checks from the __unique_copy overloads and moves them
> into the second overload of std::unique_copy, where we have the original
> binary predicate, not the adapted one from predefined_ops.h.
>
> The third __unique_copy overload currently checks that the predicate is
> callable with the input range value type and the output range value
> type. This change alters that, so that we only ever check that the
> predicate can be called with two arguments of the same type. That is
> intentional, because calling the predicate with different types is a bug
> that will be fixed in a later commit (see PR libstdc++/120386).
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/120384
> * include/bits/stl_algo.h (__unique_copy): Remove all
> _BinaryPredicateConcept concept checks.
> (unique_copy): Check _BinaryPredicateConcept in overload that
> takes a predicate.
> * testsuite/25_algorithms/unique_copy/120384.cc: New test.
> ---
>
> Tested x86_64-linux.
>
Took me a bit of time to understand why the check is having the same value
type
two times. But after reading the second commit, it makes sense.
LGTM, thanks.

>
>  libstdc++-v3/include/bits/stl_algo.h| 17 +++--
>  .../25_algorithms/unique_copy/120384.cc | 12 
>  2 files changed, 15 insertions(+), 14 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc
>
> diff --git a/libstdc++-v3/include/bits/stl_algo.h
> b/libstdc++-v3/include/bits/stl_algo.h
> index 71ead103d2bf..f5361aeab7e2 100644
> --- a/libstdc++-v3/include/bits/stl_algo.h
> +++ b/libstdc++-v3/include/bits/stl_algo.h
> @@ -932,11 +932,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   _OutputIterator __result, _BinaryPredicate __binary_pred,
>   forward_iterator_tag, output_iterator_tag)
>  {
> -  // concept requirements -- iterators already checked
> -
> __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
> - typename iterator_traits<_ForwardIterator>::value_type,
> - typename iterator_traits<_ForwardIterator>::value_type>)
> -
>_ForwardIterator __next = __first;
>*__result = *__first;
>while (++__next != __last)
> @@ -962,11 +957,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   _OutputIterator __result, _BinaryPredicate __binary_pred,
>   input_iterator_tag, output_iterator_tag)
>  {
> -  // concept requirements -- iterators already checked
> -
> __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
> - typename iterator_traits<_InputIterator>::value_type,
> - typename iterator_traits<_InputIterator>::value_type>)
> -
>typename iterator_traits<_InputIterator>::value_type __value =
> *__first;
>__decltype(__gnu_cxx::__ops::__iter_comp_val(__binary_pred))
> __rebound_pred
> @@ -995,10 +985,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   _ForwardIterator __result, _BinaryPredicate
> __binary_pred,
>   input_iterator_tag, forward_iterator_tag)
>  {
> -  // concept requirements -- iterators already checked
> -
> __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
> - typename iterator_traits<_ForwardIterator>::value_type,
> - typename iterator_traits<_InputIterator>::value_type>)
>*__result = *__first;
>while (++__first != __last)
> if (!__binary_pred(__result, __first))
> @@ -4505,6 +4491,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
>__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
> typename iterator_traits<_InputIterator>::value_type>)
>__glibcxx_requires_valid_range(__first, __last);
> +
> __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
> + typename iterator_traits<_InputIterator>::value_type,
> + typename iterator_traits<_InputIterator>::value_type>)
>
>if (__first == __last)
> return __result;
> diff --git a/libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc
> b/libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc
> new file mode 100644
> index ..27cd3375acae
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc
> @@ -0,0 +1,12 @@
> +// { dg-options "-D_GLIBCXX_CONCEPT_CHECKS" }
> +// { dg-do compile }
> +
> +// PR 120384 _BinaryPredicateConcept checks in std::unique_copy are wrong
> +
> +#include 
> +
> +void
> +test_pr120384(const int* first, const int* last, int* out)
> +{
> +  std::unique_copy(first, last, out);
> +}
> --
> 2.4

Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 5:15 PM Tomasz Kaminski  wrote:

>
>
> On Thu, May 22, 2025 at 5:04 PM Jonathan Wakely 
> wrote:
>
>> On Thu, 22 May 2025 at 15:50, Tomasz Kaminski 
>> wrote:
>> >
>> >
>> >
>> > On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely 
>> wrote:
>> >>
>> >> The current overload set for __unique_copy handles three cases:
>> >>
>> >> - The input range uses forward iterators, the output range does not.
>> >>   This is the simplest case, and can just compare adjacent elements of
>> >>   the input range.
>> >>
>> >> - Neither the input range nor output range use forward iterators.
>> >>   This requires a local variable copied from the input range and
>> updated
>> >>   by assigning each element to the local variable.
>> >>
>> >> - The output range uses forward iterators.
>> >>   For this case we compare the current element from the input range
>> with
>> >>   the element just written to the output range.
>> >>
>> >> There are two problems with this implementation. Firstly, the third
>> case
>> >> assumes that the value type of the output range can be compared to the
>> >> value type of the input range, which might not be possible at all, or
>> >> might be possible but give different results to comparing elements of
>> >> the input range. This is the problem identified in LWG 2439.
>> >>
>> >> Secondly, the third case is used when both ranges use forward
>> iterators,
>> >> even though the first case could (and should) be used. This means that
>> >> we compare elements from the output range instead of the input range,
>> >> with the problems described above (either not well-formed, or might
>> give
>> >> the wrong results).
>> >>
>> >> The cause of the second problem is that the overload for the first case
>> >> looks like:
>> >>
>> >> OutputIterator
>> >> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
>> >>   forward_iterator_tag, output_iterator_tag);
>> >>
>> >> When the output range uses forward iterators this overload cannot be
>> >> used, because forward_iterator_tag does not inherit from
>> >> output_iterator_tag, so is not convertible to it.
>> >>
>> >> To fix these problems we need to implement the resolution of LWG 2439
>> so
>> >> that the third case is only used when the value types of the two ranges
>> >> are the same. This ensures that the comparisons are well behaved. We
>> >> also need to ensure that the first case is used when both ranges use
>> >> forward iterators.
>> >>
>> >> This change replaces a single step of tag dispatching to choose between
>> >> three overloads with two step of tag dispatching, choosing between two
>> >> overloads at each step. The first step dispatches based on the iterator
>> >> category of the input range, ignoring the category of the output range.
>> >> The second step only happens when the input range uses non-forward
>> >> iterators, and dispatches based on the category of the output range and
>> >> whether the value type of the two ranges is the same. So now the cases
>> >> that are handled are:
>> >>
>> >> - The input range uses forward iterators.
>> >> - The output range uses non-forward iterators or a different value
>> type.
>> >> - The output range uses forward iterators and has the same value type.
>> >>
>> >> For the second case, the old code used
>> __gnu_cxx::__ops::__iter_comp_val
>> >> to wrap the predicate in another level of indirection. That seems
>> >> unnecessary, as we can just use a pointer to the local variable instead
>> >> of an iterator referring to it.
>> >>
>> >> libstdc++-v3/ChangeLog:
>> >>
>> >> PR libstdc++/120386
>> >> * include/bits/stl_algo.h (__unique_copy_1): New overloads for
>> >> the case where the input range uses non-forward iterators.
>> >> (__unique_copy): Replace three overloads with two, depending
>> >> only on the iterator category of the input range. Dispatch to
>> >> __unique_copy_1 for the non-forward case.
>> >> (unique_copy): Only pass the input range category to
>> >> __unique_copy.
>> >> ---
>> >>
>> >> Tested x86_64-linux.
>> >
>> > LGTM. Only small suggestion, regarding the change of order of arguments.
>>
>> I forgot to say that I need to add tests for each of the cases,
>> especially the case that fails with the existing code!
>>
>> >>
>> >>
>> >>  libstdc++-v3/include/bits/stl_algo.h | 80 +++-
>> >>  1 file changed, 44 insertions(+), 36 deletions(-)
>> >>
>> >> diff --git a/libstdc++-v3/include/bits/stl_algo.h
>> b/libstdc++-v3/include/bits/stl_algo.h
>> >> index f5361aeab7e2..c0bb17f9c8b2 100644
>> >> --- a/libstdc++-v3/include/bits/stl_algo.h
>> >> +++ b/libstdc++-v3/include/bits/stl_algo.h
>> >> @@ -918,24 +918,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >>
>> __gnu_cxx::__ops::__iter_comp_iter(__binary_pred));
>> >>  }
>> >>
>> >> -  /**
>> >> -   *  This is an uglified
>> >> -   *  unique_copy(_InputIterator, _InputIterator, _OutputIterator,
>> >> -   *  

Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely  wrote:

> The current overload set for __unique_copy handles three cases:
>
> - The input range uses forward iterators, the output range does not.
>   This is the simplest case, and can just compare adjacent elements of
>   the input range.
>
> - Neither the input range nor output range use forward iterators.
>   This requires a local variable copied from the input range and updated
>   by assigning each element to the local variable.
>
> - The output range uses forward iterators.
>   For this case we compare the current element from the input range with
>   the element just written to the output range.
>
> There are two problems with this implementation. Firstly, the third case
> assumes that the value type of the output range can be compared to the
> value type of the input range, which might not be possible at all, or
> might be possible but give different results to comparing elements of
> the input range. This is the problem identified in LWG 2439.
>
> Secondly, the third case is used when both ranges use forward iterators,
> even though the first case could (and should) be used. This means that
> we compare elements from the output range instead of the input range,
> with the problems described above (either not well-formed, or might give
> the wrong results).
>
> The cause of the second problem is that the overload for the first case
> looks like:
>
> OutputIterator
> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
>   forward_iterator_tag, output_iterator_tag);
>
> When the output range uses forward iterators this overload cannot be
> used, because forward_iterator_tag does not inherit from
> output_iterator_tag, so is not convertible to it.
>
> To fix these problems we need to implement the resolution of LWG 2439 so
> that the third case is only used when the value types of the two ranges
> are the same. This ensures that the comparisons are well behaved. We
> also need to ensure that the first case is used when both ranges use
> forward iterators.
>
> This change replaces a single step of tag dispatching to choose between
> three overloads with two step of tag dispatching, choosing between two
> overloads at each step. The first step dispatches based on the iterator
> category of the input range, ignoring the category of the output range.
> The second step only happens when the input range uses non-forward
> iterators, and dispatches based on the category of the output range and
> whether the value type of the two ranges is the same. So now the cases
> that are handled are:
>
> - The input range uses forward iterators.
> - The output range uses non-forward iterators or a different value type.
> - The output range uses forward iterators and has the same value type.
>
> For the second case, the old code used __gnu_cxx::__ops::__iter_comp_val
> to wrap the predicate in another level of indirection. That seems
> unnecessary, as we can just use a pointer to the local variable instead
> of an iterator referring to it.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/120386
> * include/bits/stl_algo.h (__unique_copy_1): New overloads for
> the case where the input range uses non-forward iterators.
> (__unique_copy): Replace three overloads with two, depending
> only on the iterator category of the input range. Dispatch to
> __unique_copy_1 for the non-forward case.
> (unique_copy): Only pass the input range category to
> __unique_copy.
> ---
>
> Tested x86_64-linux.
>
LGTM. Only small suggestion, regarding the change of order of arguments.

>
>  libstdc++-v3/include/bits/stl_algo.h | 80 +++-
>  1 file changed, 44 insertions(+), 36 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_algo.h
> b/libstdc++-v3/include/bits/stl_algo.h
> index f5361aeab7e2..c0bb17f9c8b2 100644
> --- a/libstdc++-v3/include/bits/stl_algo.h
> +++ b/libstdc++-v3/include/bits/stl_algo.h
> @@ -918,24 +918,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  __gnu_cxx::__ops::__iter_comp_iter(__binary_pred));
>  }
>
> -  /**
> -   *  This is an uglified
> -   *  unique_copy(_InputIterator, _InputIterator, _OutputIterator,
> -   *  _BinaryPredicate)
> -   *  overloaded for forward iterators and output iterator as result.
> -  */
> +  // Implementation of std::unique_copy for forward iterators.
> +  // This case is easy, just compare *i with *(i-1).
>templatetypename _BinaryPredicate>
>  _GLIBCXX20_CONSTEXPR
>  _OutputIterator
>  __unique_copy(_ForwardIterator __first, _ForwardIterator __last,
>   _OutputIterator __result, _BinaryPredicate __binary_pred,
> - forward_iterator_tag, output_iterator_tag)
> + forward_iterator_tag)
>  {
>_ForwardIterator __next = __first;
>*__result = *__first;
>while (++__next != __last)
> -   if (!__binary_pred(__first, __n

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 4:21 PM Luc Grosheintz 
wrote:

> It's missing the "registration" of the three new classes in
> std.cc.in.
>
Please remember to add it in next revisions.

>
> On 5/21/25 11:40, Luc Grosheintz wrote:
> > Follows up on:
> > https://gcc.gnu.org/pipermail/libstdc++/2025-May/061535.html
> >
> > To improve naming conventions, this series includes three new commits:
> >* Two commits to rename  _ExtentsStorage::_M_dynamic_extents, and
> >  extents::_M_dynamic_extents.
> >* One commit to cleanup whitespace errors in extents.
> >
> > The changes to the existing commits are:
> >* Fix division by zero bug.
> >* Rename subextents -> extents.
> >* Default arguments for __{static,dynamic}_extents.
> >* Default argument for __static_quotient.
> >* Four times: use range-based for.
> >* Eliminate __has_static_zero
> >* Short-circuit in __static_quotient.
> >* Optimize __exts_prod for rank == rank_dynamic.
> >
> > This review suggestion was intentionally skipped:
> >* Inline helper of __exts_prod, because with the additional
> >optimization for rank == rank_dynamic, having two separate
> >functions makes the highlevel structure a little bit more
> >obvious. Additionally, there's numerous changes planned that
> >might make one of the two functions much more verbose.
> >
> > Luc Grosheintz (9):
> >libstdc++: Rename _ExtentsStorage::_M_dynamic_extents.
> >libstdc++: Rename extents::_M_dynamic_extents.
> >libstdc++: Cleanup formatting in mdspan.
> >libstdc++: Implement layout_left from mdspan.
> >libstdc++: Add tests for layout_left.
> >libstdc++: Implement layout_right from mdspan.
> >libstdc++: Add tests for layout_right.
> >libstdc++: Implement layout_stride from mdspan.
> >libstdc++: Add tests for layout_stride.
> >
> >   libstdc++-v3/include/std/mdspan   | 692 +-
> >   .../mdspan/layouts/class_mandate_neg.cc   |  42 ++
> >   .../23_containers/mdspan/layouts/ctors.cc | 401 ++
> >   .../23_containers/mdspan/layouts/mapping.cc   | 569 ++
> >   .../23_containers/mdspan/layouts/stride.cc| 494 +
> >   5 files changed, 2185 insertions(+), 13 deletions(-)
> >   create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> >   create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> >   create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
> >   create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc
> >
>
>


Re: [PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

2025-05-22 Thread Jeff Law




On 5/22/25 1:01 AM, Jan Dubiec wrote:

On 23.02.2025 04:59, Jeff Law wrote:
[...]
Thanks!  Just a note we're in stage4 of our development cycle 
(regression bugfixes) as we prepare for gcc-15.  This doesn't look 
like something we would typically make an exception for, it'll have to 
wait for the next development window.  Meaning it probably won't get 
any attention for a couple months.


Jeff



Just BUMP.
Not forgotten :-)  It's in my tester and on my list to sit down and 
understand how it works as part of the review process.


jeff


Re: [AUTOFDO] Enable ipa-split for auto-profile

2025-05-22 Thread Jan Hubicka
> > On 9 May 2025, at 11:55 am, Kugan Vivekanandarajah 
> >  wrote:
> >
> > ipa-split is not now run for auto-profile. IMO this was an oversight.
> > This patch enables it similar to PGO runs.
> >
> > gcc/ChangeLog:
> >
> >* ipa-split.cc pass_feedback_split_functions::clone (): New.
> >* passes.def: Enable pass_feedback_split_functions for
> >pass_ipa_auto_profile.
OK,
thanks!
Honza
> >
> >
> > Regression tested on aarch64-linux-gnu with no new regression.
> > Also successfully  done autoprofiledbootstrap with the relevant patch.
> >
> > Is this OK for trunk?
> > Thanks,
> > Kugan
> >
> > <0003-AUTOFDO-Enable-ips-split-for-auto-profile.patch>
> 




Re: [PATCH v3 9/9] libstdc++: Add tests for layout_stride.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:56 AM Luc Grosheintz 
wrote:

> Implements the tests for layout_stride and for the features of the other
> two layouts that depend on layout_stride.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
> tests for layout_stride.
> * testsuite/23_containers/mdspan/layouts/ctors.cc: Add test for
> layout_stride and the interaction with other layouts.
> * testsuite/23_containers/mdspan/layouts/mapping.cc: Ditto.
> * testsuite/23_containers/mdspan/layouts/stride.cc: New test.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  .../mdspan/layouts/class_mandate_neg.cc   |  19 +
>  .../23_containers/mdspan/layouts/ctors.cc |  99 
>  .../23_containers/mdspan/layouts/mapping.cc   |  75 ++-
>  .../23_containers/mdspan/layouts/stride.cc| 494 ++
>  4 files changed, 686 insertions(+), 1 deletion(-)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc
>
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> index a41bad988d2..0e39bd3aab0 100644
> ---
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> +++
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> @@ -17,7 +17,26 @@ template
>  typename Layout::mapping m3; // { dg-error "required
> from" }
>};
>
> +template
> +  struct B // { dg-error "expansion of" }
> +  {
> +using Extents = std::extents;
> +using OExtents = std::extents;
> +
> +using Mapping = typename Layout::mapping;
> +using OMapping = typename Layout::mapping;
> +
> +Mapping m{OMapping{}};
> +  };
> +
>  A a_left; // { dg-error "required
> from" }
>  A a_right;   // { dg-error "required
> from" }
> +A a_stride; // { dg-error "required
> from" }
> +
> +B<1, std::layout_left, std::layout_right> blr; // { dg-error
> "required here" }
> +B<2, std::layout_left, std::layout_stride> bls;// { dg-error
> "required here" }
> +
> +B<3, std::layout_right, std::layout_left> brl; // { dg-error
> "required here" }
> +B<4, std::layout_right, std::layout_stride> brs;   // { dg-error
> "required here" }
>
>  // { dg-prune-output "must be representable as index_type" }
> diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> index 4a7d2bffeef..89d1b3a01a0 100644
> --- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> @@ -282,12 +282,111 @@ namespace from_left_or_right
>  }
>  }
>
> +// ctor: mapping(layout_stride::mapping)
> +namespace from_stride
> +{
> +  template
> +constexpr auto
> +strides(Mapping m)
> +{
> +  constexpr auto rank = Mapping::extents_type::rank();
> +  std::array s;
> +
> +  if constexpr (rank > 0)
> +   for(size_t i = 0; i < rank; ++i)
> + s[i] = m.stride(i);
> +  return s;
> +}
> +
> +  template
> +constexpr void
> +verify_convertible(OExtents oexts)
> +{
> +  using Mapping = typename Layout::mapping;
> +  using OMapping = std::layout_stride::mapping;
> +
> +  constexpr auto other = OMapping(oexts,
> strides(Mapping(Extents(oexts;
> +  if constexpr (std::is_same_v)
>
As I mentioned, implementations are allowed to add noexcept, and I would
add it here.

> +   ::verify_nothrow_convertible(other);
> +  else
> +   ::verify_convertible(other);
> +}
> +
> +  template
> +constexpr void
> +verify_constructible(OExtents oexts)
> +{
> +  using Mapping = typename Layout::mapping;
> +  using OMapping = std::layout_stride::mapping;
> +
> +  constexpr auto other = OMapping(oexts,
> strides(Mapping(Extents(oexts;
> +  if constexpr (std::is_same_v)
> +   ::verify_nothrow_constructible(other);
> +  else
> +   ::verify_constructible(other);
> +}
> +
> +  template
> +constexpr bool
> +test_ctor()
> +{
> +  assert_not_constructible<
> +   typename Layout::mapping>,
> +   std::layout_stride::mapping>>();
> +
> +  assert_not_constructible<
> +   typename Layout::mapping>,
> +   std::layout_stride::mapping>>();
> +
> +  assert_not_constructible<
> +   typename Layout::mapping>,
> +   std::layout_stride::mapping>>();
> +
> +  verify_convertible>(std::extents{});
> +
> +  verify_convertible>(
> +   std::extents{});
> +
> +  // Rank ==  0 doesn't check IndexType for convertibility.
> +  verify_convertible>(
> +   std::extents{});
> +
> +  verify_constructible>(
> +   std::extents{});
> +
> +  verify_constructible>(
> +   std::extents{});
> +
> +  verify_constructible>(
> +   

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-22 Thread Tomasz Kaminski
Thanks for working on the patches, they look solid, comments.

Could you prepare a separate patch to fix initialization
default-initialization of extents,
that you have noticed, standard requires them to be value-initialized, and
add corresponding test?

Similarly, we have test for default constructor of stride_mapping, I would
add them for other layouts,
and check:
  * all extents being static -> depends on mapping
  * first extent being dynamic, and rest static -> all strides are zero
  * middle extent being dynamic

For the stride and product computation, we should perform them in
Extent::size_type, not index_type.
The latter may be signed, and we may hit UB in multiplying non-zero
extents, before reaching the zero.

For is_exhaustive question, I will write to LWG reflector to ask authors,
and see what their opinion is.
Will keep you posted.

In a lot of tests we are doing, where I believe we could skip template
parameters, and deduce it for argument.
  verify_nothrow_convertible>(
+   std::extents{});
Could you look into doing it?

Regards,
Tomasz

On Thu, May 22, 2025 at 2:21 PM Tomasz Kaminski  wrote:

>
>
> On Wed, May 21, 2025 at 4:21 PM Luc Grosheintz 
> wrote:
>
>> It's missing the "registration" of the three new classes in
>> std.cc.in.
>>
> Please remember to add it in next revisions.
>
>>
>> On 5/21/25 11:40, Luc Grosheintz wrote:
>> > Follows up on:
>> > https://gcc.gnu.org/pipermail/libstdc++/2025-May/061535.html
>> >
>> > To improve naming conventions, this series includes three new commits:
>> >* Two commits to rename  _ExtentsStorage::_M_dynamic_extents, and
>> >  extents::_M_dynamic_extents.
>> >* One commit to cleanup whitespace errors in extents.
>> >
>> > The changes to the existing commits are:
>> >* Fix division by zero bug.
>> >* Rename subextents -> extents.
>> >* Default arguments for __{static,dynamic}_extents.
>> >* Default argument for __static_quotient.
>> >* Four times: use range-based for.
>> >* Eliminate __has_static_zero
>> >* Short-circuit in __static_quotient.
>> >* Optimize __exts_prod for rank == rank_dynamic.
>> >
>> > This review suggestion was intentionally skipped:
>> >* Inline helper of __exts_prod, because with the additional
>> >optimization for rank == rank_dynamic, having two separate
>> >functions makes the highlevel structure a little bit more
>> >obvious. Additionally, there's numerous changes planned that
>> >might make one of the two functions much more verbose.
>> >
>> > Luc Grosheintz (9):
>> >libstdc++: Rename _ExtentsStorage::_M_dynamic_extents.
>> >libstdc++: Rename extents::_M_dynamic_extents.
>> >libstdc++: Cleanup formatting in mdspan.
>> >libstdc++: Implement layout_left from mdspan.
>> >libstdc++: Add tests for layout_left.
>> >libstdc++: Implement layout_right from mdspan.
>> >libstdc++: Add tests for layout_right.
>> >libstdc++: Implement layout_stride from mdspan.
>> >libstdc++: Add tests for layout_stride.
>> >
>> >   libstdc++-v3/include/std/mdspan   | 692 +-
>> >   .../mdspan/layouts/class_mandate_neg.cc   |  42 ++
>> >   .../23_containers/mdspan/layouts/ctors.cc | 401 ++
>> >   .../23_containers/mdspan/layouts/mapping.cc   | 569 ++
>> >   .../23_containers/mdspan/layouts/stride.cc| 494 +
>> >   5 files changed, 2185 insertions(+), 13 deletions(-)
>> >   create mode 100644
>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
>> >   create mode 100644
>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
>> >   create mode 100644
>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
>> >   create mode 100644
>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc
>> >
>>
>>


Re: [PATCH][RISC-V][PR target/70557] Improve storing 0 to memory on rv32

2025-05-22 Thread Jeff Law




On 5/21/25 11:41 PM, Shreya Munnangi wrote:
Patch is originally from Siarhei Volkau >.


RISC-V has a zero register (x0) which we can use to store zero into memory
without loading the constant into a distinct register. Adjust the 
constraints

of the 32-bit movdi_32bit pattern to recognize that we can store 0.0 into
memory using x0 as the source register.

This patch only affects RISC-V. It has been regression tested on 
riscv64-elf.
Jeff has also tested this in his tester (riscv64-elf and riscv32-elf) 
with no

regressions.

         PR target/70557
gcc/
         * config/riscv/riscv.md (movdi_32bit): Add "J" constraint to 
allow storing 0

         directly to memory.

Thanks.  I've pushed this to the trunk.

Jeff



[PATCH] c++/modules: Fix merge of TLS import functions [PR120363]

2025-05-22 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?

(Also is renaming the old test OK/appropriate? Or should I keep it
before and just name the new tests as tls1/2, with a comment referring
to pr113292?)

-- >8 --

The PR notes that we missed setting DECL_CONTEXT on the TLS init
function; we missed this initially because this function is not created
in header units, only named modules.

I also noticed that 'DECL_CONTEXT (fn) = DECL_CONTEXT (var)' was
incorrect: for class members, this ends up having the modules merging
machinery treat the decl as a member function, which breaks when
attempting to dedup against an existing completed class type.  Instead
we can just use the global_namespace as the context, because the name of
the function is already mangled appropriately so that we'll match the
correct duplicates.

PR c++/120363

gcc/cp/ChangeLog:

* decl2.cc (get_tls_init_fn): Set context as global_namespace.
(get_tls_wrapper_fn): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr113292_a.H: Move to...
* g++.dg/modules/tls-1_a.H: ...here.
* g++.dg/modules/pr113292_b.C: Move to...
* g++.dg/modules/tls-1_b.C: ...here.
* g++.dg/modules/pr113292_c.C: Move to...
* g++.dg/modules/tls-1_c.C: ...here.
* g++.dg/modules/tls-2_a.C: New test.
* g++.dg/modules/tls-2_b.C: New test.
* g++.dg/modules/tls-2_c.C: New test.
* g++.dg/modules/tls-3.h: New test.
* g++.dg/modules/tls-3_a.H: New test.
* g++.dg/modules/tls-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl2.cc   |  3 +-
 .../modules/{pr113292_a.H => tls-1_a.H}   |  0
 .../modules/{pr113292_b.C => tls-1_b.C}   |  2 +-
 .../modules/{pr113292_c.C => tls-1_c.C}   |  2 +-
 gcc/testsuite/g++.dg/modules/tls-2_a.C| 12 ++
 gcc/testsuite/g++.dg/modules/tls-2_b.C|  5 +++
 gcc/testsuite/g++.dg/modules/tls-2_c.C| 11 +
 gcc/testsuite/g++.dg/modules/tls-3.h  | 42 +++
 gcc/testsuite/g++.dg/modules/tls-3_a.H|  4 ++
 gcc/testsuite/g++.dg/modules/tls-3_b.C|  4 ++
 10 files changed, 82 insertions(+), 3 deletions(-)
 rename gcc/testsuite/g++.dg/modules/{pr113292_a.H => tls-1_a.H} (100%)
 rename gcc/testsuite/g++.dg/modules/{pr113292_b.C => tls-1_b.C} (93%)
 rename gcc/testsuite/g++.dg/modules/{pr113292_c.C => tls-1_c.C} (93%)
 create mode 100644 gcc/testsuite/g++.dg/modules/tls-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tls-2_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tls-2_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tls-3.h
 create mode 100644 gcc/testsuite/g++.dg/modules/tls-3_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/tls-3_b.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index a08d173c0df..be82ccd8bc1 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -4028,6 +4028,7 @@ get_tls_init_fn (tree var)
   SET_DECL_LANGUAGE (fn, lang_c);
   TREE_PUBLIC (fn) = TREE_PUBLIC (var);
   DECL_ARTIFICIAL (fn) = true;
+  DECL_CONTEXT (fn) = global_namespace;
   DECL_COMDAT (fn) = DECL_COMDAT (var);
   DECL_EXTERNAL (fn) = DECL_EXTERNAL (var);
   if (DECL_ONE_ONLY (var))
@@ -4087,7 +4088,7 @@ get_tls_wrapper_fn (tree var)
   TREE_PUBLIC (fn) = TREE_PUBLIC (var);
   DECL_ARTIFICIAL (fn) = true;
   DECL_IGNORED_P (fn) = 1;
-  DECL_CONTEXT (fn) = DECL_CONTEXT (var);
+  DECL_CONTEXT (fn) = global_namespace;
   /* The wrapper is inline and emitted everywhere var is used.  */
   DECL_DECLARED_INLINE_P (fn) = true;
   if (TREE_PUBLIC (var))
diff --git a/gcc/testsuite/g++.dg/modules/pr113292_a.H 
b/gcc/testsuite/g++.dg/modules/tls-1_a.H
similarity index 100%
rename from gcc/testsuite/g++.dg/modules/pr113292_a.H
rename to gcc/testsuite/g++.dg/modules/tls-1_a.H
diff --git a/gcc/testsuite/g++.dg/modules/pr113292_b.C 
b/gcc/testsuite/g++.dg/modules/tls-1_b.C
similarity index 93%
rename from gcc/testsuite/g++.dg/modules/pr113292_b.C
rename to gcc/testsuite/g++.dg/modules/tls-1_b.C
index fc582a5a0cf..941bff2710a 100644
--- a/gcc/testsuite/g++.dg/modules/pr113292_b.C
+++ b/gcc/testsuite/g++.dg/modules/tls-1_b.C
@@ -1,7 +1,7 @@
 // PR c++/113292
 // { dg-additional-options "-fmodules-ts" }
 
-import "pr113292_a.H";
+import "tls-1_a.H";
 
 // provide a definition of 'instance' so things link
 thread_local test test::instance;
diff --git a/gcc/testsuite/g++.dg/modules/pr113292_c.C 
b/gcc/testsuite/g++.dg/modules/tls-1_c.C
similarity index 93%
rename from gcc/testsuite/g++.dg/modules/pr113292_c.C
rename to gcc/testsuite/g++.dg/modules/tls-1_c.C
index b5acf79db63..4568413bb22 100644
--- a/gcc/testsuite/g++.dg/modules/pr113292_c.C
+++ b/gcc/testsuite/g++.dg/modules/tls-1_c.C
@@ -4,7 +4,7 @@
 // { dg-add-options tls }
 // { dg-additional-options "-fmodules-ts" }
 
-import "pr113292_a.H";
+import "tls-1_a.H";
 
 int main() {
   auto& instance = test:

Re: [PATCH 2/2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Daniel Krügler
Am Do., 22. Mai 2025 um 11:48 Uhr schrieb Tomasz Kamiński <
tkami...@redhat.com>:

> From: Jonathan Wakely 
>
> This papers implements C++27 std::indirect as specified
> in P3019 with ammendment to move assgiment from LWG 4251.
>
> PR libstdc++/119152
>
> libstdc++-v3/ChangeLog:
>
> * include/Makefile.am: Add new header.
> * include/Makefile.in: Regenerate.
> * include/bits/indirect.h: New file.
> * include/bits/version.def (indirect): Define.
> * include/bits/version.h: Regenerate.
> * include/std/memory: Include new header.
> * testsuite/std/memory/indirect/copy.cc
> * testsuite/std/memory/indirect/copy_alloc.cc
> * testsuite/std/memory/indirect/ctor.cc
> * testsuite/std/memory/indirect/incomplete.cc
> * testsuite/std/memory/indirect/invalid_neg.cc
> * testsuite/std/memory/indirect/move.cc
> * testsuite/std/memory/indirect/move_alloc.cc
> * testsuite/std/memory/indirect/relops.cc
>
> Co-Authored-By: Tomasz Kamiński 
> Signed-off-by: Tomasz Kamiński 
> ---
> Tested on x86_64-linux. OK for trunk?
>
>  libstdc++-v3/include/Makefile.am  |   1 +
>  libstdc++-v3/include/Makefile.in  |   1 +
>  libstdc++-v3/include/bits/indirect.h  | 459 ++
>  libstdc++-v3/include/bits/version.def |   9 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/memory   |   5 +
>  .../testsuite/std/memory/indirect/copy.cc | 121 +
>  .../std/memory/indirect/copy_alloc.cc | 228 +
>  .../testsuite/std/memory/indirect/ctor.cc | 203 
>  .../std/memory/indirect/incomplete.cc |  38 ++
>  .../std/memory/indirect/invalid_neg.cc|  28 ++
>  .../testsuite/std/memory/indirect/move.cc | 144 ++
>  .../std/memory/indirect/move_alloc.cc | 296 +++
>  .../testsuite/std/memory/indirect/relops.cc   |  82 
>  14 files changed, 1625 insertions(+)
>  create mode 100644 libstdc++-v3/include/bits/indirect.h
>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy.cc
>  create mode 100644
> libstdc++-v3/testsuite/std/memory/indirect/copy_alloc.cc
>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
>  create mode 100644
> libstdc++-v3/testsuite/std/memory/indirect/incomplete.cc
>  create mode 100644
> libstdc++-v3/testsuite/std/memory/indirect/invalid_neg.cc
>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move.cc
>  create mode 100644
> libstdc++-v3/testsuite/std/memory/indirect/move_alloc.cc
>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/relops.cc
>
> diff --git a/libstdc++-v3/include/Makefile.am
> b/libstdc++-v3/include/Makefile.am
> index 3e5b6c4142e..b67d470c27e 100644
> --- a/libstdc++-v3/include/Makefile.am
> +++ b/libstdc++-v3/include/Makefile.am
> @@ -210,6 +210,7 @@ bits_headers = \
> ${bits_srcdir}/gslice_array.h \
> ${bits_srcdir}/hashtable.h \
> ${bits_srcdir}/hashtable_policy.h \
> +   ${bits_srcdir}/indirect.h \
> ${bits_srcdir}/indirect_array.h \
> ${bits_srcdir}/ios_base.h \
> ${bits_srcdir}/istream.tcc \
> diff --git a/libstdc++-v3/include/Makefile.in
> b/libstdc++-v3/include/Makefile.in
> index 3531162b5f7..6f7f2be68fd 100644
> --- a/libstdc++-v3/include/Makefile.in
> +++ b/libstdc++-v3/include/Makefile.in
> @@ -563,6 +563,7 @@ bits_freestanding = \
>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/gslice_array.h \
>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable.h \
>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable_policy.h \
> +@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect.h \
>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect_array.h \
>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/ios_base.h \
>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/istream.tcc \
> diff --git a/libstdc++-v3/include/bits/indirect.h
> b/libstdc++-v3/include/bits/indirect.h
> new file mode 100644
> index 000..32b2af9117d
> --- /dev/null
> +++ b/libstdc++-v3/include/bits/indirect.h
> @@ -0,0 +1,459 @@
> +// Vocabulary Types for Composite Class Design -*- C++ -*-
> +
> +// Copyright The GNU Toolchain Authors.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.

Re: [PATCH v3 4/9] libstdc++: Implement layout_left from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz 
wrote:

> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (layout_left): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 307 +++-
>  1 file changed, 306 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index e5b1b2596d9..66c9d2cffac 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   { return __exts[__i]; });
>   }
>
> +   static constexpr span
> +   _S_static_extents(size_t __begin, size_t __end) noexcept
> +   {
> + return {_Extents.data() + __begin, _Extents.data() + __end};
> +   }
> +
> +   constexpr span
> +   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
> +   requires (_Extents.size() > 0)
> +   {
> + return {_M_dyn_exts + _S_dynamic_index[__begin],
> + _M_dyn_exts + _S_dynamic_index[__end]};
> +   }
> +
>private:
> using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> [[no_unique_address]] _S_storage _M_dyn_exts;
> @@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> || _Extent <= numeric_limits<_IndexType>::max();
>}
>
> +  namespace __mdspan
> +  {
> +template
> +  constexpr span
> +  __static_extents(size_t __begin = 0, size_t __end =
> _Extents::rank())
> +  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }
> +
> +template
> +  constexpr span
> +  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
> +   size_t __end = _Extents::rank())
> +  {
> +   return __exts._M_exts._M_dynamic_extents(__begin, __end);
> +  }
> +  }
> +
>template
>  class extents
>  {
> @@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : _M_exts(span(__exts))
> { }
>
> -
>template<__mdspan::__valid_index_type _OIndexType,
> size_t _Nm>
> requires (_Nm == rank() || _Nm == rank_dynamic())
> constexpr explicit(_Nm != rank_dynamic())
> @@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>  private:
> +  friend span
> +  __mdspan::__static_extents(size_t, size_t);
> +
> +  friend span
> +  __mdspan::__dynamic_extents(const extents&, size_t,
> size_t);
> +
>using _S_storage = __mdspan::_ExtentsStorage<
> _IndexType, array{_Extents...}>;
>[[no_unique_address]] _S_storage _M_exts;
> @@ -286,6 +321,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
> +  constexpr size_t
> +  __static_extents_prod(size_t __begin, size_t __end)
> +  {
> +   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
> +   size_t __ret = 1;
> +   for (auto __factor : __sta_exts)
> + if (__factor != dynamic_extent)
> +   __ret *= __factor;
> +   return __ret;
> +  }
> +
> +template
> +  constexpr size_t
> +  __dynamic_extents_prod(const _Extents& __exts, size_t __begin,
> +size_t __end)
> +  {
> +   auto __dyn_exts = __dynamic_extents<_Extents>(__exts, __begin,
> +__end);
> +   size_t __ret = 1;
> +   for (auto __factor : __dyn_exts)
> +   __ret *= __factor;
> +   return __ret;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end)
> noexcept
> +  {
> +   using _IndexType = typename _Extents::index_type;
> +   _IndexType __ret = 1;
> +   if constexpr (_Extents::rank_dynamic() != _Extents::rank())
> + __ret = _IndexType(__static_extents_prod<_Extents>(__begin,
> __end));
> +   if constexpr (_Extents::rank_dynamic() > 0)
> + __ret *= __dynamic_extents_prod(__exts, __begin, __end);
> +   return __ret;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts_prod(__exts, 0, __r); }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts_prod(__exts, __r + 1, __exts.rank()); }
> +
>  template
>auto __build_dextents_type(integer_sequence)
> -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
> @@ -304,6 +387,228 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  explicit extents(_Integrals...) ->
>extents()...>;
>
> +  struct layout_left
> +  {
> +template
> +  class mapping;
> +  };
> +
> +  namespace __mdspan
> +  {
> +templa

Re: [PATCH v3 8/9] libstdc++: Implement layout_stride from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 12:04 PM Luc Grosheintz 
wrote:

> Implements the remaining parts of layout_left and layout_right; and all
> of layout_stride.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan(layout_stride): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 216 +++-
>  1 file changed, 213 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 43676c3463c..732fc4eb1c2 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -399,6 +399,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>class mapping;
>};
>
> +  struct layout_stride
> +  {
> +template
> +  class mapping;
> +  };
> +
>namespace __mdspan
>{
>  template
> @@ -499,7 +505,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  template
>concept __standardized_mapping = __mapping_of
> -  || __mapping_of _Mapping>;
> +  || __mapping_of _Mapping>
> +  || __mapping_of _Mapping>;
>
>  template
>concept __mapping_like = requires
> @@ -557,6 +564,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : mapping(__other.extents(), __mdspan::__internal_ctor{})
> { }
>
> +  template
> +   requires (is_constructible_v)
> +   constexpr explicit(extents_type::rank() > 0)
> +   mapping(const layout_stride::mapping<_OExtents>& __other)
>
I think I would make it noexcept, as implementations can add noexcept to
what is specified in standard.
And just add appropriate comment.

> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
> +   { __glibcxx_assert(*this == __other); }
> +
>constexpr mapping&
>operator=(const mapping&) noexcept = default;
>
> @@ -572,8 +586,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> constexpr index_type
> operator()(_Indices... __indices) const noexcept
> {
> - return __mdspan::__linear_index_left(
> -   this->extents(), static_cast(__indices)...);
> + return __mdspan::__linear_index_left(_M_extents,
> +   static_cast(__indices)...);
>
Could you move this change to  layout_left commit.

> }
>
>static constexpr bool
> @@ -687,6 +701,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : mapping(__other.extents(), __mdspan::__internal_ctor{})
> { }
>
> +  template
> +   requires (is_constructible_v)
> +   constexpr explicit(extents_type::rank() > 0)
> +   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
> +   { __glibcxx_assert(*this == __other); }
> +
>constexpr mapping&
>operator=(const mapping&) noexcept = default;
>
> @@ -760,6 +781,195 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> [[no_unique_address]] _Extents _M_extents;
>  };
>
> +  namespace __mdspan
> +  {
> +template
> +  constexpr typename _Mapping::index_type
> +  __offset(const _Mapping& __m) noexcept
> +  {
> +   using _IndexType = typename _Mapping::index_type;
> +
> +   auto __impl = [&__m](index_sequence<_Counts...>)
> +   { return __m(((void) _Counts, _IndexType(0))...); };
> +   return
> __impl(make_index_sequence<_Mapping::extents_type::rank()>());
> +  }
> +
> +template
> +  constexpr typename _Mapping::index_type
> +  __linear_index_strides(const _Mapping& __m,
> +_Indices... __indices)
> +  {
> +   using _IndexType = typename _Mapping::index_type;
> +   _IndexType __res = 0;
> +   if constexpr (sizeof...(__indices) > 0)
> + {
> +   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
> + {
> +   __res += __idx * __m.stride(__pos++);
> + };
> +   (__update(__indices), ...);
> + }
> +   return __res;
> +  }
> +  }
> +
> +  template
> +class layout_stride::mapping
> +{
> +public:
> +  using extents_type = _Extents;
> +  using index_type = typename extents_type::index_type;
> +  using size_type = typename extents_type::size_type;
> +  using rank_type = typename extents_type::rank_type;
> +  using layout_type = layout_stride;
> +
> +  static_assert(__mdspan::__representable_size<_Extents, index_type>,
> +   "The size of extents_type must be representable as index_type");
> +
> +  constexpr
> +  mapping() noexcept
>
+  {
> +   auto __stride = index_type(1);
> +   for (size_t __i = extents_type::rank(); __i > 0; --__i)
> + {
> +   _M_strides[__i - 1] = __stride;
> +   __stride *= _M_extents.extent(__i - 1);
> + }
> +  }
> +
> +  constexpr
> +  mapping(const mapping&) noexcept = default;
> +
> +  template<__mdspan::__valid_index_type

Re: [PATCH] c++: Take downgraded errors into account in seen_error [PR118388]

2025-05-22 Thread Simon Martin
Hi,

On Fri May 9, 2025 at 5:37 PM CEST, Simon Martin wrote:
> Several gcc_assert through the C++ front-end involve seen_error (), that
> does not take into account errors that were turned into warnings due to
> -fpermissive or -Wtemplate-body.
>
> Running the full C++ testsuite when forcing the use of -fpermissive
> leads to ICEs for 6 tests (see list in ticket); one could consider those
> as reject-valid cases.
>
> This patch keeps track of whether we tried to emit an error (whether it
> was eventually output as such or not) and uses this in seen_error.
>
> Successfully tested on x86_64-pc-linux-gnu.
Friendly ping.

Thanks!
  Simon

>   PR c++/118388
>
> gcc/cp/ChangeLog:
>
>   * error.cc (seen_error_raw): New counter to keep track of errors
>   including those downgraded to warnings.
>   (cp_seen_error): Take downgraded errors into account.
>   * typeck2.cc (merge_exception_specifiers): Use seen_error
>   instead of errorcount.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C: New test.
>   * g++.dg/cpp0x/noexcept128-fpermissive.C: New test.
>
> ---
>  gcc/cp/error.cc   | 54 +--
>  gcc/cp/typeck2.cc |  2 +-
>  .../cpp0x/lambda/lambda-ice5-fpermissive.C| 14 +
>  .../g++.dg/cpp0x/noexcept128-fpermissive.C| 21 
>  4 files changed, 63 insertions(+), 28 deletions(-)
>  create mode 100644 
> gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept128-fpermissive.C
>
> diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
> index 75bf7dcef62..78ecafb0e02 100644
> --- a/gcc/cp/error.cc
> +++ b/gcc/cp/error.cc
> @@ -215,6 +215,11 @@ get_current_template ()
>
>  erroneous_templates_t *erroneous_templates;
>
> +/* SEEN_ERROR_RAW will be true if we tried to emit an error message, 
> regardless
> +   of whether it was actually output or downgraded to a warning.  */
> +
> +bool seen_error_raw = false;
> +
>  /* Callback function diagnostic_context::m_adjust_diagnostic_info.
>
> Errors issued when parsing a template are automatically treated like
> @@ -227,40 +232,35 @@ cp_adjust_diagnostic_info (diagnostic_context *context,
>  diagnostic_info *diagnostic)
>  {
>if (diagnostic->kind == DK_ERROR)
> -if (tree tmpl = get_current_template ())
> -  {
> - diagnostic->option_id = OPT_Wtemplate_body;
> -
> - if (context->m_permissive)
> -   diagnostic->kind = DK_WARNING;
> -
> - bool existed;
> - location_t &error_loc
> -   = hash_map_safe_get_or_insert (erroneous_templates,
> -tmpl, &existed);
> - if (!existed)
> -   /* Remember that this template had a parse-time error so
> -  that we'll ensure a hard error has been issued upon
> -  its instantiation.  */
> -   error_loc = diagnostic->richloc->get_loc ();
> -  }
> +{
> +  seen_error_raw = true;
> +  if (tree tmpl = get_current_template ())
> + {
> +   diagnostic->option_id = OPT_Wtemplate_body;
> +
> +   if (context->m_permissive)
> + diagnostic->kind = DK_WARNING;
> +
> +   bool existed;
> +   location_t &error_loc
> + = hash_map_safe_get_or_insert (erroneous_templates,
> +  tmpl, &existed);
> +   if (!existed)
> + /* Remember that this template had a parse-time error so
> +that we'll ensure a hard error has been issued upon
> +its instantiation.  */
> + error_loc = diagnostic->richloc->get_loc ();
> + }
> +}
>  }
>
>  /* A generalization of seen_error which also returns true if we've
> -   permissively downgraded an error to a warning inside a template.  */
> +   permissively downgraded an error to a warning.  */
>
>  bool
>  cp_seen_error ()
>  {
> -  if ((seen_error) ())
> -return true;
> -
> -  if (erroneous_templates)
> -if (tree tmpl = get_current_template ())
> -  if (erroneous_templates->get (tmpl))
> - return true;
> -
> -  return false;
> +  return (seen_error) () || seen_error_raw;
>  }
>
>  /* CONTEXT->printer is a basic pretty printer that was constructed
> diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
> index 45edd180173..a2d230461c4 100644
> --- a/gcc/cp/typeck2.cc
> +++ b/gcc/cp/typeck2.cc
> @@ -2726,7 +2726,7 @@ merge_exception_specifiers (tree list, tree add)
>  return add;
>noex = TREE_PURPOSE (list);
>gcc_checking_assert (!TREE_PURPOSE (add)
> -|| errorcount || !flag_exceptions
> +|| seen_error () || !flag_exceptions
>  || cp_tree_equal (noex, TREE_PURPOSE (add)));
>
>/* Combine the dynamic-exception-specifiers, if any.  */
> diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C 
> b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice5-fpermissive.C
> new file

Re: [PATCH v3 8/9] libstdc++: Implement layout_stride from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 2:37 PM Tomasz Kaminski  wrote:

>
>
> On Wed, May 21, 2025 at 12:04 PM Luc Grosheintz 
> wrote:
>
>> Implements the remaining parts of layout_left and layout_right; and all
>> of layout_stride.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/mdspan(layout_stride): New class.
>>
>> Signed-off-by: Luc Grosheintz 
>> ---
>>  libstdc++-v3/include/std/mdspan | 216 +++-
>>  1 file changed, 213 insertions(+), 3 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/std/mdspan
>> b/libstdc++-v3/include/std/mdspan
>> index 43676c3463c..732fc4eb1c2 100644
>> --- a/libstdc++-v3/include/std/mdspan
>> +++ b/libstdc++-v3/include/std/mdspan
>> @@ -399,6 +399,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>class mapping;
>>};
>>
>> +  struct layout_stride
>> +  {
>> +template
>> +  class mapping;
>> +  };
>> +
>>namespace __mdspan
>>{
>>  template
>> @@ -499,7 +505,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>  template
>>concept __standardized_mapping = __mapping_of> _Mapping>
>> -  || __mapping_of> _Mapping>;
>> +  || __mapping_of> _Mapping>
>> +  || __mapping_of> _Mapping>;
>>
>>  template
>>concept __mapping_like = requires
>> @@ -557,6 +564,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> { }
>>
>> +  template
>> +   requires (is_constructible_v)
>> +   constexpr explicit(extents_type::rank() > 0)
>> +   mapping(const layout_stride::mapping<_OExtents>& __other)
>>
> I think I would make it noexcept, as implementations can add noexcept to
> what is specified in standard.
> And just add appropriate comment.
>
>> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> +   { __glibcxx_assert(*this == __other); }
>> +
>>constexpr mapping&
>>operator=(const mapping&) noexcept = default;
>>
>> @@ -572,8 +586,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> constexpr index_type
>> operator()(_Indices... __indices) const noexcept
>> {
>> - return __mdspan::__linear_index_left(
>> -   this->extents(), static_cast(__indices)...);
>> + return __mdspan::__linear_index_left(_M_extents,
>> +   static_cast(__indices)...);
>>
> Could you move this change to  layout_left commit.
>
>> }
>>
>>static constexpr bool
>> @@ -687,6 +701,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> { }
>>
>> +  template
>> +   requires (is_constructible_v)
>> +   constexpr explicit(extents_type::rank() > 0)
>> +   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
>> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> +   { __glibcxx_assert(*this == __other); }
>> +
>>constexpr mapping&
>>operator=(const mapping&) noexcept = default;
>>
>> @@ -760,6 +781,195 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> [[no_unique_address]] _Extents _M_extents;
>>  };
>>
>> +  namespace __mdspan
>> +  {
>> +template
>> +  constexpr typename _Mapping::index_type
>> +  __offset(const _Mapping& __m) noexcept
>> +  {
>> +   using _IndexType = typename _Mapping::index_type;
>> +
>> +   auto __impl = [&__m]> _Counts>(index_sequence<_Counts...>)
>> +   { return __m(((void) _Counts, _IndexType(0))...); };
>> +   return
>> __impl(make_index_sequence<_Mapping::extents_type::rank()>());
>> +  }
>> +
>> +template
>> +  constexpr typename _Mapping::index_type
>> +  __linear_index_strides(const _Mapping& __m,
>> +_Indices... __indices)
>> +  {
>> +   using _IndexType = typename _Mapping::index_type;
>> +   _IndexType __res = 0;
>> +   if constexpr (sizeof...(__indices) > 0)
>> + {
>> +   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
>> + {
>> +   __res += __idx * __m.stride(__pos++);
>> + };
>> +   (__update(__indices), ...);
>> + }
>> +   return __res;
>> +  }
>> +  }
>> +
>> +  template
>> +class layout_stride::mapping
>> +{
>> +public:
>> +  using extents_type = _Extents;
>> +  using index_type = typename extents_type::index_type;
>> +  using size_type = typename extents_type::size_type;
>> +  using rank_type = typename extents_type::rank_type;
>> +  using layout_type = layout_stride;
>> +
>> +  static_assert(__mdspan::__representable_size<_Extents, index_type>,
>> +   "The size of extents_type must be representable as index_type");
>> +
>> +  constexpr
>> +  mapping() noexcept
>>
> +  {
>> +   auto __stride = index_type(1);
>> +   for (size_t __i = extents_type::rank(); __i > 0; --__i)
>> + {
>> +

[PATCH v2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Tomasz Kamiński
From: Jonathan Wakely 

This papers implements C++26 std::indirect as specified
in P3019 with amendment to move assignment from LWG 4251.

PR libstdc++/119152

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/indirect.h: New file.
* include/bits/version.def (indirect): Define.
* include/bits/version.h: Regenerate.
* include/std/memory: Include new header.
* testsuite/std/memory/indirect/copy.cc
* testsuite/std/memory/indirect/copy_alloc.cc
* testsuite/std/memory/indirect/ctor.cc
* testsuite/std/memory/indirect/incomplete.cc
* testsuite/std/memory/indirect/invalid_neg.cc
* testsuite/std/memory/indirect/move.cc
* testsuite/std/memory/indirect/move_alloc.cc
* testsuite/std/memory/indirect/relops.cc

Co-authored-by: Tomasz Kamiński 
Signed-off-by: Tomasz Kamiński 
---
Changes in v2:
 - Fixed typos in commit messages as pointed by Jakub
 - Removed stray comment in indirect.h header as pointed out by Daniel

 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/indirect.h  | 459 ++
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/memory   |   5 +
 .../testsuite/std/memory/indirect/copy.cc | 121 +
 .../std/memory/indirect/copy_alloc.cc | 228 +
 .../testsuite/std/memory/indirect/ctor.cc | 203 
 .../std/memory/indirect/incomplete.cc |  38 ++
 .../std/memory/indirect/invalid_neg.cc|  28 ++
 .../testsuite/std/memory/indirect/move.cc | 144 ++
 .../std/memory/indirect/move_alloc.cc | 296 +++
 .../testsuite/std/memory/indirect/relops.cc   |  82 
 14 files changed, 1625 insertions(+)
 create mode 100644 libstdc++-v3/include/bits/indirect.h
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy_alloc.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/incomplete.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/invalid_neg.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move_alloc.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/relops.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 3e5b6c4142e..b67d470c27e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -210,6 +210,7 @@ bits_headers = \
${bits_srcdir}/gslice_array.h \
${bits_srcdir}/hashtable.h \
${bits_srcdir}/hashtable_policy.h \
+   ${bits_srcdir}/indirect.h \
${bits_srcdir}/indirect_array.h \
${bits_srcdir}/ios_base.h \
${bits_srcdir}/istream.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 3531162b5f7..6f7f2be68fd 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -563,6 +563,7 @@ bits_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/gslice_array.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable_policy.h \
+@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect_array.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/ios_base.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/istream.tcc \
diff --git a/libstdc++-v3/include/bits/indirect.h 
b/libstdc++-v3/include/bits/indirect.h
new file mode 100644
index 000..3fd9807a8fd
--- /dev/null
+++ b/libstdc++-v3/include/bits/indirect.h
@@ -0,0 +1,459 @@
+// Vocabulary Types for Composite Class Design -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RU

Re: [PATCH] RISC-V: Add minimal support of double trap extension 1.0

2025-05-22 Thread Jeff Law




On 5/22/25 12:21 AM, Jerry Zhang Jian wrote:

Add support of double trap extension [1], enabling GCC
to recognize the following extensions at compile time.

New extensions:
 - ssdbltrp
 - smdbltrp

[1] 
https://github.com/riscv/riscv-double-trap/releases/download/v1.0/riscv-double-trap.pdf

gcc/ChangeLog:
 * config/riscv/riscv-ext.def: New extensions

gcc/testsuite/ChangeLog:
 * gcc/testsuite/gcc.target/riscv/arch-56.c: New test
 * gcc/testsuite/gcc.target/riscv/arch-57.c: New test

This fails to build.  See the logs here:



https://github.com/ewlu/gcc-precommit-ci/issues/3413#issuecomment-2900108005



Jeff


Re: [PATCH v4 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-22 Thread Dhruv Chawla

On 22/05/25 16:06, Richard Sandiford wrote:

External email: Use caution opening links or attachments


 writes:

[...]
+;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a "bswap".
+;; Match that as well.
+(define_insn_and_split "*v_revvnx8hi"
+  [(parallel
+[(set (match_operand:VNx8HI 0 "register_operand")
+   (bswap:VNx8HI (match_operand 1 "register_operand")))
+ (clobber (match_scratch:VNx8BI 2))])]


Sorry for not noticing last time, but operand 0 should have a "=w"
constraint, operand 1 should have a "w" constraint, and the match_scratch
should have a "=Upl" constraint.


Ah, thanks, sorry about forgetting to add those in the first place.




+  "TARGET_SVE"
+  "#"
+  ""


The last line should be "&& 1", since the TARGET_SVE test doesn't
automatically apply to the define_split.


+  [(set (match_dup 0)
+ (unspec:VNx8HI
+   [(match_dup 2)
+(unspec:VNx8HI
+  [(match_dup 1)]
+  UNSPEC_REVB)]
+   UNSPEC_PRED_X))]
+  {
+if (!can_create_pseudo_p ())
+  operands[2] = CONSTM1_RTX (VNx8BImode);
+else
+  operands[2] = aarch64_ptrue_reg (VNx8BImode);


This should be:

 if (!can_create_pseudo_p ())
   emit_move_insn (operands[2], CONSTM1_RTX (VNx8BImode));
 else
   operands[2] = aarch64_ptrue_reg (VNx8BImode);

That is, after register allocation, the pattern gives us a scratch
predicate register, but we need to initialise it to a ptrue.



Ah right, that makes sense, my bad - I had just copied the else-block
and forgot to think about it.


+  }
+)
+
  ;; Predicated integer unary operations.
  (define_insn "@aarch64_pred_"
[(set (match_operand:SVE_FULL_I 0 "register_operand")
[...]
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
new file mode 100644
index 000..3a30f80d152
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+sve" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+/*
+** ror32_sve_lsl_imm:
+**   ptrue   p3.b, all
+**   revwz0.d, p3/m, z0.d


There's no requirement to choose p3 for the predicate, so this would
be better as:

**  ptrue   (p[0-3]).b, all
**  revwz0.d, \1/m, z0.d

Same for the others.

OK with those changes, thanks.


Here's a version of the patch with changes applied - I will commit it after
receiving write-after-approval approval and adding myself to the MAINTAINERS
file :) Thanks for the sponsor!

-- >8 --

[PATCH] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

This patch folds the following pattern:

  lsl , , 
  lsr , , 
  orr , , 

to:

  revb/h/w , 

when the shift amount is equal to half the bitwidth of the 
register.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 
Co-authored-by: Richard Sandiford 

gcc/ChangeLog:

* expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the
target already provided the result in the expected register.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const):
Avoid forcing subregs into fresh registers unnecessarily.
* config/aarch64/aarch64-sve.md: Add define_split for rotate.
(*v_revvnx8hi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/shift_rev_1.c: New test.
* gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
* gcc.target/aarch64/sve/shift_rev_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 55 
 gcc/config/aarch64/aarch64.cc | 10 ++-
 gcc/expmed.cc |  3 +-
 .../gcc.target/aarch64/sve/shift_rev_1.c  | 83 +++
 .../gcc.target/aarch64/sve/shift_rev_2.c  | 63 ++
 .../gcc.target/aarch64/sve/shift_rev_3.c  | 83 +++
 6 files changed, 294 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_3.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index e1ec778b10d..c5d3e8cd3b3 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3317,6 +3317,61 @@
 ;; - REVW
 ;; -
 
+(define_split

+  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand")
+   (rotate:SVE_FULL_HSDI
+ (match_operand:SVE_FULL_HSDI 1 "register_operand")
+ (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))]
+  "TARGET_SVE && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (ashift:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 2)))
+   (set (match_dup 0)
+   (plus:SVE_FULL_HSDI
+ (lshiftrt:SVE_FUL

Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025 at 16:25, Tomasz Kaminski  wrote:
>
>
>
> On Thu, May 22, 2025 at 5:15 PM Tomasz Kaminski  wrote:
>>
>>
>>
>> On Thu, May 22, 2025 at 5:04 PM Jonathan Wakely  wrote:
>>>
>>> On Thu, 22 May 2025 at 15:50, Tomasz Kaminski  wrote:
>>> >
>>> >
>>> >
>>> > On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely  
>>> > wrote:
>>> >>
>>> >> The current overload set for __unique_copy handles three cases:
>>> >>
>>> >> - The input range uses forward iterators, the output range does not.
>>> >>   This is the simplest case, and can just compare adjacent elements of
>>> >>   the input range.
>>> >>
>>> >> - Neither the input range nor output range use forward iterators.
>>> >>   This requires a local variable copied from the input range and updated
>>> >>   by assigning each element to the local variable.
>>> >>
>>> >> - The output range uses forward iterators.
>>> >>   For this case we compare the current element from the input range with
>>> >>   the element just written to the output range.
>>> >>
>>> >> There are two problems with this implementation. Firstly, the third case
>>> >> assumes that the value type of the output range can be compared to the
>>> >> value type of the input range, which might not be possible at all, or
>>> >> might be possible but give different results to comparing elements of
>>> >> the input range. This is the problem identified in LWG 2439.
>>> >>
>>> >> Secondly, the third case is used when both ranges use forward iterators,
>>> >> even though the first case could (and should) be used. This means that
>>> >> we compare elements from the output range instead of the input range,
>>> >> with the problems described above (either not well-formed, or might give
>>> >> the wrong results).
>>> >>
>>> >> The cause of the second problem is that the overload for the first case
>>> >> looks like:
>>> >>
>>> >> OutputIterator
>>> >> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
>>> >>   forward_iterator_tag, output_iterator_tag);
>>> >>
>>> >> When the output range uses forward iterators this overload cannot be
>>> >> used, because forward_iterator_tag does not inherit from
>>> >> output_iterator_tag, so is not convertible to it.
>>> >>
>>> >> To fix these problems we need to implement the resolution of LWG 2439 so
>>> >> that the third case is only used when the value types of the two ranges
>>> >> are the same. This ensures that the comparisons are well behaved. We
>>> >> also need to ensure that the first case is used when both ranges use
>>> >> forward iterators.
>>> >>
>>> >> This change replaces a single step of tag dispatching to choose between
>>> >> three overloads with two step of tag dispatching, choosing between two
>>> >> overloads at each step. The first step dispatches based on the iterator
>>> >> category of the input range, ignoring the category of the output range.
>>> >> The second step only happens when the input range uses non-forward
>>> >> iterators, and dispatches based on the category of the output range and
>>> >> whether the value type of the two ranges is the same. So now the cases
>>> >> that are handled are:
>>> >>
>>> >> - The input range uses forward iterators.
>>> >> - The output range uses non-forward iterators or a different value type.
>>> >> - The output range uses forward iterators and has the same value type.
>>> >>
>>> >> For the second case, the old code used __gnu_cxx::__ops::__iter_comp_val
>>> >> to wrap the predicate in another level of indirection. That seems
>>> >> unnecessary, as we can just use a pointer to the local variable instead
>>> >> of an iterator referring to it.
>>> >>
>>> >> libstdc++-v3/ChangeLog:
>>> >>
>>> >> PR libstdc++/120386
>>> >> * include/bits/stl_algo.h (__unique_copy_1): New overloads for
>>> >> the case where the input range uses non-forward iterators.
>>> >> (__unique_copy): Replace three overloads with two, depending
>>> >> only on the iterator category of the input range. Dispatch to
>>> >> __unique_copy_1 for the non-forward case.
>>> >> (unique_copy): Only pass the input range category to
>>> >> __unique_copy.
>>> >> ---
>>> >>
>>> >> Tested x86_64-linux.
>>> >
>>> > LGTM. Only small suggestion, regarding the change of order of arguments.
>>>
>>> I forgot to say that I need to add tests for each of the cases,
>>> especially the case that fails with the existing code!
>>>
>>> >>
>>> >>
>>> >>  libstdc++-v3/include/bits/stl_algo.h | 80 +++-
>>> >>  1 file changed, 44 insertions(+), 36 deletions(-)
>>> >>
>>> >> diff --git a/libstdc++-v3/include/bits/stl_algo.h 
>>> >> b/libstdc++-v3/include/bits/stl_algo.h
>>> >> index f5361aeab7e2..c0bb17f9c8b2 100644
>>> >> --- a/libstdc++-v3/include/bits/stl_algo.h
>>> >> +++ b/libstdc++-v3/include/bits/stl_algo.h
>>> >> @@ -918,24 +918,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> >>
>>> >> __gnu_cxx::__ops::__iter_comp_iter(__b

Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025 at 16:44, Jonathan Wakely  wrote:
>
> On Thu, 22 May 2025 at 16:25, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Thu, May 22, 2025 at 5:15 PM Tomasz Kaminski  wrote:
> >>
> >>
> >>
> >> On Thu, May 22, 2025 at 5:04 PM Jonathan Wakely  wrote:
> >>>
> >>> On Thu, 22 May 2025 at 15:50, Tomasz Kaminski  wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely  
> >>> > wrote:
> >>> >>
> >>> >> The current overload set for __unique_copy handles three cases:
> >>> >>
> >>> >> - The input range uses forward iterators, the output range does not.
> >>> >>   This is the simplest case, and can just compare adjacent elements of
> >>> >>   the input range.
> >>> >>
> >>> >> - Neither the input range nor output range use forward iterators.
> >>> >>   This requires a local variable copied from the input range and 
> >>> >> updated
> >>> >>   by assigning each element to the local variable.
> >>> >>
> >>> >> - The output range uses forward iterators.
> >>> >>   For this case we compare the current element from the input range 
> >>> >> with
> >>> >>   the element just written to the output range.
> >>> >>
> >>> >> There are two problems with this implementation. Firstly, the third 
> >>> >> case
> >>> >> assumes that the value type of the output range can be compared to the
> >>> >> value type of the input range, which might not be possible at all, or
> >>> >> might be possible but give different results to comparing elements of
> >>> >> the input range. This is the problem identified in LWG 2439.
> >>> >>
> >>> >> Secondly, the third case is used when both ranges use forward 
> >>> >> iterators,
> >>> >> even though the first case could (and should) be used. This means that
> >>> >> we compare elements from the output range instead of the input range,
> >>> >> with the problems described above (either not well-formed, or might 
> >>> >> give
> >>> >> the wrong results).
> >>> >>
> >>> >> The cause of the second problem is that the overload for the first case
> >>> >> looks like:
> >>> >>
> >>> >> OutputIterator
> >>> >> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
> >>> >>   forward_iterator_tag, output_iterator_tag);
> >>> >>
> >>> >> When the output range uses forward iterators this overload cannot be
> >>> >> used, because forward_iterator_tag does not inherit from
> >>> >> output_iterator_tag, so is not convertible to it.
> >>> >>
> >>> >> To fix these problems we need to implement the resolution of LWG 2439 
> >>> >> so
> >>> >> that the third case is only used when the value types of the two ranges
> >>> >> are the same. This ensures that the comparisons are well behaved. We
> >>> >> also need to ensure that the first case is used when both ranges use
> >>> >> forward iterators.
> >>> >>
> >>> >> This change replaces a single step of tag dispatching to choose between
> >>> >> three overloads with two step of tag dispatching, choosing between two
> >>> >> overloads at each step. The first step dispatches based on the iterator
> >>> >> category of the input range, ignoring the category of the output range.
> >>> >> The second step only happens when the input range uses non-forward
> >>> >> iterators, and dispatches based on the category of the output range and
> >>> >> whether the value type of the two ranges is the same. So now the cases
> >>> >> that are handled are:
> >>> >>
> >>> >> - The input range uses forward iterators.
> >>> >> - The output range uses non-forward iterators or a different value 
> >>> >> type.
> >>> >> - The output range uses forward iterators and has the same value type.
> >>> >>
> >>> >> For the second case, the old code used 
> >>> >> __gnu_cxx::__ops::__iter_comp_val
> >>> >> to wrap the predicate in another level of indirection. That seems
> >>> >> unnecessary, as we can just use a pointer to the local variable instead
> >>> >> of an iterator referring to it.
> >>> >>
> >>> >> libstdc++-v3/ChangeLog:
> >>> >>
> >>> >> PR libstdc++/120386
> >>> >> * include/bits/stl_algo.h (__unique_copy_1): New overloads for
> >>> >> the case where the input range uses non-forward iterators.
> >>> >> (__unique_copy): Replace three overloads with two, depending
> >>> >> only on the iterator category of the input range. Dispatch to
> >>> >> __unique_copy_1 for the non-forward case.
> >>> >> (unique_copy): Only pass the input range category to
> >>> >> __unique_copy.
> >>> >> ---
> >>> >>
> >>> >> Tested x86_64-linux.
> >>> >
> >>> > LGTM. Only small suggestion, regarding the change of order of arguments.
> >>>
> >>> I forgot to say that I need to add tests for each of the cases,
> >>> especially the case that fails with the existing code!
> >>>
> >>> >>
> >>> >>
> >>> >>  libstdc++-v3/include/bits/stl_algo.h | 80 +++-
> >>> >>  1 file changed, 44 insertions(+), 36 deletions(-)
> >>> >>
> >>> >> diff --git a/libstdc++-v3/include/bits/stl_a

RISC-V TLS Descriptors in GCC

2025-05-22 Thread Dongsheng Song
Hi Kito,

You mentioned that GCC 14 added TLSDESC support for RISC-V and that it
requires glibc 2.40 [1].

However, when I looked for relevant information, I found that
LoongArch and RISC-V both published TLSDESC patches for review at the
last year [2], but only the LoongArch patch was merged into glibc 2.40
[3], and I didn't see the RISC V patch merged even in the latest glibc
development branch.

Is the information I found accurate? What is the current status of
glibc support for RISC-V TLSDESC?

1. 
https://gcc.gnu.org/git/?p=gcc-wwwdocs.git;a=commitdiff;h=edc6411ab81dde8a0621ee706e6ff951be645922
2. [RISC-V: Implement TLS
Descriptors](https://inbox.sourceware.org/libc-alpha/20240329061834.40019-1-ishitatsuy...@gmail.com/)
3. (LoongArch: Add support for TLS
Descriptors](https://github.com/bminor/glibc/commit/1dbf2bef7934cee9829d875f11968d6ff1fee77f)

Thanks,
Dongsheng


Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025 at 16:48, Jonathan Wakely  wrote:
>
> On Thu, 22 May 2025 at 16:44, Jonathan Wakely  wrote:
> >
> > On Thu, 22 May 2025 at 16:25, Tomasz Kaminski  wrote:
> > >
> > >
> > >
> > > On Thu, May 22, 2025 at 5:15 PM Tomasz Kaminski  
> > > wrote:
> > >>
> > >>
> > >>
> > >> On Thu, May 22, 2025 at 5:04 PM Jonathan Wakely  
> > >> wrote:
> > >>>
> > >>> On Thu, 22 May 2025 at 15:50, Tomasz Kaminski  
> > >>> wrote:
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely  
> > >>> > wrote:
> > >>> >>
> > >>> >> The current overload set for __unique_copy handles three cases:
> > >>> >>
> > >>> >> - The input range uses forward iterators, the output range does not.
> > >>> >>   This is the simplest case, and can just compare adjacent elements 
> > >>> >> of
> > >>> >>   the input range.
> > >>> >>
> > >>> >> - Neither the input range nor output range use forward iterators.
> > >>> >>   This requires a local variable copied from the input range and 
> > >>> >> updated
> > >>> >>   by assigning each element to the local variable.
> > >>> >>
> > >>> >> - The output range uses forward iterators.
> > >>> >>   For this case we compare the current element from the input range 
> > >>> >> with
> > >>> >>   the element just written to the output range.
> > >>> >>
> > >>> >> There are two problems with this implementation. Firstly, the third 
> > >>> >> case
> > >>> >> assumes that the value type of the output range can be compared to 
> > >>> >> the
> > >>> >> value type of the input range, which might not be possible at all, or
> > >>> >> might be possible but give different results to comparing elements of
> > >>> >> the input range. This is the problem identified in LWG 2439.
> > >>> >>
> > >>> >> Secondly, the third case is used when both ranges use forward 
> > >>> >> iterators,
> > >>> >> even though the first case could (and should) be used. This means 
> > >>> >> that
> > >>> >> we compare elements from the output range instead of the input range,
> > >>> >> with the problems described above (either not well-formed, or might 
> > >>> >> give
> > >>> >> the wrong results).
> > >>> >>
> > >>> >> The cause of the second problem is that the overload for the first 
> > >>> >> case
> > >>> >> looks like:
> > >>> >>
> > >>> >> OutputIterator
> > >>> >> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
> > >>> >>   forward_iterator_tag, output_iterator_tag);
> > >>> >>
> > >>> >> When the output range uses forward iterators this overload cannot be
> > >>> >> used, because forward_iterator_tag does not inherit from
> > >>> >> output_iterator_tag, so is not convertible to it.
> > >>> >>
> > >>> >> To fix these problems we need to implement the resolution of LWG 
> > >>> >> 2439 so
> > >>> >> that the third case is only used when the value types of the two 
> > >>> >> ranges
> > >>> >> are the same. This ensures that the comparisons are well behaved. We
> > >>> >> also need to ensure that the first case is used when both ranges use
> > >>> >> forward iterators.
> > >>> >>
> > >>> >> This change replaces a single step of tag dispatching to choose 
> > >>> >> between
> > >>> >> three overloads with two step of tag dispatching, choosing between 
> > >>> >> two
> > >>> >> overloads at each step. The first step dispatches based on the 
> > >>> >> iterator
> > >>> >> category of the input range, ignoring the category of the output 
> > >>> >> range.
> > >>> >> The second step only happens when the input range uses non-forward
> > >>> >> iterators, and dispatches based on the category of the output range 
> > >>> >> and
> > >>> >> whether the value type of the two ranges is the same. So now the 
> > >>> >> cases
> > >>> >> that are handled are:
> > >>> >>
> > >>> >> - The input range uses forward iterators.
> > >>> >> - The output range uses non-forward iterators or a different value 
> > >>> >> type.
> > >>> >> - The output range uses forward iterators and has the same value 
> > >>> >> type.
> > >>> >>
> > >>> >> For the second case, the old code used 
> > >>> >> __gnu_cxx::__ops::__iter_comp_val
> > >>> >> to wrap the predicate in another level of indirection. That seems
> > >>> >> unnecessary, as we can just use a pointer to the local variable 
> > >>> >> instead
> > >>> >> of an iterator referring to it.
> > >>> >>
> > >>> >> libstdc++-v3/ChangeLog:
> > >>> >>
> > >>> >> PR libstdc++/120386
> > >>> >> * include/bits/stl_algo.h (__unique_copy_1): New overloads 
> > >>> >> for
> > >>> >> the case where the input range uses non-forward iterators.
> > >>> >> (__unique_copy): Replace three overloads with two, depending
> > >>> >> only on the iterator category of the input range. Dispatch to
> > >>> >> __unique_copy_1 for the non-forward case.
> > >>> >> (unique_copy): Only pass the input range category to
> > >>> >> __unique_copy.
> > >>> >> ---
> > >>> >>
> > >>> >> Tested x86_64-linux.
> >

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-22 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 19 May 2025, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>>> On Fri, 16 May 2025, Richard Sandiford wrote:
> The simple prototype below uses a separate flag from the epilogue
> mode, but I wonder how we want to more generally want to handle
> whether to use masking or not when iterating over modes.  Currently
> we mostly rely on --param vect-partial-vector-usage.  aarch64
> and riscv have both variable-length modes but also fixed-size modes
> where for the latter, like on x86, the target couldn't request
> a mode specifically with or without masking.  It seems both
> aarch64 and riscv fully rely on cost comparison and fully
> exploiting the mode iteration space (but not masked vs. non-masked?!)
> here?
>
> I was thinking of adding a vectorization_mode class that would
> encapsulate the mode and whether to allow masking or alternatively
> to make the vector_modes array (and the m_suggested_epilogue_mode)
> a std::pair of mode and mask flag?

 Predicated vs. non-predicated SVE is interesting for the main loop.
 The class sounds like it would be useful for that.

 I suppose predicated vs. non-predicated SVE is also potentially
 interesting for an unrolled epilogue, although there, it would in
 theory be better to predicate only the last vector iteration
 (i.e. part predicated, part unpredicated).
>>>
>>> Yes, the latter is what we want for AVX512, keep the main loop
>>> not predicated but have the epilog predicated (using the same VF).
>>
>> Reading it back, what I said was very ambiguous (as usual, unfortunately).
>> What I actually meant was that if we had, say, a 4x unrolled main loop
>> and a 2x unrolled first epilogue loop, we'd in theory want the 2x
>> unrolled epilogue loop to use unpredicated operations for the first
>> VF/2 elements and predicted operations for the second VF/2 elements.
>>
>> That way, we get the benefit of the 2x unrolling for residues of >VF
>> elements, but skip to a second epilogue if there are VF or fewer
>> remaining elements.
>>
>> That example assumes that the last quarter of each iteration of the
>> main loop is predicated in a similar way, with the rest of the iteration
>> being unpredicated.
>
> Yes, so this would work by requesting a fixed-size VF/2 first epilog
> and a VF/2 fixed-size but masked second epilog.  As you have distinct
> modes for masked/non-masked this should already work by means of the
> m_suggested_epilogue_mode field in the costs the target can set.

That sounds a bit different from what I was expecting though, in that
the predicated VF/2 portion would come after the unpredicated VF/2 version,
rather than be interleaved with it.  In:

>> Alternatively, we could have a fully-unpredicated 2x unrolled main
>> loop followed by the same kind of semi-predicated 2x unrolled
>> epilogue loop.
>>
>> So if U == unpredicated and P == predicated:
>>
>>  main loop: U U U P
>>  1st epilogue loop: U P
>>  2nd epilogue loop: P
>>
>>  1st and 2nd epilogues might both be used
>>
>> or:
>>
>>  main loop: U U
>>  1st epilogue loop: U P
>>  2nd epilogue loop: P
>>
>>  1st and 2nd epilogues are mutually exclusive

...the idea really would be to have a single 2x unrolled epilogue,
interleaved in the normal way.  The first instruction in each pair
would be unpredicated and the second instruction would be predicated.
The main loop in the second example would behave similarly.

(To be clear, this isn't an objection to the patch.  I'm just trying
to describe the use case.)

Thanks,
Richard



Re: [PATCH v3 8/9] libstdc++: Implement layout_stride from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 2:37 PM Tomasz Kaminski  wrote:

>
>
> On Wed, May 21, 2025 at 12:04 PM Luc Grosheintz 
> wrote:
>
>> Implements the remaining parts of layout_left and layout_right; and all
>> of layout_stride.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/mdspan(layout_stride): New class.
>>
>> Signed-off-by: Luc Grosheintz 
>> ---
>>  libstdc++-v3/include/std/mdspan | 216 +++-
>>  1 file changed, 213 insertions(+), 3 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/std/mdspan
>> b/libstdc++-v3/include/std/mdspan
>> index 43676c3463c..732fc4eb1c2 100644
>> --- a/libstdc++-v3/include/std/mdspan
>> +++ b/libstdc++-v3/include/std/mdspan
>> @@ -399,6 +399,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>class mapping;
>>};
>>
>> +  struct layout_stride
>> +  {
>> +template
>> +  class mapping;
>> +  };
>> +
>>namespace __mdspan
>>{
>>  template
>> @@ -499,7 +505,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>  template
>>concept __standardized_mapping = __mapping_of> _Mapping>
>> -  || __mapping_of> _Mapping>;
>> +  || __mapping_of> _Mapping>
>> +  || __mapping_of> _Mapping>;
>>
>>  template
>>concept __mapping_like = requires
>> @@ -557,6 +564,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> { }
>>
>> +  template
>> +   requires (is_constructible_v)
>> +   constexpr explicit(extents_type::rank() > 0)
>> +   mapping(const layout_stride::mapping<_OExtents>& __other)
>>
> I think I would make it noexcept, as implementations can add noexcept to
> what is specified in standard.
> And just add appropriate comment.
>
>> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> +   { __glibcxx_assert(*this == __other); }
>> +
>>constexpr mapping&
>>operator=(const mapping&) noexcept = default;
>>
>> @@ -572,8 +586,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> constexpr index_type
>> operator()(_Indices... __indices) const noexcept
>> {
>> - return __mdspan::__linear_index_left(
>> -   this->extents(), static_cast(__indices)...);
>> + return __mdspan::__linear_index_left(_M_extents,
>> +   static_cast(__indices)...);
>>
> Could you move this change to  layout_left commit.
>
>> }
>>
>>static constexpr bool
>> @@ -687,6 +701,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> { }
>>
>> +  template
>> +   requires (is_constructible_v)
>> +   constexpr explicit(extents_type::rank() > 0)
>> +   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
>> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
>> +   { __glibcxx_assert(*this == __other); }
>> +
>>constexpr mapping&
>>operator=(const mapping&) noexcept = default;
>>
>> @@ -760,6 +781,195 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> [[no_unique_address]] _Extents _M_extents;
>>  };
>>
>> +  namespace __mdspan
>> +  {
>> +template
>> +  constexpr typename _Mapping::index_type
>> +  __offset(const _Mapping& __m) noexcept
>> +  {
>>
> As layout_stride::operator== working on generic mapping will be used with
other layouts, I would add here:
if constexpr (__standardized_mapping<_Mapping>)
   return 0;
else
{
   // current impl.
}


> +   using _IndexType = typename _Mapping::index_type;
>> +
>> +   auto __impl = [&__m]> _Counts>(index_sequence<_Counts...>)
>> +   { return __m(((void) _Counts, _IndexType(0))...); };
>> +   return
>> __impl(make_index_sequence<_Mapping::extents_type::rank()>());
>> +  }
>> +
>> +template
>> +  constexpr typename _Mapping::index_type
>> +  __linear_index_strides(const _Mapping& __m,
>> +_Indices... __indices)
>> +  {
>> +   using _IndexType = typename _Mapping::index_type;
>> +   _IndexType __res = 0;
>> +   if constexpr (sizeof...(__indices) > 0)
>> + {
>> +   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
>> + {
>> +   __res += __idx * __m.stride(__pos++);
>> + };
>> +   (__update(__indices), ...);
>> + }
>> +   return __res;
>> +  }
>> +  }
>> +
>> +  template
>> +class layout_stride::mapping
>> +{
>> +public:
>> +  using extents_type = _Extents;
>> +  using index_type = typename extents_type::index_type;
>> +  using size_type = typename extents_type::size_type;
>> +  using rank_type = typename extents_type::rank_type;
>> +  using layout_type = layout_stride;
>> +
>> +  static_assert(__mdspan::__representable_size<_Extents, index_type>,
>> +   "The size of extents_type must be representable as index_type");
>

Re: [PATCH v4] libstdc++: Implement C++26 features (P2546R5)

2025-05-22 Thread Jonathan Wakely
On Mon, 5 May 2025 at 10:16, Uros Bizjak  wrote:
>
> On Thu, May 1, 2025 at 12:59 PM Jonathan Wakely  wrote:
> >
> > This includes the P2810R4 (is_debugger_present is_replaceable) changes,
> > allowing std::is_debugger_present to be replaced by the program.
> >
> > It would be good to provide a macOS definition of is_debugger_present as
> > per https://developer.apple.com/library/archive/qa/qa1361/_index.html
> > but that isn't included in this change.
> >
> > The src/c++26/debugging.cc file defines a global volatile int which can
> > be set by debuggers to indicate when they are attached and detached from
> > a running process. This allows std::is_debugger_present() to give a
> > reliable answer, and additionally allows a debugger to choose how
> > std::breakpoint() should behave. Setting the global to a positive value
> > will cause std::breakpoint() to use that value as an argument to
> > std::raise, so debuggers that prefer SIGABRT for breakpoints can select
> > that. By default std::breakpoint() will use a platform-specific action
> > such as the INT3 instruction on x86, or GCC's __builtin_trap().
> >
> > On Linux the std::is_debugger_present() function checks whether the
> > process is being traced by a process named "gdb", "gdbserver" or
> > "lldb-server", to try to avoid interpreting other tracing processes
> > (such as strace) as a debugger. There have been comments suggesting this
> > isn't desirable and that std::is_debugger_present() should just return
> > true for any tracing process (which is the case for non-Linux targets
> > that support the ptrace system call).
> >
> > libstdc++-v3/ChangeLog:
> >
> > * config.h.in: Regenerate.
> > * configure: Regenerate.
> > * configure.ac: Check for facilities needed by .
> > * include/Makefile.am: Add new header.
> > * include/Makefile.in: Regenerate.
> > * include/bits/version.def (debugging): Add.
> > * include/bits/version.h: Regenerate.
> > * include/precompiled/stdc++.h: Add new header.
> > * src/c++26/Makefile.am: Add new file.
> > * src/c++26/Makefile.in: Regenerate.
> > * include/std/debugging: New file.
> > * src/c++26/debugging.cc: New file.
> > * testsuite/19_diagnostics/debugging/breakpoint.cc: New test.
> > * testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc:
> > New test.
> > * testsuite/19_diagnostics/debugging/is_debugger_present.cc: New
> > test.
> > * testsuite/19_diagnostics/debugging/is_debugger_present-2.cc:
> > New test.
> > ---
>
> > +#elif defined(__i386__) || defined(__x86_64__)
> > +  __asm__ volatile ("int3; nop");
>
> Just a small nit - can this be written as:
>
> "int3\n\tnop"
>
> And perhaps add a small comment about "nop", that is added due to gdb issues.

Thanks, I made that change locally.

I haven't pushed this patch yet because it breaks bootstrap on AIX.
The ptrace syscall has a different signature, so I needed to add a
proper configure test for it. With that fixed, it compiles
debugging.cc but can't assemble it:

Assembler:
/tmp//ccvib7J4.s: line 64: 1252-016 The specified opcode or pseudo-op
is not valid.
   Use supported instructions or pseudo-ops only.
/tmp//ccvib7J4.s: line 84: 1252-016 The specified opcode or pseudo-op
is not valid.
   Use supported instructions or pseudo-ops only.
gmake[3]: *** [Makefile:565: debugging.lo] Error 1



Re: [PATCH v3 4/9] libstdc++: Implement layout_left from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 1:29 PM Luc Grosheintz 
wrote:

> I think part of this didn't get incorporated because I was too hasty
> sending v3. The other I just didn't deem useful (I inline the function
> for v4).
>
> There's a default initialization bug I need to fix: _M_exts and
> _M_strides must be value initialized.
>
Good catch, the value_initialization for extents should be performed here:
   private:
using _S_storage = __array_traits<_IndexType,
_S_rank_dynamic>::_Type;
[[no_unique_address]] _S_storage _M_dynamic_extents;
Because relevant member in standard is declared as:
  array *dynamic-extents*{};

>
> Then also the registration in std.cc.in & I'll squash the first three
> commits.
>
> I'll send v4 later this afternoon, please let me know if you're still
> reviewing (so I don't make the same mistake again).
>
> On 5/22/25 12:43, Tomasz Kaminski wrote:
> > On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz <
> luc.groshei...@gmail.com>
> > wrote:
> >
> >> Implements the parts of layout_left that don't depend on any of the
> >> other layouts.
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >>  * include/std/mdspan (layout_left): New class.
> >>
> >> Signed-off-by: Luc Grosheintz 
> >> ---
> >>   libstdc++-v3/include/std/mdspan | 307 +++-
> >>   1 file changed, 306 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/libstdc++-v3/include/std/mdspan
> >> b/libstdc++-v3/include/std/mdspan
> >> index e5b1b2596d9..66c9d2cffac 100644
> >> --- a/libstdc++-v3/include/std/mdspan
> >> +++ b/libstdc++-v3/include/std/mdspan
> >> @@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>{ return __exts[__i]; });
> >>}
> >>
> >> +   static constexpr span
> >> +   _S_static_extents(size_t __begin, size_t __end) noexcept
> >> +   {
> >> + return {_Extents.data() + __begin, _Extents.data() + __end};
> >> +   }
> >> +
> >> +   constexpr span
> >> +   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
> >> +   requires (_Extents.size() > 0)
> >> +   {
> >> + return {_M_dyn_exts + _S_dynamic_index[__begin],
> >> + _M_dyn_exts + _S_dynamic_index[__end]};
> >> +   }
> >> +
> >> private:
> >>  using _S_storage = __array_traits<_IndexType,
> >> _S_rank_dynamic>::_Type;
> >>  [[no_unique_address]] _S_storage _M_dyn_exts;
> >> @@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>  || _Extent <= numeric_limits<_IndexType>::max();
> >> }
> >>
> >> +  namespace __mdspan
> >> +  {
> >> +template
> >> +  constexpr span
> >> +  __static_extents(size_t __begin = 0, size_t __end =
> >> _Extents::rank())
> >> +  { return _Extents::_S_storage::_S_static_extents(__begin,
> __end); }
> >> +
> >> +template
> >> +  constexpr span
> >> +  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
> >> +   size_t __end = _Extents::rank())
> >> +  {
> >> +   return __exts._M_exts._M_dynamic_extents(__begin, __end);
> >> +  }
> >> +  }
> >> +
> >> template
> >>   class extents
> >>   {
> >> @@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>  : _M_exts(span(__exts))
> >>  { }
> >>
> >> -
> >> template<__mdspan::__valid_index_type _OIndexType,
> >> size_t _Nm>
> >>  requires (_Nm == rank() || _Nm == rank_dynamic())
> >>  constexpr explicit(_Nm != rank_dynamic())
> >> @@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>  }
> >>
> >>   private:
> >> +  friend span
> >> +  __mdspan::__static_extents(size_t, size_t);
> >> +
> >> +  friend span
> >> +  __mdspan::__dynamic_extents(const extents&, size_t,
> >> size_t);
> >> +
> >> using _S_storage = __mdspan::_ExtentsStorage<
> >>  _IndexType, array{_Extents...}>;
> >> [[no_unique_address]] _S_storage _M_exts;
> >> @@ -286,6 +321,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >> namespace __mdspan
> >> {
> >> +template
> >>
> > I have suggested in other e-mail, that we could pass auto const&,
> > and instantiatie this with reference to array that is NTTP to storage.
> >
> >> +  constexpr size_t
> >> +  __static_extents_prod(size_t __begin, size_t __end)
> >> +  {
> >> +   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
> >> +   size_t __ret = 1;
> >> +   for (auto __factor : __sta_exts)
> >> + if (__factor != dynamic_extent)
> >> +   __ret *= __factor;
> >> +   return __ret;
> >> +  }
> >> +
> >> +template
> >> +  constexpr size_t
> >> +  __dynamic_extents_prod(const _Extents& __exts, size_t __begin,
> >> +size_t __end)
> >> +  {
> >> +   auto __dyn_exts = __dynamic_extents<_Extents>(__exts, __begin,
> >> Template parameter is uncessary, it can be deduced.
> >> +__

Re: [PATCH v3 4/9] libstdc++: Implement layout_left from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz 
wrote:

> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (layout_left): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 307 +++-
>  1 file changed, 306 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index e5b1b2596d9..66c9d2cffac 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   { return __exts[__i]; });
>   }
>
> +   static constexpr span
> +   _S_static_extents(size_t __begin, size_t __end) noexcept
> +   {
> + return {_Extents.data() + __begin, _Extents.data() + __end};
> +   }
> +
> +   constexpr span
> +   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
> +   requires (_Extents.size() > 0)
> +   {
> + return {_M_dyn_exts + _S_dynamic_index[__begin],
> + _M_dyn_exts + _S_dynamic_index[__end]};
> +   }
> +
>private:
> using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> [[no_unique_address]] _S_storage _M_dyn_exts;
> @@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> || _Extent <= numeric_limits<_IndexType>::max();
>}
>
> +  namespace __mdspan
> +  {
> +template
> +  constexpr span
> +  __static_extents(size_t __begin = 0, size_t __end =
> _Extents::rank())
> +  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }
> +
> +template
> +  constexpr span
> +  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
> +   size_t __end = _Extents::rank())
> +  {
> +   return __exts._M_exts._M_dynamic_extents(__begin, __end);
> +  }
> +  }
> +
>template
>  class extents
>  {
> @@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : _M_exts(span(__exts))
> { }
>
> -
>template<__mdspan::__valid_index_type _OIndexType,
> size_t _Nm>
> requires (_Nm == rank() || _Nm == rank_dynamic())
> constexpr explicit(_Nm != rank_dynamic())
> @@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>  private:
> +  friend span
> +  __mdspan::__static_extents(size_t, size_t);
> +
> +  friend span
> +  __mdspan::__dynamic_extents(const extents&, size_t,
> size_t);
> +
>using _S_storage = __mdspan::_ExtentsStorage<
> _IndexType, array{_Extents...}>;
>[[no_unique_address]] _S_storage _M_exts;
> @@ -286,6 +321,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
> +  constexpr size_t
> +  __static_extents_prod(size_t __begin, size_t __end)
> +  {
> +   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
> +   size_t __ret = 1;
> +   for (auto __factor : __sta_exts)
> + if (__factor != dynamic_extent)
> +   __ret *= __factor;
> +   return __ret;
> +  }
> +
> +template
> +  constexpr size_t
> +  __dynamic_extents_prod(const _Extents& __exts, size_t __begin,
> +size_t __end)
> +  {
> +   auto __dyn_exts = __dynamic_extents<_Extents>(__exts, __begin,
> +__end);
> +   size_t __ret = 1;
> +   for (auto __factor : __dyn_exts)
> +   __ret *= __factor;
> +   return __ret;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end)
> noexcept
> +  {
> +   using _IndexType = typename _Extents::index_type;
>
Perform this computation in _Extents::size_type so if we overflow the
result before hitting zero extents,
there is no UB. Alos add a comment explaining while we are using size_type.

> +   _IndexType __ret = 1;
> +   if constexpr (_Extents::rank_dynamic() != _Extents::rank())
> + __ret = _IndexType(__static_extents_prod<_Extents>(__begin,
> __end));
> +   if constexpr (_Extents::rank_dynamic() > 0)
> + __ret *= __dynamic_extents_prod(__exts, __begin, __end);
> +   return __ret;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts_prod(__exts, 0, __r); }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts_prod(__exts, __r + 1, __exts.rank()); }
> +
>  template
>auto __build_dextents_type(integer_sequence)
> -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
> @@ -304,6 +387,228 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  exp

Re: [PATCH 2/2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025 at 13:23, Daniel Krügler  wrote:
>
> Am Do., 22. Mai 2025 um 11:48 Uhr schrieb Tomasz Kamiński 
> :
>>
>> From: Jonathan Wakely 
>>
>> This papers implements C++27 std::indirect as specified
>> in P3019 with ammendment to move assgiment from LWG 4251.
>>
>> PR libstdc++/119152
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/Makefile.am: Add new header.
>> * include/Makefile.in: Regenerate.
>> * include/bits/indirect.h: New file.
>> * include/bits/version.def (indirect): Define.
>> * include/bits/version.h: Regenerate.
>> * include/std/memory: Include new header.
>> * testsuite/std/memory/indirect/copy.cc
>> * testsuite/std/memory/indirect/copy_alloc.cc
>> * testsuite/std/memory/indirect/ctor.cc
>> * testsuite/std/memory/indirect/incomplete.cc
>> * testsuite/std/memory/indirect/invalid_neg.cc
>> * testsuite/std/memory/indirect/move.cc
>> * testsuite/std/memory/indirect/move_alloc.cc
>> * testsuite/std/memory/indirect/relops.cc
>>
>> Co-Authored-By: Tomasz Kamiński 
>> Signed-off-by: Tomasz Kamiński 
>> ---
>> Tested on x86_64-linux. OK for trunk?
>>
>>  libstdc++-v3/include/Makefile.am  |   1 +
>>  libstdc++-v3/include/Makefile.in  |   1 +
>>  libstdc++-v3/include/bits/indirect.h  | 459 ++
>>  libstdc++-v3/include/bits/version.def |   9 +
>>  libstdc++-v3/include/bits/version.h   |  10 +
>>  libstdc++-v3/include/std/memory   |   5 +
>>  .../testsuite/std/memory/indirect/copy.cc | 121 +
>>  .../std/memory/indirect/copy_alloc.cc | 228 +
>>  .../testsuite/std/memory/indirect/ctor.cc | 203 
>>  .../std/memory/indirect/incomplete.cc |  38 ++
>>  .../std/memory/indirect/invalid_neg.cc|  28 ++
>>  .../testsuite/std/memory/indirect/move.cc | 144 ++
>>  .../std/memory/indirect/move_alloc.cc | 296 +++
>>  .../testsuite/std/memory/indirect/relops.cc   |  82 
>>  14 files changed, 1625 insertions(+)
>>  create mode 100644 libstdc++-v3/include/bits/indirect.h
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy_alloc.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/incomplete.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/invalid_neg.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move_alloc.cc
>>  create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/relops.cc
>>
>> diff --git a/libstdc++-v3/include/Makefile.am 
>> b/libstdc++-v3/include/Makefile.am
>> index 3e5b6c4142e..b67d470c27e 100644
>> --- a/libstdc++-v3/include/Makefile.am
>> +++ b/libstdc++-v3/include/Makefile.am
>> @@ -210,6 +210,7 @@ bits_headers = \
>> ${bits_srcdir}/gslice_array.h \
>> ${bits_srcdir}/hashtable.h \
>> ${bits_srcdir}/hashtable_policy.h \
>> +   ${bits_srcdir}/indirect.h \
>> ${bits_srcdir}/indirect_array.h \
>> ${bits_srcdir}/ios_base.h \
>> ${bits_srcdir}/istream.tcc \
>> diff --git a/libstdc++-v3/include/Makefile.in 
>> b/libstdc++-v3/include/Makefile.in
>> index 3531162b5f7..6f7f2be68fd 100644
>> --- a/libstdc++-v3/include/Makefile.in
>> +++ b/libstdc++-v3/include/Makefile.in
>> @@ -563,6 +563,7 @@ bits_freestanding = \
>>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/gslice_array.h \
>>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable.h \
>>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable_policy.h \
>> +@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect.h \
>>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect_array.h \
>>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/ios_base.h \
>>  @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/istream.tcc \
>> diff --git a/libstdc++-v3/include/bits/indirect.h 
>> b/libstdc++-v3/include/bits/indirect.h
>> new file mode 100644
>> index 000..32b2af9117d
>> --- /dev/null
>> +++ b/libstdc++-v3/include/bits/indirect.h
>> @@ -0,0 +1,459 @@
>> +// Vocabulary Types for Composite Class Design -*- C++ -*-
>> +
>> +// Copyright The GNU Toolchain Authors.
>> +//
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// Under Section 7 of GPL version 3, you are granted add

Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025 at 15:50, Tomasz Kaminski  wrote:
>
>
>
> On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely  wrote:
>>
>> The current overload set for __unique_copy handles three cases:
>>
>> - The input range uses forward iterators, the output range does not.
>>   This is the simplest case, and can just compare adjacent elements of
>>   the input range.
>>
>> - Neither the input range nor output range use forward iterators.
>>   This requires a local variable copied from the input range and updated
>>   by assigning each element to the local variable.
>>
>> - The output range uses forward iterators.
>>   For this case we compare the current element from the input range with
>>   the element just written to the output range.
>>
>> There are two problems with this implementation. Firstly, the third case
>> assumes that the value type of the output range can be compared to the
>> value type of the input range, which might not be possible at all, or
>> might be possible but give different results to comparing elements of
>> the input range. This is the problem identified in LWG 2439.
>>
>> Secondly, the third case is used when both ranges use forward iterators,
>> even though the first case could (and should) be used. This means that
>> we compare elements from the output range instead of the input range,
>> with the problems described above (either not well-formed, or might give
>> the wrong results).
>>
>> The cause of the second problem is that the overload for the first case
>> looks like:
>>
>> OutputIterator
>> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
>>   forward_iterator_tag, output_iterator_tag);
>>
>> When the output range uses forward iterators this overload cannot be
>> used, because forward_iterator_tag does not inherit from
>> output_iterator_tag, so is not convertible to it.
>>
>> To fix these problems we need to implement the resolution of LWG 2439 so
>> that the third case is only used when the value types of the two ranges
>> are the same. This ensures that the comparisons are well behaved. We
>> also need to ensure that the first case is used when both ranges use
>> forward iterators.
>>
>> This change replaces a single step of tag dispatching to choose between
>> three overloads with two step of tag dispatching, choosing between two
>> overloads at each step. The first step dispatches based on the iterator
>> category of the input range, ignoring the category of the output range.
>> The second step only happens when the input range uses non-forward
>> iterators, and dispatches based on the category of the output range and
>> whether the value type of the two ranges is the same. So now the cases
>> that are handled are:
>>
>> - The input range uses forward iterators.
>> - The output range uses non-forward iterators or a different value type.
>> - The output range uses forward iterators and has the same value type.
>>
>> For the second case, the old code used __gnu_cxx::__ops::__iter_comp_val
>> to wrap the predicate in another level of indirection. That seems
>> unnecessary, as we can just use a pointer to the local variable instead
>> of an iterator referring to it.
>>
>> libstdc++-v3/ChangeLog:
>>
>> PR libstdc++/120386
>> * include/bits/stl_algo.h (__unique_copy_1): New overloads for
>> the case where the input range uses non-forward iterators.
>> (__unique_copy): Replace three overloads with two, depending
>> only on the iterator category of the input range. Dispatch to
>> __unique_copy_1 for the non-forward case.
>> (unique_copy): Only pass the input range category to
>> __unique_copy.
>> ---
>>
>> Tested x86_64-linux.
>
> LGTM. Only small suggestion, regarding the change of order of arguments.

I forgot to say that I need to add tests for each of the cases,
especially the case that fails with the existing code!

>>
>>
>>  libstdc++-v3/include/bits/stl_algo.h | 80 +++-
>>  1 file changed, 44 insertions(+), 36 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/bits/stl_algo.h 
>> b/libstdc++-v3/include/bits/stl_algo.h
>> index f5361aeab7e2..c0bb17f9c8b2 100644
>> --- a/libstdc++-v3/include/bits/stl_algo.h
>> +++ b/libstdc++-v3/include/bits/stl_algo.h
>> @@ -918,24 +918,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>> __gnu_cxx::__ops::__iter_comp_iter(__binary_pred));
>>  }
>>
>> -  /**
>> -   *  This is an uglified
>> -   *  unique_copy(_InputIterator, _InputIterator, _OutputIterator,
>> -   *  _BinaryPredicate)
>> -   *  overloaded for forward iterators and output iterator as result.
>> -  */
>> +  // Implementation of std::unique_copy for forward iterators.
>> +  // This case is easy, just compare *i with *(i-1).
>>template>typename _BinaryPredicate>
>>  _GLIBCXX20_CONSTEXPR
>>  _OutputIterator
>>  __unique_copy(_ForwardIterator __first, _ForwardIterator __last,
>>   _Outp

Re: [PATCH 0/3] Redirect to specific target based on TARGET_VERSION_COMPATIBLE

2025-05-22 Thread Alfie Richards

Hi Jeff,

I sent this patch with my implementation a while ago:
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681043.html

There hasn't been any feedback on that patch yet.

These patches are still useful and I would like to go ahead with them. I 
am in favour of using my implementation as it is a bit stronger, but it 
also requires my larger FMV series to be approved first.


Thanks,
Alfie

On 22/05/2025 04:30, Jeff Law wrote:



On 4/14/25 5:34 AM, Yangyu Chen wrote:




On 14 Apr 2025, at 19:06, Alfie Richards  wrote:

Hi Yangyu,

This looks great with what we discussed previously.

I have a very similar patch that implements a slightly stronger 
optimisation that I was about to send. It makes use of information if 
the caller is versioned. I will share this with you shortly and we 
can work out what we wish to use?


Sure! Thank you!
So do we have a sense of which of the two approaches we want to try and 
move forward?   Or to put it another way, are your patches still useful 
and if so, do we have the most recent versions posted for review?


Jeff





Re: [PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 5:04 PM Jonathan Wakely  wrote:

> On Thu, 22 May 2025 at 15:50, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Thu, May 22, 2025 at 1:42 PM Jonathan Wakely 
> wrote:
> >>
> >> The current overload set for __unique_copy handles three cases:
> >>
> >> - The input range uses forward iterators, the output range does not.
> >>   This is the simplest case, and can just compare adjacent elements of
> >>   the input range.
> >>
> >> - Neither the input range nor output range use forward iterators.
> >>   This requires a local variable copied from the input range and updated
> >>   by assigning each element to the local variable.
> >>
> >> - The output range uses forward iterators.
> >>   For this case we compare the current element from the input range with
> >>   the element just written to the output range.
> >>
> >> There are two problems with this implementation. Firstly, the third case
> >> assumes that the value type of the output range can be compared to the
> >> value type of the input range, which might not be possible at all, or
> >> might be possible but give different results to comparing elements of
> >> the input range. This is the problem identified in LWG 2439.
> >>
> >> Secondly, the third case is used when both ranges use forward iterators,
> >> even though the first case could (and should) be used. This means that
> >> we compare elements from the output range instead of the input range,
> >> with the problems described above (either not well-formed, or might give
> >> the wrong results).
> >>
> >> The cause of the second problem is that the overload for the first case
> >> looks like:
> >>
> >> OutputIterator
> >> __unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
> >>   forward_iterator_tag, output_iterator_tag);
> >>
> >> When the output range uses forward iterators this overload cannot be
> >> used, because forward_iterator_tag does not inherit from
> >> output_iterator_tag, so is not convertible to it.
> >>
> >> To fix these problems we need to implement the resolution of LWG 2439 so
> >> that the third case is only used when the value types of the two ranges
> >> are the same. This ensures that the comparisons are well behaved. We
> >> also need to ensure that the first case is used when both ranges use
> >> forward iterators.
> >>
> >> This change replaces a single step of tag dispatching to choose between
> >> three overloads with two step of tag dispatching, choosing between two
> >> overloads at each step. The first step dispatches based on the iterator
> >> category of the input range, ignoring the category of the output range.
> >> The second step only happens when the input range uses non-forward
> >> iterators, and dispatches based on the category of the output range and
> >> whether the value type of the two ranges is the same. So now the cases
> >> that are handled are:
> >>
> >> - The input range uses forward iterators.
> >> - The output range uses non-forward iterators or a different value type.
> >> - The output range uses forward iterators and has the same value type.
> >>
> >> For the second case, the old code used __gnu_cxx::__ops::__iter_comp_val
> >> to wrap the predicate in another level of indirection. That seems
> >> unnecessary, as we can just use a pointer to the local variable instead
> >> of an iterator referring to it.
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >> PR libstdc++/120386
> >> * include/bits/stl_algo.h (__unique_copy_1): New overloads for
> >> the case where the input range uses non-forward iterators.
> >> (__unique_copy): Replace three overloads with two, depending
> >> only on the iterator category of the input range. Dispatch to
> >> __unique_copy_1 for the non-forward case.
> >> (unique_copy): Only pass the input range category to
> >> __unique_copy.
> >> ---
> >>
> >> Tested x86_64-linux.
> >
> > LGTM. Only small suggestion, regarding the change of order of arguments.
>
> I forgot to say that I need to add tests for each of the cases,
> especially the case that fails with the existing code!
>
> >>
> >>
> >>  libstdc++-v3/include/bits/stl_algo.h | 80 +++-
> >>  1 file changed, 44 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/libstdc++-v3/include/bits/stl_algo.h
> b/libstdc++-v3/include/bits/stl_algo.h
> >> index f5361aeab7e2..c0bb17f9c8b2 100644
> >> --- a/libstdc++-v3/include/bits/stl_algo.h
> >> +++ b/libstdc++-v3/include/bits/stl_algo.h
> >> @@ -918,24 +918,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> __gnu_cxx::__ops::__iter_comp_iter(__binary_pred));
> >>  }
> >>
> >> -  /**
> >> -   *  This is an uglified
> >> -   *  unique_copy(_InputIterator, _InputIterator, _OutputIterator,
> >> -   *  _BinaryPredicate)
> >> -   *  overloaded for forward iterators and output iterator as result.
> >> -  */
> >> +  // Implementation of std::unique_copy for forward iterators.
> >> +  // This case is e

[FYI] [vxworks] build partial libatomic

2025-05-22 Thread Alexandre Oliva


Since vxworks' libc contains much of libatomic, in not-very-granular
modules, building all of libatomic doesn't work very well.

However, some expected entry points are not present in libc, so
arrange for libatomic to build only those missing bits.

I'm putting this in as "build machinery"; please let me know in case of
objections.


for  libatomic/ChangeLog

* configure.tgt: Set partial_libatomic on *-*-vxworks*.
* configure.ac (PARTIAL_VXWORKS): New AM_CONDITIONAL.
* Makefile.am (libatomic_la_SOURCES): Select few sources for
PARTIAL_VXWORKS.
* configure, Makefile.in: Rebuilt.
---
 libatomic/Makefile.am   |8 +++
 libatomic/Makefile.in   |  109 ++-
 libatomic/configure |   20 -
 libatomic/configure.ac  |3 +
 libatomic/configure.tgt |4 ++
 5 files changed, 93 insertions(+), 51 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index 0f1a71560848a..65dff6ece9ff8 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -70,11 +70,16 @@ endif
 
 libatomic_la_LDFLAGS = $(libatomic_version_info) $(libatomic_version_script) \
$(lt_host_flags) $(libatomic_darwin_rpath)
+
+SIZES = @SIZES@
+
+if PARTIAL_VXWORKS
+libatomic_la_SOURCES = fenv.c fence.c flag.c
+else
 libatomic_la_SOURCES = gload.c gstore.c gcas.c gexch.c glfree.c lock.c init.c \
fenv.c fence.c flag.c
 
 SIZEOBJS = load store cas exch fadd fsub fand fior fxor fnand tas
-SIZES = @SIZES@
 
 EXTRA_libatomic_la_SOURCES = $(addsuffix _n.c,$(SIZEOBJS))
 libatomic_la_DEPENDENCIES = $(libatomic_la_LIBADD) $(libatomic_version_dep)
@@ -152,6 +157,7 @@ endif
 if ARCH_AARCH64_LINUX
 libatomic_la_SOURCES += atomic_16.S
 endif
+endif
 
 libatomic_convenience_la_SOURCES = $(libatomic_la_SOURCES)
 libatomic_convenience_la_LIBADD = $(libatomic_la_LIBADD)
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
diff --git a/libatomic/configure b/libatomic/configure
[omitted]
diff --git a/libatomic/configure.ac b/libatomic/configure.ac
index aafae71028d2d..01141f6437697 100644
--- a/libatomic/configure.ac
+++ b/libatomic/configure.ac
@@ -175,11 +175,14 @@ esac
 AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)
 
 # Get target configury.
+partial_libatomic=
 . ${srcdir}/configure.tgt
 if test -n "$UNSUPPORTED"; then
   AC_MSG_ERROR([Configuration ${target} is unsupported.])
 fi
 
+AM_CONDITIONAL(PARTIAL_VXWORKS, test "x$partial_libatomic" = "xvxworks")
+
 # Write out the ifunc resolver arg type.
 AC_DEFINE_UNQUOTED(IFUNC_RESOLVER_ARGS, $IFUNC_RESOLVER_ARGS,
[Define ifunc resolver function argument.])
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index 6db039d6e8bb6..606d249116af5 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -185,6 +185,10 @@ case "${target}" in
   nvptx*-*-*)
;;
 
+  *-*-vxworks*)
+partial_libatomic=vxworks
+   ;;
+
   *)
# Who are you?
UNSUPPORTED=1


-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


[pushed] c++: constexpr always_inline [PR120935]

2025-05-22 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In cp_fold we do speculative constant evaluation of constexpr calls when
inlining is enabled.  Let's also do it for always_inline functions.

PR c++/120935

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold): Check always_inline.

gcc/testsuite/ChangeLog:

* g++.dg/opt/always_inline2.C: New test.
* g++.dg/debug/dwarf2/pubnames-2.C: Suppress -fimplicit-constexpr.
* g++.dg/debug/dwarf2/pubnames-3.C: Likewise.
---
 gcc/cp/cp-gimplify.cc |  4 ++-
 .../g++.dg/debug/dwarf2/pubnames-2.C  |  2 +-
 .../g++.dg/debug/dwarf2/pubnames-3.C  |  2 +-
 gcc/testsuite/g++.dg/opt/always_inline2.C | 28 +++
 4 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/always_inline2.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index f7bd453bc5e..03d5352977b 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3441,7 +3441,9 @@ cp_fold (tree x, fold_flags_t flags)
   Do constexpr expansion of expressions where the call itself is not
   constant, but the call followed by an INDIRECT_REF is.  */
if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
-   && !flag_no_inline)
+   && (!flag_no_inline
+   || lookup_attribute ("always_inline",
+DECL_ATTRIBUTES (callee
  {
mce_value manifestly_const_eval = mce_unknown;
if (flags & ff_mce_false)
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C
index 1fb5004df40..96469d4d332 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C
@@ -1,6 +1,6 @@
 // { dg-do compile { target c++11 } }
 // { dg-skip-if "" { powerpc-ibm-aix* } }
-// { dg-options "-gpubnames -gdwarf-4 -fno-debug-types-section -dA 
-fno-inline" }
+// { dg-options "-gpubnames -gdwarf-4 -fno-debug-types-section -dA -fno-inline 
-fno-implicit-constexpr" }
 // { dg-final { scan-assembler-times "\.section\[\t \]\[^\n\]*debug_pubnames" 
1 } }
 // { dg-final { scan-assembler "\"\\(anonymous namespace\\)0\"+\[ 
\t\]+\[#;/|@!]+\[ \t\]+external name" } }
 // { dg-final { scan-assembler "\"one0\"+\[ \t\]+\[#;/|@!]+\[ 
\t\]+external name" } }
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-3.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-3.C
index 37e04fb6c97..f635803d45a 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-3.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-3.C
@@ -1,6 +1,6 @@
 // { dg-do compile { target c++11 } }
 // { dg-skip-if "" { powerpc-ibm-aix* } }
-// { dg-options "-gpubnames -gdwarf-4 -fdebug-types-section -dA -fno-inline" }
+// { dg-options "-gpubnames -gdwarf-4 -fdebug-types-section -dA -fno-inline 
-fno-implicit-constexpr" }
 // { dg-final { scan-assembler-times "\.section\[\t \]\[^\n\]*debug_pubnames" 
1 } }
 // { dg-final { scan-assembler "\"\\(anonymous namespace\\)0\"+\[ 
\t\]+\[#;/|@!]+\[ \t\]+external name" } }
 // { dg-final { scan-assembler "\"one0\"+\[ \t\]+\[#;/|@!]+\[ 
\t\]+external name" } }
diff --git a/gcc/testsuite/g++.dg/opt/always_inline2.C 
b/gcc/testsuite/g++.dg/opt/always_inline2.C
new file mode 100644
index 000..8cfdd67e36c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/always_inline2.C
@@ -0,0 +1,28 @@
+// PR c++/120935
+// { dg-additional-options "-fdump-tree-optimized" }
+// { dg-final { scan-tree-dump-not "goto" "optimized" } }
+// { dg-do compile { target c++11 } }
+
+void x(int);
+
+[[gnu::always_inline]] constexpr bool
+is_constant_evaluated()
+{ return __builtin_is_constant_evaluated(); }
+
+struct Iter
+{
+typedef int value_type;
+
+int& operator*() const;
+Iter& operator++();
+bool operator!=(const Iter&) const;
+};
+
+void f(Iter first, Iter last)
+{
+if (__is_trivial(Iter::value_type))
+if (!is_constant_evaluated())
+return;
+for (; first != last; ++first)
+x(*first);
+}

base-commit: 10360c1b0d45ae129df616a9e9b1db5f2a2eaef8
-- 
2.49.0



Re: [PATCH 0/3] Redirect to specific target based on TARGET_VERSION_COMPATIBLE

2025-05-22 Thread Jeff Law




On 5/22/25 9:05 AM, Alfie Richards wrote:

Hi Jeff,

I sent this patch with my implementation a while ago:
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681043.html

There hasn't been any feedback on that patch yet.

These patches are still useful and I would like to go ahead with them. I 
am in favour of using my implementation as it is a bit stronger, but it 
also requires my larger FMV series to be approved first.
Can you ping your larger FMV series?  I strongly suspect everyone is 
digging out from everything that queued up while the trunk was in 
bugfixing stages.


Yangyu -- what are your thought here?  If we went with Alfie's patch, 
does it solve the problems you're interested in, and what patches of 
yours would still be relevant if we incorporated Alfie's work?


Jeff



[testsuite] Remove obsolete ada/acats/overflow.lst file

2025-05-22 Thread Eric Botcazou
It is used to specify which files are compiled with -gnato, but the switch has 
been the default for at least a decade.

Tested on x86-64/Linux, applied on the mainline.


2025-05-22  Eric Botcazou  

* ada/acats/overflow.lst: Delete.
* ada/acats/run_all.sh: Do not process overflow.lst.

-- 
Eric Botcazoudiff --git a/gcc/testsuite/ada/acats/overflow.lst b/gcc/testsuite/ada/acats/overflow.lst
deleted file mode 100644
index fb76ef17705..000
--- a/gcc/testsuite/ada/acats/overflow.lst
+++ /dev/null
@@ -1,17 +0,0 @@
-c45632a
-c45632b
-c45632c
-c45504a
-c45504b
-c45504c
-c45613a
-c45613b
-c45613c
-c45304a
-c45304b
-c45304c
-c46014a
-c460008
-c460011
-c4a012b
-cb20004
diff --git a/gcc/testsuite/ada/acats/run_all.sh b/gcc/testsuite/ada/acats/run_all.sh
index 38ec4692899..2f737854c60 100755
--- a/gcc/testsuite/ada/acats/run_all.sh
+++ b/gcc/testsuite/ada/acats/run_all.sh
@@ -303,10 +303,6 @@ for chapter in $chapters; do
   fi
 
   extraflags="-gnat95"
-  grep $i $testdir/overflow.lst > /dev/null 2>&1
-  if [ $? -eq 0 ]; then
- extraflags="$extraflags -gnato"
-  fi
   grep $i $testdir/elabd.lst > /dev/null 2>&1
   if [ $? -eq 0 ]; then
  extraflags="$extraflags -gnatE"


Re: [PATCH v3 8/9] libstdc++: Implement layout_stride from mdspan.

2025-05-22 Thread Luc Grosheintz




On 5/22/25 14:37, Tomasz Kaminski wrote:

On Wed, May 21, 2025 at 12:04 PM Luc Grosheintz 
wrote:


Implements the remaining parts of layout_left and layout_right; and all
of layout_stride.

libstdc++-v3/ChangeLog:

 * include/std/mdspan(layout_stride): New class.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan | 216 +++-
  1 file changed, 213 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 43676c3463c..732fc4eb1c2 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -399,6 +399,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
class mapping;
};

+  struct layout_stride
+  {
+template
+  class mapping;
+  };
+
namespace __mdspan
{
  template
@@ -499,7 +505,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  template
concept __standardized_mapping = __mapping_of
-  || __mapping_of;
+  || __mapping_of
+  || __mapping_of;

  template
concept __mapping_like = requires
@@ -557,6 +564,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : mapping(__other.extents(), __mdspan::__internal_ctor{})
 { }

+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other)


I think I would make it noexcept, as implementations can add noexcept to
what is specified in standard.
And just add appropriate comment.


Alright, I prepared this as a separate commit. I don't remember coming
to a conclusion, when it was discussed in the the email chain about
potential bugs in the standard. IIUC Jonathan thought the missing noexcept
was both preferable and an intentional part of the standard.

We should probably make sure he's aware we made this choice (the separate
commit should help).




+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { __glibcxx_assert(*this == __other); }
+
constexpr mapping&
operator=(const mapping&) noexcept = default;

@@ -572,8 +586,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr index_type
 operator()(_Indices... __indices) const noexcept
 {
- return __mdspan::__linear_index_left(
-   this->extents(), static_cast(__indices)...);
+ return __mdspan::__linear_index_left(_M_extents,
+   static_cast(__indices)...);


Could you move this change to  layout_left commit.


 }

static constexpr bool
@@ -687,6 +701,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : mapping(__other.extents(), __mdspan::__internal_ctor{})
 { }

+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { __glibcxx_assert(*this == __other); }
+
constexpr mapping&
operator=(const mapping&) noexcept = default;

@@ -760,6 +781,195 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 [[no_unique_address]] _Extents _M_extents;
  };

+  namespace __mdspan
+  {
+template
+  constexpr typename _Mapping::index_type
+  __offset(const _Mapping& __m) noexcept
+  {
+   using _IndexType = typename _Mapping::index_type;
+
+   auto __impl = [&__m](index_sequence<_Counts...>)
+   { return __m(((void) _Counts, _IndexType(0))...); };
+   return
__impl(make_index_sequence<_Mapping::extents_type::rank()>());
+  }
+
+template
+  constexpr typename _Mapping::index_type
+  __linear_index_strides(const _Mapping& __m,
+_Indices... __indices)
+  {
+   using _IndexType = typename _Mapping::index_type;
+   _IndexType __res = 0;
+   if constexpr (sizeof...(__indices) > 0)
+ {
+   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
+ {
+   __res += __idx * __m.stride(__pos++);
+ };
+   (__update(__indices), ...);
+ }
+   return __res;
+  }
+  }
+
+  template
+class layout_stride::mapping
+{
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_stride;
+
+  static_assert(__mdspan::__representable_size<_Extents, index_type>,
+   "The size of extents_type must be representable as index_type");
+
+  constexpr
+  mapping() noexcept


+  {

+   auto __stride = index_type(1);
+   for (size_t __i = extents_type::rank(); __i > 0; --__i)
+ {
+   _M_strides[__i - 1] = __stride;
+   __stride *= _M_extents.exten

[PATCH RFC] diagnostics: use -Wformat-diag more consistently

2025-05-22 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, any objection?

-- 8< --

r10-1211 added various -Wformat-diag warnings about quoting in GCC
diagnostic strings, but didn't change these two quoting warnings to use that
flag as well.

gcc/c-family/ChangeLog:

* c-format.cc (flag_chars_t::validate): Control quoting warnings
with -Wformat-diag.
---
 gcc/c-family/c-format.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 211d20dd25b..a44249a0222 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -2124,7 +2124,7 @@ flag_chars_t::validate (const format_kind_info *fki,
{
  format_warning_at_char (format_string_loc, format_string_cst,
  format_chars - orig_format_chars - 1,
- OPT_Wformat_,
+ OPT_Wformat_diag,
  "%s used within a quoted sequence",
  _(s->name));
}
@@ -2137,7 +2137,7 @@ flag_chars_t::validate (const format_kind_info *fki,
 {
   format_warning_at_char (format_string_loc, format_string_cst,
  format_chars - orig_format_chars,
- OPT_Wformat_,
+ OPT_Wformat_diag,
  "%qc conversion used unquoted",
  format_char);
 }

base-commit: f5016d8492e4067faef2f9403370a4b49f7a3898
-- 
2.49.0



[committed] c: Document C23 implementation-defined behavior

2025-05-22 Thread Joseph Myers
Add references to C23 subclauses to the documentation of
implementation-defined behavior, and new entries for
implementation-defined behavior new in C23; change some references in
the text to e.g. "C99 and C11" to encompass C23 as well.

Tested with "make info html pdf".

* doc/implement-c.texi: Document C23 implementation-defined
behavior.
(Constant expressions implementation, Types implementation): New
nodes.

diff --git a/gcc/doc/implement-c.texi b/gcc/doc/implement-c.texi
index a942a127cc7f..bdfb6342f0df 100644
--- a/gcc/doc/implement-c.texi
+++ b/gcc/doc/implement-c.texi
@@ -10,8 +10,8 @@ A conforming implementation of ISO C is required to document 
its
 choice of behavior in each of the areas that are designated
 ``implementation defined''.  The following lists all such areas,
 along with the section numbers from the ISO/IEC 9899:1990, ISO/IEC
-9899:1999 and ISO/IEC 9899:2011 standards.  Some areas are only
-implementation-defined in one version of the standard.
+9899:1999, ISO/IEC 9899:2011 and ISO/IEC 9899:2024 standards.  Some
+areas are only implementation-defined in one version of the standard.
 
 Some choices depend on the externally determined ABI for the platform
 (including standard character encodings) which GCC follows; these are
@@ -47,15 +47,15 @@ a freestanding environment); refer to their documentation 
for details.
 
 @itemize @bullet
 @item
-@cite{How a diagnostic is identified (C90 3.7, C99 and C11 3.10, C90,
-C99 and C11 5.1.1.3).}
+@cite{How a diagnostic is identified (C90 3.7, C99 and C11 3.10, C23
+3.13, C90, C99 and C11 5.1.1.3, C23 5.2.1.3).}
 
 Diagnostics consist of all the output sent to stderr by GCC@.
 
 @item
 @cite{Whether each nonempty sequence of white-space characters other than
 new-line is retained or replaced by one space character in translation
-phase 3 (C90, C99 and C11 5.1.1.2).}
+phase 3 (C90, C99 and C11 5.1.1.2, C23 5.2.1.2).}
 
 @xref{Implementation-defined behavior, , Implementation-defined
 behavior, cpp, The C Preprocessor}.
@@ -72,7 +72,7 @@ of the C library, and are not defined by GCC itself.
 @item
 @cite{The mapping between physical source file multibyte characters
 and the source character set in translation phase 1 (C90, C99 and C11
-5.1.1.2).}
+5.1.1.2, C23 5.2.1.2).}
 
 @xref{Implementation-defined behavior, , Implementation-defined
 behavior, cpp, The C Preprocessor}.
@@ -85,14 +85,16 @@ behavior, cpp, The C Preprocessor}.
 @itemize @bullet
 @item
 @cite{Which additional multibyte characters may appear in identifiers
-and their correspondence to universal character names (C99 and C11 6.4.2).}
+and their correspondence to universal character names (C99 and C11
+6.4.2, C23 6.4.3).}
 
 @xref{Implementation-defined behavior, , Implementation-defined
 behavior, cpp, The C Preprocessor}.
 
 @item
 @cite{The number of significant initial characters in an identifier
-(C90 6.1.2, C90, C99 and C11 5.2.4.1, C99 and C11 6.4.2).}
+(C90 6.1.2, C90, C99 and C11 5.2.4.1, C23 5.3.5.2, C99 and C11 6.4.2,
+C23 6.4.3).}
 
 For internal names, all characters are significant.  For external names,
 the number of significant characters are defined by the linker; for
@@ -102,7 +104,7 @@ almost all targets, all characters are significant.
 @cite{Whether case distinctions are significant in an identifier with
 external linkage (C90 6.1.2).}
 
-This is a property of the linker.  C99 and C11 require that case distinctions
+This is a property of the linker.  C99 and later require that case distinctions
 are always significant in identifiers with external linkage and
 systems without this property are not supported by GCC@.
 
@@ -113,34 +115,35 @@ systems without this property are not supported by GCC@.
 
 @itemize @bullet
 @item
-@cite{The number of bits in a byte (C90 3.4, C99 and C11 3.6).}
+@cite{The number of bits in a byte (C90 3.4, C99 and C11 3.6, C23 3.7).}
 
 Determined by ABI@.
 
 @item
 @cite{The values of the members of the execution character set (C90,
-C99 and C11 5.2.1).}
+C99 and C11 5.2.1, C23 5.3.1).}
 
 Determined by ABI@.
 
 @item
 @cite{The unique value of the member of the execution character set produced
 for each of the standard alphabetic escape sequences (C90, C99 and C11
-5.2.2).}
+5.2.2, C23 5.3.3).}
 
 Determined by ABI@.
 
 @item
 @cite{The value of a @code{char} object into which has been stored any
 character other than a member of the basic execution character set
-(C90 6.1.2.5, C99 and C11 6.2.5).}
+(C90 6.1.2.5, C99, C11 and C23 6.2.5).}
 
 Determined by ABI@.
 
 @item
 @cite{Which of @code{signed char} or @code{unsigned char} has the same
 range, representation, and behavior as ``plain'' @code{char} (C90
-6.1.2.5, C90 6.2.1.1, C99 and C11 6.2.5, C99 and C11 6.3.1.1).}
+6.1.2.5, C90 6.2.1.1, C99, C11 and C23 6.2.5, C99 and C11 6.3.1.1, C23
+6.3.2.1).}
 
 @opindex fsigned-char
 @opindex funsigned-char
@@ -148,17 +151,33 @@ Determined by ABI@.  The options @option{-funsigned-char} 
and
 @option{-fsigned-char} 

[PATCH 2/2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Tomasz Kamiński
From: Jonathan Wakely 

This papers implements C++27 std::indirect as specified
in P3019 with ammendment to move assgiment from LWG 4251.

PR libstdc++/119152

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/indirect.h: New file.
* include/bits/version.def (indirect): Define.
* include/bits/version.h: Regenerate.
* include/std/memory: Include new header.
* testsuite/std/memory/indirect/copy.cc
* testsuite/std/memory/indirect/copy_alloc.cc
* testsuite/std/memory/indirect/ctor.cc
* testsuite/std/memory/indirect/incomplete.cc
* testsuite/std/memory/indirect/invalid_neg.cc
* testsuite/std/memory/indirect/move.cc
* testsuite/std/memory/indirect/move_alloc.cc
* testsuite/std/memory/indirect/relops.cc

Co-Authored-By: Tomasz Kamiński 
Signed-off-by: Tomasz Kamiński 
---
Tested on x86_64-linux. OK for trunk?

 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/indirect.h  | 459 ++
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/memory   |   5 +
 .../testsuite/std/memory/indirect/copy.cc | 121 +
 .../std/memory/indirect/copy_alloc.cc | 228 +
 .../testsuite/std/memory/indirect/ctor.cc | 203 
 .../std/memory/indirect/incomplete.cc |  38 ++
 .../std/memory/indirect/invalid_neg.cc|  28 ++
 .../testsuite/std/memory/indirect/move.cc | 144 ++
 .../std/memory/indirect/move_alloc.cc | 296 +++
 .../testsuite/std/memory/indirect/relops.cc   |  82 
 14 files changed, 1625 insertions(+)
 create mode 100644 libstdc++-v3/include/bits/indirect.h
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy_alloc.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/incomplete.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/invalid_neg.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move_alloc.cc
 create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/relops.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 3e5b6c4142e..b67d470c27e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -210,6 +210,7 @@ bits_headers = \
${bits_srcdir}/gslice_array.h \
${bits_srcdir}/hashtable.h \
${bits_srcdir}/hashtable_policy.h \
+   ${bits_srcdir}/indirect.h \
${bits_srcdir}/indirect_array.h \
${bits_srcdir}/ios_base.h \
${bits_srcdir}/istream.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 3531162b5f7..6f7f2be68fd 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -563,6 +563,7 @@ bits_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/gslice_array.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable_policy.h \
+@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect_array.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/ios_base.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/istream.tcc \
diff --git a/libstdc++-v3/include/bits/indirect.h 
b/libstdc++-v3/include/bits/indirect.h
new file mode 100644
index 000..32b2af9117d
--- /dev/null
+++ b/libstdc++-v3/include/bits/indirect.h
@@ -0,0 +1,459 @@
+// Vocabulary Types for Composite Class Design -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/bits/indirec

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-22 Thread Ramana Radhakrishnan
On Thu, May 22, 2025 at 11:09 AM Ramana Radhakrishnan
 wrote:
>
> On Wed, May 7, 2025 at 6:18 PM Richard Earnshaw  wrote:
> >
> >
> > The header file for the Arm implementation of mmintrin.h was changed in 
> > GCC-15
> > to disable access to the intrinsics.  This patch removes the internal code
> > as well.
> >
> > We still allow -mcpu/-march options for the wmmx cpus, but they are now 
> > treated
> > in exactly the same way as XScale - generating code for an Armv5te 
> > architecture.
>
> I'll review with the docs but I'd prefer to make this change of
> behaviour explicit in our documentation.
>

Scratch that - I should read the lists better.

>
> >
> > Richard Earnshaw (13):
> >   arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS
> >   arm: testsuite: remove iwmmxt tests
> >   arm: treat -mcpu/arch=iwmmxt{,2} like XScale
> >   arm: remove iWMMX builtins support.
> >   arm: Remove iwmmxt patterns.
> >   arm: remove IWMMXT checks from MD files.
> >   arm: remove support for the iwmmxt ABI variant.
> >   arm: Remove iwmmxt support from arm.cc
> >   arm: remove iwmmxt-related attributes from machine description
> >   arm: cleanup iterators.md after removing iwmmxt
> >   arm: remove dead predefines when using WMMX
> >   arm: remove most remaining iwmmxt code.
> >   arm: remove iwmmxt registers from allocator tables
> >
> >  gcc/config.gcc |2 +-
> >  gcc/config/arm/aout.h  |5 -
> >  gcc/config/arm/arm-builtins.cc | 1276 +
> >  gcc/config/arm/arm-c.cc|7 -
> >  gcc/config/arm/arm-cpus.in |   28 +-
> >  gcc/config/arm/arm-generic.md  |4 +-
> >  gcc/config/arm/arm-opts.h  |1 -
> >  gcc/config/arm/arm-protos.h|8 -
> >  gcc/config/arm/arm-tables.opt  |6 -
> >  gcc/config/arm/arm-tune.md |   53 +-
> >  gcc/config/arm/arm.cc  |  401 +-
> >  gcc/config/arm/arm.h   |  169 +--
> >  gcc/config/arm/arm.md  |   43 +-
> >  gcc/config/arm/arm.opt |3 -
> >  gcc/config/arm/constraints.md  |   18 +-
> >  gcc/config/arm/iterators.md|   20 +-
> >  gcc/config/arm/iwmmxt.md   | 1766 
> >  gcc/config/arm/iwmmxt2.md  |  903 
> >  gcc/config/arm/marvell-f-iwmmxt.md |  189 ---
> >  gcc/config/arm/predicates.md   |8 +-
> >  gcc/config/arm/t-arm   |3 -
> >  gcc/config/arm/thumb2.md   |2 +-
> >  gcc/config/arm/types.md|  123 --
> >  gcc/config/arm/unspecs.md  |   29 -
> >  gcc/config/arm/vec-common.md   |   31 +-
> >  gcc/doc/invoke.texi|2 +-
> >  gcc/doc/sourcebuild.texi   |4 -
> >  gcc/testsuite/gcc.target/arm/ivopts.c  |3 +-
> >  gcc/testsuite/gcc.target/arm/mmx-1.c   |   26 -
> >  gcc/testsuite/gcc.target/arm/mmx-2.c   |  166 ---
> >  gcc/testsuite/gcc.target/arm/pr64208.c |   25 -
> >  gcc/testsuite/gcc.target/arm/pr79145.c |   16 -
> >  gcc/testsuite/gcc.target/arm/pr99724.c |   31 -
> >  gcc/testsuite/gcc.target/arm/pr99786.c |   30 -
> >  gcc/testsuite/lib/target-supports.exp  |   13 -
> >  35 files changed, 141 insertions(+), 5273 deletions(-)
> >  delete mode 100644 gcc/config/arm/iwmmxt.md
> >  delete mode 100644 gcc/config/arm/iwmmxt2.md
> >  delete mode 100644 gcc/config/arm/marvell-f-iwmmxt.md
> >  delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-1.c
> >  delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c
> >  delete mode 100644 gcc/testsuite/gcc.target/arm/pr64208.c
> >  delete mode 100644 gcc/testsuite/gcc.target/arm/pr79145.c
> >  delete mode 100644 gcc/testsuite/gcc.target/arm/pr99724.c
> >  delete mode 100644 gcc/testsuite/gcc.target/arm/pr99786.c
> >
> > --
> > 2.43.0
> >


[PATCH 1/2] libstdc++: Define _Scoped_allocation RAII helper

2025-05-22 Thread Tomasz Kamiński
From: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/allocated_ptr.h (_Scoped_allocation): New class
template.

Co-Authored-By: Tomasz Kamiński 
Signed-off-by: Tomasz Kamiński 
---
Tested on x86_64-linux. OK for trunk?

 libstdc++-v3/include/bits/allocated_ptr.h | 96 +++
 1 file changed, 96 insertions(+)

diff --git a/libstdc++-v3/include/bits/allocated_ptr.h 
b/libstdc++-v3/include/bits/allocated_ptr.h
index 0b2b6fe5820..aa5355f0e2f 100644
--- a/libstdc++-v3/include/bits/allocated_ptr.h
+++ b/libstdc++-v3/include/bits/allocated_ptr.h
@@ -36,6 +36,7 @@
 # include 
 # include 
 # include 
+# include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -136,6 +137,101 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return { std::__allocate_guarded(__a) };
 }
 
+  // An RAII type that acquires memory from an allocator.
+  // N.B.  'scoped' here in in the RAII sense, not the scoped allocator model,
+  // so this has nothing to do with `std::scoped_allocator_adaptor`.
+  // This class can be used to simplify the common pattern:
+  //
+  // auto ptr = alloc.allocate(1);
+  // try {
+  //   std::construct_at(std::to_address(ptr), args);
+  //   m_ptr = ptr;
+  // } catch (...) {
+  //   alloc.deallocate(ptr, 1);
+  //   throw;
+  // }
+  //
+  // Instead you can do:
+  //
+  // _Scoped_allocation sa(alloc);
+  // m_ptr = std::construct_at(sa.get(), args);
+  // (void) sa.release();
+  //
+  // Or even simpler:
+  //
+  // _Scoped_allocation sa(alloc, std::in_place, args);
+  // m_ptr = sa.release();
+  //
+  template
+struct _Scoped_allocation
+{
+  using value_type = typename allocator_traits<_Alloc>::value_type;
+  using pointer = typename allocator_traits<_Alloc>::pointer;
+
+  // Use `a` to allocate memory for `n` objects.
+  constexpr explicit
+  _Scoped_allocation(const _Alloc& __a, size_t __n = 1)
+  : _M_a(__a), _M_n(__n), _M_p(_M_a.allocate(__n))
+  { }
+
+#if __glibcxx_optional >= 201606L
+  // Allocate memory for a single object and if that succeeds,
+  // construct an object using args.
+  //
+  // Does not do uses-allocator construction; don't use if you need that.
+  //
+  // CAUTION: the destructor will *not* destroy this object, it will only
+  // free the memory. That means the following pattern is unsafe:
+  //
+  // _Scoped_allocation  sa(alloc, in_place, args);
+  // potentially_throwing_operations();
+  // return sa.release();
+  //
+  // If the middle operation throws, the object will not be destroyed.
+  template
+   constexpr explicit
+   _Scoped_allocation(const _Alloc& __a, in_place_t, _Args&&... __args)
+   : _Scoped_allocation(__a, 1)
+   {
+ // The target constructor has completed, so if the next line throws,
+ // the destructor will deallocate the memory.
+ allocator_traits<_Alloc>::construct(_M_a, get(),
+ std::forward<_Args>(__args)...);
+   }
+#endif
+
+  _GLIBCXX20_CONSTEXPR
+  ~_Scoped_allocation()
+  {
+   if (_M_p) [[__unlikely__]]
+ _M_a.deallocate(_M_p, _M_n);
+  }
+
+  _Scoped_allocation(_Scoped_allocation&&) = delete;
+
+  constexpr _Alloc
+  get_allocator() const noexcept { return _M_a; }
+
+  constexpr value_type*
+  get() const noexcept
+  { return std::__to_address(_M_p); }
+
+  [[__nodiscard__]]
+  constexpr pointer
+  release() noexcept { return std::__exchange(_M_p, nullptr); }
+
+private:
+  [[__no_unique_address__]] _Alloc _M_a;
+  size_t _M_n;
+  pointer _M_p;
+};
+
+#if __glibcxx_optional >= 201606L && __cpp_deduction_guides >= 201606L
+  template
+_Scoped_allocation(_Alloc, in_place_t, _Args...)
+  -> _Scoped_allocation<_Alloc>;
+#endif
+
 /// @endcond
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
-- 
2.49.0



Re: [PATCH v3 3/9] libstdc++: Cleanup formatting in mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:48 AM Luc Grosheintz 
wrote:

> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan: Fix two instances of
> whitespace errors: `for(` -> `for (`.
>
> Signed-off-by: Luc Grosheintz 
>
I would suggest merging these 3 patches into one.

> ---
>  libstdc++-v3/include/std/mdspan | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index a8ec0b159e6..e5b1b2596d9 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -74,7 +74,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> {
>   array __ret;
>   size_t __dyn = 0;
> - for(size_t __i = 0; __i < _S_rank; ++__i)
> + for (size_t __i = 0; __i < _S_rank; ++__i)
> {
>   __ret[__i] = __dyn;
>   __dyn += _S_is_dyn(_Extents[__i]);
> @@ -114,7 +114,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   constexpr void
>   _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
>   {
> -   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
> +   for (size_t __i = 0; __i < _S_rank_dynamic; ++__i)
>   {
> size_t __di = __i;
> if constexpr (_OtherRank != _S_rank_dynamic)
> --
> 2.49.0
>
>


Re: [PATCH 2/2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Jakub Jelinek
On Thu, May 22, 2025 at 11:19:25AM +0200, Tomasz Kamiński wrote:
> From: Jonathan Wakely 
> 
> This papers implements C++27 std::indirect as specified

s/27/26/

> in P3019 with ammendment to move assgiment from LWG 4251.

s/assgiment/assignment/

Jakub



Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-05-22 Thread Paul-Antoine Arras

On 07/05/2025 12:29, Robin Dapp wrote:
Yes, we need it in order to be able to test both paths, i.e. combining 
and not combining.  Also make sure to test with multiple types and 
situations as in Pan's patch.


Please find attached a revised version of the patch.

Compared to the previous iteration, I have:
* Rebased on top of Pan's work;
* Updated the cost model;
* Added a second pattern to handle the case where PLUS_MINUS operands 
are swapped;

* Added compile and run tests.

I bootstrapped and regtested against rv64gcv.

Is it OK for mainline?

Thanks,
--
PAcommit b17bd6b3fc8e0f3f1e839bc292f239b28192c607
Author: Paul-Antoine Arras 
Date:   Mon May 12 14:42:24 2025 +0200

RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

This pattern enables the combine pass to merge a vec_duplicate into a plus-mult
or minus-mult RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.fv6,fa0
  vfmadd.vv   v9,v6,v7

After, we get only one:
  vfmadd.vf   v9,fa0,v7

On SPEC2017's 503.bwaves_r, depending on the workload, the reduction in dynamic
instruction count varies from -4.66% to -4.75%.

PR target/119100

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_vf_): Add new pattern to
combine vec_duplicate + vfm{add,sub}.vv into vfm{add,sub}.vf.
* config/riscv/riscv-opts.h (FPR2VR_COST_UNPROVIDED): Define.
* config/riscv/riscv-protos.h (get_fr2vr_cost): Declare function.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost): Call it.
* config/riscv/riscv.cc (riscv_rtx_costs): Add cost model for MULT with
VEC_DUPLICATE.
(get_fr2vr_cost): New function.
* config/riscv/riscv.opt: Add new option --param=fpr2vr-cost.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: New test.

diff --git gcc/config/riscv/autovec-opt.md gcc/config/riscv/autovec-opt.md
index a972eda8de4..be08b659df1 100644
--- gcc/config/riscv/autovec-opt.md
+++ gcc/config/riscv/autovec-opt.md
@@ -1713,3 +1713,55 @@ (define_insn_and_split "*_vx_"
 		mode);
   }
   [(set_attr "type" "vialu")])
+
+;; =
+;; Combine vec_duplicate + op.vv to op.vf
+;; Include
+;; - vfmadd.vf
+;; - vfmsub.vf
+;; =
+
+
+(define_insn_and_split "*_vf_"
+  [(set (match_operand:V_VLSF 0 "register_operand""=vd")
+(plus_minus:V_VLSF
+	(mult:V_VLSF
+	  (vec_duplicate:V_VLSF
+		(match_operand: 1 "register_operand" "  f"))
+	  (match_operand:V_VLSF 2 "register_operand"  "  0"))
+	(match_operand:V_VLSF 3 "register_operand"" vr")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+		 operands[2]};
+riscv_vector::emit_vlmax_insn (code_for_pred_mul_scalar (, mode),
+   riscv_vector::TERNARY_OP_FRM_DYN, ops);
+DONE;
+  }
+  [(set_attr "type" "vfmuladd")]
+)
+
+(define_insn_and_split "*_vf_"
+  [(set (match_operand:V_VLSF 0 "register_operand""=vd")
+(plus_minus:V_VLSF
+	(match_operand:V_VLSF 3 "register_operand"" vr")
+	(mult:V_VLSF
+	  (vec_duplicate:V_VLSF
+		(match_operand: 1 "register_operand" "  f"))
+	  (match_operand:V_VLSF 2 "register_operand"  "  0"]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+		 operands[2]};
+riscv_vector::emit_vlmax_insn (code_for_pred_mul_scalar (, mode),
+   riscv_vector::TERNARY_OP_FRM_DYN, ops);
+DONE;
+  }
+  [(set_attr "type" "vfmuladd")]
+)
diff --g

[PATCH] rtl-ssa: Reject non-address uses of autoinc regs [PR120347]

2025-05-22 Thread Richard Sandiford
As the rtl.texi documentation of RTX_AUTOINC expressions says:

  If a register used as the operand of these expressions is used in
  another address in an insn, the original value of the register is
  used.  Uses of the register outside of an address are not permitted
  within the same insn as a use in an embedded side effect expression
  because such insns behave differently on different machines and hence
  must be treated as ambiguous and disallowed.

late-combine was failing to follow this rule.  One option would have
been to enforce it during the substitution phase, like combine does.
This could either be a dedicated condition in the substitution code
or, more generally, an extra condition in can_merge_accesses.
(The latter would include extending is_pre_post_modify to uses.)

However, since the restriction applies to patterns rather than to
actions on patterns, the more robust fix seemed to be test and reject
this case in (a subroutine of) rtl_ssa::recog.  We already do something
similar for hard-coded register clobbers.

Using vec_rtx_properties isn't the lightest-weight operation
out there.  I did wonder about relying on the is_pre_post_modify
flag of the definitions in the new_defs array, but that would
require callers that create new autoincs to set the flag before
calling recog.  Normally these flags are instead updated
automatically based on the final pattern.

Besides, recog itself has had to traverse the whole pattern,
and it is even less light-weight than vec_rtx_properties.
At least the pattern should be in cache.

Tested on arm-linux-gnueabihf, aarch64-linux-gnu and
x86_64-linux-gnu.  OK for trunk and backports?

Richard


gcc/
PR rtl-optimization/120347
* rtl-ssa/changes.cc (recog_level2): Check whether an
RTX_AUTOINCed register also appears outside of an address.

gcc/testsuite/
PR rtl-optimization/120347
* gcc.dg/torture/pr120347.c: New test.
---
 gcc/rtl-ssa/changes.cc  | 18 ++
 gcc/testsuite/gcc.dg/torture/pr120347.c | 13 +
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr120347.c

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index eb579ad3ad7..f7aa6a66cdf 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -1106,6 +1106,24 @@ recog_level2 (insn_change &change, add_regno_clobber_fn 
add_regno_clobber)
}
}
 
+  // Per rtl.texi, registers that are modified using RTX_AUTOINC operations
+  // cannot also appear outside an address.
+  vec_rtx_properties properties;
+  properties.add_pattern (pat);
+  for (rtx_obj_reference def : properties.refs ())
+if (def.is_pre_post_modify ())
+  for (rtx_obj_reference use : properties.refs ())
+   if (def.regno == use.regno && !use.in_address ())
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "register %d is both auto-modified"
+" and used outside an address:\n", def.regno);
+   print_rtl_single (dump_file, pat);
+ }
+   return false;
+ }
+
   // check_asm_operands checks the constraints after RA, so we don't
   // need to do it again.
   if (reload_completed && !asm_p)
diff --git a/gcc/testsuite/gcc.dg/torture/pr120347.c 
b/gcc/testsuite/gcc.dg/torture/pr120347.c
new file mode 100644
index 000..a2d187bbc5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr120347.c
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-additional-options "-march=armv7-a -mthumb" { target { arm_arch_v7a_ok 
&& arm_thumb2_ok } } } */
+
+void *end;
+void **start;
+void main(void)
+{
+  for (; end; start++) {
+if (*start)
+  return;
+*start = start;
+  }
+}
-- 
2.43.0



[PATCH v2] [aarch64] [vxworks] mark x18 as fixed, adjust tests

2025-05-22 Thread Alexandre Oliva
On May 21, 2025, Richard Sandiford  wrote:

> I think this one shows a deeper issue, though.  -fsanitize=shadow-call-stack
> is currently hardcoded to use x18:

Oh, indeed!

> and I assume this usage will be incompatible with the TCB usage.

> So I think instead we should emit a sorry() if -fsanitize=shadow-call-stack
> is used on VxWorks.

Agreed.  Here's a revised version that implements sorry(), introduces
TARGET_OS_USES_R18 to guard that and the fixed-register setting, and
skips the tests that exercise -fsanitize-shadow-call-stack.

Tested with gcc-14 on aarch64-vxworks7r2.  Ok to install?


[aarch64] [vxworks] mark x18 as fixed, adjust tests

VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set
(in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead.

This patch marks x18 as fixed if the newly-introduced
TARGET_OS_USES_R18 is defined, so that it is not chosen by the
register allocator, rejects -fsanitize-shadow-call-stack due to the
register conflict, and adjusts tests that depend on x18 or on the
static chain register.


for  gcc/ChangeLog

* config/aarch64/aarch64-vxworks.h (TARGET_OS_USES_R18): Define.
Update comments.
* config/aarch64/aarch64.c (aarch64_conditional_register_usage):
Mark x18 as fixed on VxWorks.
(aarch64_override_options_internal): Issue sorry message on
-fsanitize=shadow-call-stack if TARGET_OS_USES_R18.

for  gcc/testsuite/ChangeLog

* gcc.dg/cwsc1.c (CHAIN, aarch64): x9 instead x18 for __vxworks.
* gcc.target/aarch64/reg-alloc-4.c: Drop x18-assigned asm
operand on vxworks.
* gcc.target/aarch64/shadow_call_stack_1.c: Don't expect
-ffixed-x18 error on vxworks, but rather the sorry message.
* gcc.target/aarch64/shadow_call_stack_2.c: Skip on vxworks.
* gcc.target/aarch64/shadow_call_stack_3.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_4.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_5.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_6.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_7.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_8.c: Likewise.
* gcc.target/aarch64/stack-check-prologue-19.c: Likewise.
* gcc.target/aarch64/stack-check-prologue-20.c: Likewise.
---
 gcc/config/aarch64/aarch64-vxworks.h   |7 +++
 gcc/config/aarch64/aarch64.cc  |   21 +---
 gcc/testsuite/gcc.dg/cwsc1.c   |6 +-
 gcc/testsuite/gcc.target/aarch64/reg-alloc-4.c |2 ++
 .../gcc.target/aarch64/shadow_call_stack_1.c   |4 +++-
 .../gcc.target/aarch64/shadow_call_stack_2.c   |1 +
 .../gcc.target/aarch64/shadow_call_stack_3.c   |1 +
 .../gcc.target/aarch64/shadow_call_stack_4.c   |1 +
 .../gcc.target/aarch64/shadow_call_stack_5.c   |1 +
 .../gcc.target/aarch64/shadow_call_stack_6.c   |1 +
 .../gcc.target/aarch64/shadow_call_stack_7.c   |1 +
 .../gcc.target/aarch64/shadow_call_stack_8.c   |1 +
 .../gcc.target/aarch64/stack-check-prologue-19.c   |1 +
 .../gcc.target/aarch64/stack-check-prologue-20.c   |1 +
 14 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-vxworks.h 
b/gcc/config/aarch64/aarch64-vxworks.h
index 41adada9b1de3..7b4da934b6083 100644
--- a/gcc/config/aarch64/aarch64-vxworks.h
+++ b/gcc/config/aarch64/aarch64-vxworks.h
@@ -66,9 +66,8 @@ along with GCC; see the file COPYING3.  If not see
 #define VXWORKS_PERSONALITY "llvm"
 
 /* VxWorks uses R18 as a TCB pointer.  We must pick something else as
-   the static chain and R18 needs to be claimed "fixed".  Until we
-   arrange to override the common parts of the port family to
-   acknowledge the latter, configure --with-specs="-ffixed-r18".  */
+   the static chain and R18 needs to be claimed "fixed" (TARGET_OS_USES_R18
+   does that in aarch64_conditional_register_usage).  */
 #undef  STATIC_CHAIN_REGNUM
 #define STATIC_CHAIN_REGNUM 9
-
+#define TARGET_OS_USES_R18
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1da615c8955a4..ec9da0ed60c6f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18819,9 +18819,16 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   aarch64_stack_protector_guard_offset = offs;
 }
 
-  if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
-  && !fixed_regs[R18_REGNUM])
-error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
+  if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK))
+{
+  if (!fixed_regs[R18_REGNUM])
+   error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
+#ifdef TARGET_OS_USES_R18
+  else
+   sorry ("%<-fsanitize=shadow-call-stack%> conflicts with the use of"
+  " register x18 by the target operating system");
+#endif
+}
 
   aarch64_feature_flags isa_flags = aar

Re: [PATCH 1/2] libstdc++: Define _Scoped_allocation RAII helper

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 12:21 PM Daniel Krügler 
wrote:

>
>
> Am Do., 22. Mai 2025 um 11:41 Uhr schrieb Tomasz Kamiński <
> tkami...@redhat.com>:
>
>> From: Jonathan Wakely 
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/allocated_ptr.h (_Scoped_allocation): New class
>> template.
>>
>> Co-Authored-By: Tomasz Kamiński 
>> Signed-off-by: Tomasz Kamiński 
>> ---
>> Tested on x86_64-linux. OK for trunk?
>>
>>  libstdc++-v3/include/bits/allocated_ptr.h | 96 +++
>>  1 file changed, 96 insertions(+)
>>
>> diff --git a/libstdc++-v3/include/bits/allocated_ptr.h
>> b/libstdc++-v3/include/bits/allocated_ptr.h
>> index 0b2b6fe5820..aa5355f0e2f 100644
>> --- a/libstdc++-v3/include/bits/allocated_ptr.h
>> +++ b/libstdc++-v3/include/bits/allocated_ptr.h
>> @@ -36,6 +36,7 @@
>>  # include 
>>  # include 
>>  # include 
>> +# include 
>>
>>  namespace std _GLIBCXX_VISIBILITY(default)
>>  {
>> @@ -136,6 +137,101 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>return { std::__allocate_guarded(__a) };
>>  }
>>
>> +  // An RAII type that acquires memory from an allocator.
>> +  // N.B.  'scoped' here in in the RAII sense, not the scoped allocator
>> model,
>> +  // so this has nothing to do with `std::scoped_allocator_adaptor`.
>> +  // This class can be used to simplify the common pattern:
>> +  //
>> +  // auto ptr = alloc.allocate(1);
>> +  // try {
>> +  //   std::construct_at(std::to_address(ptr), args);
>> +  //   m_ptr = ptr;
>> +  // } catch (...) {
>> +  //   alloc.deallocate(ptr, 1);
>> +  //   throw;
>> +  // }
>> +  //
>> +  // Instead you can do:
>> +  //
>> +  // _Scoped_allocation sa(alloc);
>> +  // m_ptr = std::construct_at(sa.get(), args);
>> +  // (void) sa.release();
>> +  //
>> +  // Or even simpler:
>> +  //
>> +  // _Scoped_allocation sa(alloc, std::in_place, args);
>> +  // m_ptr = sa.release();
>> +  //
>> +  template
>> +struct _Scoped_allocation
>> +{
>> +  using value_type = typename allocator_traits<_Alloc>::value_type;
>> +  using pointer = typename allocator_traits<_Alloc>::pointer;
>> +
>> +  // Use `a` to allocate memory for `n` objects.
>> +  constexpr explicit
>> +  _Scoped_allocation(const _Alloc& __a, size_t __n = 1)
>> +  : _M_a(__a), _M_n(__n), _M_p(_M_a.allocate(__n))
>> +  { }
>> +
>> +#if __glibcxx_optional >= 201606L
>> +  // Allocate memory for a single object and if that succeeds,
>> +  // construct an object using args.
>> +  //
>> +  // Does not do uses-allocator construction; don't use if you need
>> that.
>> +  //
>> +  // CAUTION: the destructor will *not* destroy this object, it will
>> only
>> +  // free the memory. That means the following pattern is unsafe:
>> +  //
>> +  // _Scoped_allocation  sa(alloc, in_place, args);
>> +  // potentially_throwing_operations();
>> +  // return sa.release();
>> +  //
>> +  // If the middle operation throws, the object will not be
>> destroyed.
>> +  template
>> +   constexpr explicit
>> +   _Scoped_allocation(const _Alloc& __a, in_place_t, _Args&&...
>> __args)
>> +   : _Scoped_allocation(__a, 1)
>> +   {
>> + // The target constructor has completed, so if the next line
>> throws,
>> + // the destructor will deallocate the memory.
>> + allocator_traits<_Alloc>::construct(_M_a, get(),
>> +
>>  std::forward<_Args>(__args)...);
>> +   }
>> +#endif
>> +
>> +  _GLIBCXX20_CONSTEXPR
>> +  ~_Scoped_allocation()
>> +  {
>> +   if (_M_p) [[__unlikely__]]
>> + _M_a.deallocate(_M_p, _M_n);
>>
>
> Why is the situation *unlikely* that _M_p has a non-nullptr content?
> Shouldn't that actually be likely?
>
This object manages allocation of raw memory, and not contained objects. So
the non-exceptional code path
looks like:
  _Scoped_allocation a(alloc, in_place, argos);
  _M_ptr = a.release();
And release set the _M_p to nullptr.

In other words, this if is taken only in case of exception.

>
> - Daniel
>
>
>> +  }
>> +
>> +  _Scoped_allocation(_Scoped_allocation&&) = delete;
>> +
>> +  constexpr _Alloc
>> +  get_allocator() const noexcept { return _M_a; }
>> +
>> +  constexpr value_type*
>> +  get() const noexcept
>> +  { return std::__to_address(_M_p); }
>> +
>> +  [[__nodiscard__]]
>> +  constexpr pointer
>> +  release() noexcept { return std::__exchange(_M_p, nullptr); }
>> +
>> +private:
>> +  [[__no_unique_address__]] _Alloc _M_a;
>> +  size_t _M_n;
>> +  pointer _M_p;
>> +};
>> +
>> +#if __glibcxx_optional >= 201606L && __cpp_deduction_guides >= 201606L
>> +  template
>> +_Scoped_allocation(_Alloc, in_place_t, _Args...)
>> +  -> _Scoped_allocation<_Alloc>;
>> +#endif
>> +
>>  /// @endcond
>>  _GLIBCXX_END_NAMESPACE_VERSION
>>  } // namespace std
>> --
>> 2.49.0
>>
>>


Re: [PATCH v4 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-22 Thread Richard Sandiford
 writes:
> From: Dhruv Chawla 
>
> This patch modifies the shift expander to immediately lower constant
> shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns
> to match the lowered forms of the shifts, as the predicate register is
> not required for these instructions.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Signed-off-by: Dhruv Chawla 
> Co-authored-by: Richard Sandiford 
>
> gcc/ChangeLog:
>
>   * gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift):
>   Match lowered form of ashift.
>   (*aarch64_adr_shift): Likewise.
>   (*aarch64_adr_shift_sxtw): Likewise.
>   (*aarch64_adr_shift_uxtw): Likewise.
>   (3): Check amount instead of operands[2] in
>   aarch64_sve_shift_operand.
>   (v3): Generate unpredicated shifts for constant
>   operands.
>   (@aarch64_pred_): Convert to a define_expand.
>   (*aarch64_pred_): Create define_insn_and_split pattern
>   from @aarch64_pred_.
>   (*post_ra_v_ashl3): Rename to ...
>   (aarch64_vashl3_const): ... this and remove reload requirement.
>   (*post_ra_v_3): Rename to ...
>   (aarch64_v3_const): ... this and remove reload
>   requirement.
>   * gcc/config/aarch64/aarch64-sve2.md
>   (@aarch64_sve_add_): Match lowered form of
>   SHIFTRT.
>   (*aarch64_sve2_sra): Likewise.
>   (*bitmask_shift_plus): Match lowered form of lshiftrt.
> ---
>  gcc/config/aarch64/aarch64-sve.md  | 119 +++--
>  gcc/config/aarch64/aarch64-sve2.md |  46 ---
>  2 files changed, 75 insertions(+), 90 deletions(-)

OK, thanks.

It doesn't look like you're listed in MAINTAINERS as having write access.
If that's right, and if you'd like access, please follow the instructions
in https://gcc.gnu.org/gitwrite.html (I'll sponsor).

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index bf7569f932b..e1ec778b10d 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -4234,80 +4234,57 @@
>  (define_expand "@aarch64_adr_shift"
>[(set (match_operand:SVE_FULL_SDI 0 "register_operand")
>   (plus:SVE_FULL_SDI
> -   (unspec:SVE_FULL_SDI
> - [(match_dup 4)
> -  (ashift:SVE_FULL_SDI
> -(match_operand:SVE_FULL_SDI 2 "register_operand")
> -(match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
> - UNSPEC_PRED_X)
> +   (ashift:SVE_FULL_SDI
> + (match_operand:SVE_FULL_SDI 2 "register_operand")
> + (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))
> (match_operand:SVE_FULL_SDI 1 "register_operand")))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  {
> -operands[4] = CONSTM1_RTX (mode);
> -  }
>  )
>  
> -(define_insn_and_rewrite "*aarch64_adr_shift"
> +(define_insn "*aarch64_adr_shift"
>[(set (match_operand:SVE_24I 0 "register_operand" "=w")
>   (plus:SVE_24I
> -   (unspec:SVE_24I
> - [(match_operand 4)
> -  (ashift:SVE_24I
> -(match_operand:SVE_24I 2 "register_operand" "w")
> -(match_operand:SVE_24I 3 "const_1_to_3_operand"))]
> - UNSPEC_PRED_X)
> +   (ashift:SVE_24I
> + (match_operand:SVE_24I 2 "register_operand" "w")
> + (match_operand:SVE_24I 3 "const_1_to_3_operand"))
> (match_operand:SVE_24I 1 "register_operand" "w")))]
>"TARGET_SVE && TARGET_NON_STREAMING"
>"adr\t%0., [%1., %2., lsl %3]"
> -  "&& !CONSTANT_P (operands[4])"
> -  {
> -operands[4] = CONSTM1_RTX (mode);
> -  }
>  )
>  
>  ;; Same, but with the index being sign-extended from the low 32 bits.
>  (define_insn_and_rewrite "*aarch64_adr_shift_sxtw"
>[(set (match_operand:VNx2DI 0 "register_operand" "=w")
>   (plus:VNx2DI
> -   (unspec:VNx2DI
> - [(match_operand 4)
> -  (ashift:VNx2DI
> -(unspec:VNx2DI
> -  [(match_operand 5)
> -   (sign_extend:VNx2DI
> - (truncate:VNx2SI
> -   (match_operand:VNx2DI 2 "register_operand" "w")))]
> -  UNSPEC_PRED_X)
> -(match_operand:VNx2DI 3 "const_1_to_3_operand"))]
> - UNSPEC_PRED_X)
> +   (ashift:VNx2DI
> + (unspec:VNx2DI
> +   [(match_operand 4)
> +(sign_extend:VNx2DI
> +  (truncate:VNx2SI
> +(match_operand:VNx2DI 2 "register_operand" "w")))]
> +  UNSPEC_PRED_X)
> + (match_operand:VNx2DI 3 "const_1_to_3_operand"))
> (match_operand:VNx2DI 1 "register_operand" "w")))]
>"TARGET_SVE && TARGET_NON_STREAMING"
>"adr\t%0.d, [%1.d, %2.d, sxtw %3]"
> -  "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))"
> +  "&& !CONSTANT_P (operands[4])"
>{
> -operands[5] = operands[4] = CONSTM1_RTX (VNx2BImode);
> +operands[4] = CONSTM1_RTX (VNx2BImode);
>}
>  )
>  
>  ;; Same, but with the index being zero-extended from the low 32 bits.
> -(define_insn_and_

Re: [PATCH 1/2] libstdc++: Define _Scoped_allocation RAII helper

2025-05-22 Thread Daniel Krügler
Am Do., 22. Mai 2025 um 12:25 Uhr schrieb Tomasz Kaminski <
tkami...@redhat.com>:

>
>
> On Thu, May 22, 2025 at 12:21 PM Daniel Krügler 
> wrote:
>
>>
>>
>> Am Do., 22. Mai 2025 um 11:41 Uhr schrieb Tomasz Kamiński <
>> tkami...@redhat.com>:
>>
>>> From: Jonathan Wakely 
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>> * include/bits/allocated_ptr.h (_Scoped_allocation): New class
>>> template.
>>>
>>> Co-Authored-By: Tomasz Kamiński 
>>> Signed-off-by: Tomasz Kamiński 
>>> ---
>>> Tested on x86_64-linux. OK for trunk?
>>>
>>>  libstdc++-v3/include/bits/allocated_ptr.h | 96 +++
>>>  1 file changed, 96 insertions(+)
>>>
>>> diff --git a/libstdc++-v3/include/bits/allocated_ptr.h
>>> b/libstdc++-v3/include/bits/allocated_ptr.h
>>> index 0b2b6fe5820..aa5355f0e2f 100644
>>> --- a/libstdc++-v3/include/bits/allocated_ptr.h
>>> +++ b/libstdc++-v3/include/bits/allocated_ptr.h
>>> @@ -36,6 +36,7 @@
>>>  # include 
>>>  # include 
>>>  # include 
>>> +# include 
>>>
>>>  namespace std _GLIBCXX_VISIBILITY(default)
>>>  {
>>> @@ -136,6 +137,101 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>return { std::__allocate_guarded(__a) };
>>>  }
>>>
>>> +  // An RAII type that acquires memory from an allocator.
>>> +  // N.B.  'scoped' here in in the RAII sense, not the scoped allocator
>>> model,
>>> +  // so this has nothing to do with `std::scoped_allocator_adaptor`.
>>> +  // This class can be used to simplify the common pattern:
>>> +  //
>>> +  // auto ptr = alloc.allocate(1);
>>> +  // try {
>>> +  //   std::construct_at(std::to_address(ptr), args);
>>> +  //   m_ptr = ptr;
>>> +  // } catch (...) {
>>> +  //   alloc.deallocate(ptr, 1);
>>> +  //   throw;
>>> +  // }
>>> +  //
>>> +  // Instead you can do:
>>> +  //
>>> +  // _Scoped_allocation sa(alloc);
>>> +  // m_ptr = std::construct_at(sa.get(), args);
>>> +  // (void) sa.release();
>>> +  //
>>> +  // Or even simpler:
>>> +  //
>>> +  // _Scoped_allocation sa(alloc, std::in_place, args);
>>> +  // m_ptr = sa.release();
>>> +  //
>>> +  template
>>> +struct _Scoped_allocation
>>> +{
>>> +  using value_type = typename allocator_traits<_Alloc>::value_type;
>>> +  using pointer = typename allocator_traits<_Alloc>::pointer;
>>> +
>>> +  // Use `a` to allocate memory for `n` objects.
>>> +  constexpr explicit
>>> +  _Scoped_allocation(const _Alloc& __a, size_t __n = 1)
>>> +  : _M_a(__a), _M_n(__n), _M_p(_M_a.allocate(__n))
>>> +  { }
>>> +
>>> +#if __glibcxx_optional >= 201606L
>>> +  // Allocate memory for a single object and if that succeeds,
>>> +  // construct an object using args.
>>> +  //
>>> +  // Does not do uses-allocator construction; don't use if you need
>>> that.
>>> +  //
>>> +  // CAUTION: the destructor will *not* destroy this object, it
>>> will only
>>> +  // free the memory. That means the following pattern is unsafe:
>>> +  //
>>> +  // _Scoped_allocation  sa(alloc, in_place, args);
>>> +  // potentially_throwing_operations();
>>> +  // return sa.release();
>>> +  //
>>> +  // If the middle operation throws, the object will not be
>>> destroyed.
>>> +  template
>>> +   constexpr explicit
>>> +   _Scoped_allocation(const _Alloc& __a, in_place_t, _Args&&...
>>> __args)
>>> +   : _Scoped_allocation(__a, 1)
>>> +   {
>>> + // The target constructor has completed, so if the next line
>>> throws,
>>> + // the destructor will deallocate the memory.
>>> + allocator_traits<_Alloc>::construct(_M_a, get(),
>>> +
>>>  std::forward<_Args>(__args)...);
>>> +   }
>>> +#endif
>>> +
>>> +  _GLIBCXX20_CONSTEXPR
>>> +  ~_Scoped_allocation()
>>> +  {
>>> +   if (_M_p) [[__unlikely__]]
>>> + _M_a.deallocate(_M_p, _M_n);
>>>
>>
>> Why is the situation *unlikely* that _M_p has a non-nullptr content?
>> Shouldn't that actually be likely?
>>
> This object manages allocation of raw memory, and not contained objects.
> So the non-exceptional code path
> looks like:
>   _Scoped_allocation a(alloc, in_place, argos);
>   _M_ptr = a.release();
> And release set the _M_p to nullptr.
>

> In other words, this if is taken only in case of exception.
>

Got it - thanks!

- Daniel


Re: [PATCH v4 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-22 Thread Richard Sandiford
 writes:
> [...]
> +;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a 
> "bswap".
> +;; Match that as well.
> +(define_insn_and_split "*v_revvnx8hi"
> +  [(parallel
> +[(set (match_operand:VNx8HI 0 "register_operand")
> +   (bswap:VNx8HI (match_operand 1 "register_operand")))
> + (clobber (match_scratch:VNx8BI 2))])]

Sorry for not noticing last time, but operand 0 should have a "=w"
constraint, operand 1 should have a "w" constraint, and the match_scratch
should have a "=Upl" constraint.

> +  "TARGET_SVE"
> +  "#"
> +  ""

The last line should be "&& 1", since the TARGET_SVE test doesn't
automatically apply to the define_split.

> +  [(set (match_dup 0)
> + (unspec:VNx8HI
> +   [(match_dup 2)
> +(unspec:VNx8HI
> +  [(match_dup 1)]
> +  UNSPEC_REVB)]
> +   UNSPEC_PRED_X))]
> +  {
> +if (!can_create_pseudo_p ())
> +  operands[2] = CONSTM1_RTX (VNx8BImode);
> +else
> +  operands[2] = aarch64_ptrue_reg (VNx8BImode);

This should be:

if (!can_create_pseudo_p ())
  emit_move_insn (operands[2], CONSTM1_RTX (VNx8BImode));
else
  operands[2] = aarch64_ptrue_reg (VNx8BImode);

That is, after register allocation, the pattern gives us a scratch
predicate register, but we need to initialise it to a ptrue.

> +  }
> +)
> +
>  ;; Predicated integer unary operations.
>  (define_insn "@aarch64_pred_"
>[(set (match_operand:SVE_FULL_I 0 "register_operand")
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
> new file mode 100644
> index 000..3a30f80d152
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
> @@ -0,0 +1,83 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+sve" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include 
> +
> +/*
> +** ror32_sve_lsl_imm:
> +**   ptrue   p3.b, all
> +**   revwz0.d, p3/m, z0.d

There's no requirement to choose p3 for the predicate, so this would
be better as:

**  ptrue   (p[0-3]).b, all
**  revwz0.d, \1/m, z0.d

Same for the others.

OK with those changes, thanks.

Richard


Re: [PATCH v2] [aarch64] [vxworks] mark x18 as fixed, adjust tests

2025-05-22 Thread Richard Sandiford
Alexandre Oliva  writes:
> On May 21, 2025, Richard Sandiford  wrote:
>
>> I think this one shows a deeper issue, though.  -fsanitize=shadow-call-stack
>> is currently hardcoded to use x18:
>
> Oh, indeed!
>
>> and I assume this usage will be incompatible with the TCB usage.
>
>> So I think instead we should emit a sorry() if -fsanitize=shadow-call-stack
>> is used on VxWorks.
>
> Agreed.  Here's a revised version that implements sorry(), introduces
> TARGET_OS_USES_R18 to guard that and the fixed-register setting, and
> skips the tests that exercise -fsanitize-shadow-call-stack.
>
> Tested with gcc-14 on aarch64-vxworks7r2.  Ok to install?
>
>
> [aarch64] [vxworks] mark x18 as fixed, adjust tests
>
> VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set
> (in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead.
>
> This patch marks x18 as fixed if the newly-introduced
> TARGET_OS_USES_R18 is defined, so that it is not chosen by the
> register allocator, rejects -fsanitize-shadow-call-stack due to the
> register conflict, and adjusts tests that depend on x18 or on the
> static chain register.
>
>
> for  gcc/ChangeLog
>
>   * config/aarch64/aarch64-vxworks.h (TARGET_OS_USES_R18): Define.
>   Update comments.
>   * config/aarch64/aarch64.c (aarch64_conditional_register_usage):
>   Mark x18 as fixed on VxWorks.
>   (aarch64_override_options_internal): Issue sorry message on
>   -fsanitize=shadow-call-stack if TARGET_OS_USES_R18.
>
> for  gcc/testsuite/ChangeLog
>
>   * gcc.dg/cwsc1.c (CHAIN, aarch64): x9 instead x18 for __vxworks.
>   * gcc.target/aarch64/reg-alloc-4.c: Drop x18-assigned asm
>   operand on vxworks.
>   * gcc.target/aarch64/shadow_call_stack_1.c: Don't expect
>   -ffixed-x18 error on vxworks, but rather the sorry message.
>   * gcc.target/aarch64/shadow_call_stack_2.c: Skip on vxworks.
>   * gcc.target/aarch64/shadow_call_stack_3.c: Likewise.
>   * gcc.target/aarch64/shadow_call_stack_4.c: Likewise.
>   * gcc.target/aarch64/shadow_call_stack_5.c: Likewise.
>   * gcc.target/aarch64/shadow_call_stack_6.c: Likewise.
>   * gcc.target/aarch64/shadow_call_stack_7.c: Likewise.
>   * gcc.target/aarch64/shadow_call_stack_8.c: Likewise.
>   * gcc.target/aarch64/stack-check-prologue-19.c: Likewise.
>   * gcc.target/aarch64/stack-check-prologue-20.c: Likewise.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-vxworks.h   |7 +++
>  gcc/config/aarch64/aarch64.cc  |   21 
> +---
>  gcc/testsuite/gcc.dg/cwsc1.c   |6 +-
>  gcc/testsuite/gcc.target/aarch64/reg-alloc-4.c |2 ++
>  .../gcc.target/aarch64/shadow_call_stack_1.c   |4 +++-
>  .../gcc.target/aarch64/shadow_call_stack_2.c   |1 +
>  .../gcc.target/aarch64/shadow_call_stack_3.c   |1 +
>  .../gcc.target/aarch64/shadow_call_stack_4.c   |1 +
>  .../gcc.target/aarch64/shadow_call_stack_5.c   |1 +
>  .../gcc.target/aarch64/shadow_call_stack_6.c   |1 +
>  .../gcc.target/aarch64/shadow_call_stack_7.c   |1 +
>  .../gcc.target/aarch64/shadow_call_stack_8.c   |1 +
>  .../gcc.target/aarch64/stack-check-prologue-19.c   |1 +
>  .../gcc.target/aarch64/stack-check-prologue-20.c   |1 +
>  14 files changed, 40 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-vxworks.h 
> b/gcc/config/aarch64/aarch64-vxworks.h
> index 41adada9b1de3..7b4da934b6083 100644
> --- a/gcc/config/aarch64/aarch64-vxworks.h
> +++ b/gcc/config/aarch64/aarch64-vxworks.h
> @@ -66,9 +66,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define VXWORKS_PERSONALITY "llvm"
>  
>  /* VxWorks uses R18 as a TCB pointer.  We must pick something else as
> -   the static chain and R18 needs to be claimed "fixed".  Until we
> -   arrange to override the common parts of the port family to
> -   acknowledge the latter, configure --with-specs="-ffixed-r18".  */
> +   the static chain and R18 needs to be claimed "fixed" (TARGET_OS_USES_R18
> +   does that in aarch64_conditional_register_usage).  */
>  #undef  STATIC_CHAIN_REGNUM
>  #define STATIC_CHAIN_REGNUM 9
> -
> +#define TARGET_OS_USES_R18
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 1da615c8955a4..ec9da0ed60c6f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18819,9 +18819,16 @@ aarch64_override_options_internal (struct 
> gcc_options *opts)
>aarch64_stack_protector_guard_offset = offs;
>  }
>  
> -  if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
> -  && !fixed_regs[R18_REGNUM])
> -error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
> +  if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK))
> +{
> +  if (!fixed_regs[R18_REGNUM])
> + error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
> +#ifdef TARGET_OS_USES_R18
> +  

Re: [PATCH v3 4/9] libstdc++: Implement layout_left from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz 
wrote:

> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (layout_left): New class.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan | 307 +++-
>  1 file changed, 306 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index e5b1b2596d9..66c9d2cffac 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   { return __exts[__i]; });
>   }
>
> +   static constexpr span
> +   _S_static_extents(size_t __begin, size_t __end) noexcept
> +   {
> + return {_Extents.data() + __begin, _Extents.data() + __end};
> +   }
> +
> +   constexpr span
> +   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
> +   requires (_Extents.size() > 0)
> +   {
> + return {_M_dyn_exts + _S_dynamic_index[__begin],
> + _M_dyn_exts + _S_dynamic_index[__end]};
> +   }
> +
>private:
> using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> [[no_unique_address]] _S_storage _M_dyn_exts;
> @@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> || _Extent <= numeric_limits<_IndexType>::max();
>}
>
> +  namespace __mdspan
> +  {
> +template
> +  constexpr span
> +  __static_extents(size_t __begin = 0, size_t __end =
> _Extents::rank())
> +  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }
> +
> +template
> +  constexpr span
> +  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
> +   size_t __end = _Extents::rank())
> +  {
> +   return __exts._M_exts._M_dynamic_extents(__begin, __end);
> +  }
> +  }
> +
>template
>  class extents
>  {
> @@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : _M_exts(span(__exts))
> { }
>
> -
>template<__mdspan::__valid_index_type _OIndexType,
> size_t _Nm>
> requires (_Nm == rank() || _Nm == rank_dynamic())
> constexpr explicit(_Nm != rank_dynamic())
> @@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>  private:
> +  friend span
> +  __mdspan::__static_extents(size_t, size_t);
> +
> +  friend span
> +  __mdspan::__dynamic_extents(const extents&, size_t,
> size_t);
> +
>using _S_storage = __mdspan::_ExtentsStorage<
> _IndexType, array{_Extents...}>;
>[[no_unique_address]] _S_storage _M_exts;
> @@ -286,6 +321,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
>
I have suggested in other e-mail, that we could pass auto const&,
and instantiatie this with reference to array that is NTTP to storage.

> +  constexpr size_t
> +  __static_extents_prod(size_t __begin, size_t __end)
> +  {
> +   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
> +   size_t __ret = 1;
> +   for (auto __factor : __sta_exts)
> + if (__factor != dynamic_extent)
> +   __ret *= __factor;
> +   return __ret;
> +  }
> +
> +template
> +  constexpr size_t
> +  __dynamic_extents_prod(const _Extents& __exts, size_t __begin,
> +size_t __end)
> +  {
> +   auto __dyn_exts = __dynamic_extents<_Extents>(__exts, __begin,
> Template parameter is uncessary, it can be deduced.
> +__end);
> +   size_t __ret = 1;
> +   for (auto __factor : __dyn_exts)
> +   __ret *= __factor;
> +   return __ret;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end)
> noexcept
> +  {
> +   using _IndexType = typename _Extents::index_type;
> +   _IndexType __ret = 1;
> +   if constexpr (_Extents::rank_dynamic() != _Extents::rank())
> + __ret = _IndexType(__static_extents_prod<_Extents>(__begin,
> __end));
> +   if constexpr (_Extents::rank_dynamic() > 0)
> + __ret *= __dynamic_extents_prod(__exts, __begin, __end);
>
I would inline the funciton here:
+   for (auto __factor : __dynamic_extents(__exts, __begin, __end))
+   __ret *= __factor;

> +   return __ret;
> +  }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts_prod(__exts, 0, __r); }
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
> +  { return __exts_prod(__exts, __r + 1, __exts.rank()); }
> +
>  template
>auto __build_dextents_type(integer

Re: [PATCH 1/2] libstdc++: Define _Scoped_allocation RAII helper

2025-05-22 Thread Daniel Krügler
Am Do., 22. Mai 2025 um 11:41 Uhr schrieb Tomasz Kamiński <
tkami...@redhat.com>:

> From: Jonathan Wakely 
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/allocated_ptr.h (_Scoped_allocation): New class
> template.
>
> Co-Authored-By: Tomasz Kamiński 
> Signed-off-by: Tomasz Kamiński 
> ---
> Tested on x86_64-linux. OK for trunk?
>
>  libstdc++-v3/include/bits/allocated_ptr.h | 96 +++
>  1 file changed, 96 insertions(+)
>
> diff --git a/libstdc++-v3/include/bits/allocated_ptr.h
> b/libstdc++-v3/include/bits/allocated_ptr.h
> index 0b2b6fe5820..aa5355f0e2f 100644
> --- a/libstdc++-v3/include/bits/allocated_ptr.h
> +++ b/libstdc++-v3/include/bits/allocated_ptr.h
> @@ -36,6 +36,7 @@
>  # include 
>  # include 
>  # include 
> +# include 
>
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
> @@ -136,6 +137,101 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>return { std::__allocate_guarded(__a) };
>  }
>
> +  // An RAII type that acquires memory from an allocator.
> +  // N.B.  'scoped' here in in the RAII sense, not the scoped allocator
> model,
> +  // so this has nothing to do with `std::scoped_allocator_adaptor`.
> +  // This class can be used to simplify the common pattern:
> +  //
> +  // auto ptr = alloc.allocate(1);
> +  // try {
> +  //   std::construct_at(std::to_address(ptr), args);
> +  //   m_ptr = ptr;
> +  // } catch (...) {
> +  //   alloc.deallocate(ptr, 1);
> +  //   throw;
> +  // }
> +  //
> +  // Instead you can do:
> +  //
> +  // _Scoped_allocation sa(alloc);
> +  // m_ptr = std::construct_at(sa.get(), args);
> +  // (void) sa.release();
> +  //
> +  // Or even simpler:
> +  //
> +  // _Scoped_allocation sa(alloc, std::in_place, args);
> +  // m_ptr = sa.release();
> +  //
> +  template
> +struct _Scoped_allocation
> +{
> +  using value_type = typename allocator_traits<_Alloc>::value_type;
> +  using pointer = typename allocator_traits<_Alloc>::pointer;
> +
> +  // Use `a` to allocate memory for `n` objects.
> +  constexpr explicit
> +  _Scoped_allocation(const _Alloc& __a, size_t __n = 1)
> +  : _M_a(__a), _M_n(__n), _M_p(_M_a.allocate(__n))
> +  { }
> +
> +#if __glibcxx_optional >= 201606L
> +  // Allocate memory for a single object and if that succeeds,
> +  // construct an object using args.
> +  //
> +  // Does not do uses-allocator construction; don't use if you need
> that.
> +  //
> +  // CAUTION: the destructor will *not* destroy this object, it will
> only
> +  // free the memory. That means the following pattern is unsafe:
> +  //
> +  // _Scoped_allocation  sa(alloc, in_place, args);
> +  // potentially_throwing_operations();
> +  // return sa.release();
> +  //
> +  // If the middle operation throws, the object will not be destroyed.
> +  template
> +   constexpr explicit
> +   _Scoped_allocation(const _Alloc& __a, in_place_t, _Args&&...
> __args)
> +   : _Scoped_allocation(__a, 1)
> +   {
> + // The target constructor has completed, so if the next line
> throws,
> + // the destructor will deallocate the memory.
> + allocator_traits<_Alloc>::construct(_M_a, get(),
> +
>  std::forward<_Args>(__args)...);
> +   }
> +#endif
> +
> +  _GLIBCXX20_CONSTEXPR
> +  ~_Scoped_allocation()
> +  {
> +   if (_M_p) [[__unlikely__]]
> + _M_a.deallocate(_M_p, _M_n);
>

Why is the situation *unlikely* that _M_p has a non-nullptr content?
Shouldn't that actually be likely?

- Daniel


> +  }
> +
> +  _Scoped_allocation(_Scoped_allocation&&) = delete;
> +
> +  constexpr _Alloc
> +  get_allocator() const noexcept { return _M_a; }
> +
> +  constexpr value_type*
> +  get() const noexcept
> +  { return std::__to_address(_M_p); }
> +
> +  [[__nodiscard__]]
> +  constexpr pointer
> +  release() noexcept { return std::__exchange(_M_p, nullptr); }
> +
> +private:
> +  [[__no_unique_address__]] _Alloc _M_a;
> +  size_t _M_n;
> +  pointer _M_p;
> +};
> +
> +#if __glibcxx_optional >= 201606L && __cpp_deduction_guides >= 201606L
> +  template
> +_Scoped_allocation(_Alloc, in_place_t, _Args...)
> +  -> _Scoped_allocation<_Alloc>;
> +#endif
> +
>  /// @endcond
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
> --
> 2.49.0
>
>


Re: [PATCH v3 5/9] libstdc++: Add tests for layout_left.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 12:00 PM Luc Grosheintz 
wrote:

> Implements a suite of tests for the currently implemented parts of
> layout_left. The individual tests are templated over the layout type, to
> allow reuse as more layouts are added.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: New
> test.
> * testsuite/23_containers/mdspan/layouts/ctors.cc: New test.
> * testsuite/23_containers/mdspan/layouts/mapping.cc: New test.
>
> Signed-off-by: Luc Grosheintz 
> ---
>
Very comprehensive test case. This looks good to me thanks.

>  .../mdspan/layouts/class_mandate_neg.cc   |  22 +
>  .../23_containers/mdspan/layouts/ctors.cc | 238 ++
>  .../23_containers/mdspan/layouts/mapping.cc   | 438 ++
>  3 files changed, 698 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
>
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> new file mode 100644
> index 000..b276fbd333e
> --- /dev/null
> +++
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> @@ -0,0 +1,22 @@
> +// { dg-do compile { target c++23 } }
> +#include
> +
> +#include 
> +
> +constexpr size_t dyn = std::dynamic_extent;
> +static constexpr size_t n = std::numeric_limits::max() / 2;
> +
> +template
> +  struct A
> +  {
> +typename Layout::mapping> m0;
> +typename Layout::mapping> m1;
> +typename Layout::mapping> m2;
> +
> +using extents_type = std::extents;
> +typename Layout::mapping m3; // { dg-error "required
> from" }
> +  };
> +
> +A a_left; // { dg-error "required
> from" }
> +
> +// { dg-prune-output "must be representable as index_type" }
> diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> new file mode 100644
> index 000..c96f314818a
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> @@ -0,0 +1,238 @@
> +// { dg-do run { target c++23 } }
> +#include 
> +
> +#include 
> +
> +constexpr size_t dyn = std::dynamic_extent;
> +
> +template
> +  constexpr void
> +  verify(std::extents oexts)
> +  {
> +auto m = Mapping(oexts);
> +VERIFY(m.extents() == oexts);
> +  }
> +
> +template
> +  requires (requires { typename OMapping::layout_type; })
> +  constexpr void
> +  verify(OMapping other)
> +  {
> +constexpr auto rank = Mapping::extents_type::rank();
> +auto m = Mapping(other);
> +VERIFY(m.extents() == other.extents());
> +if constexpr (rank > 0)
> +  for(size_t i = 0; i < rank; ++i)
> +   VERIFY(std::cmp_equal(m.stride(i), other.stride(i)));
> +  }
> +
> +
> +template
> +  constexpr void
> +  verify_convertible(From from)
> +  {
> +static_assert(std::is_convertible_v);
> +verify(from);
> +  }
> +
> +template
> +  constexpr void
> +  verify_nothrow_convertible(From from)
> +  {
> +static_assert(std::is_nothrow_constructible_v);
> +verify_convertible(from);
> +  }
> +
> +
> +template
> +  constexpr void
> +  verify_constructible(From from)
> +  {
> +static_assert(!std::is_convertible_v);
> +static_assert(std::is_constructible_v);
> +verify(from);
> +  }
> +
> +template
> +  constexpr void
> +  verify_nothrow_constructible(From from)
> +  {
> +static_assert(std::is_nothrow_constructible_v);
> +verify_constructible(from);
> +  }
> +
> +template
> +  constexpr void
> +  assert_not_constructible()
> +  {
> +static_assert(!std::is_constructible_v);
> +  }
> +
> +// ctor: mapping(const extents&)
> +namespace from_extents
> +{
> +  template
> +constexpr void
> +verify_nothrow_convertible(OExtents oexts)
> +{
> +  using Mapping = typename Layout::mapping;
> +  ::verify_nothrow_convertible(oexts);
> +}
> +
> +  template
> +constexpr void
> +verify_nothrow_constructible(OExtents oexts)
> +{
> +  using Mapping = typename Layout::mapping;
> +  ::verify_nothrow_constructible(oexts);
> +}
> +
> +  template
> +constexpr void
> +assert_not_constructible()
> +{
> +  using Mapping = typename Layout::mapping;
> +  ::assert_not_constructible();
> +}
> +
> +  template
> +constexpr bool
> +test_ctor()
> +{
> +  verify_nothrow_convertible>(
> +   std::extents{});
> +
> +  verify_nothrow_convertible>(
> +   std::extents{});
> +
> +  verify_nothrow_convertible>(
> +   std::extents{2});
> +
> +  verify_nothrow_constructible>(
> +   std::extents{});
> +
> +  verify_nothrow_constructible>(
> +   std::extents{});
> +
> +  verify_nothrow_const

Re: [PATCH 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-22 Thread Richard Sandiford
Dhruv Chawla  writes:
> On 20/05/25 16:35, Richard Sandiford wrote:
>> Dhruv Chawla  writes:
>>> [...]
>>> Would it be a good idea to add tests for the bad codegen as well? I have 
>>> added tests for lsl/usra in the next round of patches.
>> 
>> Nah, I don't think it's worth testing for something that we don't want.
>> If we can agree on what the "good" code would look like, it might be
>> worth adding that with an xfail, so that we notice when we start to get
>> the good codegen.  But IMO it's also fine to test only the dummy-argument
>> case, as in the new patch.
>
> Yeah, that sounds good to me. For the following test:
>
> svuint16_t
> lsl_usra_8_sve_lsl_operand (svuint16_t r)
> {
>svbool_t pt = svptrue_b16 ();
>return svorr_u16_z (pt, svlsl_n_u16_z (pt, r, 6), svlsr_n_u16_z (pt, r, 
> 10));
> }
>
> Currently:
>
>  lsl z31.h, z0.h, #6
>  mov z30.d, z0.d
>  movprfx z0, z31
>  usraz0.h, z30.h, #10
>  ret
>
> is generated. Would something like:
>
>  mov z31.d, z0.d
>  lsl z0.h, z0.h, #6
>  usraz0.h, z31.h, #10
>  ret
>
> be better?

Yeah, that looks better, although we wouldn't need to use specifically z31
as the temporary register.  It would also be possible to do the mov at the
end, so if we were going to match this, we'd want to allow both move
positions.

Thanks,
Richard

>
>>  
>> On the new patch:
>> 
>>> +/* { dg-final { scan-assembler-times "revb" 0 } } */
>>> +/* { dg-final { scan-assembler-times "revh" 0 } } */
>>> +/* { dg-final { scan-assembler-times "revw" 0 } } */
>> 
>> It would be better to add \ts around the mnemonics, so that we don't
>> accidentally match pieces of filenames that happen to contain "revb":
>> 
>> /* { dg-final { scan-assembler-times "\trevb\t" 0 } } */
>> /* { dg-final { scan-assembler-times "\trevh\t" 0 } } */
>> /* { dg-final { scan-assembler-times "\trevw\t" 0 } } */
>> 
>> Thanks,
>> Richard
>> 
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 90dd5c97a10..b4396837c24 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -2086,37 +2086,6 @@ public:
>> {
>>   return f.fold_const_binary (LSHIFT_EXPR);
>> }
>> -
>> -  rtx expand (function_expander &e) const override
>> -  {
>> -tree pred = TREE_OPERAND (e.call_expr, 3);
>> -tree shift = TREE_OPERAND (e.call_expr, 5);
>> -if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
>> -   && uniform_integer_cst_p (shift))
>> -  return e.use_unpred_insn (e.direct_optab_handler (ashl_optab));
>> -return rtx_code_function::expand (e);
>> -  }
>> -};
>> -
>> -class svlsr_impl : public rtx_code_function
>> -{
>> -public:
>> -  CONSTEXPR svlsr_impl () : rtx_code_function (LSHIFTRT, LSHIFTRT) {}
>> -
>> -  gimple *fold (gimple_folder &f) const override
>> -  {
>> -return f.fold_const_binary (RSHIFT_EXPR);
>> -  }
>> -
>> -  rtx expand (function_expander &e) const override
>> -  {
>> -tree pred = TREE_OPERAND (e.call_expr, 3);
>> -tree shift = TREE_OPERAND (e.call_expr, 5);
>> -if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
>> -   && uniform_integer_cst_p (shift))
>> -  return e.use_unpred_insn (e.direct_optab_handler (lshr_optab));
>> -return rtx_code_function::expand (e);
>> -  }
>>   };
>> 
>>   class svmad_impl : public function_base
>> @@ -3617,7 +3586,7 @@ FUNCTION (svldnt1, svldnt1_impl,)
>>   FUNCTION (svlen, svlen_impl,)
>>   FUNCTION (svlsl, svlsl_impl,)
>>   FUNCTION (svlsl_wide, shift_wide, (ASHIFT, UNSPEC_ASHIFT_WIDE))
>> -FUNCTION (svlsr, svlsr_impl,)
>> +FUNCTION (svlsr, rtx_code_function, (LSHIFTRT, LSHIFTRT))
>>   FUNCTION (svlsr_wide, shift_wide, (LSHIFTRT, UNSPEC_LSHIFTRT_WIDE))
>>   FUNCTION (svmad, svmad_impl,)
>>   FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX,
>> diff --git a/gcc/config/aarch64/aarch64-sve.md 
>> b/gcc/config/aarch64/aarch64-sve.md
>> index 0156afc1e7d..fa431c9c060 100644
>> --- a/gcc/config/aarch64/aarch64-sve.md
>> +++ b/gcc/config/aarch64/aarch64-sve.md
>> @@ -4931,9 +4931,7 @@ (define_expand "3"
>>   if (CONST_INT_P (operands[2]))
>> {
>>  amount = gen_const_vec_duplicate (mode, operands[2]);
>> -   if (!aarch64_sve_shift_operand (operands[2], mode)
>> -   && !aarch64_simd_shift_imm_p (operands[2], mode,
>> - _optab == ashl_optab))
>> +   if (!aarch64_sve_shift_operand (amount, mode))
>>amount = force_reg (mode, amount);
>> }
>>   else
>> @@ -4957,8 +4955,7 @@ (define_expand "v3"
>>UNSPEC_PRED_X))]
>> "TARGET_SVE"
>> {
>> -if (aarch64_simd_shift_imm_p (operands[2], mode,
>> - _optab == ashl_optab))
>> +if (CONSTANT_P (operands[2]))
>> {
>>  emit_insn (gen_aarch64_v3_

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-22 Thread Richard Sandiford
Kugan Vivekanandarajah  writes:
> Add support for autoprofiledbootstrap in aarch64.
> This is similar to what is done for i386. Added
> gcc/config/aarch64/gcc-auto-profile for aarch64 profile
> creation.
>
> How to run:
> configure --with-build-config=bootstrap-lto
> make autoprofiledbootstrap
>
> ChangeLog:
>
>   * Makefile.def: AUTO_PROFILE based on cpu_type.
>   * Makefile.in: Likewise.
>   * configure: Regenerate.
>   * configure.ac: Set autofdo_target.
>
> gcc/ChangeLog:
>
>   * config/aarch64/gcc-auto-profile: New file.
>
> Signed-off-by: Kugan Vivekanandarajah 

OK, thanks.

Richard


Re: Fix PR 118541 (V3), do not generate unordered fp cmoves for IEEE compares

2025-05-22 Thread Surya Kumari Jangala
Hi Mike,
The source code changes are missing.

Regards,
Surya

On 22/05/25 10:46 am, Michael Meissner wrote:
> Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.
> 
> This is version 3 of patch.  I re-implemented the patch to just focus on the
> generation of the XSCMP{EQ,GT,GE}{DP,QP} instructions.
> 
> In bug PR target/118541 on power9, power10, and power11 systems, for the
> function:
> 
> extern double __ieee754_acos (double);
> 
> double
> __acospi (double x)
> {
>   double ret = __ieee754_acos (x) / 3.14;
>   return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
> }
> 
> GCC currently generates the following code:
> 
> Power9  Power10 and Power11
> ==  ===
> bl __ieee754_acos   bl __ieee754_acos@notoc
> nop plfd 0,.LC0@pcrel
> addis 9,2,.LC2@toc@ha   xxspltidp 12,1065353216
> addi 1,1,32 addi 1,1,32
> lfd 0,.LC2@toc@l(9) ld 0,16(1)
> addis 9,2,.LC0@toc@ha   fdiv 0,1,0
> ld 0,16(1)  mtlr 0
> lfd 12,.LC0@toc@l(9)xscmpgtdp 1,0,12
> fdiv 0,1,0  xxsel 1,0,12,1
> mtlr 0  blr
> xscmpgtdp 1,0,12
> xxsel 1,0,12,1
> blr
> 
> This is because ifcvt.c optimizes the conditional floating point move to use 
> the
> XSCMPGTDP instruction.
> 
> However, the XSCMPGTDP instruction will generate an interrupt if one of the
> arguments is a signalling NaN and signalling NaNs can generate an interrupt.
> The IEEE comparison functions (isgreater, etc.) require that the comparison 
> not
> raise an interrupt.
> 
> The root cause of this is we allow floating point comparisons to be reversed
> (i.e. LT will be reversed to UNGE).  Before power9, this was ok because we 
> only
> generated the FCMPU or XSCMPUDP instructions.
> 
> But with power9, we can generate the XSCMPEQDP, XSCMPGTDP, or XSCMPGEDP
> instructions.  This code now does not convert an unordered compare into an
> ordered compare.  Instead, it does the opposite comparison and swaps the
> arguments.  I.e. it converts:
> 
>   r = (a < b) ? c : d;
> 
> into:
> 
>   r = (b >= a) ? c : d;
> 
> For the following code:
> 
> double
> ordered_compare (double a, double b, double c, double d)
> {
>   return __builtin_isgreater (a, b) ? c : d;
> }
> 
> /* Verify normal > does generate xscmpgtdp.  */
> 
> double
> normal_compare (double a, double b, double c, double d)
> {
>   return a > b ? c : d;
> }
> 
> with the following patch, GCC generates the following for power9, power10, and
> power11:
> 
> ordered_compare:
> fcmpu 0,1,2
> fmr 1,4
> bnglr 0
> fmr 1,3
> blr
> 
> normal_compare:
> xscmpgtdp 1,1,2
> xxsel 1,4,3,1
> blr
> 
> I have built bootstrap compilers on big endian power9 systems and little 
> endian
> power9/power10 systems and there were no regressions.  Can I check this patch
> into the GCC trunk, and after a waiting period, can I check this into the 
> active
> older branches?
> 
> 2025-05-21  Michael Meissner  
> 
> gcc/
> 
>   PR target/118541
>   * config/rs6000/predicates.md (invert_fpmask_comparison_operator):
>   Delete.
>   (fpmask_reverse_args_comparison_operator): New predicate.
>   * config/rs6000/rs6000-proto.h (rs6000_fpmask_reverse_args): New
>   declaration.
>   * config/rs6000/rs6000.cc (rs6000_fpmask_reverse_args): New function.
>   * config/rs6000/rs6000.h (REVERSIBLE_CC_MODE): Do not allow floating
>   point comparisons to be reversed unless -ffinite-math-only is used.
>   * config/rs6000/rs6000.md (movcc_p9): Add
>   comment.
>   (movcc_invert_p9): Reverse the argument order for
>   the comparison, and use an unordered comparison, instead of ordered
>   comparison.
>   (movcc_invert_p10): Likewise.
> 
> gcc/testsuite/
> 
>   PR target/118541
>   * gcc.target/powerpc/pr118541.c: New test.
> 



[PING][PATCH v3] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-22 Thread Pengfei Li
Hi,

Just a gentle ping for below patch v3.

I’ve made minor changes from v2 to v3, as listed below:
- Added check if IFN_AVG_FLOOR is supported.
- Wrapped new code in match.pd with macro "#ifdef GIMPLE".

> This patch folds vector expressions of the form (x + y) >> 1 into
> IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that
> support averaging operations. For example, it can help improve the
> codegen on AArch64 from:
> add v0.4s, v0.4s, v31.4s
> ushrv0.4s, v0.4s, 1
> to:
> uhadd   v0.4s, v0.4s, v31.4s

> As this folding is only valid when the most significant bit of each
> element in both x and y is known to be zero, this patch checks leading
> zero bits of elements in x and y, and extends get_nonzero_bits_1() to
> handle uniform vectors. When the input is a uniform vector, the function
> now returns the nonzero bits of its element.

> Additionally, this patch adds more checks to reject vector types in bit
> constant propagation (tree-bit-ccp), since tree-bit-ccp was designed for
> scalar values only, and the new vector logic in get_non_zero_bits_1()
> could lead to incorrect propagation results.

> Bootstrapped and tested on aarch64-linux-gnu and x86_64_linux_gnu.

> gcc/ChangeLog:

> * match.pd: Add folding rule for vector average.
> * tree-ssa-ccp.cc (get_default_value): Reject vector types.
> (evaluate_stmt): Reject vector types.
> * tree-ssanames.cc (get_nonzero_bits_1): Extend to handle
> uniform vectors.

> gcc/testsuite/ChangeLog:

> * gcc.target/aarch64/acle/uhadd_1.c: New test.
> ---
>  gcc/match.pd  | 12 +++
>  .../gcc.target/aarch64/acle/uhadd_1.c | 34 +++
>  gcc/tree-ssa-ccp.cc   |  8 ++---
>  gcc/tree-ssanames.cc  |  8 +
>  4 files changed, 58 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c

> diff --git a/gcc/match.pd b/gcc/match.pd
> index ab496d923cc..52a5800457d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2177,6 +2177,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (view_convert (rshift (view_convert:ntype @0) @1))
>  (convert (rshift (convert:ntype @0) @1))

> +#if GIMPLE
> + /* Fold ((x + y) >> 1 into IFN_AVG_FLOOR (x, y) if x and y are vectors in
> +which each element is known to have at least one leading zero bit.  */
> +(simplify
> + (rshift (plus:cs @0 @1) integer_onep)
> + (if (VECTOR_TYPE_P (type)
> +  && direct_internal_fn_supported_p (IFN_AVG_FLOOR, type, 
> OPTIMIZE_FOR_BOTH)
> +  && wi::clz (get_nonzero_bits (@0)) > 0
> +  && wi::clz (get_nonzero_bits (@1)) > 0)
> +  (IFN_AVG_FLOOR @0 @1)))
> +#endif
> +
>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> when profitable.
> For bitwise binary operations apply operand conversions to the
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> new file mode 100644
> index 000..f1748a199ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
> @@ -0,0 +1,34 @@
> +/* Test if SIMD fused unsigned halving adds are generated */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +#define FUSED_SIMD_UHADD(vectype, q, ts, mask) \
> +  vectype simd_uhadd ## q ## _ ## ts ## _1 (vectype a) \
> +  { \
> +vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
> +vectype v2 = vdup ## q ## _n_ ## ts (mask); \
> +return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
> +  } \
> +  \
> +  vectype simd_uhadd ## q ## _ ## ts ## _2 (vectype a, vectype b) \
> +  { \
> +vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
> +vectype v2 = vand ## q ## _ ## ts (b, vdup ## q ## _n_ ## ts (mask)); \
> +return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
> +  }
> +
> +FUSED_SIMD_UHADD (uint8x8_t, , u8, 0x7f)
> +FUSED_SIMD_UHADD (uint8x16_t, q, u8, 0x7f)
> +FUSED_SIMD_UHADD (uint16x4_t, , u16, 0x7fff)
> +FUSED_SIMD_UHADD (uint16x8_t, q, u16, 0x7fff)
> +FUSED_SIMD_UHADD (uint32x2_t, , u32, 0x7fff)
> +FUSED_SIMD_UHADD (uint32x4_t, q, u32, 0x7fff)
> +
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.8b,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.16b,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.4h,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.8h,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.2s,} 2 } } */
> +/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.4s,} 2 } } */
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 8d2cbb384c4..3e0c75cf2be 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -298,7 +298,7 @@ get_default_value (tree var)
> {
>   val.lattice_val = VARYING;
>

Re: [PATCH] bitintlower: Ensure extension of the most significant limb on info->extended targets

2025-05-22 Thread Richard Biener



> Am 22.05.2025 um 09:31 schrieb Jakub Jelinek :
> 
> On Wed, May 21, 2025 at 12:48:21PM +0200, Jakub Jelinek wrote:
>> 2025-05-21  Jakub Jelinek  
>> 
>>* gimple-lower-bitint.cc (bitint_extended): New variable.
>>(bitint_large_huge::lower_shift_stmt): For LSHIFT_EXPR with
>>bitint_extended if lhs has most significant partial limb extend
>>it afterwards.
>> 
>>* gcc.dg/bitintext.h: New file.
>>* gcc.dg/torture/bitint-82.c: New test.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on
> s390x-linux, ok for trunk?

Ok

Richard 

>Jakub
> 


Re: [PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

2025-05-22 Thread Jan Dubiec

On 23.02.2025 04:59, Jeff Law wrote:
[...]
Thanks!  Just a note we're in stage4 of our development cycle 
(regression bugfixes) as we prepare for gcc-15.  This doesn't look like 
something we would typically make an exception for, it'll have to wait 
for the next development window.  Meaning it probably won't get any 
attention for a couple months.


Jeff



Just BUMP.

/J.D.



Re: [PATCH] libstdc++: Fix vector(from_range_t, R&&) for exceptions [PR120367]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025, 07:21 Tomasz Kaminski,  wrote:

>
>
> On Wed, May 21, 2025 at 5:41 PM Jonathan Wakely 
> wrote:
>
>> Because this constructor delegates to vector(a) the object has been
>> fully constructed and the destructor will run if an exception happens.
>> That means we need to set _M_finish == _M_start so that the destructor
>> doesn't try to destroy any elements.
>>
>> libstdc++-v3/ChangeLog:
>>
>> PR libstdc++/120367
>> * include/bits/stl_vector.h (_M_range_initialize): Initialize
>> _M_impl._M_finish.
>> * testsuite/23_containers/vector/cons/from_range.cc: Check with
>> a type that throws on construction.
>> exceptions during construction.
>> ---
>>
>> Tested x86_64-linux.
>>
>>  libstdc++-v3/include/bits/stl_vector.h|  1 +
>>  .../23_containers/vector/cons/from_range.cc   | 22 +++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/libstdc++-v3/include/bits/stl_vector.h
>> b/libstdc++-v3/include/bits/stl_vector.h
>> index 57680b7bbcf3..43b913da778d 100644
>> --- a/libstdc++-v3/include/bits/stl_vector.h
>> +++ b/libstdc++-v3/include/bits/stl_vector.h
>> @@ -1971,6 +1971,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>> {
>>   pointer __start = this->_M_impl._M_start =
>>
> Not required change, but I was a bit confused where _M_start is set.
> Maybe assign all pointers here?
>   pointer __start = this->_M_impl._M_start =
> this->_M_impl._M_finish =
>

That's how I wrote it at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120367#c4 but it seemed a bit
crowded so I did it on a separate line for the real patch.


this->_M_allocate(_S_check_init_len(__n,
>> _M_get_Tp_allocator()));
>> + this->_M_impl._M_finish = __start;
>>   this->_M_impl._M_end_of_storage = __start + __n;
>>   this->_M_impl._M_finish
>>   = std::__uninitialized_copy_a(_GLIBCXX_MOVE(__first),
>> __last,
>> diff --git
>> a/libstdc++-v3/testsuite/23_containers/vector/cons/from_range.cc
>> b/libstdc++-v3/testsuite/23_containers/vector/cons/from_range.cc
>> index 7a62645283d2..3784b9cd66ad 100644
>> --- a/libstdc++-v3/testsuite/23_containers/vector/cons/from_range.cc
>> +++ b/libstdc++-v3/testsuite/23_containers/vector/cons/from_range.cc
>> @@ -106,8 +106,30 @@ test_constexpr()
>>return true;
>>  }
>>
>> +void
>> +test_pr120367()
>> +{
>> +#ifdef __cpp_exceptions
>> +  struct X
>> +  {
>> +X(int) { throw 1; } // Cannot successfully construct an X.
>> +~X() { VERIFY(false); } // So should never need to destroy one.
>> +  };
>> +
>> +  try
>> +  {
>> +int i[1]{};
>> +std::vector v(std::from_range, i);
>> +  }
>> +  catch (int)
>> +  {
>> +  }
>> +#endif
>> +}
>> +
>>  int main()
>>  {
>>test_ranges();
>>static_assert( test_constexpr() );
>> +  test_pr120367();
>>  }
>> --
>> 2.49.0
>>
>>


Re: [PATCH 01/13] arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS

2025-05-22 Thread Ramana Radhakrishnan
On Wed, May 7, 2025 at 6:18 PM Richard Earnshaw  wrote:
>
> The flattened logic of these functions and the complexity of the
> numerous clauses makes it very difficult to understand what's written
> in these macros.  Additionally, SECONDARY_INPUT_RELOAD_CLASS was not
> laid out with the correct formatting.
>
> Add some parenthesis and re-indent to make the logic clearer.
>
> No functional change.
>
> gcc:
> * config/arm/arm.h (SECONDARY_OUTPUT_RELOAD_CLASS): Add parentheis
> and re-indent.
> (SECONDARY_INPUT_RELOAD_CLASS): Likewise.
> ---
>  gcc/config/arm/arm.h | 55 +++-
>  1 file changed, 29 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 8472b756127..9c3a644873b 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1460,34 +1460,37 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
>  /* Return the register class of a scratch register needed to copy IN into
> or out of a register in CLASS in MODE.  If it can be done directly,
> NO_REGS is returned.  */
> -#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)  \
> -  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */ \
> -  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> -   ? coproc_secondary_reload_class (MODE, X, FALSE)\
> -   : (TARGET_IWMMXT && (CLASS) == IWMMXT_REGS) \
> -   ? coproc_secondary_reload_class (MODE, X, TRUE) \
> -   : TARGET_32BIT  \
> -   ? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1) \
> -? GENERAL_REGS : NO_REGS)  \
> -   : THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X))
> +#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)  \
> +  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */  
>   \
> +  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> +   ? coproc_secondary_reload_class (MODE, X, FALSE)\
> +   : ((TARGET_IWMMXT && (CLASS) == IWMMXT_REGS)  
>   \
> +  ? coproc_secondary_reload_class (MODE, X, TRUE)  \
> +  : (TARGET_32BIT  \
> +? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1)  \
> +   ? GENERAL_REGS  \
> +   : NO_REGS)  \
> +: THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X
>
>  /* If we need to load shorts byte-at-a-time, then we need a scratch.  */
> -#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)   \
> -  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */ \
> -  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> -? coproc_secondary_reload_class (MODE, X, FALSE) : \
> -(TARGET_IWMMXT && (CLASS) == IWMMXT_REGS) ?\
> -coproc_secondary_reload_class (MODE, X, TRUE) :\
> -   (TARGET_32BIT ? \
> -(((CLASS) == IWMMXT_REGS || (CLASS) == IWMMXT_GR_REGS) \
> - && CONSTANT_P (X))\
> -? GENERAL_REGS :   \
> -(((MODE) == HImode && ! arm_arch4  \
> -  && (MEM_P (X)\
> - || ((REG_P (X) || GET_CODE (X) == SUBREG) \
> - && true_regnum (X) == -1)))   \
> - ? GENERAL_REGS : NO_REGS) \
> -: THUMB_SECONDARY_INPUT_RELOAD_CLASS (CLASS, MODE, X)))
> +#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)   \
> +  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */  
>   \
> +  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> +   ? coproc_secondary_reload_class (MODE, X, FALSE)\
> +   : ((TARGET_IWMMXT && (CLASS) == IWMMXT_REGS)  
>   \
> +  ? coproc_secondary_reload_class (MODE, X, TRUE)  \
> +  : (TARGET_32BIT  \
> +? CLASS) == IWMMXT_REGS || (CLASS) == IWMMXT_GR_REGS)  \
> +&& CONSTANT_P (X)) \
> +   ? GENERAL_REGS  \
> +   : (((MODE) == HImode\
> +   && ! arm_arch4  \
> +   && (MEM_P (X)   \
> +   || ((REG_P (X) || GET_CODE (X) == SUBREG)   \
> +   && true_regnum (X) == -1))) \
> +  

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-22 Thread Ramana Radhakrishnan
On Wed, May 7, 2025 at 6:18 PM Richard Earnshaw  wrote:
>
>
> The header file for the Arm implementation of mmintrin.h was changed in GCC-15
> to disable access to the intrinsics.  This patch removes the internal code
> as well.
>
> We still allow -mcpu/-march options for the wmmx cpus, but they are now 
> treated
> in exactly the same way as XScale - generating code for an Armv5te 
> architecture.

I'll review with the docs but I'd prefer to make this change of
behaviour explicit in our documentation.


>
> Richard Earnshaw (13):
>   arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS
>   arm: testsuite: remove iwmmxt tests
>   arm: treat -mcpu/arch=iwmmxt{,2} like XScale
>   arm: remove iWMMX builtins support.
>   arm: Remove iwmmxt patterns.
>   arm: remove IWMMXT checks from MD files.
>   arm: remove support for the iwmmxt ABI variant.
>   arm: Remove iwmmxt support from arm.cc
>   arm: remove iwmmxt-related attributes from machine description
>   arm: cleanup iterators.md after removing iwmmxt
>   arm: remove dead predefines when using WMMX
>   arm: remove most remaining iwmmxt code.
>   arm: remove iwmmxt registers from allocator tables
>
>  gcc/config.gcc |2 +-
>  gcc/config/arm/aout.h  |5 -
>  gcc/config/arm/arm-builtins.cc | 1276 +
>  gcc/config/arm/arm-c.cc|7 -
>  gcc/config/arm/arm-cpus.in |   28 +-
>  gcc/config/arm/arm-generic.md  |4 +-
>  gcc/config/arm/arm-opts.h  |1 -
>  gcc/config/arm/arm-protos.h|8 -
>  gcc/config/arm/arm-tables.opt  |6 -
>  gcc/config/arm/arm-tune.md |   53 +-
>  gcc/config/arm/arm.cc  |  401 +-
>  gcc/config/arm/arm.h   |  169 +--
>  gcc/config/arm/arm.md  |   43 +-
>  gcc/config/arm/arm.opt |3 -
>  gcc/config/arm/constraints.md  |   18 +-
>  gcc/config/arm/iterators.md|   20 +-
>  gcc/config/arm/iwmmxt.md   | 1766 
>  gcc/config/arm/iwmmxt2.md  |  903 
>  gcc/config/arm/marvell-f-iwmmxt.md |  189 ---
>  gcc/config/arm/predicates.md   |8 +-
>  gcc/config/arm/t-arm   |3 -
>  gcc/config/arm/thumb2.md   |2 +-
>  gcc/config/arm/types.md|  123 --
>  gcc/config/arm/unspecs.md  |   29 -
>  gcc/config/arm/vec-common.md   |   31 +-
>  gcc/doc/invoke.texi|2 +-
>  gcc/doc/sourcebuild.texi   |4 -
>  gcc/testsuite/gcc.target/arm/ivopts.c  |3 +-
>  gcc/testsuite/gcc.target/arm/mmx-1.c   |   26 -
>  gcc/testsuite/gcc.target/arm/mmx-2.c   |  166 ---
>  gcc/testsuite/gcc.target/arm/pr64208.c |   25 -
>  gcc/testsuite/gcc.target/arm/pr79145.c |   16 -
>  gcc/testsuite/gcc.target/arm/pr99724.c |   31 -
>  gcc/testsuite/gcc.target/arm/pr99786.c |   30 -
>  gcc/testsuite/lib/target-supports.exp  |   13 -
>  35 files changed, 141 insertions(+), 5273 deletions(-)
>  delete mode 100644 gcc/config/arm/iwmmxt.md
>  delete mode 100644 gcc/config/arm/iwmmxt2.md
>  delete mode 100644 gcc/config/arm/marvell-f-iwmmxt.md
>  delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-1.c
>  delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c
>  delete mode 100644 gcc/testsuite/gcc.target/arm/pr64208.c
>  delete mode 100644 gcc/testsuite/gcc.target/arm/pr79145.c
>  delete mode 100644 gcc/testsuite/gcc.target/arm/pr99724.c
>  delete mode 100644 gcc/testsuite/gcc.target/arm/pr99786.c
>
> --
> 2.43.0
>


Re: [committed v2 01/14] arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS

2025-05-22 Thread Ramana Radhakrishnan
On Mon, May 12, 2025 at 11:50 AM Richard Earnshaw  wrote:
>
> The flattened logic of these functions and the complexity of the
> numerous clauses makes it very difficult to understand what's written
> in these macros.  Additionally, SECONDARY_INPUT_RELOAD_CLASS was not
> laid out with the correct formatting.
>
> Add some parenthesis and re-indent to make the logic clearer.
>
> No functional change.
>
> gcc:
> * config/arm/arm.h (SECONDARY_OUTPUT_RELOAD_CLASS): Add parentheis
> and re-indent.
> (SECONDARY_INPUT_RELOAD_CLASS): Likewise.
> ---
>  gcc/config/arm/arm.h | 55 +++-
>  1 file changed, 29 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 08d3f0dae3d..f8a2da32255 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1460,34 +1460,37 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
>  /* Return the register class of a scratch register needed to copy IN into
> or out of a register in CLASS in MODE.  If it can be done directly,
> NO_REGS is returned.  */
> -#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)  \
> -  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */ \
> -  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> -   ? coproc_secondary_reload_class (MODE, X, FALSE)\
> -   : (TARGET_IWMMXT && (CLASS) == IWMMXT_REGS) \
> -   ? coproc_secondary_reload_class (MODE, X, TRUE) \
> -   : TARGET_32BIT  \
> -   ? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1) \
> -? GENERAL_REGS : NO_REGS)  \
> -   : THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X))
> +#define SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)  \
> +  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */  
>   \
> +  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> +   ? coproc_secondary_reload_class (MODE, X, FALSE)\
> +   : ((TARGET_IWMMXT && (CLASS) == IWMMXT_REGS)  
>   \
> +  ? coproc_secondary_reload_class (MODE, X, TRUE)  \
> +  : (TARGET_32BIT  \
> +? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1)  \
> +   ? GENERAL_REGS  \
> +   : NO_REGS)  \
> +: THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X
>
>  /* If we need to load shorts byte-at-a-time, then we need a scratch.  */
> -#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)   \
> -  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */ \
> -  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> -? coproc_secondary_reload_class (MODE, X, FALSE) : \
> -(TARGET_IWMMXT && (CLASS) == IWMMXT_REGS) ?\
> -coproc_secondary_reload_class (MODE, X, TRUE) :\
> -   (TARGET_32BIT ? \
> -(((CLASS) == IWMMXT_REGS || (CLASS) == IWMMXT_GR_REGS) \
> - && CONSTANT_P (X))\
> -? GENERAL_REGS :   \
> -(((MODE) == HImode && ! arm_arch4  \
> -  && (MEM_P (X)\
> - || ((REG_P (X) || GET_CODE (X) == SUBREG) \
> - && true_regnum (X) == -1)))   \
> - ? GENERAL_REGS : NO_REGS) \
> -: THUMB_SECONDARY_INPUT_RELOAD_CLASS (CLASS, MODE, X)))
> +#define SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)   \
> +  /* Restrict which direct reloads are allowed for VFP/iWMMXt regs.  */  
>   \
> +  ((TARGET_HARD_FLOAT && IS_VFP_CLASS (CLASS)) \
> +   ? coproc_secondary_reload_class (MODE, X, FALSE)\
> +   : ((TARGET_IWMMXT && (CLASS) == IWMMXT_REGS)  
>   \
> +  ? coproc_secondary_reload_class (MODE, X, TRUE)  \
> +  : (TARGET_32BIT  \
> +? CLASS) == IWMMXT_REGS || (CLASS) == IWMMXT_GR_REGS)  \
> +&& CONSTANT_P (X)) \
> +   ? GENERAL_REGS  \
> +   : (((MODE) == HImode\
> +   && ! arm_arch4  \
> +   && (MEM_P (X)   \
> +   || ((REG_P (X) || GET_CODE (X) == SUBREG)   \
> +   && true_regnum (X) == -1))) \
> +

Re: [PATCH] libstdc++: Make debug iterator pointer sequence const [PR116369]

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025, 08:26 Jonathan Wakely,  wrote:

>
>
> On Thu, 15 May 2025, 06:26 François Dumont,  wrote:
>
>> Got
>>
>> On 14/05/2025 18:46, Jonathan Wakely wrote:
>> > On Wed, 14 May 2025 at 17:31, François Dumont 
>> wrote:
>> >> On 12/05/2025 23:03, Jonathan Wakely wrote:
>> >>> On 31/03/25 22:20 +0200, François Dumont wrote:
>>  Hi
>> 
>>  Following this previous patch
>>  https://gcc.gnu.org/pipermail/libstdc++/2024-August/059418.html I've
>>  completed it for the _Safe_unordered_container_base type and
>>  implemented the rest of the change to store the safe iterator
>>  sequence as a pointer-to-const.
>> 
>>   libstdc++: Make debug iterator pointer sequence const [PR116369]
>> 
>>   In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug
>>  sequence
>>   have been made mutable to allow attach iterators to const
>>  containers.
>>   This change completes this fix by also declaring debug unordered
>>  container
>>   members mutable.
>> 
>>   Additionally the debug iterator sequence is now a
>>  pointer-to-const and so
>>   _Safe_sequence_base _M_attach and all other methods are const
>>  qualified.
>>   Symbols export are maintained thanks to __asm directives.
>> 
>> >>> I can't compile this, it seems to be missing changes to
>> >>> safe_local_iterator.tcc:
>> >>>
>> >>> In file included from
>> >>>
>> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.h:444,
>> >>>   from
>> >>> /home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:33:
>> >>>
>> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:
>> >>> In member function ‘typename
>> >>> __gnu_debug::_Distance_traits<_Iterator>::__type
>> >>> __gnu_debug::_Safe_local_iterator<_Iterator,
>> >>> _Sequence>::_M_get_distance_to(const
>> >>> __gnu_debug::_Safe_local_iterator<_Iterator, _Sequence>&) const’:
>> >>>
>> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
>> >>> error: there are no arguments to ‘_M_get_sequence’ that depend on a
>> >>> template parameter, so a declaration of ‘_M_get_sequence’ must be
>> >>> available [-Wtemplate-body]
>> >>> 47 | _M_get_sequence()->bucket_size(bucket()),
>> >>>| ^~~
>> >>>
>> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
>> >>> note: (if you use ‘-fpermissive’, G++ will accept your code, but
>> >>> allowing the use of an undeclared name is deprecated)
>> >>>
>> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:59:18:
>> >>> error: there are no arguments to ‘_M_get_sequence’ that depend on a
>> >>> template parameter, so a declaration of ‘_M_get_sequence’ must be
>> >>> available [-Wtemplate-body]
>> >>> 59 | -_M_get_sequence()->bucket_size(bucket()),
>> >>>|  ^~~
>> >>>
>> >> Yes, sorry, I had already spotted this problem, but only updated the PR
>> >> and not re-sending patch here.
>> >>
>> >>
>>  Also available as a PR
>> 
>>  https://forge.sourceware.org/gcc/gcc-TEST/pulls/47
>> 
>>   /** Detach all singular iterators.
>>    *  @post for all iterators i attached to this sequence,
>>    *   i->_M_version == _M_version.
>>    */
>>   void
>>  -_M_detach_singular();
>>  +_M_detach_singular() const
>>  +
>> __asm("_ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEv");
>> >>> Does this work on all targets?
>> >> No idea ! I thought the symbol name used here just had to match the
>> >> entries in config/abi/pre/gnu.ver.
>> > That linker script is not used for all targets.
>>
>> Ok, got it, I only need to use this when symbol versioning is activated.
>>
>
> I don't think that's right. For targets that don't use gnu.ver we still
> want to preserve the same symbols. They just aren't versioned on those
> targets.
> And e.g. Solaris uses versioning, but a different format, not gnu.ver, and
> I don't remember it the same macro is defined.
>
> Isn't it possible to do this without asm somehow? At least as a fallback
> for targets that don't use gnu.ver
>

Basically this needs more research, and then testing on other targets.



>
>
>> I think this new patch should do it if so.
>>
>> François
>>
>>


Re: [PATCH] bitintlower: Ensure extension of the most significant limb on info->extended targets

2025-05-22 Thread Jakub Jelinek
On Wed, May 21, 2025 at 12:48:21PM +0200, Jakub Jelinek wrote:
> 2025-05-21  Jakub Jelinek  
> 
>   * gimple-lower-bitint.cc (bitint_extended): New variable.
>   (bitint_large_huge::lower_shift_stmt): For LSHIFT_EXPR with
>   bitint_extended if lhs has most significant partial limb extend
>   it afterwards.
> 
>   * gcc.dg/bitintext.h: New file.
>   * gcc.dg/torture/bitint-82.c: New test.

Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on
s390x-linux, ok for trunk?

Jakub



Re: [PATCH] libstdc++: Make debug iterator pointer sequence const [PR116369]

2025-05-22 Thread Jonathan Wakely
On Thu, 15 May 2025, 06:26 François Dumont,  wrote:

> Got
>
> On 14/05/2025 18:46, Jonathan Wakely wrote:
> > On Wed, 14 May 2025 at 17:31, François Dumont 
> wrote:
> >> On 12/05/2025 23:03, Jonathan Wakely wrote:
> >>> On 31/03/25 22:20 +0200, François Dumont wrote:
>  Hi
> 
>  Following this previous patch
>  https://gcc.gnu.org/pipermail/libstdc++/2024-August/059418.html I've
>  completed it for the _Safe_unordered_container_base type and
>  implemented the rest of the change to store the safe iterator
>  sequence as a pointer-to-const.
> 
>   libstdc++: Make debug iterator pointer sequence const [PR116369]
> 
>   In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug
>  sequence
>   have been made mutable to allow attach iterators to const
>  containers.
>   This change completes this fix by also declaring debug unordered
>  container
>   members mutable.
> 
>   Additionally the debug iterator sequence is now a
>  pointer-to-const and so
>   _Safe_sequence_base _M_attach and all other methods are const
>  qualified.
>   Symbols export are maintained thanks to __asm directives.
> 
> >>> I can't compile this, it seems to be missing changes to
> >>> safe_local_iterator.tcc:
> >>>
> >>> In file included from
> >>>
> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.h:444,
> >>>   from
> >>> /home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:33:
> >>>
> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:
> >>> In member function ‘typename
> >>> __gnu_debug::_Distance_traits<_Iterator>::__type
> >>> __gnu_debug::_Safe_local_iterator<_Iterator,
> >>> _Sequence>::_M_get_distance_to(const
> >>> __gnu_debug::_Safe_local_iterator<_Iterator, _Sequence>&) const’:
> >>>
> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
> >>> error: there are no arguments to ‘_M_get_sequence’ that depend on a
> >>> template parameter, so a declaration of ‘_M_get_sequence’ must be
> >>> available [-Wtemplate-body]
> >>> 47 | _M_get_sequence()->bucket_size(bucket()),
> >>>| ^~~
> >>>
> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
> >>> note: (if you use ‘-fpermissive’, G++ will accept your code, but
> >>> allowing the use of an undeclared name is deprecated)
> >>>
> /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:59:18:
> >>> error: there are no arguments to ‘_M_get_sequence’ that depend on a
> >>> template parameter, so a declaration of ‘_M_get_sequence’ must be
> >>> available [-Wtemplate-body]
> >>> 59 | -_M_get_sequence()->bucket_size(bucket()),
> >>>|  ^~~
> >>>
> >> Yes, sorry, I had already spotted this problem, but only updated the PR
> >> and not re-sending patch here.
> >>
> >>
>  Also available as a PR
> 
>  https://forge.sourceware.org/gcc/gcc-TEST/pulls/47
> 
>   /** Detach all singular iterators.
>    *  @post for all iterators i attached to this sequence,
>    *   i->_M_version == _M_version.
>    */
>   void
>  -_M_detach_singular();
>  +_M_detach_singular() const
>  +
> __asm("_ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEv");
> >>> Does this work on all targets?
> >> No idea ! I thought the symbol name used here just had to match the
> >> entries in config/abi/pre/gnu.ver.
> > That linker script is not used for all targets.
>
> Ok, got it, I only need to use this when symbol versioning is activated.
>

I don't think that's right. For targets that don't use gnu.ver we still
want to preserve the same symbols. They just aren't versioned on those
targets.
And e.g. Solaris uses versioning, but a different format, not gnu.ver, and
I don't remember it the same macro is defined.

Isn't it possible to do this without asm somehow? At least as a fallback
for targets that don't use gnu.ver



> I think this new patch should do it if so.
>
> François
>
>


Re: [PATCH 1/2] libstdc++: Define _Scoped_allocation RAII helper

2025-05-22 Thread Jonathan Wakely
On Thu, 22 May 2025 at 10:38, Tomasz Kamiński  wrote:
>
> From: Jonathan Wakely 
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/allocated_ptr.h (_Scoped_allocation): New class
> template.
>
> Co-Authored-By: Tomasz Kamiński 
> Signed-off-by: Tomasz Kamiński 
> ---
> Tested on x86_64-linux. OK for trunk?

OK, thanks.

>
>  libstdc++-v3/include/bits/allocated_ptr.h | 96 +++
>  1 file changed, 96 insertions(+)
>
> diff --git a/libstdc++-v3/include/bits/allocated_ptr.h 
> b/libstdc++-v3/include/bits/allocated_ptr.h
> index 0b2b6fe5820..aa5355f0e2f 100644
> --- a/libstdc++-v3/include/bits/allocated_ptr.h
> +++ b/libstdc++-v3/include/bits/allocated_ptr.h
> @@ -36,6 +36,7 @@
>  # include 
>  # include 
>  # include 
> +# include 
>
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
> @@ -136,6 +137,101 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>return { std::__allocate_guarded(__a) };
>  }
>
> +  // An RAII type that acquires memory from an allocator.
> +  // N.B.  'scoped' here in in the RAII sense, not the scoped allocator 
> model,
> +  // so this has nothing to do with `std::scoped_allocator_adaptor`.
> +  // This class can be used to simplify the common pattern:
> +  //
> +  // auto ptr = alloc.allocate(1);
> +  // try {
> +  //   std::construct_at(std::to_address(ptr), args);
> +  //   m_ptr = ptr;
> +  // } catch (...) {
> +  //   alloc.deallocate(ptr, 1);
> +  //   throw;
> +  // }
> +  //
> +  // Instead you can do:
> +  //
> +  // _Scoped_allocation sa(alloc);
> +  // m_ptr = std::construct_at(sa.get(), args);
> +  // (void) sa.release();
> +  //
> +  // Or even simpler:
> +  //
> +  // _Scoped_allocation sa(alloc, std::in_place, args);
> +  // m_ptr = sa.release();
> +  //
> +  template
> +struct _Scoped_allocation
> +{
> +  using value_type = typename allocator_traits<_Alloc>::value_type;
> +  using pointer = typename allocator_traits<_Alloc>::pointer;
> +
> +  // Use `a` to allocate memory for `n` objects.
> +  constexpr explicit
> +  _Scoped_allocation(const _Alloc& __a, size_t __n = 1)
> +  : _M_a(__a), _M_n(__n), _M_p(_M_a.allocate(__n))
> +  { }
> +
> +#if __glibcxx_optional >= 201606L
> +  // Allocate memory for a single object and if that succeeds,
> +  // construct an object using args.
> +  //
> +  // Does not do uses-allocator construction; don't use if you need that.
> +  //
> +  // CAUTION: the destructor will *not* destroy this object, it will only
> +  // free the memory. That means the following pattern is unsafe:
> +  //
> +  // _Scoped_allocation  sa(alloc, in_place, args);
> +  // potentially_throwing_operations();
> +  // return sa.release();
> +  //
> +  // If the middle operation throws, the object will not be destroyed.
> +  template
> +   constexpr explicit
> +   _Scoped_allocation(const _Alloc& __a, in_place_t, _Args&&... __args)
> +   : _Scoped_allocation(__a, 1)
> +   {
> + // The target constructor has completed, so if the next line throws,
> + // the destructor will deallocate the memory.
> + allocator_traits<_Alloc>::construct(_M_a, get(),
> + std::forward<_Args>(__args)...);
> +   }
> +#endif
> +
> +  _GLIBCXX20_CONSTEXPR
> +  ~_Scoped_allocation()
> +  {
> +   if (_M_p) [[__unlikely__]]
> + _M_a.deallocate(_M_p, _M_n);
> +  }
> +
> +  _Scoped_allocation(_Scoped_allocation&&) = delete;
> +
> +  constexpr _Alloc
> +  get_allocator() const noexcept { return _M_a; }
> +
> +  constexpr value_type*
> +  get() const noexcept
> +  { return std::__to_address(_M_p); }
> +
> +  [[__nodiscard__]]
> +  constexpr pointer
> +  release() noexcept { return std::__exchange(_M_p, nullptr); }
> +
> +private:
> +  [[__no_unique_address__]] _Alloc _M_a;
> +  size_t _M_n;
> +  pointer _M_p;
> +};
> +
> +#if __glibcxx_optional >= 201606L && __cpp_deduction_guides >= 201606L
> +  template
> +_Scoped_allocation(_Alloc, in_place_t, _Args...)
> +  -> _Scoped_allocation<_Alloc>;
> +#endif
> +
>  /// @endcond
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
> --
> 2.49.0
>



Re: [PATCH 2/2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Jonathan Wakely

On 22/05/25 11:19 +0200, Tomasz Kamiński wrote:

From: Jonathan Wakely 

This papers implements C++27 std::indirect as specified
in P3019 with ammendment to move assgiment from LWG 4251.

PR libstdc++/119152

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/indirect.h: New file.
* include/bits/version.def (indirect): Define.
* include/bits/version.h: Regenerate.
* include/std/memory: Include new header.
* testsuite/std/memory/indirect/copy.cc
* testsuite/std/memory/indirect/copy_alloc.cc
* testsuite/std/memory/indirect/ctor.cc
* testsuite/std/memory/indirect/incomplete.cc
* testsuite/std/memory/indirect/invalid_neg.cc
* testsuite/std/memory/indirect/move.cc
* testsuite/std/memory/indirect/move_alloc.cc
* testsuite/std/memory/indirect/relops.cc

Co-Authored-By: Tomasz Kamiński 
Signed-off-by: Tomasz Kamiński 
---
Tested on x86_64-linux. OK for trunk?


Obviously I think it's OK because I wrote most of the code, but let's
wait a little while to see if others have comments.

If nobody else comments by the end of tomorrow then please push to
trunk.

And thanks for finishing this work!


libstdc++-v3/include/Makefile.am  |   1 +
libstdc++-v3/include/Makefile.in  |   1 +
libstdc++-v3/include/bits/indirect.h  | 459 ++
libstdc++-v3/include/bits/version.def |   9 +
libstdc++-v3/include/bits/version.h   |  10 +
libstdc++-v3/include/std/memory   |   5 +
.../testsuite/std/memory/indirect/copy.cc | 121 +
.../std/memory/indirect/copy_alloc.cc | 228 +
.../testsuite/std/memory/indirect/ctor.cc | 203 
.../std/memory/indirect/incomplete.cc |  38 ++
.../std/memory/indirect/invalid_neg.cc|  28 ++
.../testsuite/std/memory/indirect/move.cc | 144 ++
.../std/memory/indirect/move_alloc.cc | 296 +++
.../testsuite/std/memory/indirect/relops.cc   |  82 
14 files changed, 1625 insertions(+)
create mode 100644 libstdc++-v3/include/bits/indirect.h
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/copy_alloc.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/incomplete.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/invalid_neg.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/move_alloc.cc
create mode 100644 libstdc++-v3/testsuite/std/memory/indirect/relops.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 3e5b6c4142e..b67d470c27e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -210,6 +210,7 @@ bits_headers = \
${bits_srcdir}/gslice_array.h \
${bits_srcdir}/hashtable.h \
${bits_srcdir}/hashtable_policy.h \
+   ${bits_srcdir}/indirect.h \
${bits_srcdir}/indirect_array.h \
${bits_srcdir}/ios_base.h \
${bits_srcdir}/istream.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 3531162b5f7..6f7f2be68fd 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -563,6 +563,7 @@ bits_freestanding = \
@GLIBCXX_HOSTED_TRUE@   ${bits_srcdir}/gslice_array.h \
@GLIBCXX_HOSTED_TRUE@   ${bits_srcdir}/hashtable.h \
@GLIBCXX_HOSTED_TRUE@   ${bits_srcdir}/hashtable_policy.h \
+@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/indirect.h \
@GLIBCXX_HOSTED_TRUE@   ${bits_srcdir}/indirect_array.h \
@GLIBCXX_HOSTED_TRUE@   ${bits_srcdir}/ios_base.h \
@GLIBCXX_HOSTED_TRUE@   ${bits_srcdir}/istream.tcc \
diff --git a/libstdc++-v3/include/bits/indirect.h 
b/libstdc++-v3/include/bits/indirect.h
new file mode 100644
index 000..32b2af9117d
--- /dev/null
+++ b/libstdc++-v3/include/bits/indirect.h
@@ -0,0 +1,459 @@
+// Vocabulary Types for Composite Class Design -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have receive

[committed] libstdc++: Fix PSTL test iterators

2025-05-22 Thread Jonathan Wakely
These were fixed upstream by:
https://github.com/uxlfoundation/oneDPL/pull/534
https://github.com/uxlfoundation/oneDPL/pull/546

libstdc++-v3/ChangeLog:

* testsuite/util/pstl/test_utils.h (ForwardIterator::operator++):
Fix return type.
(BidirectionalIterator::operator++): Likewise.
(BidirectionalIterator::operator--): Likewise.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/testsuite/util/pstl/test_utils.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/testsuite/util/pstl/test_utils.h 
b/libstdc++-v3/testsuite/util/pstl/test_utils.h
index 55b510098a04..9c61a7145f59 100644
--- a/libstdc++-v3/testsuite/util/pstl/test_utils.h
+++ b/libstdc++-v3/testsuite/util/pstl/test_utils.h
@@ -154,7 +154,7 @@ class ForwardIterator
 explicit ForwardIterator(Iterator i) : my_iterator(i) {}
 reference operator*() const { return *my_iterator; }
 Iterator operator->() const { return my_iterator; }
-ForwardIterator
+ForwardIterator&
 operator++()
 {
 ++my_iterator;
@@ -194,13 +194,13 @@ class BidirectionalIterator : public 
ForwardIterator
 explicit BidirectionalIterator(Iterator i) : base_type(i) {}
 BidirectionalIterator(const base_type& i) : base_type(i.iterator()) {}
 
-BidirectionalIterator
+BidirectionalIterator&
 operator++()
 {
 ++base_type::my_iterator;
 return *this;
 }
-BidirectionalIterator
+BidirectionalIterator&
 operator--()
 {
 --base_type::my_iterator;
-- 
2.49.0



Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-05-22 Thread Robin Dapp

Hi Paul-Antoine,


Please find attached a revised version of the patch.

Compared to the previous iteration, I have:
* Rebased on top of Pan's work;
* Updated the cost model;
* Added a second pattern to handle the case where PLUS_MINUS operands 
are swapped;

* Added compile and run tests.

I bootstrapped and regtested against rv64gcv.


We need to replace all "FR to VR" uses with the new function and IMHO
better done in a separate patch rather than folded into this one.

Also, please add f16 tests still.  Not that I expect fallout but just for 
completeness's sake.


Please CC patchworks...@rivosinc.com for the next version so we have CI
coverage.

+(define_insn_and_split "*_vf_" 
+  [(set (match_operand:V_VLSF 0 "register_operand""=vd")
+(plus_minus:V_VLSF  
+   (mult:V_VLSF 
+ (vec_duplicate:V_VLSF  
+   (match_operand: 1 "register_operand" "  f"))
+ (match_operand:V_VLSF 2 "register_operand"  "  0"))
+   (match_operand:V_VLSF 3 "register_operand"" vr")))]  
+  "TARGET_VECTOR && can_create_pseudo_p ()"


Nit: The constraints look unaligned here, or is that just my editor?

In the run test params you use

+/* { dg-additional-options "--param=gpr2vr-cost=0" } */

That should rather be fpr2vr-cost=.


Are you going to do the other fma variants in follow ups still?

--
Regards
Robin



Re: [PATCH v3 4/9] libstdc++: Implement layout_left from mdspan.

2025-05-22 Thread Luc Grosheintz

I think part of this didn't get incorporated because I was too hasty
sending v3. The other I just didn't deem useful (I inline the function
for v4).

There's a default initialization bug I need to fix: _M_exts and
_M_strides must be value initialized.

Then also the registration in std.cc.in & I'll squash the first three
commits.

I'll send v4 later this afternoon, please let me know if you're still
reviewing (so I don't make the same mistake again).

On 5/22/25 12:43, Tomasz Kaminski wrote:

On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz 
wrote:


Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++-v3/ChangeLog:

 * include/std/mdspan (layout_left): New class.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan | 307 +++-
  1 file changed, 306 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index e5b1b2596d9..66c9d2cffac 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __exts[__i]; });
   }

+   static constexpr span
+   _S_static_extents(size_t __begin, size_t __end) noexcept
+   {
+ return {_Extents.data() + __begin, _Extents.data() + __end};
+   }
+
+   constexpr span
+   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
+   requires (_Extents.size() > 0)
+   {
+ return {_M_dyn_exts + _S_dynamic_index[__begin],
+ _M_dyn_exts + _S_dynamic_index[__end]};
+   }
+
private:
 using _S_storage = __array_traits<_IndexType,
_S_rank_dynamic>::_Type;
 [[no_unique_address]] _S_storage _M_dyn_exts;
@@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 || _Extent <= numeric_limits<_IndexType>::max();
}

+  namespace __mdspan
+  {
+template
+  constexpr span
+  __static_extents(size_t __begin = 0, size_t __end =
_Extents::rank())
+  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }
+
+template
+  constexpr span
+  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
+   size_t __end = _Extents::rank())
+  {
+   return __exts._M_exts._M_dynamic_extents(__begin, __end);
+  }
+  }
+
template
  class extents
  {
@@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _M_exts(span(__exts))
 { }

-
template<__mdspan::__valid_index_type _OIndexType,
size_t _Nm>
 requires (_Nm == rank() || _Nm == rank_dynamic())
 constexpr explicit(_Nm != rank_dynamic())
@@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }

  private:
+  friend span
+  __mdspan::__static_extents(size_t, size_t);
+
+  friend span
+  __mdspan::__dynamic_extents(const extents&, size_t,
size_t);
+
using _S_storage = __mdspan::_ExtentsStorage<
 _IndexType, array{_Extents...}>;
[[no_unique_address]] _S_storage _M_exts;
@@ -286,6 +321,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

namespace __mdspan
{
+template


I have suggested in other e-mail, that we could pass auto const&,
and instantiatie this with reference to array that is NTTP to storage.


+  constexpr size_t
+  __static_extents_prod(size_t __begin, size_t __end)
+  {
+   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
+   size_t __ret = 1;
+   for (auto __factor : __sta_exts)
+ if (__factor != dynamic_extent)
+   __ret *= __factor;
+   return __ret;
+  }
+
+template
+  constexpr size_t
+  __dynamic_extents_prod(const _Extents& __exts, size_t __begin,
+size_t __end)
+  {
+   auto __dyn_exts = __dynamic_extents<_Extents>(__exts, __begin,
Template parameter is uncessary, it can be deduced.
+__end);
+   size_t __ret = 1;
+   for (auto __factor : __dyn_exts)
+   __ret *= __factor;
+   return __ret;
+  }
+
+template
+  constexpr typename _Extents::index_type
+  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end)
noexcept
+  {
+   using _IndexType = typename _Extents::index_type;
+   _IndexType __ret = 1;
+   if constexpr (_Extents::rank_dynamic() != _Extents::rank())
+ __ret = _IndexType(__static_extents_prod<_Extents>(__begin,
__end));
+   if constexpr (_Extents::rank_dynamic() > 0)
+ __ret *= __dynamic_extents_prod(__exts, __begin, __end);


I would inline the funciton here:
+   for (auto __factor : __dynamic_extents(__exts, __begin, __end))
+   __ret *= __factor;


+   return __ret;
+  }
+
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  { return __exts_prod(__exts, 0, __r)

Re: [PATCH 2/2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-22 Thread Tomasz Kaminski
On Thu, May 22, 2025 at 11:59 AM Jakub Jelinek  wrote:

> On Thu, May 22, 2025 at 11:19:25AM +0200, Tomasz Kamiński wrote:
> > From: Jonathan Wakely 
> >
> > This papers implements C++27 std::indirect as specified
>
> s/27/26/
>
> > in P3019 with ammendment to move assgiment from LWG 4251.
>
> s/assgiment/assignment/
>
Fixed both locally. Thanks.

>
> Jakub
>
>


[PATCH 1/2] libstdc++: Fix concept checks for std::unique_copy [PR120384]

2025-05-22 Thread Jonathan Wakely
This looks to have been wrong since r0-125454-gea89b2482f97aa which
introduced the predefined_ops.h. Since that change, the binary predicate
passed to std::__unique_copy is _Iter_comp_iter, which takes arguments
of the iterator type, not the iterator's value type.

This removes the checks from the __unique_copy overloads and moves them
into the second overload of std::unique_copy, where we have the original
binary predicate, not the adapted one from predefined_ops.h.

The third __unique_copy overload currently checks that the predicate is
callable with the input range value type and the output range value
type. This change alters that, so that we only ever check that the
predicate can be called with two arguments of the same type. That is
intentional, because calling the predicate with different types is a bug
that will be fixed in a later commit (see PR libstdc++/120386).

libstdc++-v3/ChangeLog:

PR libstdc++/120384
* include/bits/stl_algo.h (__unique_copy): Remove all
_BinaryPredicateConcept concept checks.
(unique_copy): Check _BinaryPredicateConcept in overload that
takes a predicate.
* testsuite/25_algorithms/unique_copy/120384.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/stl_algo.h| 17 +++--
 .../25_algorithms/unique_copy/120384.cc | 12 
 2 files changed, 15 insertions(+), 14 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 71ead103d2bf..f5361aeab7e2 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -932,11 +932,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _OutputIterator __result, _BinaryPredicate __binary_pred,
  forward_iterator_tag, output_iterator_tag)
 {
-  // concept requirements -- iterators already checked
-  __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
- typename iterator_traits<_ForwardIterator>::value_type,
- typename iterator_traits<_ForwardIterator>::value_type>)
-
   _ForwardIterator __next = __first;
   *__result = *__first;
   while (++__next != __last)
@@ -962,11 +957,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _OutputIterator __result, _BinaryPredicate __binary_pred,
  input_iterator_tag, output_iterator_tag)
 {
-  // concept requirements -- iterators already checked
-  __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
- typename iterator_traits<_InputIterator>::value_type,
- typename iterator_traits<_InputIterator>::value_type>)
-
   typename iterator_traits<_InputIterator>::value_type __value = *__first;
   __decltype(__gnu_cxx::__ops::__iter_comp_val(__binary_pred))
__rebound_pred
@@ -995,10 +985,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _ForwardIterator __result, _BinaryPredicate __binary_pred,
  input_iterator_tag, forward_iterator_tag)
 {
-  // concept requirements -- iterators already checked
-  __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
- typename iterator_traits<_ForwardIterator>::value_type,
- typename iterator_traits<_InputIterator>::value_type>)
   *__result = *__first;
   while (++__first != __last)
if (!__binary_pred(__result, __first))
@@ -4505,6 +4491,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   __glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
typename iterator_traits<_InputIterator>::value_type>)
   __glibcxx_requires_valid_range(__first, __last);
+  __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
+ typename iterator_traits<_InputIterator>::value_type,
+ typename iterator_traits<_InputIterator>::value_type>)
 
   if (__first == __last)
return __result;
diff --git a/libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc 
b/libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc
new file mode 100644
index ..27cd3375acae
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/unique_copy/120384.cc
@@ -0,0 +1,12 @@
+// { dg-options "-D_GLIBCXX_CONCEPT_CHECKS" }
+// { dg-do compile }
+
+// PR 120384 _BinaryPredicateConcept checks in std::unique_copy are wrong
+
+#include 
+
+void
+test_pr120384(const int* first, const int* last, int* out)
+{
+  std::unique_copy(first, last, out);
+}
-- 
2.49.0



[PATCH 2/2] libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]

2025-05-22 Thread Jonathan Wakely
The current overload set for __unique_copy handles three cases:

- The input range uses forward iterators, the output range does not.
  This is the simplest case, and can just compare adjacent elements of
  the input range.

- Neither the input range nor output range use forward iterators.
  This requires a local variable copied from the input range and updated
  by assigning each element to the local variable.

- The output range uses forward iterators.
  For this case we compare the current element from the input range with
  the element just written to the output range.

There are two problems with this implementation. Firstly, the third case
assumes that the value type of the output range can be compared to the
value type of the input range, which might not be possible at all, or
might be possible but give different results to comparing elements of
the input range. This is the problem identified in LWG 2439.

Secondly, the third case is used when both ranges use forward iterators,
even though the first case could (and should) be used. This means that
we compare elements from the output range instead of the input range,
with the problems described above (either not well-formed, or might give
the wrong results).

The cause of the second problem is that the overload for the first case
looks like:

OutputIterator
__unique_copy(ForwardIter, ForwardIter, OutputIterator, BinaryPred,
  forward_iterator_tag, output_iterator_tag);

When the output range uses forward iterators this overload cannot be
used, because forward_iterator_tag does not inherit from
output_iterator_tag, so is not convertible to it.

To fix these problems we need to implement the resolution of LWG 2439 so
that the third case is only used when the value types of the two ranges
are the same. This ensures that the comparisons are well behaved. We
also need to ensure that the first case is used when both ranges use
forward iterators.

This change replaces a single step of tag dispatching to choose between
three overloads with two step of tag dispatching, choosing between two
overloads at each step. The first step dispatches based on the iterator
category of the input range, ignoring the category of the output range.
The second step only happens when the input range uses non-forward
iterators, and dispatches based on the category of the output range and
whether the value type of the two ranges is the same. So now the cases
that are handled are:

- The input range uses forward iterators.
- The output range uses non-forward iterators or a different value type.
- The output range uses forward iterators and has the same value type.

For the second case, the old code used __gnu_cxx::__ops::__iter_comp_val
to wrap the predicate in another level of indirection. That seems
unnecessary, as we can just use a pointer to the local variable instead
of an iterator referring to it.

libstdc++-v3/ChangeLog:

PR libstdc++/120386
* include/bits/stl_algo.h (__unique_copy_1): New overloads for
the case where the input range uses non-forward iterators.
(__unique_copy): Replace three overloads with two, depending
only on the iterator category of the input range. Dispatch to
__unique_copy_1 for the non-forward case.
(unique_copy): Only pass the input range category to
__unique_copy.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/stl_algo.h | 80 +++-
 1 file changed, 44 insertions(+), 36 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index f5361aeab7e2..c0bb17f9c8b2 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -918,24 +918,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __gnu_cxx::__ops::__iter_comp_iter(__binary_pred));
 }
 
-  /**
-   *  This is an uglified
-   *  unique_copy(_InputIterator, _InputIterator, _OutputIterator,
-   *  _BinaryPredicate)
-   *  overloaded for forward iterators and output iterator as result.
-  */
+  // Implementation of std::unique_copy for forward iterators.
+  // This case is easy, just compare *i with *(i-1).
   template
 _GLIBCXX20_CONSTEXPR
 _OutputIterator
 __unique_copy(_ForwardIterator __first, _ForwardIterator __last,
  _OutputIterator __result, _BinaryPredicate __binary_pred,
- forward_iterator_tag, output_iterator_tag)
+ forward_iterator_tag)
 {
   _ForwardIterator __next = __first;
   *__result = *__first;
   while (++__next != __last)
-   if (!__binary_pred(__first, __next))
+   if (!__binary_pred(__next, __first))
  {
__first = __next;
*++__result = *__first;
@@ -943,27 +939,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return ++__result;
 }
 
-  /**
-   *  This is an uglified
-   *  unique_copy(_InputIterator, _InputIterator, _OutputIterator,
-   *  _

Re: [PATCH 2/2] aarch64: Improve rtx_cost for constants in COMPARE [PR120372]

2025-05-22 Thread Richard Sandiford
Andrew Pinski  writes:
> The middle-end uses rtx_cost on constants with the outer of being COMPARE
> to find out the cost of a constant formation for a comparison instruction.
> So for aarch64 backend, we would just return the cost of constant formation
> in general. We can improve this by seeing if the outer is COMPARE and if
> the constant fits the constraints of the cmp instruction just set the costs
> to being one instruction.
>
> Built and tested for aarch64-linux-gnu.
>
>   PR target/120372
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_rtx_costs ): 
> Handle
>   if outer is COMPARE and the constant can be handled by the cmp 
> instruction.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/imm_choice_comparison-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc |  7 ++
>  .../aarch64/imm_choice_comparison-2.c | 90 +++
>  2 files changed, 97 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/imm_choice_comparison-2.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 1da615c8955..c747ad42ac4 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14578,6 +14578,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> outer ATTRIBUTE_UNUSED,
>we don't need to consider that here.  */
>if (x == const0_rtx)
>   *cost = 0;
> +  /* If the outer is a COMPARE which is used by the middle-end
> +  and the constant fits how the cmp instruction allows, say the cost
> +  is the same as 1 insn.  */
> +  else if (outer == COMPARE
> +&& (aarch64_uimm12_shift (INTVAL (x))
> +|| aarch64_uimm12_shift (- (unsigned HOST_WIDE_INT) INTVAL 
> (x

Would be shorter as -UINTVAL.

OK with that change, thanks.

Richard

> + *cost = COSTS_N_INSNS (1);
>else
>   {
> /* To an approximation, building any other constant is
> diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison-2.c 
> b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison-2.c
> new file mode 100644
> index 000..379fc50563c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison-2.c
> @@ -0,0 +1,90 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/* PR target/120372 */
> +
> +/* Go from 2 moves to none.  */
> +
> +/*
> +** GT:
> +**   ...
> +**   cmp w0, 11182080
> +**   ...
> +*/
> +
> +int
> +GT (unsigned int x)
> +{
> +  return x > 0xaa9fff;
> +}
> +
> +/*
> +** LE:
> +**   ...
> +**   cmp w0, 11182080
> +**   ...
> +*/
> +
> +int
> +LE (unsigned int x)
> +{
> +  return x <= 0xaa9fff;
> +}
> +
> +/*
> +** GE:
> +**   ...
> +**   cmp x0, 11182080
> +**   ...
> +*/
> +
> +int
> +GE (long long x)
> +{
> +  return x >= 0xaaa000;
> +}
> +
> +/*
> +** LT:
> +**   ...
> +**   cmp w0, 11182080
> +**   ...
> +*/
> +
> +int
> +LT (int x)
> +{
> +  return x < 0xaaa000;
> +}
> +
> +/* Optimize the immediate in conditionals.  */
> +
> +/*
> +** check:
> +**   ...
> +**   cmp w0, 11182080
> +**   ...
> +*/
> +
> +int
> +check (int x, int y)
> +{
> +  if (x > y && GT (x))
> +return 100;
> +
> +  return x;
> +}
> +
> +/*
> +** tern:
> +**   ...
> +**   cmp w0, 11182080
> +**   ...
> +*/
> +
> +int
> +tern (int x)
> +{
> +  return x >= 0xaaa000 ? 5 : -3;
> +}


[no subject]

2025-05-22 Thread Tomasz Kamiński
This patch series implements C++26 std::indirect, the majority
of the implementation was provied by Jonathan Wakely, with tests
and minor fixes added by me, that can be found here:
https://forge.sourceware.org/tkaminsk/gcc/commit/d461c4826c61138f69bb533defbc22c6f6305e1a



Re: [PATCH v3 4/9] libstdc++: Implement layout_left from mdspan.

2025-05-22 Thread Tomasz Kaminski
I am still reviewing. Should be able to get throu all of them today.

On Thu, May 22, 2025 at 1:29 PM Luc Grosheintz 
wrote:

> I think part of this didn't get incorporated because I was too hasty
> sending v3. The other I just didn't deem useful (I inline the function
> for v4).
>
> There's a default initialization bug I need to fix: _M_exts and
> _M_strides must be value initialized.
>
> Then also the registration in std.cc.in & I'll squash the first three
> commits.
>
> I'll send v4 later this afternoon, please let me know if you're still
> reviewing (so I don't make the same mistake again).
>
> On 5/22/25 12:43, Tomasz Kaminski wrote:
> > On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz <
> luc.groshei...@gmail.com>
> > wrote:
> >
> >> Implements the parts of layout_left that don't depend on any of the
> >> other layouts.
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >>  * include/std/mdspan (layout_left): New class.
> >>
> >> Signed-off-by: Luc Grosheintz 
> >> ---
> >>   libstdc++-v3/include/std/mdspan | 307 +++-
> >>   1 file changed, 306 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/libstdc++-v3/include/std/mdspan
> >> b/libstdc++-v3/include/std/mdspan
> >> index e5b1b2596d9..66c9d2cffac 100644
> >> --- a/libstdc++-v3/include/std/mdspan
> >> +++ b/libstdc++-v3/include/std/mdspan
> >> @@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>{ return __exts[__i]; });
> >>}
> >>
> >> +   static constexpr span
> >> +   _S_static_extents(size_t __begin, size_t __end) noexcept
> >> +   {
> >> + return {_Extents.data() + __begin, _Extents.data() + __end};
> >> +   }
> >> +
> >> +   constexpr span
> >> +   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
> >> +   requires (_Extents.size() > 0)
> >> +   {
> >> + return {_M_dyn_exts + _S_dynamic_index[__begin],
> >> + _M_dyn_exts + _S_dynamic_index[__end]};
> >> +   }
> >> +
> >> private:
> >>  using _S_storage = __array_traits<_IndexType,
> >> _S_rank_dynamic>::_Type;
> >>  [[no_unique_address]] _S_storage _M_dyn_exts;
> >> @@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>  || _Extent <= numeric_limits<_IndexType>::max();
> >> }
> >>
> >> +  namespace __mdspan
> >> +  {
> >> +template
> >> +  constexpr span
> >> +  __static_extents(size_t __begin = 0, size_t __end =
> >> _Extents::rank())
> >> +  { return _Extents::_S_storage::_S_static_extents(__begin,
> __end); }
> >> +
> >> +template
> >> +  constexpr span
> >> +  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
> >> +   size_t __end = _Extents::rank())
> >> +  {
> >> +   return __exts._M_exts._M_dynamic_extents(__begin, __end);
> >> +  }
> >> +  }
> >> +
> >> template
> >>   class extents
> >>   {
> >> @@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>  : _M_exts(span(__exts))
> >>  { }
> >>
> >> -
> >> template<__mdspan::__valid_index_type _OIndexType,
> >> size_t _Nm>
> >>  requires (_Nm == rank() || _Nm == rank_dynamic())
> >>  constexpr explicit(_Nm != rank_dynamic())
> >> @@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>  }
> >>
> >>   private:
> >> +  friend span
> >> +  __mdspan::__static_extents(size_t, size_t);
> >> +
> >> +  friend span
> >> +  __mdspan::__dynamic_extents(const extents&, size_t,
> >> size_t);
> >> +
> >> using _S_storage = __mdspan::_ExtentsStorage<
> >>  _IndexType, array{_Extents...}>;
> >> [[no_unique_address]] _S_storage _M_exts;
> >> @@ -286,6 +321,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >> namespace __mdspan
> >> {
> >> +template
> >>
> > I have suggested in other e-mail, that we could pass auto const&,
> > and instantiatie this with reference to array that is NTTP to storage.
> >
> >> +  constexpr size_t
> >> +  __static_extents_prod(size_t __begin, size_t __end)
> >> +  {
> >> +   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
> >> +   size_t __ret = 1;
> >> +   for (auto __factor : __sta_exts)
> >> + if (__factor != dynamic_extent)
> >> +   __ret *= __factor;
> >> +   return __ret;
> >> +  }
> >> +
> >> +template
> >> +  constexpr size_t
> >> +  __dynamic_extents_prod(const _Extents& __exts, size_t __begin,
> >> +size_t __end)
> >> +  {
> >> +   auto __dyn_exts = __dynamic_extents<_Extents>(__exts, __begin,
> >> Template parameter is uncessary, it can be deduced.
> >> +__end);
> >> +   size_t __ret = 1;
> >> +   for (auto __factor : __dyn_exts)
> >> +   __ret *= __factor;
> >> +   return __ret;
> >> +  }
> >> +
> >> +template
> >> +  constexpr typename _Extents::index_type
> >

Re: [PATCH v3 6/9] libstdc++: Implement layout_right from mdspan.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:53 AM Luc Grosheintz 
wrote:

> Implement the parts of layout_left that depend on layout_right; and the
> parts of layout_right that don't depend on layout_stride.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (layout_right): New class.
>
> Signed-off-by: Luc Grosheintz 
>
This looks good to me.

> ---
>  libstdc++-v3/include/std/mdspan | 153 +++-
>  1 file changed, 152 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 66c9d2cffac..43676c3463c 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -393,6 +393,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>class mapping;
>};
>
> +  struct layout_right
> +  {
> +template
> +  class mapping;
> +  };
> +
>namespace __mdspan
>{
>  template
> @@ -492,7 +498,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   _Mapping>;
>
>  template
> -  concept __standardized_mapping = __mapping_of _Mapping>;
> +  concept __standardized_mapping = __mapping_of
> +  || __mapping_of _Mapping>;
>
>  template
>concept __mapping_like = requires
> @@ -542,6 +549,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : mapping(__other.extents(), __mdspan::__internal_ctor{})
> { }
>
> +  template
> +   requires (_Extents::rank() <= 1
> + && is_constructible_v<_Extents, _OExtents>)
> +   constexpr explicit(!is_convertible_v<_OExtents, _Extents>)
> +   mapping(const layout_right::mapping<_OExtents>& __other) noexcept
> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
> +   { }
> +
>constexpr mapping&
>operator=(const mapping&) noexcept = default;
>
> @@ -609,6 +624,142 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> [[no_unique_address]] _Extents _M_extents;
>  };
>
> +  namespace __mdspan
> +  {
> +template
> +  constexpr typename _Extents::index_type
> +  __linear_index_right(const _Extents& __exts, _Indices... __indices)
> +  {
> +   using _IndexType = typename _Extents::index_type;
> +   array<_IndexType, sizeof...(__indices)> __ind_arr{__indices...};
> +   _IndexType __res = 0;
> +   if constexpr (sizeof...(__indices) > 0)
> + {
> +   _IndexType __mult = 1;
> +   auto __update = [&, __pos = __exts.rank()](_IndexType) mutable
> + {
> +   --__pos;
> +   __res += __ind_arr[__pos] * __mult;
> +   __mult *= __exts.extent(__pos);
> + };
> +   (__update(__indices), ...);
>
Note necessary requesting change, just sharing as interesting tidbit.
If you want to experiment a bit, I think this could be implemented as a
fold.
The arguments of opertor= are evaluated right to left, so we could do:
struct _Dummy {};
   auto __update = [&, __pos = __exts.rank()](_IndexType __idx)
mutable
 {
   --__pos;
   __res += __idx * __mult;
   __mult *= __exts.extent(__pos);
  return _Dummy;
 };
// Assignments are evaluated right to left
(... = __update(__indicies));
See here: https://godbolt.org/z/jEdW31bs7

> + }
> +   return __res;
> +  }
> +  }
> +
> +  template
> +class layout_right::mapping
> +{
> +public:
> +  using extents_type = _Extents;
> +  using index_type = typename extents_type::index_type;
> +  using size_type = typename extents_type::size_type;
> +  using rank_type = typename extents_type::rank_type;
> +  using layout_type = layout_right;
> +
> +  static_assert(__mdspan::__representable_size<_Extents, index_type>,
> +   "The size of extents_type must be representable as index_type");
> +
> +  constexpr
> +  mapping() noexcept = default;
> +
> +  constexpr
> +  mapping(const mapping&) noexcept = default;
> +
> +  constexpr
> +  mapping(const _Extents& __extents) noexcept
> +  : _M_extents(__extents)
> +  {
> __glibcxx_assert(__mdspan::__is_representable_extents(_M_extents)); }
> +
> +  template
> +   requires (is_constructible_v)
> +   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
> +   mapping(const mapping<_OExtents>& __other) noexcept
> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
> +   { }
> +
> +  template
> +   requires (extents_type::rank() <= 1
> +   && is_constructible_v)
> +   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
> +   mapping(const layout_left::mapping<_OExtents>& __other) noexcept
> +   : mapping(__other.extents(), __mdspan::__internal_ctor{})
> +   { }
> +
> +  constexpr mapping&
> +  operator=(const mapping&) noexcept = default;
> +
> +  constexpr const _Extents&
> +  extents() const noexcept { return _M_extents; }
> +
> +  con

Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-22 Thread Robin Dapp
AFAICT the main difference to standard mode switching is that we (ab)use it 
to set the rounding mode to the value it had initially, either at function 
entry or after a call.  That's different to regular mode switching which 
assumes "static" rounding modes for different instructions.


Standard could e.g. be:
- insn1 demands frm1
- call1 demands frm4
- call2 demands frm5

Whereas we have:
- insn1 demands frm1
- call1 demands "frm at the start of the function"
- call2 demands "frm after call1 that could have called fesetround"
Weird, call2 can demand the frm as it existed after call1?!?  I'm going 
to try not to cry and return to my bubble :-)


Maybe demand was not really accurate.  It's rather that we want call2 to
operate with the global rounding mode that call1 might have changed.
So the demand is not a specific rounding mode but rather the global one.

And as we might have changed the rounding mode without having restored it since 
we need to jump through those hoops.  Not defending, just describing the status 
quo, I don't like it either ;) and yeah it's contrary to what we usually assume 
of liveness etc.


I'd much rather see only a local backup of the rounding mode in the 
mode-changing intrinsics.  Like


backup FRM
set specific FRM
insn with specific FRM
restore FRM

so we'd always be sure the rounding mode is back to "normal"/"unknown" after an 
intrinsic and wouldn't need to do anything for calls and exits.


Another argument would also be that we technically aren't allowed to change the 
rounding mode without -frounding-math (as seen in a PR a while ago) because 
passes might not do the right thing.  Therefore, increasing the region of 
non-default rounding mode, could lead to incorrect optimizations.


Anyway, I think Vineet's patches improve on what we have right now.  I'd still 
like to understand if this "abuse" of mode switching gets globally better 
results than a very simple approach like above.  If the other (non 
mode-switching) LCMs cannot really optimize FRM reads and writes the simple

approach could indeed be worse.

--
Regards
Robin



Re: [PATCH v3 7/9] libstdc++: Add tests for layout_right.

2025-05-22 Thread Tomasz Kaminski
On Wed, May 21, 2025 at 11:52 AM Luc Grosheintz 
wrote:

> Adds tests for layout_right and for the parts of layout_left that depend
> on layout_right.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
> tests for layout_stride.
> * testsuite/23_containers/mdspan/layouts/ctors.cc: Add tests for
> layout_right and the interaction with layout_left.
> * testsuite/23_containers/mdspan/layouts/mapping.cc: ditto.
>
> Signed-off-by: Luc Grosheintz 
>
This also looks nice. The generalization of the test in previous comments
have paid off
as you can add new layouts easily.

> ---
>  .../mdspan/layouts/class_mandate_neg.cc   |  1 +
>  .../23_containers/mdspan/layouts/ctors.cc | 64 +++
>  .../23_containers/mdspan/layouts/mapping.cc   | 78 ---
>  3 files changed, 133 insertions(+), 10 deletions(-)
>
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> index b276fbd333e..a41bad988d2 100644
> ---
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> +++
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
> @@ -18,5 +18,6 @@ template
>};
>
>  A a_left; // { dg-error "required
> from" }
> +A a_right;   // { dg-error "required
> from" }
>
>  // { dg-prune-output "must be representable as index_type" }
> diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> index c96f314818a..4a7d2bffeef 100644
> --- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
> @@ -222,6 +222,66 @@ namespace from_same_layout
>  }
>  }
>
> +// ctor: mapping(layout_{right,left}::mapping)
> +namespace from_left_or_right
> +{
> +  template +  typename OExtents>
> +constexpr void
> +verify_ctor(OExtents oexts)
> +{
> +  using SMapping = typename SLayout::mapping;
> +  using OMapping = typename OLayout::mapping;
> +
> +  constexpr bool expected = std::is_convertible_v;
> +  if constexpr (expected)
> +   verify_nothrow_convertible(OMapping(oexts));
> +  else
> +   verify_nothrow_constructible(OMapping(oexts));
> +}
> +
> +  template
> +constexpr bool
> +test_ctor()
> +{
> +  assert_not_constructible<
> +   typename SLayout::mapping>,
> +   typename OLayout::mapping>>();
> +
> +  verify_ctor>(
> +   std::extents{});
> +
> +  verify_ctor>(
> +   std::extents{});
> +
> +  assert_not_constructible<
> +   typename SLayout::mapping>,
> +   typename OLayout::mapping>>();
> +
> +  verify_ctor>(
> +   std::extents{});
> +
> +  verify_ctor>(
> +   std::extents{});
> +
> +  verify_ctor>(
> +   std::extents{});
> +
> +  assert_not_constructible<
> +   typename SLayout::mapping>,
> +   typename OLayout::mapping>>();
> +  return true;
> +}
> +
> +  template
> +constexpr void
> +test_all()
> +{
> +  test_ctor();
> +  static_assert(test_ctor());
> +}
> +}
> +
>  template
>constexpr void
>test_all()
> @@ -234,5 +294,9 @@ int
>  main()
>  {
>test_all();
> +  test_all();
> +
> +  from_left_or_right::test_all();
> +  from_left_or_right::test_all();
>return 0;
>  }
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
> index 60630dc37ca..c6bf04a5446 100644
> --- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
> @@ -294,6 +294,15 @@ template<>
>  VERIFY(m.stride(1) == 3);
>}
>
> +template<>
> +  constexpr void
> +  test_stride_2d()
> +  {
> +std::layout_right::mapping> m;
> +VERIFY(m.stride(0) == 5);
> +VERIFY(m.stride(1) == 1);
> +  }
> +
>  template
>constexpr void
>test_stride_3d();
> @@ -308,6 +317,16 @@ template<>
>  VERIFY(m.stride(2) == 3*5);
>}
>
> +template<>
> +  constexpr void
> +  test_stride_3d()
> +  {
> +std::layout_right::mapping m(std::dextents(3, 5, 7));
> +VERIFY(m.stride(0) == 35);
> +VERIFY(m.stride(1) == 7);
> +VERIFY(m.stride(2) == 1);
> +  }
> +
>  template
>constexpr bool
>test_stride_all()
> @@ -382,24 +401,59 @@ template
>  { m2 != m1 } -> std::same_as;
>};
>
> -template
> -  constexpr bool
> +template
> +  constexpr void
>test_has_op_eq()
>{
> +static_assert(has_op_eq<
> +   typename SLayout::mapping>,
> +   typename OLayout::mapping>> == Expected);
> +
> +static_assert(!has_op_eq<
> +   typename SLayout::mapping>,
> +   typename OLayout::mapping>>);
> +
> +static_assert(has_op_eq<
> +   t

[PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-22 Thread Yuta Mukai (Fujitsu)
Hello,

We would like to enable features for FUJITSU-MONAKA that were implemented in 
GCC after we added support for FUJITSU-MONAKA.
As the features were implemented in GCC15, we also want to backport it to GCC15.

Thanks to Andre Vieira for notifying us.

Bootstrapped/regtested on aarch64-unknown-linux-gnu.

We would be grateful if someone could push this on our behalf, as we do not 
have write access.

Thanks,
Yuta
--
Yuta Mukai
Fujitsu Limited



0001-aarch64-Enable-newly-implemented-features-for-FUJITS.patch
Description: 0001-aarch64-Enable-newly-implemented-features-for-FUJITS.patch


[PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-22 Thread H.J. Lu
Add preserve_none attribute which is similar to no_callee_saved_registers
attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
used for integer parameter passing.  This can be used in an interpreter
to avoid saving/restoring the registers in functions which processing
byte codes.  It improved the pystones benchmark by 6-7%:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15

Remove -mgeneral-regs-only restriction on no_caller_saved_registers
attribute.  Only SSE is allowed since SSE XMM register load preserves
the upper bits in YMM/ZMM register while YMM register load zeros the
upper 256 bits of ZMM register, and preserving 32 ZMM registers can
be quite expensive.

gcc/

PR target/119628
* config/i386/i386-expand.cc (ix86_expand_call): Call
ix86_type_no_callee_saved_registers_p instead of looking up
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Look up
preserve_none attribute.  Check preserve_none attribute for
interrupt attribute.  Don't check no_caller_saved_registers nor
no_callee_saved_registers conflicts here.
(ix86_set_func_type): Check no_callee_saved_registers before
checking no_caller_saved_registers attribute.
(ix86_set_current_function): Allow SSE with
no_caller_saved_registers attribute.
(ix86_handle_call_saved_registers_attribute): Check preserve_none,
no_callee_saved_registers and no_caller_saved_registers conflicts.
(ix86_gnu_attributes): Add preserve_none attribute.
* config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p):
New.
* config/i386/i386.cc
(x86_64_preserve_none_int_parameter_registers): New.
(ix86_using_red_zone): Don't use red-zone when there are no
caller-saved registers with SSE.
(ix86_type_no_callee_saved_registers_p): New.
(ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE
and call ix86_type_no_callee_saved_registers_p instead of looking
up no_callee_saved_registers attribute.
(ix86_comp_type_attributes): Call
ix86_type_no_callee_saved_registers_p instead of looking up
no_callee_saved_registers attribute.  Return 0 if preserve_none
attribute doesn't match in 64-bit mode.
(ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE,
use x86_64_preserve_none_int_parameter_registers.
(init_cumulative_args): Set preserve_none_abi.
(function_arg_64): Use x86_64_preserve_none_int_parameter_registers
with preserve_none attribute.
(setup_incoming_varargs_64): Use
x86_64_preserve_none_int_parameter_registers with preserve_none
attribute.
(ix86_save_reg): Treat TYPE_PRESERVE_NONE like
TYPE_NO_CALLEE_SAVED_REGISTERS.
(ix86_nsaved_sseregs): Allow saving XMM registers for
no_caller_saved_registers attribute.
(ix86_compute_frame_layout): Likewise.
(x86_this_parameter): Use
x86_64_preserve_none_int_parameter_registers with preserve_none
attribute.
* config/i386/i386.h (ix86_args): Add preserve_none_abi.
(call_saved_registers_type): Add TYPE_PRESERVE_NONE.
(machine_function): Change call_saved_registers to 3 bits.
* doc/extend.texi: Add preserve_none attribute.  Update
no_caller_saved_registers attribute to remove -mgeneral-regs-only
restriction.

gcc/testsuite/

PR target/119628
* gcc.target/i386/no-callee-saved-3.c: Adjust error location.
* gcc.target/i386/no-callee-saved-19a.c: New test.
* gcc.target/i386/no-callee-saved-19b.c: Likewise.
* gcc.target/i386/no-callee-saved-19c.c: Likewise.
* gcc.target/i386/no-callee-saved-19d.c: Likewise.
* gcc.target/i386/no-callee-saved-19e.c: Likewise.
* gcc.target/i386/preserve-none-1.c: Likewise.
* gcc.target/i386/preserve-none-2.c: Likewise.
* gcc.target/i386/preserve-none-3.c: Likewise.
* gcc.target/i386/preserve-none-4.c: Likewise.
* gcc.target/i386/preserve-none-5.c: Likewise.
* gcc.target/i386/preserve-none-6.c: Likewise.
* gcc.target/i386/preserve-none-7.c: Likewise.
* gcc.target/i386/preserve-none-8.c: Likewise.
* gcc.target/i386/preserve-none-9.c: Likewise.
* gcc.target/i386/preserve-none-10.c: Likewise.
* gcc.target/i386/preserve-none-11.c: Likewise.
* gcc.target/i386/preserve-none-12.c: Likewise.
* gcc.target/i386/preserve-none-13.c: Likewise.
* gcc.target/i386/preserve-none-14.c: Likewise.
* gcc.target/i386/preserve-none-15.c: Likewise.
* gcc.target/i386/preserve-none-16.c: Likewise.
* gcc.target/i386/preserve-none-17.c: Likewise.
* gcc.target/i386/preserve-none-19.c: Likewise.
* gcc.target/i386/preserve-none-19.c: Likewise.
* gcc.targe

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vor.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-05-22 Thread pan2 . li
From: Pan Li 

Add asm dump check test for vec_duplicate + vor.vv combine to vor.vx,
with the GR2VR cost is 0, 2 and 15.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add test cases
for vor vx combine case 0 on GR2VR cost.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for vor.vx run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx-1-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 392 ++
 .../rvv/autovec/vx_vf/vx_vor-run-1-i16.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-i32.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-i64.c  |  15 +
 .../riscv/rvv/autovec/vx_vf/vx_vor-run-1-i8.c |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-u16.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-u32.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-u64.c  |  15 +
 .../riscv/rvv/autovec/vx_vf/vx_vor-run-1-u8.c |  15 +
 33 files changed, 560 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.targ

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vor.vv to vor.vx on GR2VR cost

2025-05-22 Thread pan2 . li
From: Pan Li 

This patch would like to combine the vec_duplicate + vor.vv to the
vor.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)\
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = in[i] OP x;\
  }

  DEF_VX_BINARY(int32_t, |)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma
  13   │ vmv.v.x v2,a2
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vor.vv v1,v1,v2
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vor.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case for IOR op.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc  | 2 ++
 gcc/config/riscv/riscv.cc| 1 +
 gcc/config/riscv/vector-iterators.md | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e406e7a7f59..69bd4bf7f3a 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5512,6 +5512,7 @@ expand_vx_binary_vec_dup_vec (rtx op_0, rtx op_1, rtx 
op_2,
 {
 case PLUS:
 case AND:
+case IOR:
   icode = code_for_pred_scalar (code, mode);
   break;
 case MINUS:
@@ -5539,6 +5540,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx 
op_2,
 {
 case MINUS:
 case AND:
+case IOR:
   icode = code_for_pred_scalar (code, mode);
   break;
 default:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 03dcc347fb8..dafc969568d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3917,6 +3917,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  case PLUS:
  case MINUS:
  case AND:
+ case IOR:
{
  rtx op_0 = XEXP (x, 0);
  rtx op_1 = XEXP (x, 1);
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 026be6f65d3..a50b7fde9c6 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4042,7 +4042,7 @@ (define_code_iterator any_int_binop [plus minus and ior 
xor ashift ashiftrt lshi
 ])
 
 (define_code_iterator any_int_binop_no_shift_vx [
-  plus minus and
+  plus minus and ior
 ])
 
 (define_code_iterator any_int_unop [neg not])
-- 
2.43.0



[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vor.vv to vor.vx on GR2VR cost

2025-05-22 Thread pan2 . li
From: Pan Li 

This patch would like to introduce the combine of vec_dup + vor.vv into
vor.vx on the cost value of GR2VR.  The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test.  There will be two cases for the combine:

Case 0:
 |   ...
 |   vmv.v.x
 | L1:
 |   vor.vv
 |   J L1
 |   ...

Case 1:
 |   ...
 | L1:
 |   vmv.v.x
 |   vor.vv
 |   J L1
 |   ...

Both will be combined to below if the cost of GR2VR is zero.
 |   ...
 | L1:
 |   vor.vx
 |   J L1
 |   ...

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (3):
  RISC-V: Combine vec_duplicate + vor.vv to vor.vx on GR2VR cost
  RISC-V: Add test for vec_duplicate + vor.vv combine case 0 with GR2VR cost 0, 
2 and 15
  RISC-V: Add test for vec_duplicate + vor.vv combine case 1 with GR2VR cost 0, 
1 and 2

 gcc/config/riscv/riscv-v.cc   |   2 +
 gcc/config/riscv/riscv.cc |   1 +
 gcc/config/riscv/vector-iterators.md  |   2 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 392 ++
 .../rvv/autovec/vx_vf/vx_vor-run-1-i16.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-i32.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-i64.c  |  15 +
 .../riscv/rvv/autovec/vx_vf/vx_vor-run-1-i8.c |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-u16.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-u32.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vor-run-1-u64.c  |  15 +
 .../riscv/rvv/autovec/vx_vf/vx_vor-run-1-u8.c |  15 +
 60 files changed, 612 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vor-run-1-u8.c

-- 
2.43.0



[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vor.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-05-22 Thread pan2 . li
From: Pan Li 

Add asm dump check test for vec_duplicate + vor.vv combine to vor.vx,
with the GR2VR cost is 0, 1 and 2.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vor.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c  | 2 ++
 24 files changed, 48 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
index 62fd4e39c01..ffad2a27f92 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
@@ -9,8 +9,10 @@ DEF_VX_BINARY_CASE_1_WRAP(T, +, add, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1_WRAP(T, -, sub, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY_X16)
 DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X16)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
 /* { dg-final { scan-assembler {vrsub.vx} } } */
 /* { dg-final { scan-assembler {vand.vx} } } */
+/* { dg-final { scan-assembler {vor.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
index d047458b81d..275a11e9158 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
@@ -9,8 +9,10 @@ DEF_VX_BINARY_CASE_1_WRAP(T, +, add, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_CASE_1_WRAP(T, -, sub, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY_X4)
 DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X4)
+DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X4)
 
 /*

Re: [PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-22 Thread H.J. Lu
On Wed, May 14, 2025 at 2:12 PM Hongtao Liu  wrote:
>
> On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu  wrote:
> >
> > Add preserve_none attribute which is similar to no_callee_saved_registers
> > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
> Could you split preserve_none into a separate patch,
> It looks like it's different from clang's preserve_none[1], can we
> make them align?
> [1] https://clang.llvm.org/docs/AttributeReference.html
>
> > used for integer parameter passing.  This can be used in an interpreter
> > to avoid saving/restoring the registers in functions which processing
> > byte codes.  It improved the pystones benchmark by 6-7%:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15
> >
> > Remove -mgeneral-regs-only restriction on no_caller_saved_registers
> > attribute.  Only SSE is allowed since SSE XMM register load preserves
> > the upper bits in YMM/ZMM register while YMM register load zeros the
> > upper 256 bits of ZMM register, and preserving 32 ZMM registers can
> > be quite expensive.
> >
> > gcc/
> >
> > PR target/119628
> > * config/i386/i386-expand.cc (ix86_expand_call): Call
> > ix86_type_preserve_none_attribute_p instead of looking up
> > no_callee_saved_registers attribute.
> > * config/i386/i386-options.cc (ix86_set_func_type): Call
> > ix86_type_preserve_none_attribute_p instead of looking up
> > no_callee_saved_registers attribute.  Check preserve_none
> > attribute for interrupt attribute.  Don't check
> > no_caller_saved_registers and no_callee_saved_registers conflicts
> > here.
> > (ix86_set_current_function): Allow SSE with
> > no_caller_saved_registers attribute.
> > (ix86_handle_call_saved_registers_attribute): Check preserve_none,
> > no_callee_saved_registers and no_caller_saved_registers conflicts.
> > (ix86_gnu_attributes): Add preserve_none attribute.
> > * config/i386/i386-protos.h (ix86_type_preserve_none_attribute_p):
> > New.
> > * config/i386/i386.cc
> > (x86_64_preserve_none_int_parameter_registers): New.
> > (ix86_using_red_zone): Don't use red-zone when there are no
> > caller-saved registers with SSE.
> > (ix86_type_preserve_none_attribute_p): New.
> > (ix86_function_ok_for_sibcall): Call
> > ix86_type_preserve_none_attribute_p instead of looking up
> > no_callee_saved_registers attribute.
> > (ix86_comp_type_attributes): Call
> > ix86_type_preserve_none_attribute_p instead of looking up
> > no_callee_saved_registers attribute.  Return 0 if preserve_none
> > attribute doesn't match in 64-bit mode.
> > (ix86_function_arg_regno_p): If preserve_none calling convention
> > is used, use x86_64_preserve_none_int_parameter_registers.
> > (ix86_call_abi_override): Also set the preserve_none_abi field.
> > (init_cumulative_args): Likewise.
> > (function_arg_64): Use x86_64_preserve_none_int_parameter_registers
> > with preserve_none attribute.
> > (setup_incoming_varargs_64): Use
> > x86_64_preserve_none_int_parameter_registers with preserve_none
> > attribute.
> > (ix86_nsaved_sseregs): Allow saving XMM registers for
> > no_caller_saved_registers attribute.
> > (ix86_compute_frame_layout): Likewise.
> > (x86_this_parameter): Use
> > x86_64_preserve_none_int_parameter_registers with preserve_none
> > attribute.
> > * config/i386/i386.h (ix86_args): Add preserve_none_abi.
> > (call_saved_registers_type): Update comments for
> > TYPE_NO_CALLEE_SAVED_REGISTERS.
> > (machine_function): Add preserve_none_abi.
> > * doc/extend.texi: Add preserve_none attribute.  Update
> > no_caller_saved_registers attribute to remove -mgeneral-regs-only
> > restriction.
> >

...

> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 8507243d726..3b5c9520ddf 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -1682,6 +1682,8 @@ typedef struct ix86_args {
> >int stdarg;   /* Set to 1 if function is stdarg.  */
> >enum calling_abi call_abi;   /* Set to SYSV_ABI for sysv abi. Otherwise
> >MS_ABI for ms abi.  */
> > +  bool preserve_none_abi;  /* Set to true if the preserve_none ABI is
> > +  used.  */
> >tree decl;   /* Callee decl.  */
> >  } CUMULATIVE_ARGS;
> >
> > @@ -2782,7 +2784,7 @@ enum call_saved_registers_type
> >   or "no_caller_saved_registers" attribute.  */
> >TYPE_NO_CALLER_SAVED_REGISTERS,
> >/* The current function is a function specified with the
> > - "no_callee_saved_registers" attribute.  */
> > + "no_callee_saved_registers"/"preserve_none" attribu

Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-22 Thread Jeff Law




On 5/22/25 6:12 AM, Robin Dapp wrote:
AFAICT the main difference to standard mode switching is that we 
(ab)use it to set the rounding mode to the value it had initially, 
either at function entry or after a call.  That's different to 
regular mode switching which assumes "static" rounding modes for 
different instructions.


Standard could e.g. be:
- insn1 demands frm1
- call1 demands frm4
- call2 demands frm5

Whereas we have:
- insn1 demands frm1
- call1 demands "frm at the start of the function"
- call2 demands "frm after call1 that could have called fesetround"
Weird, call2 can demand the frm as it existed after call1?!?  I'm 
going to try not to cry and return to my bubble :-)


Maybe demand was not really accurate.  It's rather that we want call2 to
operate with the global rounding mode that call1 might have changed.
So the demand is not a specific rounding mode but rather the global one.
Right, but I don't think that changes how braindamaged I think this is 
aspect of the ABI is :-)  But at least I have a better sense of what 
we're up against here.






Anyway, I think Vineet's patches improve on what we have right now.  I'd 
still like to understand if this "abuse" of mode switching gets globally 
better results than a very simple approach like above.  If the other 
(non mode-switching) LCMs cannot really optimize FRM reads and writes 
the simple  approach could indeed be worse.
My worry with the standard LCM bits is they're likely going to punt when 
they see a hard register and I think global optimization of copies may 
not be great.  I know I went through a regression eons ago that I traced 
down to not having an RTL copy propagator which could handle partial 
redundancies well.


jeff


Re: [PATCH 0/3] Redirect to specific target based on TARGET_VERSION_COMPATIBLE

2025-05-22 Thread Yangyu Chen



> On 23 May 2025, at 04:02, Jeff Law  wrote:
> 
> 
> On 5/22/25 9:05 AM, Alfie Richards wrote:
>> Hi Jeff,
>> I sent this patch with my implementation a while ago:
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681043.html
>> There hasn't been any feedback on that patch yet.
>> These patches are still useful and I would like to go ahead with them. I am 
>> in favour of using my implementation as it is a bit stronger, but it also 
>> requires my larger FMV series to be approved first.
> Can you ping your larger FMV series?  I strongly suspect everyone is digging 
> out from everything that queued up while the trunk was in bugfixing stages.
> 
> Yangyu -- what are your thought here?  If we went with Alfie's patch, does it 
> solve the problems you're interested in, and what patches of yours would 
> still be relevant if we incorporated Alfie's work?
> 

I agree with Alfie's approach. We are addressing the same issue.
His patch is more structured and includes test cases.

His patch lacks a target hook for RISC-V, while mine does. However,
I think it's OK if we get his patch accepted, and I will write that
for RISC-V.

Thanks,
Yangyu Chen



Re: [PATCH RFC] diagnostics: use -Wformat-diag more consistently

2025-05-22 Thread David Malcolm
On Thu, 2025-05-22 at 17:01 -0400, Jason Merrill wrote:
> Tested x86_64-pc-linux-gnu, any objection?

LGTM

Thanks
Dave

> 
> -- 8< --
> 
> r10-1211 added various -Wformat-diag warnings about quoting in GCC
> diagnostic strings, but didn't change these two quoting warnings to
> use that
> flag as well.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-format.cc (flag_chars_t::validate): Control quoting
> warnings
>   with -Wformat-diag.
> ---
>  gcc/c-family/c-format.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
> index 211d20dd25b..a44249a0222 100644
> --- a/gcc/c-family/c-format.cc
> +++ b/gcc/c-family/c-format.cc
> @@ -2124,7 +2124,7 @@ flag_chars_t::validate (const format_kind_info
> *fki,
>   {
>     format_warning_at_char (format_string_loc,
> format_string_cst,
>     format_chars - orig_format_chars -
> 1,
> -   OPT_Wformat_,
> +   OPT_Wformat_diag,
>     "%s used within a quoted
> sequence",
>     _(s->name));
>   }
> @@ -2137,7 +2137,7 @@ flag_chars_t::validate (const format_kind_info
> *fki,
>  {
>    format_warning_at_char (format_string_loc, format_string_cst,
>     format_chars - orig_format_chars,
> -   OPT_Wformat_,
> +   OPT_Wformat_diag,
>     "%qc conversion used unquoted",
>     format_char);
>  }
> 
> base-commit: f5016d8492e4067faef2f9403370a4b49f7a3898



[PATCH] [MAINTAINERS] Add myself to write after approval and DCO.

2025-05-22 Thread dhruvc
From: Dhruv Chawla 

Committed as 3213828f74f2f27a2dd91792cef27117ba1a522e.

ChangeLog:

* MAINTAINERS: Add myself to write after approval and DCO.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8993d176c22..f40d6350462 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -402,6 +402,7 @@ Stephane Carrez ciceron 

 Gabriel Charettegchare  
 Arnaud Charlet  charlet 
 Chandra Chavva  -   
+Dhruv Chawladhruvc  
 Dehao Chen  dehao   
 Fabien Chênefabien  
 Bin Cheng   amker   
@@ -932,6 +933,7 @@ information.
 
 
 Soumya AR   
+Dhruv Chawla
 Juergen Christ  
 Giuseppe D'Angelo   
 Robin Dapp  
-- 
2.39.3 (Apple Git-146)



Re: RISC-V TLS Descriptors in GCC

2025-05-22 Thread Xi Ruoyao
On Thu, 2025-05-22 at 23:55 +0800, Dongsheng Song wrote:
> Hi Kito,
> 
> You mentioned that GCC 14 added TLSDESC support for RISC-V and that it
> requires glibc 2.40 [1].

> However, when I looked for relevant information, I found that
> LoongArch and RISC-V both published TLSDESC patches for review at the
> last year [2], but only the LoongArch patch was merged into glibc 2.40
> [3], and I didn't see the RISC V patch merged even in the latest glibc
> development branch.
> 
> Is the information I found accurate? What is the current status of
> glibc support for RISC-V TLSDESC?

I don't think it's accurate.  The RISC-V TLSDESC support is just not
merged into Glibc yet.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University