Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 03:48, Vineet Gupta  wrote:
>
>
>
> On 11/13/22 15:05, Christoph Muellner wrote:
> >
> > +static bool
> > +riscv_overlap_op_by_pieces (void)
> > +{
> > +  return tune_param->overlap_op_by_pieces;
>
> Does this not need to be gated on unaligned access enabled as well.

I assume you mean "&& !STRICT_ALIGNMENT"?

Philipp.


Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 8:31 AM Richard Biener 
wrote:

> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch adds a new pass that looks up function pointer assignments,
> > and adds guarded direct calls to the call sites of the function
> > pointers.
> >
> > E.g.: Lets assume an assignment to a function pointer as follows:
> > b->cb = &myfun;
> >   Other part of the program can use the function pointer as follows:
> > b->cb ();
> >   With this pass the invocation will be transformed to:
> > if (b->cb == myfun)
> >   myfun();
> > else
> >b->cb ()
> >
> > The impact of the dynamic guard is expected to be less than the speedup
> > gained by enabled optimizations (e.g. inlining or constant propagation).
>
> We have speculative devirtualization doing this very transform, shouldn't
> you
> instead improve that instead of inventing another specialized pass?
>

Yes, it can be integrated into ipa-devirt.

The reason we initially decided to move it into its own file was that C++
devirtualization
and function pointer dereferencing/devirtualization will likely not use the
same analysis.
E.g. ODR only applies to C++, C++ tables are not directly exposed to the
user.
So we figured that different things should not be merged together, but a
reuse
of common code to avoid duplication is mandatory.

The patch uses the same API like speculative devirtualization in the
propagation
phase (ipa_make_edge_direct_to_target) and does not do anything in the
transformation phase. So there is no duplication of functionality.

I will move the code into ipa-devirt.

Thanks!



>
> Thanks,
> Richard.
>
> > PR ipa/107666
> > gcc/ChangeLog:
> >
> > * Makefile.in: Add new pass.
> > * common.opt: Add flag -fipa-guarded-deref.
> > * lto-section-in.cc: Add new section "ipa_guarded_deref".
> > * lto-streamer.h (enum lto_section_type): Add new section.
> > * passes.def: Add new pass.
> > * timevar.def (TV_IPA_GUARDED_DEREF): Add time var.
> > * tree-pass.h (make_pass_ipa_guarded_deref): New prototype.
> > * ipa-guarded-deref.cc: New file.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/Makefile.in  |1 +
> >  gcc/common.opt   |4 +
> >  gcc/ipa-guarded-deref.cc | 1115 ++
> >  gcc/lto-section-in.cc|1 +
> >  gcc/lto-streamer.h   |1 +
> >  gcc/passes.def   |1 +
> >  gcc/timevar.def  |1 +
> >  gcc/tree-pass.h  |1 +
> >  8 files changed, 1125 insertions(+)
> >  create mode 100644 gcc/ipa-guarded-deref.cc
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index f672e6ea549..402c4a6ea3f 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1462,6 +1462,7 @@ OBJS = \
> > ipa-sra.o \
> > ipa-devirt.o \
> > ipa-fnsummary.o \
> > +   ipa-guarded-deref.o \
> > ipa-polymorphic-call.o \
> > ipa-split.o \
> > ipa-inline.o \
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index bce3e514f65..8344940ae5b 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1933,6 +1933,10 @@ fipa-bit-cp
> >  Common Var(flag_ipa_bit_cp) Optimization
> >  Perform interprocedural bitwise constant propagation.
> >
> > +fipa-guarded-deref
> > +Common Var(flag_ipa_guarded_deref) Optimization
> > +Perform guarded function pointer derferencing.
> > +
> >  fipa-modref
> >  Common Var(flag_ipa_modref) Optimization
> >  Perform interprocedural modref analysis.
> > diff --git a/gcc/ipa-guarded-deref.cc b/gcc/ipa-guarded-deref.cc
> > new file mode 100644
> > index 000..198fb9b33ad
> > --- /dev/null
> > +++ b/gcc/ipa-guarded-deref.cc
> > @@ -0,0 +1,1115 @@
> > +/* IPA pass to transform indirect calls to guarded direct calls.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   Contributed by Christoph Muellner (Vrull GmbH)
> > +   Based on work by Erick Ochoa (Vrull GmbH)
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +.  */
> > +
> > +/* Indirect calls are used to separate callees from their call sites.
> > +   This helps to implement proper abstraction layers, but prevents
> > +   optimizations like constant-propagation or function specialization

Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-14 Thread Martin Liška
On 11/14/22 08:32, Arnaud Charlet wrote:
>> Sorry for the breakage. However, I contacted you (and your colleague) 
>> and haven't received
>> any feedback for a couple of weeks.
>
> Right although I did give you feedback that what you sent wasn’t in a 
> suitable form for review wrt Ada.

 Sure, but sending a patch set to gcc-patches wouldn't have worked either, 
 we've got quite a strict
 email size limit.
> 
> Note that the Ada part should have been quite limited in size given that the
> doc was already in .rst format, which is why I was expecting a smaller patch
> to review on the Ada side.

Yes, I made basically folder shuffling of the Ada files, but the .rst files 
itself were
mainly untouched. Plus, these was ambition having baseconf.py which would 
set-up common
settings for all Sphinx manuals.

> 
 Anyway, hope the AdaCore build would be fixable with a reasonable amount 
 of effort?
>>>
>>> Unclear yet. We'll probably need to change and possibly partially revert the
>>> Ada changes, we'll see.
>>
>> Hello.
>>
>> Note the Sphinx changes will be reverted today:
>> https://gcc.gnu.org/pipermail/gcc/2022-November/239983.html
>>
>> Sorry for your extra work.
> 
> Understood, thanks for your efforts and sorry you had to revert.

Thank you.

> 
> Clearly a change which requires a bleeding edge version of sphinx cannot be
> pushed at this stage, that's premature.

Yes, depending on bleeding edge version was of one the problem.

Cheers,
Martin

> 
> Cheers,
> 
> Arno



Re: [PATCH] libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

2022-11-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2022-11-14 at 08:55 +0100, Uros Bizjak via Gcc-patches wrote:
> On Mon, Nov 14, 2022 at 8:48 AM Jakub Jelinek 
> wrote:
> > 
> > Hi!
> > 
> > Working virtually out of Baker Island.
> > 
> > We got a response from AMD in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
> > so the following patch starts treating AMD with AVX and CMPXCHG16B
> > ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
> > 
> > Ok for trunk if it passes bootstrap/regtest?
> > 
> > 2022-11-13  Jakub Jelinek  
> > 
> >     PR target/104688
> >     * config/x86/init.c (__libat_feat1_init): Revert 2022-03-17
> > change
> >     - on x86_64 no longer clear bit_AVX if CPU vendor is not
> > Intel.
> > 
> > --- libatomic/config/x86/init.c.jj  2022-03-17
> > 18:48:56.708723194 +0100
> > +++ libatomic/config/x86/init.c 2022-11-13 18:23:26.315440071 -1200
> > @@ -34,18 +34,6 @@ __libat_feat1_init (void)
> >    unsigned int eax, ebx, ecx, edx;
> >    FEAT1_REGISTER = 0;
> >    __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> > -#ifdef __x86_64__
> > -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > -  == (bit_AVX | bit_CMPXCHG16B))
> > -    {
> > -  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte
> > aligned address
> > -    is atomic, but so far we don't have this guarantee from
> > AMD.  */
> > -  unsigned int ecx2 = 0;
> > -  __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> > -  if (ecx2 != signature_INTEL_ecx)
> > -   FEAT1_REGISTER &= ~bit_AVX;
> 
> We still need this, but also bypass it for AMD signature. There are
> other vendors than Intel and AMD.

Mayshao: how about the status of this feature on Zhaoxin product lines?
IIRC they support AVX (but disabled by default in GCC for Lujiazui), but
we don't know if they make the guarantee about atomicity of 16B aligned
access.

> 
> OK with the above addition.
> 
> Thanks,
> Uros.
> 
> > -    }
> > -#endif
> >    /* See the load in load_feat1.  */
> >    __atomic_store_n (&__libat_feat1, FEAT1_REGISTER,
> > __ATOMIC_RELAXED);
> >    return FEAT1_REGISTER;
> > 
> >     Jakub
> > 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 8:59 AM Philipp Tomsich 
wrote:

> On Mon, 14 Nov 2022 at 03:48, Vineet Gupta  wrote:
> >
> >
> >
> > On 11/13/22 15:05, Christoph Muellner wrote:
> > >
> > > +static bool
> > > +riscv_overlap_op_by_pieces (void)
> > > +{
> > > +  return tune_param->overlap_op_by_pieces;
> >
> > Does this not need to be gated on unaligned access enabled as well.
>
> I assume you mean "&& !STRICT_ALIGNMENT"?
>

I think the case that slow_unaligned_access and overlap_op_by_pieces will
both be set will not occur (we can defer the discussion about that until
then).
Gating overlap_op_by_pieces with !TARGET_STRICT_ALIGN is a good idea.
It will be fixed for a v2.

Thanks,
Christoph



>
> Philipp.
>


Re: [PATCH] libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

2022-11-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 14, 2022 at 04:19:48PM +0800, Xi Ruoyao wrote:
> > > --- libatomic/config/x86/init.c.jj  2022-03-17
> > > 18:48:56.708723194 +0100
> > > +++ libatomic/config/x86/init.c 2022-11-13 18:23:26.315440071 -1200
> > > @@ -34,18 +34,6 @@ __libat_feat1_init (void)
> > >    unsigned int eax, ebx, ecx, edx;
> > >    FEAT1_REGISTER = 0;
> > >    __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> > > -#ifdef __x86_64__
> > > -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > > -  == (bit_AVX | bit_CMPXCHG16B))
> > > -    {
> > > -  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte
> > > aligned address
> > > -    is atomic, but so far we don't have this guarantee from
> > > AMD.  */
> > > -  unsigned int ecx2 = 0;
> > > -  __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> > > -  if (ecx2 != signature_INTEL_ecx)
> > > -   FEAT1_REGISTER &= ~bit_AVX;
> > 
> > We still need this, but also bypass it for AMD signature. There are
> > other vendors than Intel and AMD.
> 
> Mayshao: how about the status of this feature on Zhaoxin product lines?
> IIRC they support AVX (but disabled by default in GCC for Lujiazui), but
> we don't know if they make the guarantee about atomicity of 16B aligned
> access.

I did the change on the assumption that only Intel and AMD implement AVX.
Looking around, I'm afraid Zhaoxin Zhangjiang/Wudaokou/Lujiazui
and VIA Eden C and VIA Nano C CPUs do support AVX too, the question is
if they implement CMPXCHG16B too.
>From what is in i386-common.cc, none of non-Intel CPUs in there have
PTA_AVX and only Lujiazui has CX16.  But that doesn't need to match what
the HW actually does and one can just compile with -mcx16 -mavx -m64
rather than using some -march=whatever.

Sure, can change the check so that it checks for AMD too for now and
therefore discard the sync.md patch, the question is whom do we talk at
Zhaoxin and VIA and if there are any further other CX16+AVX CPUs

Jakub



Re: Revert Sphinx documentation [Was: Issues with Sphinx]

2022-11-14 Thread Martin Liška
On 11/14/22 03:49, Martin Liška wrote:
> I'm going to revert the patchset during today (Monday) and I'll send a patch 
> with a couple
> of new changes that landed in the period of time we used Sphinx.

The revert is done and I included ce51e8439a491910348a1c5aea43b55f000ba8ac 
commit
that ports all the new documentation bits to Texinfo.

Web pages content will be updated with Jakub in the afternoon.

Martin


Re: [PATCH] libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

2022-11-14 Thread Xi Ruoyao via Gcc-patches
On Mon, 2022-11-14 at 09:34 +0100, Jakub Jelinek wrote:

> > Mayshao: how about the status of this feature on Zhaoxin product lines?
> > IIRC they support AVX (but disabled by default in GCC for Lujiazui), but
> > we don't know if they make the guarantee about atomicity of 16B aligned
> > access.
> 
> I did the change on the assumption that only Intel and AMD implement AVX.
> Looking around, I'm afraid Zhaoxin Zhangjiang/Wudaokou/Lujiazui
> and VIA Eden C and VIA Nano C CPUs do support AVX too, the question is
> if they implement CMPXCHG16B too.

According to r13-713, at least Lujiazui has CX16.

> From what is in i386-common.cc, none of non-Intel CPUs in there have
> PTA_AVX and only Lujiazui has CX16.  But that doesn't need to match what
> the HW actually does and one can just compile with -mcx16 -mavx -m64
> rather than using some -march=whatever.
> 
> Sure, can change the check so that it checks for AMD too for now and
> therefore discard the sync.md patch, the question is whom do we talk at
> Zhaoxin and VIA and if there are any further other CX16+AVX CPUs
> 
> Jakub
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 9:13 AM Christoph Müllner
 wrote:
>
>
>
> On Mon, Nov 14, 2022 at 8:31 AM Richard Biener  
> wrote:
>>
>> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
>>  wrote:
>> >
>> > From: Christoph Müllner 
>> >
>> > This patch adds a new pass that looks up function pointer assignments,
>> > and adds guarded direct calls to the call sites of the function
>> > pointers.
>> >
>> > E.g.: Lets assume an assignment to a function pointer as follows:
>> > b->cb = &myfun;
>> >   Other part of the program can use the function pointer as follows:
>> > b->cb ();
>> >   With this pass the invocation will be transformed to:
>> > if (b->cb == myfun)
>> >   myfun();
>> > else
>> >b->cb ()
>> >
>> > The impact of the dynamic guard is expected to be less than the speedup
>> > gained by enabled optimizations (e.g. inlining or constant propagation).
>>
>> We have speculative devirtualization doing this very transform, shouldn't you
>> instead improve that instead of inventing another specialized pass?
>
>
> Yes, it can be integrated into ipa-devirt.
>
> The reason we initially decided to move it into its own file was that C++ 
> devirtualization
> and function pointer dereferencing/devirtualization will likely not use the 
> same analysis.
> E.g. ODR only applies to C++, C++ tables are not directly exposed to the user.
> So we figured that different things should not be merged together, but a reuse
> of common code to avoid duplication is mandatory.

Btw, in other context the idea came up to build candidates based on available
API/ABI (that can be indirectly called).  That would help for example the
get_ref calls in refine_subpel in the x264 benchmark.  Maybe what you
do is actually
the very same thing (but look for explicit address-taking) - I didn't
look into whether
you prune the list of candidates based on API/ABI.

> The patch uses the same API like speculative devirtualization in the 
> propagation
> phase (ipa_make_edge_direct_to_target) and does not do anything in the
> transformation phase. So there is no duplication of functionality.
>
> I will move the code into ipa-devirt.
>
> Thanks!
>
>
>>
>>
>> Thanks,
>> Richard.
>>
>> > PR ipa/107666
>> > gcc/ChangeLog:
>> >
>> > * Makefile.in: Add new pass.
>> > * common.opt: Add flag -fipa-guarded-deref.
>> > * lto-section-in.cc: Add new section "ipa_guarded_deref".
>> > * lto-streamer.h (enum lto_section_type): Add new section.
>> > * passes.def: Add new pass.
>> > * timevar.def (TV_IPA_GUARDED_DEREF): Add time var.
>> > * tree-pass.h (make_pass_ipa_guarded_deref): New prototype.
>> > * ipa-guarded-deref.cc: New file.
>> >
>> > Signed-off-by: Christoph Müllner 
>> > ---
>> >  gcc/Makefile.in  |1 +
>> >  gcc/common.opt   |4 +
>> >  gcc/ipa-guarded-deref.cc | 1115 ++
>> >  gcc/lto-section-in.cc|1 +
>> >  gcc/lto-streamer.h   |1 +
>> >  gcc/passes.def   |1 +
>> >  gcc/timevar.def  |1 +
>> >  gcc/tree-pass.h  |1 +
>> >  8 files changed, 1125 insertions(+)
>> >  create mode 100644 gcc/ipa-guarded-deref.cc
>> >
>> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> > index f672e6ea549..402c4a6ea3f 100644
>> > --- a/gcc/Makefile.in
>> > +++ b/gcc/Makefile.in
>> > @@ -1462,6 +1462,7 @@ OBJS = \
>> > ipa-sra.o \
>> > ipa-devirt.o \
>> > ipa-fnsummary.o \
>> > +   ipa-guarded-deref.o \
>> > ipa-polymorphic-call.o \
>> > ipa-split.o \
>> > ipa-inline.o \
>> > diff --git a/gcc/common.opt b/gcc/common.opt
>> > index bce3e514f65..8344940ae5b 100644
>> > --- a/gcc/common.opt
>> > +++ b/gcc/common.opt
>> > @@ -1933,6 +1933,10 @@ fipa-bit-cp
>> >  Common Var(flag_ipa_bit_cp) Optimization
>> >  Perform interprocedural bitwise constant propagation.
>> >
>> > +fipa-guarded-deref
>> > +Common Var(flag_ipa_guarded_deref) Optimization
>> > +Perform guarded function pointer derferencing.
>> > +
>> >  fipa-modref
>> >  Common Var(flag_ipa_modref) Optimization
>> >  Perform interprocedural modref analysis.
>> > diff --git a/gcc/ipa-guarded-deref.cc b/gcc/ipa-guarded-deref.cc
>> > new file mode 100644
>> > index 000..198fb9b33ad
>> > --- /dev/null
>> > +++ b/gcc/ipa-guarded-deref.cc
>> > @@ -0,0 +1,1115 @@
>> > +/* IPA pass to transform indirect calls to guarded direct calls.
>> > +   Copyright (C) 2022 Free Software Foundation, Inc.
>> > +   Contributed by Christoph Muellner (Vrull GmbH)
>> > +   Based on work by Erick Ochoa (Vrull GmbH)
>> > +
>> > +This file is part of GCC.
>> > +
>> > +GCC is free software; you can redistribute it and/or modify it under
>> > +the terms of the GNU General Public License as published by the Free
>> > +Software Foundation; either version 3, or (at your option) any later
>> > +version.
>> > +
>> > +GCC is distributed in the hope that it will be useful, bu

Re: [PATCH] i386: Emit 16b atomics inline with -m64 -mcx16 -mavx [PR104688]

2022-11-14 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 14, 2022 at 3:57 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Mon, Nov 14, 2022 at 8:52 AM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > Working virtually out of Baker Island.
> >
> > Given
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
> > the following patch implements atomic load/store (and therefore also
> > enabling compare and exchange) for -m64 -mcx16 -mavx.
> >
> > Ok for trunk if it passes bootstrap/regtest?
>
> We only have guarantee from Intel and AMD, there can be other vendors.
Can we make it a as a micro-architecture tuning?
>
> Uros.
>
> >
> > 2022-11-13  Jakub Jelinek  
> >
> > PR target/104688
> > * config/i386/sync.md (atomic_loadti, atomic_storeti): New
> > define_expand patterns.
> > (atomic_loadti_1, atomic_storeti_1): New define_insn patterns.
> >
> > * gcc.target/i386/pr104688-1.c: New test.
> > * gcc.target/i386/pr104688-2.c: New test.
> > * gcc.target/i386/pr104688-3.c: New test.
> >
> > --- gcc/config/i386/sync.md.jj  2022-11-07 20:54:37.259400942 -1200
> > +++ gcc/config/i386/sync.md 2022-11-13 19:27:22.977987355 -1200
> > @@ -225,6 +225,31 @@ (define_insn_and_split "atomic_loaddi_fp
> >DONE;
> >  })
> >
> > +;; Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> > +;; is atomic.  AMD will give a similar guarantee.
> > +(define_expand "atomic_loadti"
> > +  [(set (match_operand:TI 0 "register_operand" "=x,Yv")
> > +   (unspec:TI [(match_operand:TI 1 "memory_operand" "m,m")
> > +   (match_operand:SI 2 "const_int_operand")]
> > +  UNSPEC_LDA))]
> > +  "TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX"
> > +{
> > +  emit_insn (gen_atomic_loadti_1 (operands[0], operands[1]));
> > +  DONE;
> > +})
> > +
> > +(define_insn "atomic_loadti_1"
> > +  [(set (match_operand:TI 0 "register_operand" "=x,Yv")
> > +   (unspec:TI [(match_operand:TI 1 "memory_operand" "m,m")]
> > +  UNSPEC_LDA))]
> > +  "TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX"
> > +  "@
> > +   vmovdqa\t{%1, %0|%0, %1}
> > +   vmovdqa64\t{%1, %0|%0, %1}"
> > +  [(set_attr "type" "ssemov")
> > +   (set_attr "prefix" "vex,evex")
> > +   (set_attr "mode" "TI")])
> > +
> >  (define_expand "atomic_store"
> >[(set (match_operand:ATOMIC 0 "memory_operand")
> > (unspec:ATOMIC [(match_operand:ATOMIC 1 "nonimmediate_operand")
> > @@ -276,6 +301,36 @@ (define_insn "atomic_store_1"
> >""
> >"%K2mov{}\t{%1, %0|%0, %1}")
> >
> > +(define_expand "atomic_storeti"
> > +  [(set (match_operand:TI 0 "memory_operand")
> > +   (unspec:TI [(match_operand:TI 1 "register_operand")
> > +   (match_operand:SI 2 "const_int_operand")]
> > +  UNSPEC_STA))]
> > +  "TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX"
> > +{
> > +  enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
> > +
> > +  emit_insn (gen_atomic_storeti_1 (operands[0], operands[1], operands[2]));
> > +
> > +  /* ... followed by an MFENCE, if required.  */
> > +  if (is_mm_seq_cst (model))
> > +emit_insn (gen_mem_thread_fence (operands[2]));
> > +  DONE;
> > +})
> > +
> > +(define_insn "atomic_storeti_1"
> > +  [(set (match_operand:TI 0 "memory_operand" "=m,m")
> > +   (unspec:TI [(match_operand:TI 1 "register_operand" "x,Yv")
> > +(match_operand:SI 2 "const_int_operand")]
> > +   UNSPEC_STA))]
> > +  ""
> > +  "@
> > +   %K2vmovdqa\t{%1, %0|%0, %1}
> > +   %K2vmovdqa64\t{%1, %0|%0, %1}"
> > +  [(set_attr "type" "ssemov")
> > +   (set_attr "prefix" "vex,evex")
> > +   (set_attr "mode" "TI")])
> > +
> >  (define_insn_and_split "atomic_storedi_fpu"
> >[(set (match_operand:DI 0 "memory_operand" "=m,m,m")
> > (unspec:DI [(match_operand:DI 1 "nonimmediate_operand" "x,m,?r")]
> > --- gcc/testsuite/gcc.target/i386/pr104688-1.c.jj   2022-11-13 
> > 19:36:43.251332612 -1200
> > +++ gcc/testsuite/gcc.target/i386/pr104688-1.c  2022-11-13 
> > 19:40:22.649334650 -1200
> > @@ -0,0 +1,34 @@
> > +/* PR target/104688 */
> > +/* { dg-do compile { target int128 } } */
> > +/* { dg-options "-O2 -mno-cx16" } */
> > +/* { dg-final { scan-assembler "\t__sync_val_compare_and_swap_16" } } */
> > +/* { dg-final { scan-assembler "\t__atomic_load_16" } } */
> > +/* { dg-final { scan-assembler "\t__atomic_store_16" } } */
> > +/* { dg-final { scan-assembler "\t__atomic_compare_exchange_16" } } */
> > +
> > +__int128 v;
> > +
> > +__int128
> > +f1 (void)
> > +{
> > +  return __sync_val_compare_and_swap (&v, 42, 0);
> > +}
> > +
> > +__int128
> > +f2 (void)
> > +{
> > +  return __atomic_load_n (&v, __ATOMIC_SEQ_CST);
> > +}
> > +
> > +void
> > +f3 (__int128 x)
> > +{
> > +  __atomic_store_n (&v, 42, __ATOMIC_SEQ_CST);
> > +}
> > +
> > +__int128
> > +f4 (void)
> > +{
> > +  __int128 y = 42;
> > +  __atomic_compare_exchange_n (&v, &y, 0, 0, __ATOMIC_SEQ_CST, 
> > __ATOMIC_SEQ_CST);
> > +}
> > --- gcc/testsuite/gcc.target/i386/pr104688-2.

Re: [PATCH 1/3] libstdc++: Implement ranges::contains/contains_subrange from P2302R4

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 04:51, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__contains_fn, contains): Define.
> (__contains_subrange_fn, contains_subrange): Define.
> * testsuite/25_algorithms/contains/1.cc: New test.
> * testsuite/25_algorithms/contains_subrange/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 54 +++
>  .../testsuite/25_algorithms/contains/1.cc | 33 
>  .../25_algorithms/contains_subrange/1.cc  | 35 
>  3 files changed, 122 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/contains/1.cc
>  create mode 100644 
> libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index de71bd07a2f..da0ca981dc3 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -3464,6 +3464,60 @@ namespace ranges
>
>inline constexpr __prev_permutation_fn prev_permutation{};
>
> +#if __cplusplus > 202002L
> +  struct __contains_fn
> +  {
> +template _Sent,
> +   typename _Tp, typename _Proj = identity>
> +  requires indirect_binary_predicate +projected<_Iter, _Proj>, const _Tp*>
> +  constexpr bool
> +  operator()(_Iter __first, _Sent __last, const _Tp& __value, _Proj 
> __proj = {}) const
> +  { return ranges::find(std::move(__first), __last, __value, __proj) != 
> __last; }

Should this use std::move(__proj)?



> +
> +template
> +  requires indirect_binary_predicate +projected, 
> _Proj>, const _Tp*>
> +  constexpr bool
> +  operator()(_Range&& __r, const _Tp& __value, _Proj __proj = {}) const
> +  { return (*this)(ranges::begin(__r), ranges::end(__r), __value, 
> std::move(__proj)); }
> +  };
> +
> +  inline constexpr __contains_fn contains{};
> +
> +  struct __contains_subrange_fn
> +  {
> +template _Sent1,
> +forward_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +typename _Pred = ranges::equal_to,
> +typename Proj1 = identity, typename Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, Proj1, Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1, _Iter2 __first2, _Sent2 
> __last2,
> +_Pred __pred = {}, Proj1 __proj1 = {}, Proj2 __proj2 = {}) 
> const
> +  {
> +   return __first2 == __last2
> + || !ranges::search(__first1, __last1, __first2, __last2,
> +std::move(__pred), std::move(__proj1), 
> std::move(__proj2)).empty();
> +  }
> +
> +template +typename _Pred = ranges::equal_to,
> +typename _Proj1 = identity, typename _Proj2 = identity>
> +  requires indirectly_comparable, 
> iterator_t<_Range2>,
> +_Pred, _Proj1, _Proj2>
> +  constexpr bool
> +  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
> +_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
> +  {
> +   return (*this)(ranges::begin(__r1), ranges::end(__r1),
> +  ranges::begin(__r2), ranges::end(__r2),
> +  std::move(__pred), std::move(__proj1), 
> std::move(__proj2));
> +  }
> +  };
> +
> +  inline constexpr __contains_subrange_fn contains_subrange{};
> +#endif // C++23
>  } // namespace ranges
>
>  #define __cpp_lib_shift 201806L
> diff --git a/libstdc++-v3/testsuite/25_algorithms/contains/1.cc 
> b/libstdc++-v3/testsuite/25_algorithms/contains/1.cc
> new file mode 100644
> index 000..146ab593b70
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/25_algorithms/contains/1.cc
> @@ -0,0 +1,33 @@
> +// { dg-options "-std=gnu++23" }
> +// { dg-do run { target c++23 } }
> +
> +#include 
> +#include 
> +#include 
> +
> +namespace ranges = std::ranges;
> +
> +void
> +test01()
> +{
> +  int x[] = {1,2,3};
> +  using to_input = __gnu_test::test_input_range;
> +  VERIFY( ranges::contains(to_input(x), 1) );
> +  VERIFY( ranges::contains(to_input(x), 2) );
> +  VERIFY( ranges::contains(to_input(x), 3) );
> +  VERIFY( !ranges::contains(to_input(x), 4) );
> +  VERIFY( !ranges::contains(x, x+2, 3) );
> +  auto neg = [](int n) { return -n; };
> +  VERIFY( ranges::contains(to_input(x), -1, neg) );
> +  VERIFY( ranges::contains(to_input(x), -2, neg) );
> +  VERIFY( ranges::contains(to_input(x), -3, neg) );
> +  VERIFY( !ranges::contains(to_input(x), -4, neg) );
> +
> +  VERIFY( !ranges::contains(x, x+2, -3, neg) );
> +}
> +
> +int
> +main()
> +{
> +  test01();
> +}
> diff --git a/libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc 
> b/libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc
> new file mode 100644
>

Re: [PATCH] i386: Emit 16b atomics inline with -m64 -mcx16 -mavx [PR104688]

2022-11-14 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 14, 2022 at 5:04 PM Hongtao Liu  wrote:
>
> On Mon, Nov 14, 2022 at 3:57 PM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Mon, Nov 14, 2022 at 8:52 AM Jakub Jelinek  wrote:
> > >
> > > Hi!
> > >
> > > Working virtually out of Baker Island.
> > >
> > > Given
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
> > > the following patch implements atomic load/store (and therefore also
> > > enabling compare and exchange) for -m64 -mcx16 -mavx.
> > >
> > > Ok for trunk if it passes bootstrap/regtest?
> >
> > We only have guarantee from Intel and AMD, there can be other vendors.
> Can we make it a as a micro-architecture tuning?
Or is this the kind that might cause correctness problems (for
example, other vendors use -mtune=intel/amd processors) is not
suitable for tuning as a microarchitecture?
> >
> > Uros.
> >
> > >
> > > 2022-11-13  Jakub Jelinek  
> > >
> > > PR target/104688
> > > * config/i386/sync.md (atomic_loadti, atomic_storeti): New
> > > define_expand patterns.
> > > (atomic_loadti_1, atomic_storeti_1): New define_insn patterns.
> > >
> > > * gcc.target/i386/pr104688-1.c: New test.
> > > * gcc.target/i386/pr104688-2.c: New test.
> > > * gcc.target/i386/pr104688-3.c: New test.
> > >
> > > --- gcc/config/i386/sync.md.jj  2022-11-07 20:54:37.259400942 -1200
> > > +++ gcc/config/i386/sync.md 2022-11-13 19:27:22.977987355 -1200
> > > @@ -225,6 +225,31 @@ (define_insn_and_split "atomic_loaddi_fp
> > >DONE;
> > >  })
> > >
> > > +;; Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> > > +;; is atomic.  AMD will give a similar guarantee.
> > > +(define_expand "atomic_loadti"
> > > +  [(set (match_operand:TI 0 "register_operand" "=x,Yv")
> > > +   (unspec:TI [(match_operand:TI 1 "memory_operand" "m,m")
> > > +   (match_operand:SI 2 "const_int_operand")]
> > > +  UNSPEC_LDA))]
> > > +  "TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX"
> > > +{
> > > +  emit_insn (gen_atomic_loadti_1 (operands[0], operands[1]));
> > > +  DONE;
> > > +})
> > > +
> > > +(define_insn "atomic_loadti_1"
> > > +  [(set (match_operand:TI 0 "register_operand" "=x,Yv")
> > > +   (unspec:TI [(match_operand:TI 1 "memory_operand" "m,m")]
> > > +  UNSPEC_LDA))]
> > > +  "TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX"
> > > +  "@
> > > +   vmovdqa\t{%1, %0|%0, %1}
> > > +   vmovdqa64\t{%1, %0|%0, %1}"
> > > +  [(set_attr "type" "ssemov")
> > > +   (set_attr "prefix" "vex,evex")
> > > +   (set_attr "mode" "TI")])
> > > +
> > >  (define_expand "atomic_store"
> > >[(set (match_operand:ATOMIC 0 "memory_operand")
> > > (unspec:ATOMIC [(match_operand:ATOMIC 1 "nonimmediate_operand")
> > > @@ -276,6 +301,36 @@ (define_insn "atomic_store_1"
> > >""
> > >"%K2mov{}\t{%1, %0|%0, %1}")
> > >
> > > +(define_expand "atomic_storeti"
> > > +  [(set (match_operand:TI 0 "memory_operand")
> > > +   (unspec:TI [(match_operand:TI 1 "register_operand")
> > > +   (match_operand:SI 2 "const_int_operand")]
> > > +  UNSPEC_STA))]
> > > +  "TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX"
> > > +{
> > > +  enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
> > > +
> > > +  emit_insn (gen_atomic_storeti_1 (operands[0], operands[1], 
> > > operands[2]));
> > > +
> > > +  /* ... followed by an MFENCE, if required.  */
> > > +  if (is_mm_seq_cst (model))
> > > +emit_insn (gen_mem_thread_fence (operands[2]));
> > > +  DONE;
> > > +})
> > > +
> > > +(define_insn "atomic_storeti_1"
> > > +  [(set (match_operand:TI 0 "memory_operand" "=m,m")
> > > +   (unspec:TI [(match_operand:TI 1 "register_operand" "x,Yv")
> > > +(match_operand:SI 2 "const_int_operand")]
> > > +   UNSPEC_STA))]
> > > +  ""
> > > +  "@
> > > +   %K2vmovdqa\t{%1, %0|%0, %1}
> > > +   %K2vmovdqa64\t{%1, %0|%0, %1}"
> > > +  [(set_attr "type" "ssemov")
> > > +   (set_attr "prefix" "vex,evex")
> > > +   (set_attr "mode" "TI")])
> > > +
> > >  (define_insn_and_split "atomic_storedi_fpu"
> > >[(set (match_operand:DI 0 "memory_operand" "=m,m,m")
> > > (unspec:DI [(match_operand:DI 1 "nonimmediate_operand" "x,m,?r")]
> > > --- gcc/testsuite/gcc.target/i386/pr104688-1.c.jj   2022-11-13 
> > > 19:36:43.251332612 -1200
> > > +++ gcc/testsuite/gcc.target/i386/pr104688-1.c  2022-11-13 
> > > 19:40:22.649334650 -1200
> > > @@ -0,0 +1,34 @@
> > > +/* PR target/104688 */
> > > +/* { dg-do compile { target int128 } } */
> > > +/* { dg-options "-O2 -mno-cx16" } */
> > > +/* { dg-final { scan-assembler "\t__sync_val_compare_and_swap_16" } } */
> > > +/* { dg-final { scan-assembler "\t__atomic_load_16" } } */
> > > +/* { dg-final { scan-assembler "\t__atomic_store_16" } } */
> > > +/* { dg-final { scan-assembler "\t__atomic_compare_exchange_16" } } */
> > > +
> > > +__int128 v;
> > > +
> > > +__int128
> > > +f1 (void)
> > > +{
> > > +

PING Re: [PATCH] Fortran: Remove double spaces in open() warning [PR99884]

2022-11-14 Thread Bernhard Reutner-Fischer via Gcc-patches
yearly ping. Ok for trunk after re-regtesting?

thanks,

On Sun, 31 Oct 2021 13:57:46 +0100
Bernhard Reutner-Fischer  wrote:

> From: Bernhard Reutner-Fischer 
> 
> gcc/fortran/ChangeLog:
> 
>   PR fortran/99884
>   * io.c (check_open_constraints): Remove double spaces.
> ---
>  gcc/fortran/io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
> index fc97df79eca..9506f35008e 100644
> --- a/gcc/fortran/io.c
> +++ b/gcc/fortran/io.c
> @@ -2513,7 +2513,7 @@ check_open_constraints (gfc_open *open, locus *where)
> spec = "";
>   }
>  
> -  warn_or_error (G_("%s specifier at %L not allowed in OPEN statement 
> for "
> +  warn_or_error (G_("%sspecifier at %L not allowed in OPEN statement for 
> "
>"unformatted I/O"), spec, loc);
>  }
>  



Re: [PATCH] [PR68097] Try to avoid recursing for floats in tree_*_nonnegative_warnv_p.

2022-11-14 Thread Richard Biener via Gcc-patches
On Sat, Nov 12, 2022 at 7:30 PM Aldy Hernandez  wrote:
>
> It irks me that a PR named "we should track ranges for floating-point
> hasn't been closed in this release.  This is an attempt to do just
> that.
>
> As mentioned in the PR, even though we track ranges for floats, it has
> been suggested that avoiding recursing through SSA defs in
> gimple_assign_nonnegative_warnv_p is also a goal.  We can do this with
> various ranger components without the need for a heavy handed approach
> (i.e. a full ranger).
>
> I have implemented two versions of known_float_sign_p() that answer
> the question whether we definitely know the sign for an operation or a
> tree expression.
>
> Both versions use get_global_range_query, which is a wrapper to query
> global ranges.  This means, that no caching or propagation is done.
> In the case of an SSA, we just return the global range for it (think
> SSA_NAME_RANGE_INFO).  In the case of a tree code with operands, we
> also use get_global_range_query to resolve the operands, and then call
> into range-ops, which is our lowest level component.  There is no
> ranger or gori involved.  All we're doing is resolving the operation
> with the ranges passed.
>
> This is enough to avoid recursing in the case where we definitely know
> the sign of a range.  Otherwise, we still recurse.
>
> Note that instead of get_global_range_query(), we could use
> get_range_query() which uses a ranger (if active in a pass), or
> get_global_range_query if not.  This would allow passes that have an
> active ranger (with enable_ranger) to use a full ranger.  These passes
> are currently, VRP, loop unswitching, DOM, loop versioning, etc.  If
> no ranger is active, get_range_query defaults to global ranges, so
> there's no additional penalty.
>
> Would this be acceptable, at least enough to close (or rename the PR ;-))?

I think the checks would belong to the gimple_stmt_nonnegative_warnv_p function
only (that's the SSA name entry from the fold-const.cc ones)?

I also notice the use of 'bool' for the "sign".  That's not really
descriptive.  We
have SIGNED and UNSIGNED (aka enum signop), not sure if that's the
perfect match vs. NEGATIVE and NONNEGATIVE.  Maybe the functions
name is just bad and they should be known_float_negative_p?

> PR tree-optimization/68097
>
> gcc/ChangeLog:
>
> * fold-const.cc (known_float_sign_p): New.
> (tree_unary_nonnegative_warnv_p): Call known_float_sign_p.
> (tree_binary_nonnegative_warnv_p): Same.
> (tree_single_nonnegative_warnv_p): Same.
> ---
>  gcc/fold-const.cc | 51 +++
>  1 file changed, 51 insertions(+)
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index b89cac91cae..bd74cfca996 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -14577,6 +14577,44 @@ tree_simple_nonnegative_warnv_p (enum tree_code 
> code, tree type)
>return false;
>  }
>
> +/* Return true if T is of type floating point and has a known sign.
> +   If so, set the sign in SIGN.  */
> +
> +static bool
> +known_float_sign_p (bool &sign, tree t)
> +{
> +  if (!frange::supports_p (TREE_TYPE (t)))
> +return false;
> +
> +  frange r;
> +  return (get_global_range_query ()->range_of_expr (r, t)
> + && r.signbit_p (sign));
> +}
> +
> +/* Return true if TYPE is a floating-point type and (CODE OP0 OP1) has
> +   a known sign.  If so, set the sign in SIGN.  */
> +
> +static bool
> +known_float_sign_p (bool &sign, enum tree_code code, tree type, tree op0,
> +   tree op1 = NULL_TREE)
> +{
> +  if (!frange::supports_p (type))
> +return false;
> +
> +  range_op_handler handler (code, type);
> +  if (handler)
> +{
> +  frange res, r0, r1;
> +  get_global_range_query ()->range_of_expr (r0, op0);
> +  if (op1)
> +   get_global_range_query ()->range_of_expr (r1, op1);
> +  else
> +   r1.set_varying (type);
> +  return handler.fold_range (res, type, r0, r1) && res.signbit_p (sign);
> +}
> +  return false;
> +}
> +
>  /* Return true if (CODE OP0) is known to be non-negative.  If the return
> value is based on the assumption that signed overflow is undefined,
> set *STRICT_OVERFLOW_P to true; otherwise, don't change
> @@ -14589,6 +14627,10 @@ tree_unary_nonnegative_warnv_p (enum tree_code code, 
> tree type, tree op0,
>if (TYPE_UNSIGNED (type))
>  return true;
>
> +  bool sign;
> +  if (known_float_sign_p (sign, code, type, op0))
> +return !sign;
> +
>switch (code)
>  {
>  case ABS_EXPR:
> @@ -14656,6 +14698,10 @@ tree_binary_nonnegative_warnv_p (enum tree_code 
> code, tree type, tree op0,
>if (TYPE_UNSIGNED (type))
>  return true;
>
> +  bool sign;
> +  if (known_float_sign_p (sign, code, type, op0, op1))
> +return !sign;
> +
>switch (code)
>  {
>  case POINTER_PLUS_EXPR:
> @@ -14778,6 +14824,8 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, 
> tree type, tree op0,
>  bool
>  tree_s

Re: [PATCH] i386: Emit 16b atomics inline with -m64 -mcx16 -mavx [PR104688]

2022-11-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 14, 2022 at 05:04:24PM +0800, Hongtao Liu wrote:
> On Mon, Nov 14, 2022 at 3:57 PM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Mon, Nov 14, 2022 at 8:52 AM Jakub Jelinek  wrote:
> > >
> > > Hi!
> > >
> > > Working virtually out of Baker Island.
> > >
> > > Given
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
> > > the following patch implements atomic load/store (and therefore also
> > > enabling compare and exchange) for -m64 -mcx16 -mavx.
> > >
> > > Ok for trunk if it passes bootstrap/regtest?
> >
> > We only have guarantee from Intel and AMD, there can be other vendors.
> Can we make it a as a micro-architecture tuning?

No, -mtune= isn't a guarantee the code will be executed only on certain CPUs
(-march= is), -mtune= is only about optimizing code for certain CPU.
If we don't get a guarantee from the remaining makers of CPUs with AVX +
CX16 ISAs, another option would be to add a new -mvmovdqa-atomic option
and set it on for -march= of Intel and AMD CPUs with AVX + CX16
and use
TARGET_64BIT && TARGET_CMPXCHG16B && TARGET_AVX && TARGET_VMOVDQA_ATOMIC
as the conditions in the patch.
But that would use it only for -march=native or when people -march=
a particular Intel or AMD CPU, while if we get a guarantee from all AVX+CX16
CPU makers, then it can be on by default with just -mcx16 -mavx.

Jakub



RE: [PATCH][i386]: Update ix86_can_change_mode_class target hook to accept QImode conversions

2022-11-14 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, November 14, 2022 2:14 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; hubi...@ucw.cz;
> ubiz...@gmail.com; kirill.yuk...@gmail.com; hongtao@intel.com
> Subject: Re: [PATCH][i386]: Update ix86_can_change_mode_class target
> hook to accept QImode conversions
> 
> On Fri, Nov 11, 2022 at 10:47 PM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > The current i386 implementation of the
> TARGET_CAN_CHANGE_MODE_CLASS is
> > currently not useful before re-alloc.
> >
> > In particular before regalloc optimization passes query the hook using
> > ALL_REGS, but because of the
> >
> >   if (MAYBE_FLOAT_CLASS_P (regclass))
> >   return false;
> >
> > The hook returns false for all modes, even integer ones because
> > ALL_REGS overlaps with floating point regs.
> >
> > The vector permute fallback cases used to unconditionally convert
> > vector integer permutes to vector QImode ones as a fallback plan.
> > This is incorrect and can result in incorrect code if the target doesn't
> support this conversion.
> >
> > To fix this some more checks were added, however that ended up
> > introducing ICEs in the i386 backend because e.g. the hook would
> > reject conversions between modes like V2TImode and V32QImode.
> >
> > My understanding is that for x87 we don't want to allow floating point
> > conversions, but integers are fine.  So I have modified the check such
> > that it also checks the modes, not just the register class groups.
> >
> > The second part of the code is needed because now that integer modes
> > aren't uniformly rejected the i386 backend trigger further
> > optimizations.  However the backend lacks instructions to deal with
> > canonical RTL representations of certain instructions.  for instance
> > the back-end seems to prefer vec_select 0 instead of subregs.
> >
> > So to prevent the canonicalization I reject integer modes when the
> > sizes of to and from don't match and when we would have exited with
> false previously.
> >
> > This fixes all the ICEs and codegen regressions, but perhaps an x86
> > maintainer should take a deeper look at this hook implementation.
> >
> > Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.cc (ix86_can_change_mode_class): Update the
> target
> > hook.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> >
> c4d0e36e9c0a2256f5dde1f4dc021c0328aa0cba..477dd007ea80272680751b61e
> 35c
> > c3eec79b66c3 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -19682,7 +19682,15 @@ ix86_can_change_mode_class
> (machine_mode
> > from, machine_mode to,
> >
> >/* x87 registers can't do subreg at all, as all values are reformatted
> >   to extended precision.  */
> > -  if (MAYBE_FLOAT_CLASS_P (regclass))
> > +  if (MAYBE_FLOAT_CLASS_P (regclass)
> > +  && VALID_FP_MODE_P (from)
> > +  && VALID_FP_MODE_P (to))
> > +return false;
> This change looks reasonable since only VALID_FP_MODE_P will be allocated
> to FLOAT_CLASS.
> > +
> > +  /* Reject integer modes if the sizes aren't the same.  It would have
> > + normally exited above.  */
> > +  if (MAYBE_FLOAT_CLASS_P (regclass)
> > +  && GET_MODE_SIZE (from) != GET_MODE_SIZE (to))
> >  return false;
> Do you have a case(or a patch so I can reproduce the regression
> myself) to indicate the regression, so I can have a deep look.

Yes that's this one 
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605756.html

Cheers,
Tamar

> >
> >if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
> >
> >
> >
> >
> > --
> 
> 
> 
> --
> BR,
> Hongtao


Re: Revert Sphinx documentation [Was: Issues with Sphinx]

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 08:40, Martin Liška wrote:
>
> On 11/14/22 03:49, Martin Liška wrote:
> > I'm going to revert the patchset during today (Monday) and I'll send a 
> > patch with a couple
> > of new changes that landed in the period of time we used Sphinx.
>
> The revert is done and I included ce51e8439a491910348a1c5aea43b55f000ba8ac 
> commit
> that ports all the new documentation bits to Texinfo.

Sorry it didn't work out, and thanks for reapplying my two doc changes
that landed during the era of the sphinx.

I formatted my new region/endregion pragmas on one line because that
seemed to be how it should be done for rSt, e.g. we had:

``#pragma GCC push_options`` ``#pragma GCC pop_options``

But I think the attached patch is more correct for how we document
pragmas in texinfo.

OK for trunk?
commit 3aa461d4ba46449544730a342cb2dcb0ce6851e9
Author: Jonathan Wakely 
Date:   Mon Nov 14 09:19:13 2022

doc: Format region pragmas as separate items

This seems consistent with how other paired pragmas are documented in
texinfo, e.g. push_options and pop_options.

gcc/ChangeLog:

* doc/cpp.texi (Pragmas): Use @item and @itemx for region
pragmas.

diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 1be29eb605e..5e86a957a88 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -3843,7 +3843,8 @@ file will never be read again, no matter what.  It is a 
less-portable
 alternative to using @samp{#ifndef} to guard the contents of header files
 against multiple inclusions.
 
-@code{#pragma region @{tokens@}...}, @code{#pragma endregion @{tokens@}...}
+@item #pragma region @{tokens@}...
+@itemx #pragma endregion @{tokens@}...
 These pragmas are accepted, but have no effect.
 
 @end ftable


Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 10:00 AM Richard Biener 
wrote:

> On Mon, Nov 14, 2022 at 9:13 AM Christoph Müllner
>  wrote:
> >
> >
> >
> > On Mon, Nov 14, 2022 at 8:31 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >>
> >> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
> >>  wrote:
> >> >
> >> > From: Christoph Müllner 
> >> >
> >> > This patch adds a new pass that looks up function pointer assignments,
> >> > and adds guarded direct calls to the call sites of the function
> >> > pointers.
> >> >
> >> > E.g.: Lets assume an assignment to a function pointer as follows:
> >> > b->cb = &myfun;
> >> >   Other part of the program can use the function pointer as
> follows:
> >> > b->cb ();
> >> >   With this pass the invocation will be transformed to:
> >> > if (b->cb == myfun)
> >> >   myfun();
> >> > else
> >> >b->cb ()
> >> >
> >> > The impact of the dynamic guard is expected to be less than the
> speedup
> >> > gained by enabled optimizations (e.g. inlining or constant
> propagation).
> >>
> >> We have speculative devirtualization doing this very transform,
> shouldn't you
> >> instead improve that instead of inventing another specialized pass?
> >
> >
> > Yes, it can be integrated into ipa-devirt.
> >
> > The reason we initially decided to move it into its own file was that
> C++ devirtualization
> > and function pointer dereferencing/devirtualization will likely not use
> the same analysis.
> > E.g. ODR only applies to C++, C++ tables are not directly exposed to the
> user.
> > So we figured that different things should not be merged together, but a
> reuse
> > of common code to avoid duplication is mandatory.
>
> Btw, in other context the idea came up to build candidates based on
> available
> API/ABI (that can be indirectly called).  That would help for example the
> get_ref calls in refine_subpel in the x264 benchmark.  Maybe what you
> do is actually
> the very same thing (but look for explicit address-taking) - I didn't
> look into whether
> you prune the list of candidates based on API/ABI.
>

No, I don't consider API/ABI at all (do you have a pointer so I can get a
better understanding of that idea?).
Adding guards for all possible functions with the same API/ABI seems
expensive (I might misunderstand the idea).
My patch adds a maximum of 1 test per call site.

What I do is looking which addresses are assigned to the function pointer.
If there is more than one assigned function, I drop the function pointer
from the list of candidates.

I just checked in the dump file, and the patch also dereferences the
indirect calls to get_ref in refine_subpel.



>
> > The patch uses the same API like speculative devirtualization in the
> propagation
> > phase (ipa_make_edge_direct_to_target) and does not do anything in the
> > transformation phase. So there is no duplication of functionality.
> >
> > I will move the code into ipa-devirt.
> >
> > Thanks!
> >
> >
> >>
> >>
> >> Thanks,
> >> Richard.
> >>
> >> > PR ipa/107666
> >> > gcc/ChangeLog:
> >> >
> >> > * Makefile.in: Add new pass.
> >> > * common.opt: Add flag -fipa-guarded-deref.
> >> > * lto-section-in.cc: Add new section "ipa_guarded_deref".
> >> > * lto-streamer.h (enum lto_section_type): Add new section.
> >> > * passes.def: Add new pass.
> >> > * timevar.def (TV_IPA_GUARDED_DEREF): Add time var.
> >> > * tree-pass.h (make_pass_ipa_guarded_deref): New prototype.
> >> > * ipa-guarded-deref.cc: New file.
> >> >
> >> > Signed-off-by: Christoph Müllner 
> >> > ---
> >> >  gcc/Makefile.in  |1 +
> >> >  gcc/common.opt   |4 +
> >> >  gcc/ipa-guarded-deref.cc | 1115
> ++
> >> >  gcc/lto-section-in.cc|1 +
> >> >  gcc/lto-streamer.h   |1 +
> >> >  gcc/passes.def   |1 +
> >> >  gcc/timevar.def  |1 +
> >> >  gcc/tree-pass.h  |1 +
> >> >  8 files changed, 1125 insertions(+)
> >> >  create mode 100644 gcc/ipa-guarded-deref.cc
> >> >
> >> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> >> > index f672e6ea549..402c4a6ea3f 100644
> >> > --- a/gcc/Makefile.in
> >> > +++ b/gcc/Makefile.in
> >> > @@ -1462,6 +1462,7 @@ OBJS = \
> >> > ipa-sra.o \
> >> > ipa-devirt.o \
> >> > ipa-fnsummary.o \
> >> > +   ipa-guarded-deref.o \
> >> > ipa-polymorphic-call.o \
> >> > ipa-split.o \
> >> > ipa-inline.o \
> >> > diff --git a/gcc/common.opt b/gcc/common.opt
> >> > index bce3e514f65..8344940ae5b 100644
> >> > --- a/gcc/common.opt
> >> > +++ b/gcc/common.opt
> >> > @@ -1933,6 +1933,10 @@ fipa-bit-cp
> >> >  Common Var(flag_ipa_bit_cp) Optimization
> >> >  Perform interprocedural bitwise constant propagation.
> >> >
> >> > +fipa-guarded-deref
> >> > +Common Var(flag_ipa_guarded_deref) Optimization
> >> > +Perform guarded function pointer derferencing.
> >> > +
> >> >  fipa-modref
> >> 

Re: [PATCH] builtins: Commonise default handling of nonlocal_goto

2022-11-14 Thread Richard Biener via Gcc-patches
On Sun, Nov 13, 2022 at 10:33 AM Richard Sandiford via Gcc-patches
 wrote:
>
> expand_builtin_longjmp and expand_builtin_nonlocal_goto both
> emit nonlocal gotos.  They first try to use a target-provided
> pattern and fall back to generic code otherwise.  These pieces
> of generic code are almost identical, and having them inline
> like this makes it difficult to define a nonlocal_goto pattern
> that only wants to add extra steps, not change the default ones.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

> Richard
>
> gcc/
> * builtins.h (emit_standard_nonlocal_goto): Declare.
> * builtins.cc (emit_standard_nonlocal_goto): New function,
> commonizing code from...
> (expand_builtin_longjmp, expand_builtin_nonlocal_goto): ...here.
> * genemit.cc (main): Emit an include of builtins.h.
> ---
>  gcc/builtins.cc | 103 +---
>  gcc/builtins.h  |   1 +
>  gcc/genemit.cc  |   1 +
>  3 files changed, 47 insertions(+), 58 deletions(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 4dc1ca672b2..2507745c17a 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -998,6 +998,49 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
>emit_insn (gen_blockage ());
>  }
>
> +/* Emit the standard sequence for a nonlocal_goto.  The arguments are
> +   the operands to the .md pattern.  */
> +
> +void
> +emit_standard_nonlocal_goto (rtx value, rtx label, rtx stack, rtx fp)
> +{
> +  emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
> +  emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
> +
> +  label = copy_to_reg (label);
> +
> +  /* Restore the frame pointer and stack pointer.  We must use a
> + temporary since the setjmp buffer may be a local.  */
> +  fp = copy_to_reg (fp);
> +  emit_stack_restore (SAVE_NONLOCAL, stack);
> +
> +  /* Ensure the frame pointer move is not optimized.  */
> +  emit_insn (gen_blockage ());
> +  emit_clobber (hard_frame_pointer_rtx);
> +  emit_clobber (frame_pointer_rtx);
> +  emit_move_insn (hard_frame_pointer_rtx, fp);
> +
> +  /* USE of hard_frame_pointer_rtx added for consistency;
> + not clear if really needed.  */
> +  emit_use (hard_frame_pointer_rtx);
> +  emit_use (stack_pointer_rtx);
> +
> +  /* If the architecture is using a GP register, we must
> + conservatively assume that the target function makes use of it.
> + The prologue of functions with nonlocal gotos must therefore
> + initialize the GP register to the appropriate value, and we
> + must then make sure that this value is live at the point
> + of the jump.  (Note that this doesn't necessarily apply
> + to targets with a nonlocal_goto pattern; they are free
> + to implement it in their own way.  Note also that this is
> + a no-op if the GP register is a global invariant.)  */
> +  unsigned regnum = PIC_OFFSET_TABLE_REGNUM;
> +  if (value == const0_rtx && regnum != INVALID_REGNUM && fixed_regs[regnum])
> +emit_use (pic_offset_table_rtx);
> +
> +  emit_indirect_jump (label);
> +}
> +
>  /* __builtin_longjmp is passed a pointer to an array of five words (not
> all will be used on all machines).  It operates similarly to the C
> library function of the same name, but is more efficient.  Much of
> @@ -1049,27 +1092,7 @@ expand_builtin_longjmp (rtx buf_addr, rtx value)
>what that value is, because builtin_setjmp does not use it.  */
> emit_insn (targetm.gen_nonlocal_goto (value, lab, stack, fp));
>else
> -   {
> - emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
> - emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
> -
> - lab = copy_to_reg (lab);
> -
> - /* Restore the frame pointer and stack pointer.  We must use a
> -temporary since the setjmp buffer may be a local.  */
> - fp = copy_to_reg (fp);
> - emit_stack_restore (SAVE_NONLOCAL, stack);
> -
> - /* Ensure the frame pointer move is not optimized.  */
> - emit_insn (gen_blockage ());
> - emit_clobber (hard_frame_pointer_rtx);
> - emit_clobber (frame_pointer_rtx);
> - emit_move_insn (hard_frame_pointer_rtx, fp);
> -
> - emit_use (hard_frame_pointer_rtx);
> - emit_use (stack_pointer_rtx);
> - emit_indirect_jump (lab);
> -   }
> +   emit_standard_nonlocal_goto (value, lab, stack, fp);
>  }
>
>/* Search backwards and mark the jump insn as a non-local goto.
> @@ -1201,43 +1224,7 @@ expand_builtin_nonlocal_goto (tree exp)
>if (targetm.have_nonlocal_goto ())
>  emit_insn (targetm.gen_nonlocal_goto (const0_rtx, r_label, r_sp, r_fp));
>else
> -{
> -  emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
> -  emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
> -
> -  r_label = copy_to_reg (r_label);
> -
> -  /* Restore the fram

Re: PING Re: [PATCH] Fortran: Remove double spaces in open() warning [PR99884]

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 10:10 AM Bernhard Reutner-Fischer via
Gcc-patches  wrote:
>
> yearly ping. Ok for trunk after re-regtesting?

OK.

> thanks,
>
> On Sun, 31 Oct 2021 13:57:46 +0100
> Bernhard Reutner-Fischer  wrote:
>
> > From: Bernhard Reutner-Fischer 
> >
> > gcc/fortran/ChangeLog:
> >
> >   PR fortran/99884
> >   * io.c (check_open_constraints): Remove double spaces.
> > ---
> >  gcc/fortran/io.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
> > index fc97df79eca..9506f35008e 100644
> > --- a/gcc/fortran/io.c
> > +++ b/gcc/fortran/io.c
> > @@ -2513,7 +2513,7 @@ check_open_constraints (gfc_open *open, locus *where)
> > spec = "";
> >   }
> >
> > -  warn_or_error (G_("%s specifier at %L not allowed in OPEN statement 
> > for "
> > +  warn_or_error (G_("%sspecifier at %L not allowed in OPEN statement 
> > for "
> >"unformatted I/O"), spec, loc);
> >  }
> >
>


Re: [PATCH] Using sub-scalars mode to move struct block

2022-11-14 Thread Jiufu Guo via Gcc-patches
Hi!

Thanks for your helpful comments/sugguestions!

Richard Biener  writes:

> On Mon, 14 Nov 2022, Jiufu Guo wrote:
>
>> 
>> Hi!
>> Thanks so much for your review!
>> 
>> Richard Biener  writes:
>> 
>> > On Fri, 11 Nov 2022, Jiufu Guo wrote:
>> >
>> >> Hi,
>> >> 
>> >> When assigning a struct parameter to another variable, or loading a
>> >> memory block to a struct var (especially for return value),
>> >> Now, "block move" would be used during expand the assignment. And
>> >> the "block move" may use a type/mode different from the mode which
>> >> is accessing the var. e.g. on ppc64le, V2DI would be used to move
>> >> the block of 16bytes.
>> >> 
>> >> And then, this "block move" would prevent optimization passes from
>> >> leaping/crossing over the assignment. PR65421 reflects this issue.
>> >> 
>> >> As the example code in PR65421.
>> >> 
>> >> typedef struct { double a[4]; } A;
>> >> A foo (const A *a) { return *a; }
>> >> 
>> >> On ppc64le, the below instructions are used for the "block move":
>> >>   7: r122:V2DI=[r121:DI]
>> >>   8: r124:V2DI=[r121:DI+r123:DI]
>> >>   9: [r112:DI]=r122:V2DI
>> >>   10: [r112:DI+0x10]=r124:V2DI
>> >> 
>> >> For this issue, a few comments/suggestions are mentioned via RFC:
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> >> I drafted a patch which is updating the behavior of block_move for
>> >> struct type. This patch is simple to work with, a few ideas in the
>> >> comments are not put into this patch. I would submit this
>> >> patch first.
>> >> 
>> >> The idea is trying to use sub-modes(scalar) for the "block move".
>> >> And the sub-modes would align with the access patterns of the
>> >> struct members and usages on parameter/return value.
>> >> The major benefits of this change would be raising more 
>> >> opportunities for other optimization passes(cse/dse/xprop).
>> >> 
>> >> The suitable mode would be target specified and relates to ABI,
>> >> this patch introduces a target hook. And in this patch, the hook
>> >> is implemented on rs6000.
>> >> 
>> >> In this patch, the hook would be just using heuristic modes for all
>> >> struct block moving. And the hook would not check if the "block move"
>> >> is about parameters or return value or other uses.
>> >> 
>> >> For the rs6000 implementation of this hook, it is able to use
>> >> DF/DI/TD/.. modes for the struct block movement. The sub-modes
>> >> would be the same as the mode when the struct type is on parameter or
>> >> return value.
>> >> 
>> >> Bootstrapped and regtested on ppc64/ppc64le. 
>> >> Is this ok for trunk?
>> >> 
>> >> 
>> >> BR,
>> >> Jeff(Jiufu)
>> >> 
>> >> 
>> >> gcc/ChangeLog:
>> >> 
>> >>   * config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
>> >>   (submode_for_struct_block_move): New function.  Called from
>> >>   rs600_block_move_for_struct.
>> >>   (rs600_block_move_for_struct): New function.
>> >>   * doc/tm.texi: Regenerate.
>> >>   * doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
>> >>   * expr.cc (store_expr): Call block_move_for_struct.
>> >>   * target.def (block_move_for_struct): New hook.
>> >>   * targhooks.cc (default_block_move_for_struct): New function.
>> >>   * targhooks.h (default_block_move_for_struct): New Prototype.
>> >> 
>> >> ---
>> >>  gcc/config/rs6000/rs6000.cc | 44 +
>> >>  gcc/doc/tm.texi |  6 +
>> >>  gcc/doc/tm.texi.in  |  2 ++
>> >>  gcc/expr.cc | 14 +---
>> >>  gcc/target.def  | 10 +
>> >>  gcc/targhooks.cc|  7 ++
>> >>  gcc/targhooks.h |  1 +
>> >>  7 files changed, 81 insertions(+), 3 deletions(-)
>> >> 
>> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> >> index a85d7630b41..e14cecba0ef 100644
>> >> --- a/gcc/config/rs6000/rs6000.cc
>> >> +++ b/gcc/config/rs6000/rs6000.cc
>> >> @@ -1758,6 +1758,9 @@ static const struct attribute_spec 
>> >> rs6000_attribute_table[] =
>> >>  #undef TARGET_NEED_IPA_FN_TARGET_INFO
>> >>  #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
>> >>  
>> >> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
>> >> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
>> >> +
>> >>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>> >>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>> >>  
>> >> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
>> >>return gen_rtx_REG (mode, regno);
>> >>  }
>> >>  
>> >> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode 
>> >> which
>> >> +   would be used to move the struct.  */
>> >> +static machine_mode
>> >> +submode_for_struct_block_move (tree type)
>> >> +{
>> >> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
>> >> +
>> >> +  /* The sub mode may not be the field's type of the struct.
>> >> + It would be fine to use the mode as if the type is used as a 
>> >> function
>> >> + parameter or return v

Re: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 04:52, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (out_value_result): Define.
> (iota_result): Define.
> (__iota_fn, iota): Define.
> * testsuite/25_algorithms/iota/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 48 +++
>  .../testsuite/25_algorithms/iota/1.cc | 29 +++
>  2 files changed, 77 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index da0ca981dc3..f003117c569 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -3517,6 +3517,54 @@ namespace ranges
>};
>
>inline constexpr __contains_subrange_fn contains_subrange{};
> +
> +  template
> +struct out_value_result
> +{
> +  [[no_unique_address]] _Out out;
> +  [[no_unique_address]] _Tp value;
> +
> +  template
> +   requires convertible_to
> + && convertible_to
> +   constexpr
> +   operator out_value_result<_Out2, _Tp2>() const &
> +   { return {out, value}; }
> +
> +  template
> +   requires convertible_to<_Out, _Out2>
> + && convertible_to<_Tp, _Tp2>
> +   constexpr
> +   operator out_value_result<_Out2, _Tp2>() &&
> +   { return {std::move(out), std::move(value)}; }
> +};
> +
> +  template
> +using iota_result = out_value_result<_Out, _Tp>;
> +
> +  struct __iota_fn
> +  {
> +template _Sent, 
> weakly_incrementable _Tp>
> +  requires indirectly_writable<_Out, const _Tp&>
> +  constexpr iota_result<_Out, _Tp>
> +  operator()(_Out __first, _Sent __last, _Tp __value) const
> +  {
> +   while (__first != __last)
> + {
> +   *__first = static_cast&>(__value);

Is this any different to const_cast(__value) ?

We know _Tp must be the same as decay_t<_Tp> and non-void, because
it's passed by value, and therefore I think is_same_v> is always true, isn't it? We don't need to care
about people saying ranges::iota.operator()(o,s,t), those
people are animals.

We would just change the function parameter to const _Tp which would
mean that *_first = __value; always uses a const lvalue, but maybe
that's a bit too subtle. The cast makes it more explicit what's
happening, especially the const_cast version.


> +   ++__first;
> +   ++__value;
> + }
> +   return {std::move(__first), std::move(__value)};
> +  }
> +
> +template _Range>
> +  constexpr iota_result, _Tp>
> +  operator()(_Range&& __r, _Tp __value) const
> +  { return (*this)(ranges::begin(__r), ranges::end(__r), 
> std::move(__value)); }
> +  };
> +
> +  inline constexpr __iota_fn iota{};
>  #endif // C++23
>  } // namespace ranges
>
> diff --git a/libstdc++-v3/testsuite/25_algorithms/iota/1.cc 
> b/libstdc++-v3/testsuite/25_algorithms/iota/1.cc
> new file mode 100644
> index 000..ad2bf08adf5
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/25_algorithms/iota/1.cc
> @@ -0,0 +1,29 @@
> +// { dg-options "-std=gnu++23" }
> +// { dg-do run { target c++23 } }
> +
> +#include 
> +#include 
> +#include 
> +
> +namespace ranges = std::ranges;
> +
> +void
> +test01()
> +{
> +  int x[3] = {};
> +  __gnu_test::test_output_range rx(x);
> +  auto r0 = ranges::iota(rx, 0);
> +  VERIFY( r0.out.ptr == x+3 );
> +  VERIFY( r0.value == 3 );
> +  VERIFY( ranges::equal(x, (int[]){0,1,2}) );
> +  auto r1 = ranges::iota(x, x+2, 5);
> +  VERIFY( r1.out == x+2 );
> +  VERIFY( r1.value == 7 );
> +  VERIFY( ranges::equal(x, (int[]){5,6,2}) );
> +}
> +
> +int
> +main()
> +{
> +  test01();
> +}
> --
> 2.38.1.420.g319605f8f0
>



Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 10:32 AM Christoph Müllner
 wrote:
>
>
>
> On Mon, Nov 14, 2022 at 10:00 AM Richard Biener  
> wrote:
>>
>> On Mon, Nov 14, 2022 at 9:13 AM Christoph Müllner
>>  wrote:
>> >
>> >
>> >
>> > On Mon, Nov 14, 2022 at 8:31 AM Richard Biener 
>> >  wrote:
>> >>
>> >> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
>> >>  wrote:
>> >> >
>> >> > From: Christoph Müllner 
>> >> >
>> >> > This patch adds a new pass that looks up function pointer assignments,
>> >> > and adds guarded direct calls to the call sites of the function
>> >> > pointers.
>> >> >
>> >> > E.g.: Lets assume an assignment to a function pointer as follows:
>> >> > b->cb = &myfun;
>> >> >   Other part of the program can use the function pointer as follows:
>> >> > b->cb ();
>> >> >   With this pass the invocation will be transformed to:
>> >> > if (b->cb == myfun)
>> >> >   myfun();
>> >> > else
>> >> >b->cb ()
>> >> >
>> >> > The impact of the dynamic guard is expected to be less than the speedup
>> >> > gained by enabled optimizations (e.g. inlining or constant propagation).
>> >>
>> >> We have speculative devirtualization doing this very transform, shouldn't 
>> >> you
>> >> instead improve that instead of inventing another specialized pass?
>> >
>> >
>> > Yes, it can be integrated into ipa-devirt.
>> >
>> > The reason we initially decided to move it into its own file was that C++ 
>> > devirtualization
>> > and function pointer dereferencing/devirtualization will likely not use 
>> > the same analysis.
>> > E.g. ODR only applies to C++, C++ tables are not directly exposed to the 
>> > user.
>> > So we figured that different things should not be merged together, but a 
>> > reuse
>> > of common code to avoid duplication is mandatory.
>>
>> Btw, in other context the idea came up to build candidates based on available
>> API/ABI (that can be indirectly called).  That would help for example the
>> get_ref calls in refine_subpel in the x264 benchmark.  Maybe what you
>> do is actually
>> the very same thing (but look for explicit address-taking) - I didn't
>> look into whether
>> you prune the list of candidates based on API/ABI.
>
>
> No, I don't consider API/ABI at all (do you have a pointer so I can get a 
> better understanding of that idea?).

No, it was just an idea discussed internally.

> Adding guards for all possible functions with the same API/ABI seems 
> expensive (I might misunderstand the idea).
> My patch adds a maximum of 1 test per call site.
>
> What I do is looking which addresses are assigned to the function pointer.
> If there is more than one assigned function, I drop the function pointer from 
> the list of candidates.

OK.  If the program is type correct that's probably going to work well
enough.  If there are more than
one candidates then you could prune those by simple API checks, like
match up the number of arguments
or void vs. non-void return type.  More advanced pruning might lose
some valid candidates (API vs.
ABI compatibility), but it's only heuristic pruning in any case.

It would probably help depending on what exactly "assigned to the
function pointer" means.  If the
function pointer is not from directly visible static storage then
matching up assignments and uses
is going to be a difficult IPA problem itself.  So our original idea was for

 (*fnptr) (args ...);

look for all possible definitions in the (LTO) unit that match the
call signature and that have their
address taken and that possibly could be pointed to by fnptr and if
that's a single one, speculatively
devirtualize that.

> I just checked in the dump file, and the patch also dereferences the indirect 
> calls to get_ref in refine_subpel.

IIRC the x264 case has a global variable with all the function
pointers so your implementation
will likely pick up the single assignment to the individual fields.

Richard.

>
>>
>>
>> > The patch uses the same API like speculative devirtualization in the 
>> > propagation
>> > phase (ipa_make_edge_direct_to_target) and does not do anything in the
>> > transformation phase. So there is no duplication of functionality.
>> >
>> > I will move the code into ipa-devirt.
>> >
>> > Thanks!
>> >
>> >
>> >>
>> >>
>> >> Thanks,
>> >> Richard.
>> >>
>> >> > PR ipa/107666
>> >> > gcc/ChangeLog:
>> >> >
>> >> > * Makefile.in: Add new pass.
>> >> > * common.opt: Add flag -fipa-guarded-deref.
>> >> > * lto-section-in.cc: Add new section "ipa_guarded_deref".
>> >> > * lto-streamer.h (enum lto_section_type): Add new section.
>> >> > * passes.def: Add new pass.
>> >> > * timevar.def (TV_IPA_GUARDED_DEREF): Add time var.
>> >> > * tree-pass.h (make_pass_ipa_guarded_deref): New prototype.
>> >> > * ipa-guarded-deref.cc: New file.
>> >> >
>> >> > Signed-off-by: Christoph Müllner 
>> >> > ---
>> >> >  gcc/Makefile.in  |1 +
>> >> >  gcc/common.opt   |4 +
>> >> >  gcc

Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Sun, 13 Nov 2022 at 20:30, Bernhard Reutner-Fischer via Libstdc++
 wrote:
>
> On Sun, 13 Nov 2022 19:42:52 +
> Jonathan Wakely via Gcc-patches  wrote:
>
> > On Sun, 13 Nov 2022, 18:06 Arsen Arsenović via Libstdc++, <
> > libstd...@gcc.gnu.org> wrote:
> >
> > > I'm unsure why this issue only started manifesting now with how old this
> > > code is, but this should fix it.
> > >
> >
> > I just pushed a change to how the debug build makefiles are generated,
> > which presumably uncovered this latent bug. I'll review the patch in the
> > morning.
>
> Ah, you removed debugdir everywhere but in the install-debug rule :)

Doh! I'll get your fix below committed in a few minutes, thanks.

> I.e.:
>
> $ git diff
> diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
> index b545ebf0dcf..bfa031ea395 100644
> --- a/libstdc++-v3/src/Makefile.am
> +++ b/libstdc++-v3/src/Makefile.am
> @@ -422,5 +422,5 @@ build-debug: stamp-debug $(debug_backtrace_supported_h)
>
>  # Install debug library.
>  install-debug: build-debug
> -   (cd ${debugdir} && $(MAKE) CXXFLAGS='$(DEBUG_FLAGS)' \
> -   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install) ;
> +   $(MAKE) -C debug CXXFLAGS='$(DEBUG_FLAGS)' \
> +   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install
> diff --git a/libstdc++-v3/src/Makefile.in b/libstdc++-v3/src/Makefile.in
> index f54ee282fb0..8479d297389 100644
> --- a/libstdc++-v3/src/Makefile.in
> +++ b/libstdc++-v3/src/Makefile.in
> @@ -1142,8 +1142,8 @@ build-debug: stamp-debug $(debug_backtrace_supported_h)
>
>  # Install debug library.
>  install-debug: build-debug
> -   (cd ${debugdir} && $(MAKE) CXXFLAGS='$(DEBUG_FLAGS)' \
> -   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install) ;
> +   $(MAKE) -C debug CXXFLAGS='$(DEBUG_FLAGS)' \
> +   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install
>
>  # Tell versions [3.59,3.63) of GNU make to not export all variables.
>  # Otherwise a system limit (for SysV at least) may be exceeded.
>
> I personally did not experience the gdb.py install bug Arsen seems to
> have encountered though.
>
> thanks!
> >
> >
> >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * python/Makefile.am: Call mkinstalldirs before INSTALL_DATA
> > > when installing gdb scripts.
> > > * python/Makefile.in: Regenerate.
> > > ---
> > > Hi,
> > >
> > > Someone spotted on IRC spotted an error: if trying to install to a fresh
> > > prefix/sysroot with --enable-libstdcxx-debug, the install fails since it's
> > > intended target directories don't exist.  I could replicate this on
> > > r13-3944-g43435c7eb0ff60 using
> > >
> > > $ ../gcc/configure --disable-bootstrap \
> > > --enable-libstdcxx-debug \
> > > --enable-languages=c,c++ \
> > > --prefix=$(pwd)/pfx
>



Re: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1

2022-11-14 Thread Daniel Krügler via Gcc-patches
Am Mo., 14. Nov. 2022 um 11:09 Uhr schrieb Jonathan Wakely via
Libstdc++ :
>
> On Mon, 14 Nov 2022 at 04:52, Patrick Palka via Libstdc++
>  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/ranges_algo.h (out_value_result): Define.
> > (iota_result): Define.
> > (__iota_fn, iota): Define.
> > * testsuite/25_algorithms/iota/1.cc: New test.
> > ---
> >  libstdc++-v3/include/bits/ranges_algo.h   | 48 +++
> >  .../testsuite/25_algorithms/iota/1.cc | 29 +++
> >  2 files changed, 77 insertions(+)
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc
> >
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > b/libstdc++-v3/include/bits/ranges_algo.h
> > index da0ca981dc3..f003117c569 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -3517,6 +3517,54 @@ namespace ranges
> >};
> >
> >inline constexpr __contains_subrange_fn contains_subrange{};
> > +
> > +  template
> > +struct out_value_result
> > +{
> > +  [[no_unique_address]] _Out out;
> > +  [[no_unique_address]] _Tp value;
> > +
> > +  template
> > +   requires convertible_to
> > + && convertible_to
> > +   constexpr
> > +   operator out_value_result<_Out2, _Tp2>() const &
> > +   { return {out, value}; }
> > +
> > +  template
> > +   requires convertible_to<_Out, _Out2>
> > + && convertible_to<_Tp, _Tp2>
> > +   constexpr
> > +   operator out_value_result<_Out2, _Tp2>() &&
> > +   { return {std::move(out), std::move(value)}; }
> > +};
> > +
> > +  template
> > +using iota_result = out_value_result<_Out, _Tp>;
> > +
> > +  struct __iota_fn
> > +  {
> > +template _Sent, 
> > weakly_incrementable _Tp>
> > +  requires indirectly_writable<_Out, const _Tp&>
> > +  constexpr iota_result<_Out, _Tp>
> > +  operator()(_Out __first, _Sent __last, _Tp __value) const
> > +  {
> > +   while (__first != __last)
> > + {
> > +   *__first = static_cast&>(__value);
>
> Is this any different to const_cast(__value) ?

I think it is. const_cast can potentially mean the removal
of volatile, so I would always look with suspicion on const_cast, while static_cast is clearer. Alternatively, as_const could be
used, which does add_const_t.

- Daniel


Re: [PATCH 4/8] Modify test, to prevent the next patch breaking it

2022-11-14 Thread Richard Biener via Gcc-patches
On Fri, Nov 11, 2022 at 7:48 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> The upcoming c[lt]z idiom recognition patch eliminates the need for a
> brute force computation of the iteration count of these loops. The test
> is intended to verify that ivcanon can determine the loop count when the
> condition is given by a chain of constant computations.
>
> We replace the constant operations with a more complicated chain that should
> resist future idiom recognition.

OK.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr77975.c: Make tests more robust.
>
>
> --
>
>
> diff --git a/gcc/testsuite/gcc.dg/pr77975.c b/gcc/testsuite/gcc.dg/pr77975.c
> index 
> 148cebdded964da7fce148abdf2a430c55650513..a187ce2b50c2821841e71b5b6cb243a37a66fb57
>  100644
> --- a/gcc/testsuite/gcc.dg/pr77975.c
> +++ b/gcc/testsuite/gcc.dg/pr77975.c
> @@ -7,10 +7,11 @@
>  unsigned int
>  foo (unsigned int *b)
>  {
> -  unsigned int a = 3;
> +  unsigned int a = 8;
>while (a)
>  {
> -  a >>= 1;
> +  a += 5;
> +  a &= 44;
>*b += a;
>  }
>return a;
> @@ -21,10 +22,11 @@ foo (unsigned int *b)
>  unsigned int
>  bar (unsigned int *b)
>  {
> -  unsigned int a = 7;
> +  unsigned int a = 3;
>while (a)
>  {
> -  a >>= 1;
> +  a += 5;
> +  a &= 44;
>*b += a;
>  }
>return a;


Re: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 10:17, Daniel Krügler  wrote:
>
> Am Mo., 14. Nov. 2022 um 11:09 Uhr schrieb Jonathan Wakely via
> Libstdc++ :
> >
> > On Mon, 14 Nov 2022 at 04:52, Patrick Palka via Libstdc++
> >  wrote:
> > >
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/ranges_algo.h (out_value_result): Define.
> > > (iota_result): Define.
> > > (__iota_fn, iota): Define.
> > > * testsuite/25_algorithms/iota/1.cc: New test.
> > > ---
> > >  libstdc++-v3/include/bits/ranges_algo.h   | 48 +++
> > >  .../testsuite/25_algorithms/iota/1.cc | 29 +++
> > >  2 files changed, 77 insertions(+)
> > >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc
> > >
> > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > index da0ca981dc3..f003117c569 100644
> > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > @@ -3517,6 +3517,54 @@ namespace ranges
> > >};
> > >
> > >inline constexpr __contains_subrange_fn contains_subrange{};
> > > +
> > > +  template
> > > +struct out_value_result
> > > +{
> > > +  [[no_unique_address]] _Out out;
> > > +  [[no_unique_address]] _Tp value;
> > > +
> > > +  template
> > > +   requires convertible_to
> > > + && convertible_to
> > > +   constexpr
> > > +   operator out_value_result<_Out2, _Tp2>() const &
> > > +   { return {out, value}; }
> > > +
> > > +  template
> > > +   requires convertible_to<_Out, _Out2>
> > > + && convertible_to<_Tp, _Tp2>
> > > +   constexpr
> > > +   operator out_value_result<_Out2, _Tp2>() &&
> > > +   { return {std::move(out), std::move(value)}; }
> > > +};
> > > +
> > > +  template
> > > +using iota_result = out_value_result<_Out, _Tp>;
> > > +
> > > +  struct __iota_fn
> > > +  {
> > > +template _Sent, 
> > > weakly_incrementable _Tp>
> > > +  requires indirectly_writable<_Out, const _Tp&>
> > > +  constexpr iota_result<_Out, _Tp>
> > > +  operator()(_Out __first, _Sent __last, _Tp __value) const
> > > +  {
> > > +   while (__first != __last)
> > > + {
> > > +   *__first = static_cast&>(__value);
> >
> > Is this any different to const_cast(__value) ?
>
> I think it is. const_cast can potentially mean the removal
> of volatile,

True.

> so I would always look with suspicion on const_cast _Tp&>, while static_cast is clearer. Alternatively, as_const could be
> used, which does add_const_t.

Which means evaluating the add_const trait *and* overload resolution
for as_const* *and* a runtime function call.

Let's go with static_cast.



Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Sun, 13 Nov 2022 at 18:06, Arsen Arsenović via Libstdc++
 wrote:
>
> I'm unsure why this issue only started manifesting now with how old this
> code is, but this should fix it.
>
> libstdc++-v3/ChangeLog:
>
> * python/Makefile.am: Call mkinstalldirs before INSTALL_DATA
> when installing gdb scripts.
> * python/Makefile.in: Regenerate.


This looks simple, and more consistent with what we already do. Does
it solve your issue?

commit d26dc9e37602314bf6922ac5667fff34f5559449 (HEAD -> master)
Author: Jonathan Wakely 
Date:   Mon Nov 14 10:27:15 2022

   libstdc++: Add missing mkdirs for installing python files for debug lib

   libstdc++-v3/ChangeLog:

   * python/Makefile.am (intall-data-local): Use mkdirs_p for debug
   libdir.
   * python/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/python/Makefile.am b/libstdc++-v3/python/Makefile.am
index f523d3a44dc..d5d29b398b0 100644
--- a/libstdc++-v3/python/Makefile.am
+++ b/libstdc++-v3/python/Makefile.am
@@ -62,5 +62,6 @@ install-data-local: gdb.py
   $(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
   if [ -n "$(debug_gdb_py)" ]; then \
 sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ $(mkdir_p) $(DESTDIR)$(toolexeclibdir)/debug
 $(INSTALL_DATA) debug-gdb.py
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
   fi
diff --git a/libstdc++-v3/python/Makefile.in b/libstdc++-v3/python/Makefile.in
index 05e79b5ac1e..cfec788b6e3 100644
--- a/libstdc++-v3/python/Makefile.in
+++ b/libstdc++-v3/python/Makefile.in
@@ -627,6 +627,7 @@ install-data-local: gdb.py
   $(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
   if [ -n "$(debug_gdb_py)" ]; then \
 sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ $(mkdir_p) $(DESTDIR)$(toolexeclibdir)/debug
 $(INSTALL_DATA) debug-gdb.py
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
   fi



[PATCH] c++: Add testcase for DR 2392

2022-11-14 Thread Jakub Jelinek via Gcc-patches
Hi!

Working virtually out of Baker Island.

The testcase from DR 2392 passes, so I assume we don't need to do
anything further for the DR.

Tested on x86_64-linux, ok for trunk?

2022-11-13  Jakub Jelinek  

* g++.dg/DRs/dr2392.C: Add testcase for DR 2392.

--- gcc/testsuite/g++.dg/DRs/dr2392.C.jj2022-11-13 20:49:22.107817793 
-1200
+++ gcc/testsuite/g++.dg/DRs/dr2392.C   2022-11-13 20:49:17.506880524 -1200
@@ -0,0 +1,12 @@
+// DR 2392
+// { dg-do compile { target c++11 } }
+
+template 
+constexpr int
+foo ()
+{
+  T t;
+  return 1;
+}
+
+using V = decltype (new int[foo ()]);

Jakub



Re: [RFC PATCH] ipa-cp: Speculatively call specialized functions

2022-11-14 Thread Manolis Tsamis
On Mon, Nov 14, 2022 at 9:37 AM Richard Biener
 wrote:
>
> On Sun, Nov 13, 2022 at 4:38 PM Christoph Muellner
>  wrote:
> >
> > From: mtsamis 
> >
> > The IPA CP pass offers a wide range of optimizations, where most of them
> > lead to specialized functions that are called from a call site.
> > This can lead to multiple specialized function clones, if more than
> > one call-site allows such an optimization.
> > If not all call-sites can be optimized, the program might end
> > up with call-sites to the original function.
> >
> > This pass assumes that non-optimized call-sites (i.e. call-sites
> > that don't call specialized functions) are likely to be called
> > with arguments that would allow calling specialized clones.
> > Since we cannot guarantee this (for obvious reasons), we can't
> > replace the existing calls. However, we can introduce dynamic
> > guards that test the arguments for the collected constants
> > and calls the specialized function if there is a match.
> >
> > To demonstrate the effect, let's consider the following program part:
> >
> >   func_1()
> > myfunc(1)
> >   func_2()
> > myfunc(2)
> >   func_i(i)
> > myfunc(i)
> >
> > In this case the transformation would do the following:
> >
> >   func_1()
> > myfunc.constprop.1() // myfunc() with arg0 == 1
> >   func_2()
> > myfunc.constprop.2() // myfunc() with arg0 == 2
> >   func_i(i)
> > if (i == 1)
> >   myfunc.constprop.1() // myfunc() with arg0 == 1
> > else if (i == 2)
> >   myfunc.constprop.2() // myfunc() with arg0 == 2
> > else
> >   myfunc(i)
> >
> > The pass consists of two main parts:
> > * collecting all specialized functions and the argument/constant pair(s)
> > * insertion of the guards during materialization
> >
> > The patch integrates well into ipa-cp and related IPA functionality.
> > Given the nature of IPA, the changes are touching many IPA-related
> > files as well as call-graph data structures.
> >
> > The impact of the dynamic guard is expected to be less than the speedup
> > gained by enabled optimizations (e.g. inlining or constant propagation).
>
> I don't see any limits on the number of callee candidates or the complexity
> of the guard.  Is there any reason to not factor the guards into a wrapper
> function to avoid bloating cold call sites and to allow inlining to decide
> where the expansion is useful?
>

There is indeed no limit on the numbers of guards or guard complexity
currently. Would it be a good choice here to introduce two parameters
for the maximum number of guards and conditions per guard and assign
some sane default value?

About the wrapper functions, that is an interesting question that I haven't
explored as much. One reason is that this transformation aims to work in
a similar way as the speculative edges (which already existed). Since the
expected number of guards is low (1-2 in most cases), I considered the
two optimizations quite similar and wanted to share as much of the design
and functionality as I could. I also tried to make the overhead of the
non-specialized original function call as low as possible.

But I can also see how there is a difference in the speculative and
specialized edges that make creating a wrapper meaningful for
this case: The maximum speedup of a direct vs indirect function
call can be much smaller than that of a specialized call instead
of the generic one.

> Skimming the patch I noticed an #if 0 commented assert with a comment
> that this was to be temporary?
>

Thanks for pointing that out, this is unintentional. I will fix it.

Best,
Manolis

> Thanks,
> Richard.
>
> > PR ipa/107667
> > gcc/Changelog:
> >
> > * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for 
> > guarded specialized edges.
> > (cgraph_edge::set_call_stmt): Likewise.
> > (symbol_table::create_edge): Likewise.
> > (cgraph_edge::remove): Likewise.
> > (cgraph_edge::make_speculative): Likewise.
> > (cgraph_edge::make_specialized): Likewise.
> > (cgraph_edge::remove_specializations): Likewise.
> > (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> > (cgraph_edge::dump_edge_flags): Likewise.
> > (verify_speculative_call): Likewise.
> > (verify_specialized_call): Likewise.
> > (cgraph_node::verify_node): Likewise.
> > * cgraph.h (class GTY): Add new class that contains info of 
> > specialized edges.
> > * cgraphclones.cc (cgraph_edge::clone): Add support for guarded 
> > specialized edges.
> > (cgraph_node::set_call_stmt_including_clones): Likewise.
> > * ipa-cp.cc (want_remove_some_param_p): Likewise.
> > (create_specialized_node): Likewise.
> > (add_specialized_edges): Likewise.
> > (ipcp_driver): Likewise.
> > * ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
> > (ipa_fn_summary_t::duplicate): Likewise.
> > (analyze_function_body): Likewise.
> > (esti

Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 10:29, Jonathan Wakely wrote:
>
> On Sun, 13 Nov 2022 at 18:06, Arsen Arsenović via Libstdc++
>  wrote:
> >
> > I'm unsure why this issue only started manifesting now with how old this
> > code is, but this should fix it.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * python/Makefile.am: Call mkinstalldirs before INSTALL_DATA
> > when installing gdb scripts.
> > * python/Makefile.in: Regenerate.
>
>
> This looks simple, and more consistent with what we already do. Does
> it solve your issue?

Apparently it helps if I commit a fix after testing it, and don't send
the unfixed commit.

Try *this* one.
commit 58a8ec0ce8c9231e6d8cc99a0bf8f2afd1e702de
Author: Jonathan Wakely 
Date:   Mon Nov 14 10:37:58 2022

libstdc++: Add missing mkdirs for installing python files for debug lib

libstdc++-v3/ChangeLog:

* python/Makefile.am (intall-data-local): Use mkdirs_p for debug
libdir.
* python/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/python/Makefile.am b/libstdc++-v3/python/Makefile.am
index f523d3a44dc..df6bf508210 100644
--- a/libstdc++-v3/python/Makefile.am
+++ b/libstdc++-v3/python/Makefile.am
@@ -62,5 +62,6 @@ install-data-local: gdb.py
$(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
if [ -n "$(debug_gdb_py)" ]; then \
  sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ $(mkdir_p) $(DESTDIR)$(toolexeclibdir)/debug ; \
  $(INSTALL_DATA) debug-gdb.py 
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
fi
diff --git a/libstdc++-v3/python/Makefile.in b/libstdc++-v3/python/Makefile.in
index 05e79b5ac1e..c527e6cf186 100644
--- a/libstdc++-v3/python/Makefile.in
+++ b/libstdc++-v3/python/Makefile.in
@@ -627,6 +627,7 @@ install-data-local: gdb.py
$(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
if [ -n "$(debug_gdb_py)" ]; then \
  sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ $(mkdir_p) $(DESTDIR)$(toolexeclibdir)/debug ; \
  $(INSTALL_DATA) debug-gdb.py 
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
fi
 


[PATCH] c++: Allow attributes on concepts - DR 2428

2022-11-14 Thread Jakub Jelinek via Gcc-patches
Hi!

Working virtually out of Baker Island.

The following patch adds parsing of attributes to concept definition,
allows deprecated attribute to be specified (some ugliness needed
because CONCEPT_DECL is a cp/*.def attribute and so can't be mentioned
in c-family/ directly; used what is used for objc method decls,
an alternative would be a langhook) and checks TREE_DEPRECATED in
build_standard_check (not sure if that is the right spot, or whether
it shouldn't be checked also for variable and function concepts and
how to write testcase coverage for that).

Lightly tested so far.

2022-11-13  Jakub Jelinek  

gcc/c-family/
* c-common.h (c_concept_decl): Declare.
* c-attribs.cc (handle_deprecated_attribute): Allow deprecated
attribute on CONCEPT_DECL if flag_concepts.
gcc/c/
* c-decl.cc (c_concept_decl): New function.
gcc/cp/
* cp-tree.h (finish_concept_definition): Add ATTRS parameter.
* parser.cc (cp_parser_concept_definition): Parse attributes in
between identifier and =.  Adjust finish_concept_definition
caller.
* pt.cc (finish_concept_definition): Add ATTRS parameter.  Call
cplus_decl_attributes.
* constraint.cc (build_standard_check): If CONCEPT_DECL is
TREE_DEPRECATED, emit -Wdeprecated-declaration warnings.
* tree.cc (c_concept_decl): New function.
gcc/testsuite/
* g++.dg/cpp2a/concepts-dr2428.C: New test.

--- gcc/c-family/c-common.h.jj  2022-10-27 21:00:53.698247586 -1200
+++ gcc/c-family/c-common.h 2022-11-13 21:49:37.934598359 -1200
@@ -831,6 +831,7 @@ extern tree (*make_fname_decl) (location
 
 /* In c-decl.cc and cp/tree.cc.  FIXME.  */
 extern void c_register_addr_space (const char *str, addr_space_t as);
+extern bool c_concept_decl (enum tree_code);
 
 /* In c-common.cc.  */
 extern bool in_late_binary_op;
--- gcc/c-family/c-attribs.cc.jj2022-10-09 19:31:57.177988375 -1200
+++ gcc/c-family/c-attribs.cc   2022-11-13 21:52:37.920152731 -1200
@@ -4211,7 +4211,8 @@ handle_deprecated_attribute (tree *node,
  || VAR_OR_FUNCTION_DECL_P (decl)
  || TREE_CODE (decl) == FIELD_DECL
  || TREE_CODE (decl) == CONST_DECL
- || objc_method_decl (TREE_CODE (decl)))
+ || objc_method_decl (TREE_CODE (decl))
+ || (flag_concepts && c_concept_decl (TREE_CODE (decl
TREE_DEPRECATED (decl) = 1;
   else if (TREE_CODE (decl) == LABEL_DECL)
{
--- gcc/c/c-decl.cc.jj  2022-11-12 23:29:08.181504470 -1200
+++ gcc/c/c-decl.cc 2022-11-13 21:50:38.178779716 -1200
@@ -12987,6 +12987,14 @@ c_register_addr_space (const char *word,
   ridpointers [rid] = id;
 }
 
+/* C doesn't have CONCEPT_DECL.  */
+
+bool
+c_concept_decl (enum tree_code)
+{
+  return false;
+}
+
 /* Return identifier to look up for omp declare reduction.  */
 
 tree
--- gcc/cp/cp-tree.h.jj 2022-11-11 20:30:10.138056914 -1200
+++ gcc/cp/cp-tree.h2022-11-13 20:58:39.443218815 -1200
@@ -8324,7 +8324,7 @@ struct diagnosing_failed_constraint
 extern cp_expr finish_constraint_or_expr   (location_t, cp_expr, cp_expr);
 extern cp_expr finish_constraint_and_expr  (location_t, cp_expr, cp_expr);
 extern cp_expr finish_constraint_primary_expr  (cp_expr);
-extern tree finish_concept_definition  (cp_expr, tree);
+extern tree finish_concept_definition  (cp_expr, tree, tree);
 extern tree combine_constraint_expressions  (tree, tree);
 extern tree append_constraint  (tree, tree);
 extern tree get_constraints (const_tree);
--- gcc/cp/parser.cc.jj 2022-11-08 22:39:13.325041007 -1200
+++ gcc/cp/parser.cc2022-11-13 20:58:15.692542640 -1200
@@ -29672,6 +29672,8 @@ cp_parser_concept_definition (cp_parser
   return NULL_TREE;
 }
 
+  tree attrs = cp_parser_attributes_opt (parser);
+
   if (!cp_parser_require (parser, CPP_EQ, RT_EQ))
 {
   cp_parser_skip_to_end_of_statement (parser);
@@ -29688,7 +29690,7 @@ cp_parser_concept_definition (cp_parser
  but continue as if it were.  */
   cp_parser_consume_semicolon_at_end_of_statement (parser);
 
-  return finish_concept_definition (id, init);
+  return finish_concept_definition (id, init, attrs);
 }
 
 // -- 
//
--- gcc/cp/pt.cc.jj 2022-11-07 20:54:37.341399829 -1200
+++ gcc/cp/pt.cc2022-11-13 21:01:18.333053377 -1200
@@ -29027,7 +29027,7 @@ placeholder_type_constraint_dependent_p
the TEMPLATE_DECL. */
 
 tree
-finish_concept_definition (cp_expr id, tree init)
+finish_concept_definition (cp_expr id, tree init, tree attrs)
 {
   gcc_assert (identifier_p (id));
   gcc_assert (processing_template_decl);
@@ -29061,6 +29061,9 @@ finish_concept_definition (cp_expr id, t
   DECL_CONTEXT (decl) = current_scope ();
   DECL_INITIAL (decl) = init;
 
+  if (attrs)
+cplus_decl_attributes (&decl, attrs, 0);
+
   set_originating_module (decl, false);
 
   /* Push the enclosing te

Re: [RFC PATCH] ipa-cp: Speculatively call specialized functions

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 11:36 AM Manolis Tsamis  wrote:
>
> On Mon, Nov 14, 2022 at 9:37 AM Richard Biener
>  wrote:
> >
> > On Sun, Nov 13, 2022 at 4:38 PM Christoph Muellner
> >  wrote:
> > >
> > > From: mtsamis 
> > >
> > > The IPA CP pass offers a wide range of optimizations, where most of them
> > > lead to specialized functions that are called from a call site.
> > > This can lead to multiple specialized function clones, if more than
> > > one call-site allows such an optimization.
> > > If not all call-sites can be optimized, the program might end
> > > up with call-sites to the original function.
> > >
> > > This pass assumes that non-optimized call-sites (i.e. call-sites
> > > that don't call specialized functions) are likely to be called
> > > with arguments that would allow calling specialized clones.
> > > Since we cannot guarantee this (for obvious reasons), we can't
> > > replace the existing calls. However, we can introduce dynamic
> > > guards that test the arguments for the collected constants
> > > and calls the specialized function if there is a match.
> > >
> > > To demonstrate the effect, let's consider the following program part:
> > >
> > >   func_1()
> > > myfunc(1)
> > >   func_2()
> > > myfunc(2)
> > >   func_i(i)
> > > myfunc(i)
> > >
> > > In this case the transformation would do the following:
> > >
> > >   func_1()
> > > myfunc.constprop.1() // myfunc() with arg0 == 1
> > >   func_2()
> > > myfunc.constprop.2() // myfunc() with arg0 == 2
> > >   func_i(i)
> > > if (i == 1)
> > >   myfunc.constprop.1() // myfunc() with arg0 == 1
> > > else if (i == 2)
> > >   myfunc.constprop.2() // myfunc() with arg0 == 2
> > > else
> > >   myfunc(i)
> > >
> > > The pass consists of two main parts:
> > > * collecting all specialized functions and the argument/constant pair(s)
> > > * insertion of the guards during materialization
> > >
> > > The patch integrates well into ipa-cp and related IPA functionality.
> > > Given the nature of IPA, the changes are touching many IPA-related
> > > files as well as call-graph data structures.
> > >
> > > The impact of the dynamic guard is expected to be less than the speedup
> > > gained by enabled optimizations (e.g. inlining or constant propagation).
> >
> > I don't see any limits on the number of callee candidates or the complexity
> > of the guard.  Is there any reason to not factor the guards into a wrapper
> > function to avoid bloating cold call sites and to allow inlining to decide
> > where the expansion is useful?
> >
>
> There is indeed no limit on the numbers of guards or guard complexity
> currently. Would it be a good choice here to introduce two parameters
> for the maximum number of guards and conditions per guard and assign
> some sane default value?

Yes, that sounds good.

> About the wrapper functions, that is an interesting question that I haven't
> explored as much. One reason is that this transformation aims to work in
> a similar way as the speculative edges (which already existed). Since the
> expected number of guards is low (1-2 in most cases), I considered the
> two optimizations quite similar and wanted to share as much of the design
> and functionality as I could. I also tried to make the overhead of the
> non-specialized original function call as low as possible.
>
> But I can also see how there is a difference in the speculative and
> specialized edges that make creating a wrapper meaningful for
> this case: The maximum speedup of a direct vs indirect function
> call can be much smaller than that of a specialized call instead
> of the generic one.

Yes - IIRC there was also the idea to generally wrap not specialized
calls or to modify the not specialized copy itself.

> > Skimming the patch I noticed an #if 0 commented assert with a comment
> > that this was to be temporary?
> >
>
> Thanks for pointing that out, this is unintentional. I will fix it.
>
> Best,
> Manolis
>
> > Thanks,
> > Richard.
> >
> > > PR ipa/107667
> > > gcc/Changelog:
> > >
> > > * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for 
> > > guarded specialized edges.
> > > (cgraph_edge::set_call_stmt): Likewise.
> > > (symbol_table::create_edge): Likewise.
> > > (cgraph_edge::remove): Likewise.
> > > (cgraph_edge::make_speculative): Likewise.
> > > (cgraph_edge::make_specialized): Likewise.
> > > (cgraph_edge::remove_specializations): Likewise.
> > > (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> > > (cgraph_edge::dump_edge_flags): Likewise.
> > > (verify_speculative_call): Likewise.
> > > (verify_specialized_call): Likewise.
> > > (cgraph_node::verify_node): Likewise.
> > > * cgraph.h (class GTY): Add new class that contains info of 
> > > specialized edges.
> > > * cgraphclones.cc (cgraph_edge::clone): Add support for guarded 
> > > specialized edges.
> > > (cgraph_node

Re: why does gccgit require pthread?

2022-11-14 Thread LIU Hao via Gcc-patches

在 2022/11/12 02:27, Jonathan Wakely 写道:


A clean build fixed that. This patch bootstraps and passes testing on
x86_64-pc-linux-gnu (CentOS 8 Stream).

OK for trunk?


What should we do if no one has been approving this patch?


--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] Fortran: fix treatment of character, value, optional dummy arguments [PR107444]

2022-11-14 Thread Andreas Schwab
On Nov 13 2022, Harald Anlauf wrote:

> Can you please confirm that it fixes your issues?

Looks good.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-14 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich  writes:
> This patch adds support for Ampere-1A CPU:
>  - recognize the name of the core and provide detection for -mcpu=native,
>  - updated extra_costs,
>  - adds a new fusion pair for (A+B+1 and A-B-1).
>
> Ampere-1A and Ampere-1 have more timing difference than the extra
> costs indicate, but these don't propagate through to the headline
> items in our extra costs (e.g. the change in latency for scalar sqrt
> doesn't have a corresponding table entry).
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
>   * config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
>   * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
>   Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
>   registers and then +1/-1).
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>   idiom-matcher for the new fusion pair.

OK except for a minor formatting nit:

>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/config/aarch64/aarch64-cores.def|   1 +
>  gcc/config/aarch64/aarch64-cost-tables.h| 107 
>  gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
>  gcc/config/aarch64/aarch64-tune.md  |   2 +-
>  gcc/config/aarch64/aarch64.cc   |  63 
>  5 files changed, 173 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index d2671778928..aead587cec1 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  
> V8A,  (CRC, CRYPTO), thu
>  
>  /* Ampere Computing ('\xC0') cores. */
>  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
> ampere1, 0xC0, 0xac3, -1)
> +AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
> MEMTAG), ampere1a, 0xC0, 0xac4, -1)
>  /* Do not swap around "emag" and "xgene1",
> this order is required to handle variant correctly. */
>  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
> 0x50, 0x000, 3)
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> b/gcc/config/aarch64/aarch64-cost-tables.h
> index 760d7b30368..48522606fbe 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -775,4 +775,111 @@ const struct cpu_cost_table ampere1_extra_costs =
>}
>  };
>  
> +const struct cpu_cost_table ampere1a_extra_costs =
> +{
> +  /* ALU */
> +  {
> +0, /* arith.  */
> +0, /* logical.  */
> +0, /* shift.  */
> +COSTS_N_INSNS (1), /* shift_reg.  */
> +0, /* arith_shift.  */
> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> +0, /* log_shift.  */
> +COSTS_N_INSNS (1), /* log_shift_reg.  */
> +0, /* extend.  */
> +COSTS_N_INSNS (1), /* extend_arith.  */
> +0, /* bfi.  */
> +0, /* bfx.  */
> +0, /* clz.  */
> +0, /* rev.  */
> +0, /* non_exec.  */
> +true   /* non_exec_costs_exec.  */
> +  },
> +  {
> +/* MULT SImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  COSTS_N_INSNS (3),   /* flag_setting.  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (4),   /* add.  */
> +  COSTS_N_INSNS (4),   /* extend_add.  */
> +  COSTS_N_INSNS (19)   /* idiv.  */
> +},
> +/* MULT DImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  0,   /* flag_setting (N/A).  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (4),   /* add.  */
> +  COSTS_N_INSNS (4),   /* extend_add.  */
> +  COSTS_N_INSNS (35)   /* idiv.  */
> +}
> +  },
> +  /* LD/ST */
> +  {
> +COSTS_N_INSNS (4), /* load.  */
> +COSTS_N_INSNS (4), /* load_sign_extend.  */
> +0, /* ldrd (n/a).  */
> +0, /* ldm_1st.  */
> +0, /* ldm_regs_per_insn_1st.  */
> +0, /* ldm_regs_per_insn_subsequent.  */
> +COSTS_N_INSNS (5), /* loadf.  */
> +COSTS_N_INSNS (5), /* loadd.  */
> +COSTS_N_INSNS (5), /* load_unaligned.  */
> +0, /* store.  */
> +0, /* strd.  */
> +0, /* stm_1st.  */
> +0, /* stm_regs_per_insn_1st.  */
> +0, /* stm_regs_per_insn_subsequent.  */
> +COSTS_N_INSNS (2), /* storef.  */
> +COSTS_N_INSNS (2), /* stored.  */
> +COSTS_N_INSNS (2),

Re: [PATCH 0/2] i386: slim down insn-automata [PR 87832]

2022-11-14 Thread Alexander Monakov via Gcc-patches


On Mon, 7 Nov 2022, Alexander Monakov wrote:

> 
> On Tue, 1 Nov 2022, Alexander Monakov wrote:
> 
> > Hi,
> > 
> > I'm sending followup fixes for combinatorial explosion of znver scheduling
> > automaton tables as described in the earlier thread:
> > 
> > https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f5...@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
> 
> AMD folks, do you have any feedback?
> 
> What is the way forward for this patchset?

Ping?

> Alexander
> 
> > 
> > I think lujiazui.md and b[dt]ver[123].md have similar issues.
> > 
> > Alexander Monakov (2):
> >   i386: correct x87&SSE division modeling in znver.md
> >   i386: correct x87&SSE multiplication modeling in znver.md
> > 
> >  gcc/config/i386/znver.md | 67 
> >  1 file changed, 34 insertions(+), 33 deletions(-)
> > 
> > 
> 


RE: [PATCH] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
> Sandiford via Gcc-patches
> Sent: Monday, November 14, 2022 11:11 AM
> To: Philipp Tomsich 
> Cc: gcc-patches@gcc.gnu.org; JiangNing Liu
> ; Christoph Muellner
> 
> Subject: Re: [PATCH] aarch64: Add support for Ampere-1A (-
> mcpu=ampere1a) CPU
> 
> Philipp Tomsich  writes:
> > This patch adds support for Ampere-1A CPU:
> >  - recognize the name of the core and provide detection for -mcpu=native,
> >  - updated extra_costs,
> >  - adds a new fusion pair for (A+B+1 and A-B-1).
> >
> > Ampere-1A and Ampere-1 have more timing difference than the extra
> > costs indicate, but these don't propagate through to the headline
> > items in our extra costs (e.g. the change in latency for scalar sqrt
> > doesn't have a corresponding table entry).
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add
> ampere1a.
> > * config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
> > * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
> > Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
> > registers and then +1/-1).
> > * config/aarch64/aarch64-tune.md: Regenerate.
> > * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p):
> Implement
> > idiom-matcher for the new fusion pair.
> 
> OK except for a minor formatting nit:
> 
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >  gcc/config/aarch64/aarch64-cores.def|   1 +
> >  gcc/config/aarch64/aarch64-cost-tables.h| 107 
> >  gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
> >  gcc/config/aarch64/aarch64-tune.md  |   2 +-
> >  gcc/config/aarch64/aarch64.cc   |  63 
> >  5 files changed, 173 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> > index d2671778928..aead587cec1 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,
> thunderx,  V8A,  (CRC, CRYPTO), thu
> >
> >  /* Ampere Computing ('\xC0') cores. */
> >  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES,
> SHA3), ampere1, 0xC0, 0xac3, -1)
> > +AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG,
> AES, SHA3, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
> >  /* Do not swap around "emag" and "xgene1",
> > this order is required to handle variant correctly. */
> >  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO),
> emag, 0x50, 0x000, 3)
> > diff --git a/gcc/config/aarch64/aarch64-cost-tables.h
> b/gcc/config/aarch64/aarch64-cost-tables.h
> > index 760d7b30368..48522606fbe 100644
> > --- a/gcc/config/aarch64/aarch64-cost-tables.h
> > +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> > @@ -775,4 +775,111 @@ const struct cpu_cost_table
> ampere1_extra_costs =
> >}
> >  };
> >
> > +const struct cpu_cost_table ampere1a_extra_costs =
> > +{
> > +  /* ALU */
> > +  {
> > +0, /* arith.  */
> > +0, /* logical.  */
> > +0, /* shift.  */
> > +COSTS_N_INSNS (1), /* shift_reg.  */
> > +0, /* arith_shift.  */
> > +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> > +0, /* log_shift.  */
> > +COSTS_N_INSNS (1), /* log_shift_reg.  */
> > +0, /* extend.  */
> > +COSTS_N_INSNS (1), /* extend_arith.  */
> > +0, /* bfi.  */
> > +0, /* bfx.  */
> > +0, /* clz.  */
> > +0, /* rev.  */
> > +0, /* non_exec.  */
> > +true   /* non_exec_costs_exec.  */
> > +  },
> > +  {
> > +/* MULT SImode */
> > +{
> > +  COSTS_N_INSNS (3),   /* simple.  */
> > +  COSTS_N_INSNS (3),   /* flag_setting.  */
> > +  COSTS_N_INSNS (3),   /* extend.  */
> > +  COSTS_N_INSNS (4),   /* add.  */
> > +  COSTS_N_INSNS (4),   /* extend_add.  */
> > +  COSTS_N_INSNS (19)   /* idiv.  */
> > +},
> > +/* MULT DImode */
> > +{
> > +  COSTS_N_INSNS (3),   /* simple.  */
> > +  0,   /* flag_setting (N/A).  */
> > +  COSTS_N_INSNS (3),   /* extend.  */
> > +  COSTS_N_INSNS (4),   /* add.  */
> > +  COSTS_N_INSNS (4),   /* extend_add.  */
> > +  COSTS_N_INSNS (35)   /* idiv.  */
> > +}
> > +  },
> > +  /* LD/ST */
> > +  {
> > +COSTS_N_INSNS (4), /* load.  */
> > +COSTS_N_INSNS (4), /* load_sign_extend.  */
> > +0, /* ldrd (n/a).  */
> > +0, /* ldm_1st.  */
> > +0, /* ldm_regs_per_insn_1st.  */
> > +0, /* ldm_regs_per_insn_subsequent.  */
> > +C

[PATCH] c++: Alignment changes to layout compatibility/common initial sequence - DR2583

2022-11-14 Thread Jakub Jelinek via Gcc-patches
Hi!

Working virtually out of Baker Island.

When trying to figure out what to do about alignment,
layout_compatible_type_p returns false if TYPE_ALIGN on
ENUMERAL_TYPE/CLASS_TYPE_P (but not scalar types?) differ, or if members
don't have the same positions.

What is in DR2583 doesn't say anything like that though, on the other side
it says that if the corresponding entities don't have the same alignment
requirements, they aren't part of the common initial sequence.

So, my understanding of this is we shouldn't check TYPE_ALIGN in
layout_compatible_type_p, but instead DECL_ALIGN in
next_common_initial_seqence.

Lightly tested (on is-layout*/is-corresponding*/dr2583.C only) so far,
ok if it passes full bootstrap/regtest?
Or do we need different rules?

2022-11-14  Jakub Jelinek  

* typeck.cc (next_common_initial_seqence): Return false members have
different DECL_ALIGN.
(layout_compatible_type_p): Don't test TYPE_ALIGN of ENUMERAL_TYPE
or CLASS_TYPE_P.

* g++.dg/cpp2a/is-layout-compatible3.C: Expect enums with different
alignas to be layout compatible, while classes with different
alignas on members layout incompatible.
* g++.dg/DRs/dr2583.C: New test.

--- gcc/cp/typeck.cc.jj 2022-11-13 04:53:46.010682269 -1200
+++ gcc/cp/typeck.cc2022-11-13 23:14:41.355180354 -1200
@@ -1833,6 +1833,8 @@ next_common_initial_seqence (tree &memb1
   if ((!lookup_attribute ("no_unique_address", DECL_ATTRIBUTES (memb1)))
   != !lookup_attribute ("no_unique_address", DECL_ATTRIBUTES (memb2)))
 return false;
+  if (DECL_ALIGN (memb1) != DECL_ALIGN (memb2))
+return false;
   if (!tree_int_cst_equal (bit_position (memb1), bit_position (memb2)))
 return false;
   return true;
@@ -1854,15 +1856,13 @@ layout_compatible_type_p (tree type1, tr
   type2 = cp_build_qualified_type (type2, TYPE_UNQUALIFIED);
 
   if (TREE_CODE (type1) == ENUMERAL_TYPE)
-return (TYPE_ALIGN (type1) == TYPE_ALIGN (type2)
-   && tree_int_cst_equal (TYPE_SIZE (type1), TYPE_SIZE (type2))
+return (tree_int_cst_equal (TYPE_SIZE (type1), TYPE_SIZE (type2))
&& same_type_p (finish_underlying_type (type1),
finish_underlying_type (type2)));
 
   if (CLASS_TYPE_P (type1)
   && std_layout_type_p (type1)
   && std_layout_type_p (type2)
-  && TYPE_ALIGN (type1) == TYPE_ALIGN (type2)
   && tree_int_cst_equal (TYPE_SIZE (type1), TYPE_SIZE (type2)))
 {
   tree field1 = TYPE_FIELDS (type1);
--- gcc/testsuite/g++.dg/cpp2a/is-layout-compatible3.C.jj   2021-08-18 
21:42:27.414421719 -1200
+++ gcc/testsuite/g++.dg/cpp2a/is-layout-compatible3.C  2022-11-13 
23:20:05.008776825 -1200
@@ -55,10 +55,10 @@ static_assert (!std::is_layout_compatibl
 static_assert (!std::is_layout_compatible_v);
 static_assert (!std::is_layout_compatible_v);
 static_assert (!std::is_layout_compatible_v);
-static_assert (!std::is_layout_compatible_v);
+static_assert (std::is_layout_compatible_v);
 static_assert (!std::is_layout_compatible_v);
 static_assert (!std::is_layout_compatible_v);
 static_assert (!std::is_layout_compatible_v);
-static_assert (std::is_layout_compatible_v);
+static_assert (!std::is_layout_compatible_v);
 static_assert (std::is_layout_compatible_v);
 static_assert (std::is_layout_compatible_v);
--- gcc/testsuite/g++.dg/DRs/dr2583.C.jj2022-11-13 22:58:11.977640606 
-1200
+++ gcc/testsuite/g++.dg/DRs/dr2583.C   2022-11-13 23:18:04.630414835 -1200
@@ -0,0 +1,31 @@
+// DR 2583 - Common initial sequence should consider over-alignment.
+// { dg-do compile { target c++11 } }
+
+#include 
+
+struct A {
+  int i;
+  char c;
+};
+
+struct B {
+  int i;
+  alignas(8) char c;
+};
+
+struct S0 {
+  alignas(16) char x[128];
+  int i;
+};
+
+struct alignas(16) S1 {
+  char x[128];
+  int i;
+};
+
+#if __cpp_lib_is_layout_compatible >= 201907L
+static_assert (std::is_corresponding_member (&A::i, &B::i), "");
+static_assert (alignof (char) == 8 || !std::is_corresponding_member (&A::c, 
&B::c), "");
+static_assert (alignof (char) == 16 || !std::is_corresponding_member (&S0::x, 
&S1::x), "");
+static_assert (alignof (char) == 16 || !std::is_corresponding_member (&S0::i, 
&S1::i), "");
+#endif

Jakub



[PATCH] c++: Add testcase for DR 2604

2022-11-14 Thread Jakub Jelinek via Gcc-patches
Hi!

Working virtually out of Baker Island.

As the following testcase shows, I think we don't inherit template's
attributes into specializations.

Tested on x86_64-linux, ok for trunk?

2022-11-13  Jakub Jelinek  

* g++.dg/DRs/dr2604.C: New test.

--- gcc/testsuite/g++.dg/DRs/dr2604.C.jj2022-11-13 23:39:45.725712300 
-1200
+++ gcc/testsuite/g++.dg/DRs/dr2604.C   2022-11-13 23:39:38.712807673 -1200
@@ -0,0 +1,53 @@
+// DR 2604 - Attributes for an explicit specialization.
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-parameter" }
+
+template
+[[noreturn]] void
+foo ([[maybe_unused]] int i)
+{
+  for (;;);
+}
+
+template<>
+void
+foo (int i)   // { dg-warning "unused parameter 'i'" }
+{
+}
+
+template
+void
+bar (int i)// { dg-warning "unused parameter 'i'" }
+{
+}
+
+template<>
+[[noreturn]] void
+bar ([[maybe_unused]] int i)
+{
+  for (;;);
+}
+
+[[noreturn]] void
+baz ()
+{
+  foo (0);
+}
+
+[[noreturn]] void
+qux ()
+{
+  foo (0);
+}  // { dg-warning "'noreturn' function does return" }
+
+[[noreturn]] void
+garply ()
+{
+  bar (0);
+}  // { dg-warning "'noreturn' function does return" }
+
+[[noreturn]] void
+corge ()
+{
+  bar (0);
+}

Jakub



Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 11:10 AM Richard Biener 
wrote:

> On Mon, Nov 14, 2022 at 10:32 AM Christoph Müllner
>  wrote:
> >
> >
> >
> > On Mon, Nov 14, 2022 at 10:00 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >>
> >> On Mon, Nov 14, 2022 at 9:13 AM Christoph Müllner
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Nov 14, 2022 at 8:31 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >> >>
> >> >> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
> >> >>  wrote:
> >> >> >
> >> >> > From: Christoph Müllner 
> >> >> >
> >> >> > This patch adds a new pass that looks up function pointer
> assignments,
> >> >> > and adds guarded direct calls to the call sites of the function
> >> >> > pointers.
> >> >> >
> >> >> > E.g.: Lets assume an assignment to a function pointer as follows:
> >> >> > b->cb = &myfun;
> >> >> >   Other part of the program can use the function pointer as
> follows:
> >> >> > b->cb ();
> >> >> >   With this pass the invocation will be transformed to:
> >> >> > if (b->cb == myfun)
> >> >> >   myfun();
> >> >> > else
> >> >> >b->cb ()
> >> >> >
> >> >> > The impact of the dynamic guard is expected to be less than the
> speedup
> >> >> > gained by enabled optimizations (e.g. inlining or constant
> propagation).
> >> >>
> >> >> We have speculative devirtualization doing this very transform,
> shouldn't you
> >> >> instead improve that instead of inventing another specialized pass?
> >> >
> >> >
> >> > Yes, it can be integrated into ipa-devirt.
> >> >
> >> > The reason we initially decided to move it into its own file was that
> C++ devirtualization
> >> > and function pointer dereferencing/devirtualization will likely not
> use the same analysis.
> >> > E.g. ODR only applies to C++, C++ tables are not directly exposed to
> the user.
> >> > So we figured that different things should not be merged together,
> but a reuse
> >> > of common code to avoid duplication is mandatory.
> >>
> >> Btw, in other context the idea came up to build candidates based on
> available
> >> API/ABI (that can be indirectly called).  That would help for example
> the
> >> get_ref calls in refine_subpel in the x264 benchmark.  Maybe what you
> >> do is actually
> >> the very same thing (but look for explicit address-taking) - I didn't
> >> look into whether
> >> you prune the list of candidates based on API/ABI.
> >
> >
> > No, I don't consider API/ABI at all (do you have a pointer so I can get
> a better understanding of that idea?).
>
> No, it was just an idea discussed internally.
>
> > Adding guards for all possible functions with the same API/ABI seems
> expensive (I might misunderstand the idea).
> > My patch adds a maximum of 1 test per call site.
> >
> > What I do is looking which addresses are assigned to the function
> pointer.
> > If there is more than one assigned function, I drop the function pointer
> from the list of candidates.
>
> OK.  If the program is type correct that's probably going to work well
> enough.  If there are more than
> one candidates then you could prune those by simple API checks, like
> match up the number of arguments
> or void vs. non-void return type.  More advanced pruning might lose
> some valid candidates (API vs.
> ABI compatibility), but it's only heuristic pruning in any case.
>
> It would probably help depending on what exactly "assigned to the
> function pointer" means.  If the
> function pointer is not from directly visible static storage then
> matching up assignments and uses
> is going to be a difficult IPA problem itself.  So our original idea was
> for
>
>  (*fnptr) (args ...);
>
> look for all possible definitions in the (LTO) unit that match the
> call signature and that have their
> address taken and that possibly could be pointed to by fnptr and if
> that's a single one, speculatively
> devirtualize that.
>

Understood. That's an interesting idea.
Assuming that functions with identical signatures are rare,
both approaches should find similar candidates.

I wonder why the API/ABI compatibility checks are needed
if we only consider functions assigned to a function pointer.
I.e. if call-site and callee don't match, wouldn't the indirect call
suffer from the same incompatibility?

The patch currently looks at the following properties of the RHS of a
function pointer assignment:
* rhs = gimple_assign_rhs1 (stmt)
* rhs_t = TREE_TYPE (rhs)
* possible_decl = TREE_OPERAND (rhs, 0)
* node = cgraph_node::get (possible_decl)

And the following rules are currently enforced:
* TREE_CODE (rhs) == ADDR_EXPR
* TREE_CODE (rhs_t) == POINTER_TYPE
* TREE_CODE (TREE_TYPE (rhs_t)) == FUNCTION_TYPE
* TREE_CODE (possible_decl) == FUNCTION_DECL



> > I just checked in the dump file, and the patch also dereferences the
> indirect calls to get_ref in refine_subpel.
>
> IIRC the x264 case has a global variable with all the function
> pointers so your implementation
> will likely pick up the single assignment t

Re: Revert Sphinx documentation [Was: Issues with Sphinx]

2022-11-14 Thread Gerald Pfeifer
On Mon, 14 Nov 2022, Martin Liška wrote:
> The situation with the Sphinx migration went out of control. The TODO 
> list overwhelmed me and there are road-blocks that can't be easily fixed 
> with what Sphinx currently supports.

This migration was/is a huge and complex undertaking, and you have been 
patiently chipping away at obstacle after obstacle.

So while it probably is disappointing it did not go through this time,
you made a lot of progress and important contributions - and we all 
learned quite a bit more, also in terms of (not so obvious) requirements,
dependencies, and road blocks left which you summarized.


Timing was tricky for me being on the road last week and I am definitely
committed to keep helping with this transition. Maybe soon after we are in 
stage 1 again?

And would it make sense to convert at least our installation docs and
https://gcc.gnu.org/install/ for the GCC 13 release?

Gerald


Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Arsen Arsenović via Gcc-patches
Hi,

Jonathan Wakely  writes:
>> This looks simple, and more consistent with what we already do. Does
>> it solve your issue?

It does work; though, if I was more daring I'd have said that it's fine
without checking, too, since it does the same operation on the same
directory ;)

Was the omission of the mkdir $(DESTDIR)$(toolexeclibdir) intentional?
I only see TELD/debug in your revision of the patch.  Chances are, it
gets created elsewhere (my test was just install-target-libstdc++-v3, so
not even the full install), but it might be worth being conservative
about it.

Thanks,
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


GCC 13.0.0 Status Report (2022-11-14), Stage 3 in effect now

2022-11-14 Thread Richard Biener via Gcc-patches
Status
==

The GCC development branch which will become GCC 13 is now in
bugfixing mode (Stage 3) until the end of Jan 15th.

As usual the first weeks of Stage 3 are used to feature patches
posted late during Stage 1.  At some point unreviewed features
need to be postponed for the next Stage 1.


Quality Data


Priority  #   Change from last report
---   ---
P1  33
P2  473 
P3  113   +  29
P4  253   +   6
P5  25   
---   ---
Total P1-P3 619   +  29
Total   897   +  35


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2022-October/239690.html


[PATCH] remove duplicate match.pd patterns

2022-11-14 Thread Richard Biener via Gcc-patches
The following merges match.pd patterns that cause genmatch complaints
about duplicates when in-order isn't enforced (you have to edit
genmatch.cc to do a full duplicate check).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* match.pd: Remove duplicates.
---
 gcc/match.pd | 63 +---
 1 file changed, 30 insertions(+), 33 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..4d0898ccdcb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1285,8 +1285,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* ~x | x -> -1 */
 /* ~x ^ x -> -1 */
-/* ~x + x -> -1 */
-(for op (bit_ior bit_xor plus)
+(for op (bit_ior bit_xor)
  (simplify
   (op:c (convert? @0) (convert? (bit_not @0)))
   (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
@@ -2939,9 +2938,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
   /* ~A + A -> -1 */
   (simplify
-   (plus:c (bit_not @0) @0)
+   (plus:c (convert? (bit_not @0)) (convert? @0))
(if (!TYPE_OVERFLOW_TRAPS (type))
-{ build_all_ones_cst (type); }))
+(convert { build_all_ones_cst (TREE_TYPE (@0)); })))
 
   /* ~A + 1 -> -A */
   (simplify
@@ -5103,34 +5102,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(scmp @0 (bit_not @1)
 
 (for cmp (simple_comparison)
- /* Fold (double)float1 CMP (double)float2 into float1 CMP float2.  */
- (simplify
-  (cmp (convert@2 @0) (convert? @1))
-  (if (FLOAT_TYPE_P (TREE_TYPE (@0))
-   && (DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@2))
-  == DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@0)))
-   && (DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@2))
-  == DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@1
-   (with
-{
-  tree type1 = TREE_TYPE (@1);
-  if (TREE_CODE (@1) == REAL_CST && !DECIMAL_FLOAT_TYPE_P (type1))
-{
- REAL_VALUE_TYPE orig = TREE_REAL_CST (@1);
- if (TYPE_PRECISION (type1) > TYPE_PRECISION (float_type_node)
- && exact_real_truncate (TYPE_MODE (float_type_node), &orig))
-   type1 = float_type_node;
- if (TYPE_PRECISION (type1) > TYPE_PRECISION (double_type_node)
- && exact_real_truncate (TYPE_MODE (double_type_node), &orig))
-   type1 = double_type_node;
-}
-  tree newtype
-= (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (type1)
-  ? TREE_TYPE (@0) : type1);
-}
-(if (TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (newtype))
- (cmp (convert:newtype @0) (convert:newtype @1))
-
  (simplify
   (cmp @0 REAL_CST@1)
   /* IEEE doesn't distinguish +0 and -0 in comparisons.  */
@@ -5683,7 +5654,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (cmp == LT_EXPR || cmp == LE_EXPR)
   { constant_boolean_node (above ? true : false, type); }
   (if (cmp == GT_EXPR || cmp == GE_EXPR)
-   { constant_boolean_node (above ? false : true, type); }
+   { constant_boolean_node (above ? false : true, type); })
+   /* Fold (double)float1 CMP (double)float2 into float1 CMP float2.  */
+   (if (FLOAT_TYPE_P (TREE_TYPE (@00))
+   && (DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@0))
+   == DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@00)))
+   && (DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@0))
+   == DECIMAL_FLOAT_TYPE_P (TREE_TYPE (@10
+(with
+ {
+   tree type1 = TREE_TYPE (@10);
+   if (TREE_CODE (@10) == REAL_CST && !DECIMAL_FLOAT_TYPE_P (type1))
+{
+  REAL_VALUE_TYPE orig = TREE_REAL_CST (@10);
+  if (TYPE_PRECISION (type1) > TYPE_PRECISION (float_type_node)
+  && exact_real_truncate (TYPE_MODE (float_type_node), &orig))
+type1 = float_type_node;
+  if (TYPE_PRECISION (type1) > TYPE_PRECISION (double_type_node)
+  && exact_real_truncate (TYPE_MODE (double_type_node), &orig))
+type1 = double_type_node;
+}
+  tree newtype
+= (TYPE_PRECISION (TREE_TYPE (@00)) > TYPE_PRECISION (type1)
+  ? TREE_TYPE (@00) : type1);
+ }
+ (if (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (newtype))
+  (cmp (convert:newtype @00) (convert:newtype @10
+
 
 (for cmp (eq ne)
  (simplify
-- 
2.35.3


Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 12:46 PM Christoph Müllner
 wrote:
>
>
>
> On Mon, Nov 14, 2022 at 11:10 AM Richard Biener  
> wrote:
>>
>> On Mon, Nov 14, 2022 at 10:32 AM Christoph Müllner
>>  wrote:
>> >
>> >
>> >
>> > On Mon, Nov 14, 2022 at 10:00 AM Richard Biener 
>> >  wrote:
>> >>
>> >> On Mon, Nov 14, 2022 at 9:13 AM Christoph Müllner
>> >>  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Nov 14, 2022 at 8:31 AM Richard Biener 
>> >> >  wrote:
>> >> >>
>> >> >> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
>> >> >>  wrote:
>> >> >> >
>> >> >> > From: Christoph Müllner 
>> >> >> >
>> >> >> > This patch adds a new pass that looks up function pointer 
>> >> >> > assignments,
>> >> >> > and adds guarded direct calls to the call sites of the function
>> >> >> > pointers.
>> >> >> >
>> >> >> > E.g.: Lets assume an assignment to a function pointer as follows:
>> >> >> > b->cb = &myfun;
>> >> >> >   Other part of the program can use the function pointer as 
>> >> >> > follows:
>> >> >> > b->cb ();
>> >> >> >   With this pass the invocation will be transformed to:
>> >> >> > if (b->cb == myfun)
>> >> >> >   myfun();
>> >> >> > else
>> >> >> >b->cb ()
>> >> >> >
>> >> >> > The impact of the dynamic guard is expected to be less than the 
>> >> >> > speedup
>> >> >> > gained by enabled optimizations (e.g. inlining or constant 
>> >> >> > propagation).
>> >> >>
>> >> >> We have speculative devirtualization doing this very transform, 
>> >> >> shouldn't you
>> >> >> instead improve that instead of inventing another specialized pass?
>> >> >
>> >> >
>> >> > Yes, it can be integrated into ipa-devirt.
>> >> >
>> >> > The reason we initially decided to move it into its own file was that 
>> >> > C++ devirtualization
>> >> > and function pointer dereferencing/devirtualization will likely not use 
>> >> > the same analysis.
>> >> > E.g. ODR only applies to C++, C++ tables are not directly exposed to 
>> >> > the user.
>> >> > So we figured that different things should not be merged together, but 
>> >> > a reuse
>> >> > of common code to avoid duplication is mandatory.
>> >>
>> >> Btw, in other context the idea came up to build candidates based on 
>> >> available
>> >> API/ABI (that can be indirectly called).  That would help for example the
>> >> get_ref calls in refine_subpel in the x264 benchmark.  Maybe what you
>> >> do is actually
>> >> the very same thing (but look for explicit address-taking) - I didn't
>> >> look into whether
>> >> you prune the list of candidates based on API/ABI.
>> >
>> >
>> > No, I don't consider API/ABI at all (do you have a pointer so I can get a 
>> > better understanding of that idea?).
>>
>> No, it was just an idea discussed internally.
>>
>> > Adding guards for all possible functions with the same API/ABI seems 
>> > expensive (I might misunderstand the idea).
>> > My patch adds a maximum of 1 test per call site.
>> >
>> > What I do is looking which addresses are assigned to the function pointer.
>> > If there is more than one assigned function, I drop the function pointer 
>> > from the list of candidates.
>>
>> OK.  If the program is type correct that's probably going to work well
>> enough.  If there are more than
>> one candidates then you could prune those by simple API checks, like
>> match up the number of arguments
>> or void vs. non-void return type.  More advanced pruning might lose
>> some valid candidates (API vs.
>> ABI compatibility), but it's only heuristic pruning in any case.
>>
>> It would probably help depending on what exactly "assigned to the
>> function pointer" means.  If the
>> function pointer is not from directly visible static storage then
>> matching up assignments and uses
>> is going to be a difficult IPA problem itself.  So our original idea was for
>>
>>  (*fnptr) (args ...);
>>
>> look for all possible definitions in the (LTO) unit that match the
>> call signature and that have their
>> address taken and that possibly could be pointed to by fnptr and if
>> that's a single one, speculatively
>> devirtualize that.
>
>
> Understood. That's an interesting idea.
> Assuming that functions with identical signatures are rare,
> both approaches should find similar candidates.
>
> I wonder why the API/ABI compatibility checks are needed
> if we only consider functions assigned to a function pointer.
> I.e. if call-site and callee don't match, wouldn't the indirect call
> suffer from the same incompatibility?

At least in C land mismatches are not unheard of (working across TUs).

>
> The patch currently looks at the following properties of the RHS of a 
> function pointer assignment:
> * rhs = gimple_assign_rhs1 (stmt)
> * rhs_t = TREE_TYPE (rhs)
> * possible_decl = TREE_OPERAND (rhs, 0)
> * node = cgraph_node::get (possible_decl)
>
> And the following rules are currently enforced:
> * TREE_CODE (rhs) == ADDR_EXPR
> * TREE_CODE (rhs_t) == POINTER_TYPE
> * TREE_CODE (TREE_TYPE (rhs_t)) == FUNCTION_TYPE
>

[COMMITTED] ada: Remove gnatcheck reference

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Arnaud Charlet 

Since gnatcheck is no longer bundled with gnat

gcc/ada/

* doc/gnat_ugn/gnat_utility_programs.rst: Remove gnatcheck
reference.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../doc/gnat_ugn/gnat_utility_programs.rst| 22 ---
 1 file changed, 22 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/gnat_utility_programs.rst 
b/gcc/ada/doc/gnat_ugn/gnat_utility_programs.rst
index 92877a2d172..17d3e0d0cca 100644
--- a/gcc/ada/doc/gnat_ugn/gnat_utility_programs.rst
+++ b/gcc/ada/doc/gnat_ugn/gnat_utility_programs.rst
@@ -14,7 +14,6 @@ This chapter describes a number of utility programs:
 
   * :ref:`The_File_Cleanup_Utility_gnatclean`
   * :ref:`The_GNAT_Library_Browser_gnatls`
-  * :ref:`The_Coding_Standard_Verifier_gnatcheck`
   * :ref:`The_GNAT_Pretty_Printer_gnatpp`
   * :ref:`The_Body_Stub_Generator_gnatstub`
   * :ref:`The_Backtrace_Symbolizer_gnatsymbolize`
@@ -465,27 +464,6 @@ building specialized scripts.
   /home/comar/local/adainclude/unchconv.ads
 
 
-.. only:: PRO or GPL
-
-  .. _The_Coding_Standard_Verifier_gnatcheck:
-
-  The Coding Standard Verifier ``gnatcheck``
-  ==
-
-  .. index:: ! gnatcheck
-  .. index:: ASIS
-
-  The ``gnatcheck`` tool is an ASIS-based utility that checks coding standard
-  compliance of Ada source files according to a given set of semantic rules.
-
-  ``gnatcheck`` is a project-aware tool
-  (see :ref:`Using_Project_Files_with_GNAT_Tools` for a description of
-  the project-related switches). The project file package that can specify
-  ``gnatcheck`` switches is named ``Check``.
-
-  For full details, plese refer to :title:`GNATcheck Reference Manual`.
-
-
 .. only:: PRO or GPL
 
.. _The_GNAT_Pretty_Printer_gnatpp:
-- 
2.34.1



[COMMITTED] ada: Improve location of error messages in instantiations

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Yannick Moy 

When flag -gnatdF is used, source code lines are displayed to point
the location of errors. The code of the instantiation was displayed
in case of errors inside generic instances, which was not precise.
Now the code inside the generic is displayed.

gcc/ada/

* errout.adb (Error_Msg_Internal): Store span for Optr field, and
adapt to new type of Optr.
(Finalize. Output_JSON_Message, Remove_Warning_Messages): Adapt to
new type of Optr.
(Output_Messages): Use Optr instead of Sptr to display code
snippet closer to error.
* erroutc.adb (dmsg): Adapt to new type of Optr.
* erroutc.ads (Error_Msg_Object): Make Optr a span like Sptr.
* errutil.adb (Error_Msg): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errout.adb  | 19 ++-
 gcc/ada/erroutc.adb |  2 +-
 gcc/ada/erroutc.ads |  2 +-
 gcc/ada/errutil.adb |  2 +-
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index 19ea1553260..dcd21778db3 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -1215,7 +1215,7 @@ package body Errout is
   Next=> No_Error_Msg,
   Prev=> No_Error_Msg,
   Sptr=> Span,
-  Optr=> Optr,
+  Optr=> Opan,
   Insertion_Sloc  => (if Has_Insertion_Line then Error_Msg_Sloc
   else No_Location),
   Sfile   => Get_Source_File_Index (Sptr),
@@ -1284,7 +1284,7 @@ package body Errout is
or else
   (Sptr = Errors.Table (Last_Error_Msg).Sptr.Ptr
  and then
-   Optr > Errors.Table (Last_Error_Msg).Optr))
+   Optr > Errors.Table (Last_Error_Msg).Optr.Ptr))
  then
 Prev_Msg := Last_Error_Msg;
 Next_Msg := No_Error_Msg;
@@ -1302,7 +1302,8 @@ package body Errout is
then
   exit when Sptr < Errors.Table (Next_Msg).Sptr.Ptr
 or else (Sptr = Errors.Table (Next_Msg).Sptr.Ptr
-  and then Optr < Errors.Table (Next_Msg).Optr);
+  and then
+ Optr < Errors.Table (Next_Msg).Optr.Ptr);
end if;
 
Prev_Msg := Next_Msg;
@@ -1681,8 +1682,8 @@ package body Errout is
(Warning_Specifically_Suppressed (CE.Sptr.Ptr, CE.Text, Tag)
 /= No_String
   or else
-Warning_Specifically_Suppressed (CE.Optr, CE.Text, Tag) /=
-   No_String)
+Warning_Specifically_Suppressed (CE.Optr.Ptr, CE.Text, Tag)
+/= No_String)
 then
Delete_Warning (Cur);
 
@@ -2232,9 +2233,9 @@ package body Errout is
   Write_Str (",""locations"":[");
   Write_JSON_Span (Errors.Table (E));
 
-  if Errors.Table (E).Optr /= Errors.Table (E).Sptr.Ptr then
+  if Errors.Table (E).Optr.Ptr /= Errors.Table (E).Sptr.Ptr then
  Write_Str (",{""caret"":");
- Write_JSON_Location (Errors.Table (E).Optr);
+ Write_JSON_Location (Errors.Table (E).Optr.Ptr);
  Write_Str ("}");
   end if;
 
@@ -2954,7 +2955,7 @@ package body Errout is
else SGR_Error);
  begin
 Write_Source_Code_Lines
-  (Errors.Table (E).Sptr, SGR_Span);
+  (Errors.Table (E).Optr, SGR_Span);
  end;
   end if;
end if;
@@ -3329,7 +3330,7 @@ package body Errout is
 
--  Don't remove if location does not match
 
-   and then Errors.Table (E).Optr = Loc
+   and then Errors.Table (E).Optr.Ptr = Loc
 
--  Don't remove if not warning/info message. Note that we do
--  not remove style messages here. They are warning messages
diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
index 9ecc97fb46d..7766c972730 100644
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -324,7 +324,7 @@ package body Erroutc is
 
   Write_Str
 ("  Optr = ");
-  Write_Location (E.Optr);
+  Write_Location (E.Optr.Ptr);
   Write_Eol;
 
   w ("  Line = ", Int (E.Line));
diff --git a/gcc/ada/erroutc.ads b/gcc/ada/erroutc.ads
index 7957228a91b..c992bbaa183 100644
--- a/gcc/ada/erroutc.ads
+++ b/gcc/ada/erroutc.ads
@@ -209,7 +209,7 @@ package Erroutc is
   --  will be posted. Note that an error placed on an instantiation will
   --  have Sptr pointing to the instantiati

[COMMITTED] ada: Crash on applying 'Pos to expression of a type derived from a formal type

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Gary Dismukes 

The compiler crashes when trying to do a static check for a range violation
in a type conversion of a Pos attribute applied to a prefix of a type derived
from a generic formal discrete type. This optimization was suppressed in the
case of formal types, because the upper bound may not be known, but it also
needs to be suppressed for types derived from formal types.

gcc/ada/

* checks.adb
(Apply_Type_Conversion_Checks): Apply Root_Type to the type of the
prefix of a Pos attribute when checking whether the type is a
formal discrete type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 96876672871..2a45f4d49b0 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -3789,13 +3789,14 @@ package body Checks is
--  Universal_Integer. So in numeric conversions it is usually
--  within range of the target integer type. Use the static
--  bounds of the base types to check. Disable this optimization
-   --  in case of a generic formal discrete type, because we don't
-   --  necessarily know the upper bound yet.
+   --  in case of a descendant of a generic formal discrete type,
+   --  because we don't necessarily know the upper bound yet.
 
if Nkind (Expr) = N_Attribute_Reference
  and then Attribute_Name (Expr) = Name_Pos
  and then Is_Enumeration_Type (Etype (Prefix (Expr)))
- and then not Is_Generic_Type (Etype (Prefix (Expr)))
+ and then
+   not Is_Generic_Type (Root_Type (Etype (Prefix (Expr
  and then Is_Integer_Type (Target_Type)
then
   declare
-- 
2.34.1



[COMMITTED] ada: Enable Support_Atomic_Primitives on QNX and RTEMS

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Patrick Bernardi 

QNX and RTEMS support 64-bit atomic primitives.

gcc/ada/

* libgnat/system-qnx-arm.ads: Set Support_Atomic_Primitives to
True.
* libgnat/system-rtems.ads: Add Support_Atomic_Primitives.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/system-qnx-arm.ads | 2 +-
 gcc/ada/libgnat/system-rtems.ads   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/libgnat/system-qnx-arm.ads 
b/gcc/ada/libgnat/system-qnx-arm.ads
index 038fe6c9230..749384f9fd1 100644
--- a/gcc/ada/libgnat/system-qnx-arm.ads
+++ b/gcc/ada/libgnat/system-qnx-arm.ads
@@ -142,7 +142,7 @@ private
Stack_Check_Probes: constant Boolean := True;
Stack_Check_Limits: constant Boolean := False;
Support_Aggregates: constant Boolean := True;
-   Support_Atomic_Primitives : constant Boolean := False;
+   Support_Atomic_Primitives : constant Boolean := True;
Support_Composite_Assign  : constant Boolean := True;
Support_Composite_Compare : constant Boolean := True;
Support_Long_Shifts   : constant Boolean := True;
diff --git a/gcc/ada/libgnat/system-rtems.ads b/gcc/ada/libgnat/system-rtems.ads
index 5959b72405b..52ee299c260 100644
--- a/gcc/ada/libgnat/system-rtems.ads
+++ b/gcc/ada/libgnat/system-rtems.ads
@@ -150,6 +150,7 @@ private
Stack_Check_Probes: constant Boolean := False;
Stack_Check_Limits: constant Boolean := False;
Support_Aggregates: constant Boolean := True;
+   Support_Atomic_Primitives : constant Boolean := True;
Support_Composite_Assign  : constant Boolean := True;
Support_Composite_Compare : constant Boolean := True;
Support_Long_Shifts   : constant Boolean := True;
-- 
2.34.1



[COMMITTED] ada: Expand generic formal subprograms with contracts for GNATprove

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

In GNATprove mode generic formal subprograms with Pre/Post contracts are
now expanded into wrappers, just like in ordinary compilation.

gcc/ada/

* sem_ch12.adb (Analyze_Associations): Expand wrappers for
GNATprove.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 276656085be..7af365e49c9 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -1937,7 +1937,7 @@ package body Sem_Ch12 is
  --  take place e.g. within an enclosing generic unit.
 
  if Has_Contracts (Analyzed_Formal)
-   and then Expander_Active
+   and then (Expander_Active or GNATprove_Mode)
  then
 Build_Subprogram_Wrappers;
  end if;
-- 
2.34.1



[COMMITTED] ada: Silence CodePeer false positive

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Boris Yakobowski 

gcc/ada/

* sem_case.adb: silence false positive warning emitted by CodePeer
on predefined equality for type Choice_Range_Info.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_case.adb | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ada/sem_case.adb b/gcc/ada/sem_case.adb
index 244e53f5752..f89c3ca386f 100644
--- a/gcc/ada/sem_case.adb
+++ b/gcc/ada/sem_case.adb
@@ -209,6 +209,8 @@ package body Sem_Case is
 null;
   end case;
end record;
+ pragma Annotate (CodePeer, False_Positive, "raise exception",
+  "function is abstract, hence never called");
  function "=" (X, Y : Choice_Range_Info) return Boolean is abstract;
 
  type Choices_Range_Info is array (Choice_Id) of Choice_Range_Info;
-- 
2.34.1



[COMMITTED] ada: hardcfr docs: add optional checkpoints

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Alexandre Oliva 

Previously, control flow redundancy only checked the visited bitmap
against the control flow graph at return points and before mandatory
tail calls, missing various other possibilities of exiting a
subprogram, such as by raising or propagating exceptions, and calling
noreturn functions.  The checks inserted before returns also prevented
potential tail-call optimizations.

This incremental change introduces options to control checking at each
of these previously-missed checkpoints.  Unless disabled, a cleanup is
introduced to check when an exceptions escapes a subprogram.  To avoid
disrupting sibcall optimizations, when they are enabled, checks are
introduced before calls whose results are immediately returned,
whether or not they are ultimately optimized.  If enabled, checks are
introduced before noreturn calls and exception raises, or only before
nothrow noreturn calls.

Add examples of code transformations to the GNAT RM.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst: Document optional
hardcfr checkpoints.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../gnat_rm/security_hardening_features.rst   | 126 +-
 gcc/ada/gnat_rm.texi  | 123 -
 gcc/ada/gnat_ugn.texi |   5 +-
 3 files changed, 240 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/security_hardening_features.rst 
b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
index d7c02b94f36..ad165cd6849 100644
--- a/gcc/ada/doc/gnat_rm/security_hardening_features.rst
+++ b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
@@ -383,11 +383,127 @@ For each block that is marked as visited, the mechanism 
checks that at
 least one of its predecessors, and at least one of its successors, are
 also marked as visited.
 
-Verification is performed just before returning.  Subprogram
-executions that complete by raising or propagating an exception bypass
-verification-and-return points.  A subprogram that can only complete
-by raising or propagating an exception may have instrumentation
-disabled altogether.
+Verification is performed just before a subprogram returns.  The
+following fragment:
+
+.. code-block:: ada
+
+   if X then
+ Y := F (Z);
+ return;
+   end if;
+
+
+gets turned into:
+
+.. code-block:: ada
+
+   type Visited_Bitmap is array (1..N) of Boolean with Pack;
+   Visited : aliased Visited_Bitmap := (others => False);
+   --  Bitmap of visited blocks.  N is the basic block count.
+   [...]
+   --  Basic block #I
+   Visited(I) := True;
+   if X then
+ --  Basic block #J
+ Visited(J) := True;
+ Y := F (Z);
+ CFR.Check (N, Visited'Access, CFG'Access);
+ --  CFR is a hypothetical package whose Check procedure calls
+ --  libgcc's __hardcfr_check, that traps if the Visited bitmap
+ --  does not hold a valid path in CFG, the run-time
+ --  representation of the control flow graph in the enclosing
+ --  subprogram.
+ return;
+   end if;
+   --  Basic block #K
+   Visited(K) := True;
+
+
+Verification would also be performed before tail calls, if any
+front-ends marked them as mandatory or desirable, but none do.
+Regular calls are optimized into tail calls too late for this
+transformation to act on it.
+
+In order to avoid adding verification after potential tail calls,
+which would prevent tail-call optimization, we recognize returning
+calls, i.e., calls whose result, if any, is returned by the calling
+subprogram to its caller immediately after the call returns.
+Verification is performed before such calls, whether or not they are
+ultimately optimized to tail calls.  This behavior is enabled by
+default whenever sibcall optimization is enabled (see
+:switch:`-foptimize-sibling-calls`); it may be disabled with
+:switch:`-fno-hardcfr-check-returning-calls`, or enabled with
+:switch:`-fhardcfr-check-returning-calls`, regardless of the
+optimization, but the lack of other optimizations may prevent calls
+from being recognized as returning calls:
+
+.. code-block:: ada
+
+ --  CFR.Check here, with -fhardcfr-check-returning-calls.
+ P (X);
+ --  CFR.Check here, with -fno-hardcfr-check-returning-calls.
+ return;
+
+or:
+
+.. code-block:: ada
+
+ --  CFR.Check here, with -fhardcfr-check-returning-calls.
+ R := F (X);
+ --  CFR.Check here, with -fno-hardcfr-check-returning-calls.
+ return R;
+
+
+Any subprogram from which an exception may escape, i.e., that may
+raise or propagate an exception that isn't handled internally, is
+conceptually enclosed by a cleanup handler that performs verification,
+unless this is disabled with :switch:`-fno-hardcfr-check-exceptions`.
+With this feature enabled, a subprogram body containing:
+
+.. code-block:: ada
+
+ --  ...
+   Y := F (X);  -- May raise exceptions.
+ --  ...
+   raise E;  -- Not handle

[COMMITTED] ada: Fix error on SPARK_Mode on library-level separate body

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Yannick Moy 

When applying explicitly SPARK_Mode on a separate library-level spec
and body for which a contract needs to be checked, compilation with
-gnata was failing on a spurious error related to SPARK_Mode
placement. Now fixed.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Add special case for the special
local subprogram created for contracts.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 615c6d2110c..77fcb1c505f 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -23424,10 +23424,14 @@ package body Sem_Prag is
Spec_Id : constant Entity_Id := Unique_Defining_Entity (Decl);
 
 begin
-   --  Ignore pragma when applied to the special body created for
-   --  inlining, recognized by its internal name _Parent.
+   --  Ignore pragma when applied to the special body created
+   --  for inlining, recognized by its internal name _Parent; or
+   --  when applied to the special body created for contracts,
+   --  recognized by its internal name _Wrapped_Statements.
 
-   if Chars (Body_Id) = Name_uParent then
+   if Chars (Body_Id) in Name_uParent
+   | Name_uWrapped_Statements
+   then
   return;
end if;
 
-- 
2.34.1



[COMMITTED] ada: Fix style in code for generic formal subprograms with contracts

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

Code cleanup related to expansion generic formal subprograms with
contracts for GNATprove.

gcc/ada/

* inline.adb (Replace_Formal): Tune whitespace.
* sem_ch12.adb (Check_Overloaded_Formal_Subprogram): Refine type
of a formal parameter and local variable; this routine operates on
nodes and not entities.
* sem_ch12.ads: Tune whitespace.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/inline.adb   |  4 ++--
 gcc/ada/sem_ch12.adb | 18 +-
 gcc/ada/sem_ch12.ads | 18 +-
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
index a1ead98e67a..d33f5b4558e 100644
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -4723,8 +4723,8 @@ package body Inline is
   
 
   function Replace_Formal (N : Node_Id) return Traverse_Result is
- A   : Entity_Id;
- E   : Entity_Id;
+ A : Entity_Id;
+ E : Entity_Id;
 
   begin
  if Is_Entity_Name (N) and then Present (Entity (N)) then
diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 7af365e49c9..03ce5d51a03 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -1151,7 +1151,7 @@ package body Sem_Ch12 is
   --  in which case the predefined operations will be used. This merits
   --  a warning because of the special semantics of fixed point ops.
 
-  procedure Check_Overloaded_Formal_Subprogram (Formal : Entity_Id);
+  procedure Check_Overloaded_Formal_Subprogram (Formal : Node_Id);
   --  Apply RM 12.3(9): if a formal subprogram is overloaded, the instance
   --  cannot have a named association for it. AI05-0025 extends this rule
   --  to formals of formal packages by AI05-0025, and it also applies to
@@ -1259,15 +1259,15 @@ package body Sem_Ch12 is
  --  actuals.
 
  Append_To (Assoc_List,
-Build_Subprogram_Body_Wrapper (Formal, Actual_Name));
+   Build_Subprogram_Body_Wrapper (Formal, Actual_Name));
   end Build_Subprogram_Wrappers;
 
   
   -- Check_Overloaded_Formal_Subprogram --
   
 
-  procedure Check_Overloaded_Formal_Subprogram (Formal : Entity_Id) is
- Temp_Formal : Entity_Id;
+  procedure Check_Overloaded_Formal_Subprogram (Formal : Node_Id) is
+ Temp_Formal : Node_Id;
 
   begin
  Temp_Formal := First (Formals);
@@ -1449,8 +1449,8 @@ package body Sem_Ch12 is
 (F   : Entity_Id;
  A_F : Entity_Id) return Node_Id
   is
- Prev  : Node_Id;
- Act   : Node_Id;
+ Prev : Node_Id;
+ Act  : Node_Id;
 
   begin
  Is_Named_Assoc := False;
@@ -6252,7 +6252,7 @@ package body Sem_Ch12 is
 
   while Present (Act) loop
  Append_To (Actuals,
-Make_Identifier  (Loc, Chars (Defining_Identifier (Act;
+Make_Identifier (Loc, Chars (Defining_Identifier (Act;
  Next (Act);
   end loop;
 
@@ -6273,8 +6273,8 @@ package body Sem_Ch12 is
 Specification => Spec_Node,
 Declarations  => New_List,
 Handled_Statement_Sequence =>
-   Make_Handled_Sequence_Of_Statements (Loc,
- Statements=> New_List (Stmt)));
+  Make_Handled_Sequence_Of_Statements (Loc,
+Statements => New_List (Stmt)));
 
   return Body_Node;
end Build_Subprogram_Body_Wrapper;
diff --git a/gcc/ada/sem_ch12.ads b/gcc/ada/sem_ch12.ads
index 58a94552991..69c9d6404e6 100644
--- a/gcc/ada/sem_ch12.ads
+++ b/gcc/ada/sem_ch12.ads
@@ -27,15 +27,15 @@ with Inline; use Inline;
 with Types;  use Types;
 
 package Sem_Ch12 is
-   procedure Analyze_Generic_Package_Declaration(N : Node_Id);
-   procedure Analyze_Generic_Subprogram_Declaration (N : Node_Id);
-   procedure Analyze_Package_Instantiation  (N : Node_Id);
-   procedure Analyze_Procedure_Instantiation(N : Node_Id);
-   procedure Analyze_Function_Instantiation (N : Node_Id);
-   procedure Analyze_Formal_Object_Declaration  (N : Node_Id);
-   procedure Analyze_Formal_Type_Declaration(N : Node_Id);
-   procedure Analyze_Formal_Subprogram_Declaration  (N : Node_Id);
-   procedure Analyze_Formal_Package_Declaration (N : Node_Id);
+   procedure Analyze_Generic_Package_Declaration(N : Node_Id);
+   procedure Analyze_Generic_Subprogram_Declaration (N : Node_Id);
+   procedure Analyze_Package_Instantiation  (N : Node_Id);
+   procedure Analyze_Procedure_Instantiation(N : Node_Id);
+   procedure Analyze_Function_Instantiation (N : Node_Id);
+   procedure Analyze_Formal_Object_Declaration  (N : Node_Id);
+   procedure Analyze_Formal_Type_Declaration(N : Node_Id);
+   procedure Analyze_Formal_Subprogram_Declaration  (N : Node_Id);
+   procedure A

[PATCH v2] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-14 Thread Philipp Tomsich
This patch adds support for Ampere-1A CPU:
 - recognize the name of the core and provide detection for -mcpu=native,
 - updated extra_costs,
 - adds a new fusion pair for (A+B+1 and A-B-1).

Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- break line in fusion matcher to stay below 80 characters
- rename fusion pair addsub_2reg_const1
- document 'ampere1a' in invoke.texi

 gcc/config/aarch64/aarch64-cores.def|   1 +
 gcc/config/aarch64/aarch64-cost-tables.h| 107 
 gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
 gcc/config/aarch64/aarch64-tune.md  |   2 +-
 gcc/config/aarch64/aarch64.cc   |  64 
 gcc/doc/invoke.texi |   2 +-
 6 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index d2671778928..aead587cec1 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 
 /* Ampere Computing ('\xC0') cores. */
 AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
+AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
MEMTAG), ampere1a, 0xC0, 0xac4, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 760d7b30368..48522606fbe 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -775,4 +775,111 @@ const struct cpu_cost_table ampere1_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1a_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  COSTS_N_INSNS (3),   /* flag_setting.  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (19)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (35)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (4), /* load.  */
+COSTS_N_INSNS (4), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (5), /* loadf.  */
+COSTS_N_INSNS (5), /* loadd.  */
+COSTS_N_INSNS (5), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2), /* storef.  */
+COSTS_N_INSNS (2), /* stored.  */
+COSTS_N_INSNS (2), /* store_unaligned.  */
+   

Re: [PATCH v2] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-14 Thread Philipp Tomsich
Applied to master as v2 with the requested changes (and the change to add
"ampere1a" in invoke.texi). Thanks!

Philipp.

On Mon, 14 Nov 2022 at 14:53, Philipp Tomsich  wrote:
>
> This patch adds support for Ampere-1A CPU:
>  - recognize the name of the core and provide detection for -mcpu=native,
>  - updated extra_costs,
>  - adds a new fusion pair for (A+B+1 and A-B-1).
>
> Ampere-1A and Ampere-1 have more timing difference than the extra
> costs indicate, but these don't propagate through to the headline
> items in our extra costs (e.g. the change in latency for scalar sqrt
> doesn't have a corresponding table entry).
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
> * config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
> * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
> Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
> registers and then +1/-1).
> * config/aarch64/aarch64-tune.md: Regenerate.
> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
> idiom-matcher for the new fusion pair.
> * doc/invoke.texi: Add ampere1a.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
> Changes in v2:
> - break line in fusion matcher to stay below 80 characters
> - rename fusion pair addsub_2reg_const1
> - document 'ampere1a' in invoke.texi
>
>  gcc/config/aarch64/aarch64-cores.def|   1 +
>  gcc/config/aarch64/aarch64-cost-tables.h| 107 
>  gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
>  gcc/config/aarch64/aarch64-tune.md  |   2 +-
>  gcc/config/aarch64/aarch64.cc   |  64 
>  gcc/doc/invoke.texi |   2 +-
>  6 files changed, 175 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index d2671778928..aead587cec1 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  
> V8A,  (CRC, CRYPTO), thu
>
>  /* Ampere Computing ('\xC0') cores. */
>  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
> ampere1, 0xC0, 0xac3, -1)
> +AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
> MEMTAG), ampere1a, 0xC0, 0xac4, -1)
>  /* Do not swap around "emag" and "xgene1",
> this order is required to handle variant correctly. */
>  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
> 0x50, 0x000, 3)
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> b/gcc/config/aarch64/aarch64-cost-tables.h
> index 760d7b30368..48522606fbe 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -775,4 +775,111 @@ const struct cpu_cost_table ampere1_extra_costs =
>}
>  };
>
> +const struct cpu_cost_table ampere1a_extra_costs =
> +{
> +  /* ALU */
> +  {
> +0, /* arith.  */
> +0, /* logical.  */
> +0, /* shift.  */
> +COSTS_N_INSNS (1), /* shift_reg.  */
> +0, /* arith_shift.  */
> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> +0, /* log_shift.  */
> +COSTS_N_INSNS (1), /* log_shift_reg.  */
> +0, /* extend.  */
> +COSTS_N_INSNS (1), /* extend_arith.  */
> +0, /* bfi.  */
> +0, /* bfx.  */
> +0, /* clz.  */
> +0, /* rev.  */
> +0, /* non_exec.  */
> +true   /* non_exec_costs_exec.  */
> +  },
> +  {
> +/* MULT SImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  COSTS_N_INSNS (3),   /* flag_setting.  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (4),   /* add.  */
> +  COSTS_N_INSNS (4),   /* extend_add.  */
> +  COSTS_N_INSNS (19)   /* idiv.  */
> +},
> +/* MULT DImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  0,   /* flag_setting (N/A).  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (4),   /* add.  */
> +  COSTS_N_INSNS (4),   /* extend_add.  */
> +  COSTS_N_INSNS (35)   /* idiv.  */
> +}
> +  },
> +  /* LD/ST */
> +  {
> +COSTS_N_INSNS (4), /* load.  */
> +COSTS_N_INSNS (4), /* load_sign_extend.  */
> +0, /* ldrd (n/a).  */
> +0, /* ldm_1st.  */
> +0, /* ldm_regs_per_insn_1st.  */
> +0, /* ldm_regs_per_insn_subsequent.  */
> +COSTS_N_INSNS (5), /* loadf.  */
> +COSTS_N_INSNS (5), /* loadd.  */
> +COSTS_N_INSNS (5), /* load_unaligned.  */
> +0, 

[COMMITTED] ada: Fix non-capturing parentheses handling

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

Before this patch, non-capturingly parenthesized expressions with more
than one branch were processed incorrectly when part of a branch
followed by another branch. This patch fixes this by aligning the
handling of non-capturing parentheses with the handling of regular
parentheses.

gcc/ada/

* libgnat/s-regpat.adb
(Parse): Fix handling of non-capturing parentheses.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-regpat.adb | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/libgnat/s-regpat.adb b/gcc/ada/libgnat/s-regpat.adb
index 3290f900544..3e9f880cd4e 100644
--- a/gcc/ada/libgnat/s-regpat.adb
+++ b/gcc/ada/libgnat/s-regpat.adb
@@ -920,18 +920,16 @@ package body System.Regpat is
 if Capturing then
Ender := Emit_Node (CLOSE);
Emit (Character'Val (Par_No));
-   Link_Tail (IP, Ender);
-
 else
-   --  Need to keep looking after the closing parenthesis
-   Ender := Emit_Ptr;
+   Ender := Emit_Node (NOTHING);
 end if;
 
  else
 Ender := Emit_Node (EOP);
-Link_Tail (IP, Ender);
  end if;
 
+ Link_Tail (IP, Ender);
+
  if Have_Branch and then Emit_Ptr <= PM.Size + 1 then
 
 --  Hook the tails of the branches to the closing node
-- 
2.34.1



[COMMITTED] ada: Adjust locations in aspects on generic formal subprograms

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

When instantiating a generic that has formal subprogram parameter with
contracts, e.g.:

  generic
with procedure P with Pre => ..., Post => ...;
  ...

we create a wrapper that executes Pre/Post contracts before/after
calling the actual subprogram. Errors emitted for these contracts
will now have locations of the instance and not just of the generic.

gcc/ada/

* sem_ch12.adb (Build_Subprogram_Wrappers): Adjust slocs of the
copied aspects, just like we do in Build_Class_Wide_Expression for
inherited class-wide contracts.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 03ce5d51a03..72c2eef7061 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -1203,12 +1203,31 @@ package body Sem_Ch12 is
   ---
 
   procedure Build_Subprogram_Wrappers is
+ function Adjust_Aspect_Sloc (N : Node_Id) return Traverse_Result;
+ --  Adjust sloc so that errors located at N will be reported with
+ --  information about the instance and not just about the generic.
+
+ 
+ -- Adjust_Aspect_Sloc --
+ 
+
+ function Adjust_Aspect_Sloc (N : Node_Id) return Traverse_Result is
+ begin
+Adjust_Instantiation_Sloc (N, S_Adjustment);
+return OK;
+ end Adjust_Aspect_Sloc;
+
+ procedure Adjust_Aspect_Slocs is new
+   Traverse_Proc (Adjust_Aspect_Sloc);
+
  Formal : constant Entity_Id :=
Defining_Unit_Name (Specification (Analyzed_Formal));
  Aspect_Spec : Node_Id;
  Decl_Node   : Node_Id;
  Actual_Name : Node_Id;
 
+  --  Start of processing for Build_Subprogram_Wrappers
+
   begin
  --  Create declaration for wrapper subprogram
  --  The actual can be overloaded, in which case it will be
@@ -1247,6 +1266,7 @@ package body Sem_Ch12 is
 
  Aspect_Spec := First (Aspect_Specifications (Decl_Node));
  while Present (Aspect_Spec) loop
+Adjust_Aspect_Slocs (Aspect_Spec);
 Set_Analyzed (Aspect_Spec, False);
 Next (Aspect_Spec);
  end loop;
-- 
2.34.1



Re: [wwwdocs] gcc-13: Mention Intel new ISA and march support.

2022-11-14 Thread Gerald Pfeifer
On Thu, 10 Nov 2022, Haochen Jiang via Gcc-patches wrote:
> +  New ISA extension support for Intel AVX-IFMA was added to GCC.

Here and in the other cases I'd skip "to GCC". This is clear from the
context (this being the GCC release notes :-) and makes it shorter.

Gerald


[COMMITTED] ada: Remove incorrect comments about initialization

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Bob Duff 

Cleanup only; no change in behavior.

This patch removes and rewrites some comments regarding initialization.
These initializions are needed, so there's no need to apologize for
initializing these variables.

Note that -gnatVa is not relevant; reads of uninitialized variables
are wrong, whether or not we get caught.

gcc/ada/

* atree.ads: Remove some comments.
* err_vars.ads: Likewise.
* scans.ads: Likewise.
* sinput.ads: Likewise.
* checks.ads: Likewise. Also add a "???" comment indicating an
obsolete comment that is too difficult to correct at this time.
* sem_attr.adb: Minor comment rewrite.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/atree.ads|  6 ++
 gcc/ada/checks.ads   | 25 ++---
 gcc/ada/err_vars.ads |  7 ---
 gcc/ada/scans.ads| 27 ---
 gcc/ada/sem_attr.adb |  2 +-
 gcc/ada/sinput.ads   |  2 --
 6 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/gcc/ada/atree.ads b/gcc/ada/atree.ads
index 0c809f56435..cc66ab3777c 100644
--- a/gcc/ada/atree.ads
+++ b/gcc/ada/atree.ads
@@ -148,7 +148,6 @@ package Atree is
--  This is a count of errors that are serious enough to stop expansion,
--  and hence to prevent generation of an object file even if the
--  switch -gnatQ is set. Initialized to zero at the start of compilation.
-   --  Initialized for -gnatVa use, see comment above.
 
--  WARNING: There is a matching C declaration of this variable in fe.h
 
@@ -156,12 +155,11 @@ package Atree is
--  Number of errors detected so far. Includes count of serious errors and
--  non-serious errors, so this value is always greater than or equal to the
--  Serious_Errors_Detected value. Initialized to zero at the start of
-   --  compilation. Initialized for -gnatVa use, see comment above.
+   --  compilation.
 
Warnings_Detected : Nat := 0;
--  Number of warnings detected. Initialized to zero at the start of
-   --  compilation. Initialized for -gnatVa use, see comment above. This
-   --  count includes the count of style and info messages.
+   --  compilation. This count includes the count of style and info messages.
 
Warning_Info_Messages : Nat := 0;
--  Number of info messages generated as warnings. Info messages are never
diff --git a/gcc/ada/checks.ads b/gcc/ada/checks.ads
index 48678cd01df..a7d05a3fa39 100644
--- a/gcc/ada/checks.ads
+++ b/gcc/ada/checks.ads
@@ -776,12 +776,14 @@ package Checks is
--   itself lead to erroneous or unpredictable execution, or to
--   other objects becoming abnormal.
 
-   --  We quote the rules in full here since they are quite delicate. Most
-   --  of the time, we can just compute away with wrong values, and get a
-   --  possibly wrong result, which is well within the range of allowed
-   --  implementation defined behavior. The two tricky cases are subscripted
-   --  array assignments, where we don't want to do wild stores, and case
-   --  statements where we don't want to do wild jumps.
+   --  We quote the rules in full here since they are quite delicate.
+   --  (???The rules quoted here are obsolete; see the GNAT User's Guide for a
+   --  description of all the -gnatV switches.) Most of the time, we can just
+   --  compute away with wrong values, and get a possibly wrong result, which
+   --  is well within the range of allowed implementation defined behavior. The
+   --  two tricky cases are subscripted array assignments, where we don't want
+   --  to do wild stores, and case statements where we don't want to do wild
+   --  jumps.
 
--  In GNAT, we control validity checking with a switch -gnatV that can take
--  three parameters, n/d/f for None/Default/Full. These modes have the
@@ -799,15 +801,8 @@ package Checks is
--alternatives will be executed. Wild jumps cannot result even
--in this mode, since we always do a range check
 
-   --For subscripted array assignments, wild stores will result in
-   --the expected manner when addresses are calculated using values
-   --of subscripts that are out of range.
-
-   --  It could perhaps be argued that this mode is still conformant with
-   --  the letter of the RM, since implementation defined is a rather
-   --  broad category, but certainly it is not in the spirit of the
-   --  RM requirement, since wild stores certainly seem to be a case of
-   --  erroneous behavior.
+   --For subscripted array assignments, wild stores can result in
+   --overwriting arbitrary memory locations.
 
--Default (default standard RM-compatible validity checking)
 
diff --git a/gcc/ada/err_vars.ads b/gcc/ada/err_vars.ads
index 79d5f319f59..66c4bb09b4c 100644
--- a/gcc/ada/err_vars.ads
+++ b/gcc/ada/err_vars.ads
@@ -32,12 +32,6 @@ with Uintp; use Uintp;
 
 package Err_Vars is
 
-   --  All of these variables 

[COMMITTED] ada: Flag unsupported dispatching constructor calls

2022-11-14 Thread Marc Poulhiès via Gcc-patches
From: Javier Miranda 

gcc/ada/

* exp_intr.adb
(Expand_Dispatching_Constructor_Call): Improve warning message.
* freeze.adb
(Check_No_Parts_Violations): Improve error message.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_intr.adb | 9 ++---
 gcc/ada/freeze.adb   | 3 +++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/exp_intr.adb b/gcc/ada/exp_intr.adb
index cb9b5be1090..d18ed69eeae 100644
--- a/gcc/ada/exp_intr.adb
+++ b/gcc/ada/exp_intr.adb
@@ -315,9 +315,12 @@ package body Exp_Intr is
 Error_Msg_N
   ("unsupported dispatching constructor call if the type "
& "of the built object has task components??", N);
-Error_Msg_N
-  ("\work around this problem by replacing task components "
-   & "with access-to-task-type components??", N);
+
+Error_Msg_Sloc := Sloc (Root_Type (Etype (Entity (Name (N);
+Error_Msg_NE
+  ("\work around this by adding ''with no_task_parts'' to "
+   & "the declaration of the root type& defined#???",
+   N, Root_Type (Etype (Entity (Name (N);
  end if;
   end if;
 
diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 032c73d3dfb..7f78b4315a8 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -3188,6 +3188,9 @@ package body Freeze is
if Has_Task (Typ) then
   Error_Msg_N
 ("aspect % applied to task type &", Typ);
+  Error_Msg_N
+("\replace task components with access-to-task-type "
+ & "components??", Typ);
end if;
 
 else
-- 
2.34.1



Re: [PATCH] 0/19 modula-2 front end patches overview

2022-11-14 Thread Gaius Mulley via Gcc-patches
Richard Biener  writes:

> On Mon, Oct 10, 2022 at 5:32 PM Gaius Mulley via Gcc-patches
>  wrote:
>>
>>
>> Here are the latest modula-2 front end patches for review.
>> The status of the patches and their contents are also contained at:
>>
>>https://splendidisolation.ddns.net/public/modula2/patchsummary.html
>>
>> where they are also broken down into topic groups.
>>
>> In summary the high level changes from the last posting are:
>>
>>* the driver code has been completely rewritten and it is now based
>>  on the fortran driver and the c++ driver.  The gm2 driver adds
>>  paths/libraries depending upon dialect chosen.
>>* the linking mechanism has been completely redesigned
>>  (As per
>>  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595725.html).
>>  Objects can be linked via g++.  New linking options
>>  are available to allow linking with/without a scaffold.
>>* gcc/m2/Make-lang.in (rewritten).
>>* gm2tools/ removed and any required functionality with the
>>  new linking mechanism has been moved into cc1gm2.
>>
>> The gm2 testsuite has been extended to test project linking
>> options.
>
> Thanks for these improvements!

the front end feels alot cleaner now!

> The frontend specific parts are a lot to digest and I think it isn't
> too important to wait for the unlikely event that all of that gets a
> review.  I'm trusting you here as a maintainer and also based on the
> use of the frontend out in the wild.  I've CCed the other two RMs for
> their opinion on this.
>
> I hope to get to the driver parts that I reviewed the last time, I'd
> appreciate a look on the runtime library setup by somebody else.
>
> I think it's important to get this (and the rust frontend) into the tree 
> before
> Christmas holidays so it gets exposed to the more weird treatment of some
> of our users (build wise).  This way we can develop either a negative or
> positive list of host/targets where to disable the new frontends.

great news thanks - yes this makes sense,

regards,
Gaius


[PATCH] libstdc++: Fix up for extended floating point types [PR107649]

2022-11-14 Thread Jakub Jelinek via Gcc-patches
Hi!

As filed by Jonathan in the PR, I've screwed up the requires syntax
in the extended floating point specialization:
-requires(__complex_type<_Tp>::type)
+requires requires { typename __complex_type<_Tp>::type; }
and doing this change resulted in lots of errors because __complex_whatever
overfloads from extended floating point types were declared after the
templates which used them.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
I've tested that with _GLIBCXX_HAVE_FLOAT128_MATH not being defined
while __STDCPP_FLOAT128_T__ defined one can still use
std::complex for basic arithmetic etc., just one can't
expect std::sin etc. to work in that case (because we don't have any
implementation).

Ok for trunk?

2022-11-14  Jakub Jelinek  
Jonathan Wakely  

PR libstdc++/107649
* include/std/complex (__complex_abs, __complex_arg, __complex_cos,
__complex_cosh, __complex_exp, __complex_log, __complex_sin,
__complex_sinh, __complex_sqrt, __complex_tan, __complex_tanh,
__complex_pow): Move __complex__ _Float{16,32,64,128} and
__complex__ decltype(0.0bf16) overloads earlier in the file.
(complex): Fix up requires on the partial specialization for extended
float types.
(__complex_acos, __complex_asin, __complex_atan, __complex_acosh,
__complex_asinh, __complex_atanh): Move
__complex__ _Float{16,32,64,128} and __complex__ decltype(0.0bf16)
overloads earlier in the file.

--- libstdc++-v3/include/std/complex.jj 2022-10-31 20:15:49.756552019 +0100
+++ libstdc++-v3/include/std/complex2022-11-12 01:56:42.970560123 +0100
@@ -598,6 +598,264 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __z.imag(); }
 #endif
 
+#if _GLIBCXX_USE_C99_COMPLEX
+#if defined(__STDCPP_FLOAT16_T__) && defined(_GLIBCXX_FLOAT_IS_IEEE_BINARY32)
+  inline _Float16
+  __complex_abs(__complex__ _Float16 __z)
+  { return _Float16(__builtin_cabsf(__z)); }
+
+  inline _Float16
+  __complex_arg(__complex__ _Float16 __z)
+  { return _Float16(__builtin_cargf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_cos(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_ccosf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_cosh(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_ccoshf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_exp(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_cexpf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_log(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_clogf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_sin(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_csinf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_sinh(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_csinhf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_sqrt(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_csqrtf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_tan(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_ctanf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_tanh(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_ctanhf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_pow(__complex__ _Float16 __x, __complex__ _Float16 __y)
+  { return static_cast<__complex__ _Float16>(__builtin_cpowf(__x, __y)); }
+#endif
+
+#if defined(__STDCPP_FLOAT32_T__) && defined(_GLIBCXX_FLOAT_IS_IEEE_BINARY32)
+  inline _Float32
+  __complex_abs(__complex__ _Float32 __z) { return __builtin_cabsf(__z); }
+
+  inline _Float32
+  __complex_arg(__complex__ _Float32 __z) { return __builtin_cargf(__z); }
+
+  inline __complex__ _Float32
+  __complex_cos(__complex__ _Float32 __z) { return __builtin_ccosf(__z); }
+
+  inline __complex__ _Float32
+  __complex_cosh(__complex__ _Float32 __z) { return __builtin_ccoshf(__z); }
+
+  inline __complex__ _Float32
+  __complex_exp(__complex__ _Float32 __z) { return __builtin_cexpf(__z); }
+
+  inline __complex__ _Float32
+  __complex_log(__complex__ _Float32 __z) { return __builtin_clogf(__z); }
+
+  inline __complex__ _Float32
+  __complex_sin(__complex__ _Float32 __z) { return __builtin_csinf(__z); }
+
+  inline __complex__ _Float32
+  __complex_sinh(__complex__ _Float32 __z) { return __builtin_csinhf(__z); }
+
+  inline __complex__ _Float32
+  __complex_sqrt(__complex__ _Float32 __z) { return __builtin_csqrtf(__z); }
+
+  inline __complex__ _Float32
+  __complex_tan(__complex__ _Float32 __z) { return __builtin_ctanf(__z); }
+
+  inline __complex__ _Float32
+  __complex_tanh(__complex__ _Float32 __z) { return __builtin_ctanhf(__z); }
+
+  inline __complex__ _Float32
+  __complex_pow(__complex__ _Float32 __x, __complex__ _Float32

Re: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire semantics

2022-11-14 Thread Andre Vieira (lists) via Gcc-patches

Here is the latest version and an updated ChangeLog:

2022-11-14  Andre Vieira  
   Kyrylo Tkachov 

gcc/ChangeLog:

    * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): New Macro.
    (TARGET_RCPC): New Macro.
    * config/aarch64/atomics.md (atomic_load): Change into an 
expand.

    (aarch64_atomic_load_rcpc): New define_insn for ldapr.
    (aarch64_atomic_load): Rename of old define_insn for ldar.
    * config/aarch64/iterators.md (UNSPEC_LDAP): New unspec enum value.
    * doc/invoke.texi (rcpc): Ammend documentation to mention the 
effects

    on code generation.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/ldapr.c: New test.

On 10/11/2022 15:55, Kyrylo Tkachov wrote:

Hi Andre,


-Original Message-
From: Andre Vieira (lists) 
Sent: Thursday, November 10, 2022 11:17 AM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Richard Sandiford

Subject: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire
semantics

Hello,

This patch enables the use of LDAPR for load-acquire semantics. After
some internal investigation based on the work published by Podkopaev et
al. (https://dl.acm.org/doi/10.1145/3290382) we can confirm that using
LDAPR for the C++ load-acquire semantics is a correct relaxation.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

OK for trunk?

Thanks for the patch


2022-11-09  Andre Vieira  
      Kyrylo Tkachov  

gcc/ChangeLog:

      * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): New Macro.
      (TARGET_RCPC): New Macro.
      * config/aarch64/atomics.md (atomic_load): Change into
      an expand.
      (aarch64_atomic_load_rcpc): New define_insn for ldapr.
      (aarch64_atomic_load): Rename of old define_insn for ldar.
      * config/aarch64/iterators.md (UNSPEC_LDAP): New unspec enum
value.
      *
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
options.rst
      (rcpc): Ammend documentation to mention the effects on code
generation.

gcc/testsuite/ChangeLog:

      * gcc.target/aarch64/ldapr.c: New test.
      * lib/target-supports.exp (add_options_for_aarch64_rcpc): New
options procedure.
      (check_effective_target_aarch64_rcpc_ok_nocache): New
check-effective-target.
      (check_effective_target_aarch64_rcpc_ok): Likewise.

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 
bc95f6d9d15f190a3e33704b4def2860d5f339bd..801a62bf2ba432f35ae1931beb8c4405b77b36c3
 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -657,7 +657,42 @@
}
  )
  
-(define_insn "atomic_load"

+(define_expand "atomic_load"
+  [(match_operand:ALLI 0 "register_operand" "=r")
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "Q")
+   (match_operand:SI   2 "const_int_operand")]
+  ""
+  {
+/* If TARGET_RCPC and this is an ACQUIRE load, then expand to a pattern
+   using UNSPECV_LDAP.  */
+enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+if (TARGET_RCPC
+   && (is_mm_acquire (model)
+   || is_mm_acq_rel (model)))
+{
+  emit_insn (gen_aarch64_atomic_load_rcpc (operands[0], operands[1],
+operands[2]));
+}
+else
+{
+  emit_insn (gen_aarch64_atomic_load (operands[0], operands[1],
+   operands[2]));
+}

No braces needed for single-statement bodies.

diff --git 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
index 
c2b23a6ee97ef2b7c74119f22c1d3e3d85385f4d..25d609238db7d45845dbc446ac21d12dddcf8eac
 100644
--- 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
+++ 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
@@ -437,9 +437,9 @@ the following and their inverses no :samp:`{feature}` :
floating-point instructions. This option is enabled by default for 
:option:`-march=armv8.4-a`. Use of this option with architectures prior to 
Armv8.2-A is not supported.
  
  :samp:`rcpc`

-  Enable the RcPc extension.  This does not change code generation from GCC,
-  but is passed on to the assembler, enabling inline asm statements to use
-  instructions from the RcPc extension.
+  Enable the RcPc extension.  This enables the use of the LDAPR instructions 
for
+  load-acquire atomic semantics, and passes it on to the assembler, enabling
+  inline asm statements to use instructions from the RcPc extension.

Let's capitalize this consistently throughout the patch as "RCpc".

diff --git a/gcc/testsuite/gcc.target/aarch64/ldapr.c 
b/gcc/testsuite/gcc.target/aarch64/ldapr.c
new file mode 100644
index 
..c36edfcd79a9ee41434ab09ac47d257a692a8606
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldapr.c

Re: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

2022-11-14 Thread Andre Vieira (lists) via Gcc-patches
Updated version of the patch to account for the testsuite changes in the 
first patch.


On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This patch adds support for the widening LDAPR instructions.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

OK for trunk?

2022-11-09  Andre Vieira  
    Kyrylo Tkachov  

gcc/ChangeLog:

    * config/aarch64/atomics.md 
(*aarch64_atomic_load_rcpc_zext): New pattern.

    (*aarch64_atomic_load_rcpc_zext): Likewise.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/ldapr-ext.c: New test.diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 
dc5f52ee8a4b349c0d8466a16196f83604893cbb..9670bef7d8cb2b32c5146536d806a7e8bdffb2e3
 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -704,6 +704,28 @@
   }
 )
 
+(define_insn "*aarch64_atomic_load_rcpc_zext"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(zero_extend:GPI
+  (unspec_volatile:ALLX
+[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
+ (match_operand:SI 2 "const_int_operand")] ;; model
+   UNSPECV_LDAP)))]
+  "TARGET_RCPC"
+  "ldapr\t%0, %1"
+)
+
+(define_insn "*aarch64_atomic_load_rcpc_sext"
+  [(set (match_operand:GPI  0 "register_operand" "=r")
+(sign_extend:GPI
+  (unspec_volatile:ALLX
+[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
+ (match_operand:SI 2 "const_int_operand")] ;; model
+   UNSPECV_LDAP)))]
+  "TARGET_RCPC"
+  "ldaprs\t%0, %1"
+)
+
 (define_insn "atomic_store"
   [(set (match_operand:ALLI 0 "aarch64_rcpc_memory_operand" "=Q,Ust")
 (unspec_volatile:ALLI
diff --git a/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c 
b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
new file mode 100644
index 
..aed27e06235b1d266decf11745dacf94cc59e76d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -std=c99" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+#include 
+
+#pragma GCC target "+rcpc"
+
+atomic_ullong u64;
+atomic_llong s64;
+atomic_uint u32;
+atomic_int s32;
+atomic_ushort u16;
+atomic_short s16;
+atomic_uchar u8;
+atomic_schar s8;
+
+#define TEST(name, ldsize, rettype)\
+rettype\
+test_##name (void) \
+{  \
+  return atomic_load_explicit (&ldsize, memory_order_acquire); \
+}
+
+/*
+**test_u8_u64:
+**...
+** ldaprb  x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u8_u64, u8, unsigned long long)
+
+/*
+**test_s8_s64:
+**...
+** ldaprsb x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s8_s64, s8, long long)
+
+/*
+**test_u16_u64:
+**...
+** ldaprh  x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u16_u64, u16, unsigned long long)
+
+/*
+**test_s16_s64:
+**...
+** ldaprsh x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s16_s64, s16, long long)
+
+/*
+**test_u8_u32:
+**...
+** ldaprb  w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u8_u32, u8, unsigned)
+
+/*
+**test_s8_s32:
+**...
+** ldaprsb w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s8_s32, s8, int)
+
+/*
+**test_u16_u32:
+**...
+** ldaprh  w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u16_u32, u16, unsigned)
+
+/*
+**test_s16_s32:
+**...
+** ldaprsh w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s16_s32, s16, int)


RE: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire semantics

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Monday, November 14, 2022 2:09 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> 
> Subject: Re: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire
> semantics
> 
> Here is the latest version and an updated ChangeLog:
> 
> 2022-11-14  Andre Vieira  
>     Kyrylo Tkachov 
> 
> gcc/ChangeLog:
> 
>      * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): New Macro.
>      (TARGET_RCPC): New Macro.
>      * config/aarch64/atomics.md (atomic_load): Change into an
> expand.
>      (aarch64_atomic_load_rcpc): New define_insn for ldapr.
>      (aarch64_atomic_load): Rename of old define_insn for ldar.
>      * config/aarch64/iterators.md (UNSPEC_LDAP): New unspec enum
> value.
>      * doc/invoke.texi (rcpc): Ammend documentation to mention the
> effects
>      on code generation.
> 
> gcc/testsuite/ChangeLog:
> 
>      * gcc.target/aarch64/ldapr.c: New test.

I don't see this test in the patch?
Thanks,
Kyrill

> 
> On 10/11/2022 15:55, Kyrylo Tkachov wrote:
> > Hi Andre,
> >
> >> -Original Message-
> >> From: Andre Vieira (lists) 
> >> Sent: Thursday, November 10, 2022 11:17 AM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Kyrylo Tkachov ; Richard Earnshaw
> >> ; Richard Sandiford
> >> 
> >> Subject: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire
> >> semantics
> >>
> >> Hello,
> >>
> >> This patch enables the use of LDAPR for load-acquire semantics. After
> >> some internal investigation based on the work published by Podkopaev et
> >> al. (https://dl.acm.org/doi/10.1145/3290382) we can confirm that using
> >> LDAPR for the C++ load-acquire semantics is a correct relaxation.
> >>
> >> Bootstrapped and regression tested on aarch64-none-linux-gnu.
> >>
> >> OK for trunk?
> > Thanks for the patch
> >
> >> 2022-11-09  Andre Vieira  
> >>       Kyrylo Tkachov  
> >>
> >> gcc/ChangeLog:
> >>
> >>       * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): New Macro.
> >>       (TARGET_RCPC): New Macro.
> >>       * config/aarch64/atomics.md (atomic_load): Change into
> >>       an expand.
> >>       (aarch64_atomic_load_rcpc): New define_insn for ldapr.
> >>       (aarch64_atomic_load): Rename of old define_insn for ldar.
> >>       * config/aarch64/iterators.md (UNSPEC_LDAP): New unspec enum
> >> value.
> >>       *
> >> doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
> >> options.rst
> >>       (rcpc): Ammend documentation to mention the effects on code
> >> generation.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>       * gcc.target/aarch64/ldapr.c: New test.
> >>       * lib/target-supports.exp (add_options_for_aarch64_rcpc): New
> >> options procedure.
> >>       (check_effective_target_aarch64_rcpc_ok_nocache): New
> >> check-effective-target.
> >>       (check_effective_target_aarch64_rcpc_ok): Likewise.
> > diff --git a/gcc/config/aarch64/atomics.md
> b/gcc/config/aarch64/atomics.md
> > index
> bc95f6d9d15f190a3e33704b4def2860d5f339bd..801a62bf2ba432f35ae1931b
> eb8c4405b77b36c3 100644
> > --- a/gcc/config/aarch64/atomics.md
> > +++ b/gcc/config/aarch64/atomics.md
> > @@ -657,7 +657,42 @@
> > }
> >   )
> >
> > -(define_insn "atomic_load"
> > +(define_expand "atomic_load"
> > +  [(match_operand:ALLI 0 "register_operand" "=r")
> > +   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "Q")
> > +   (match_operand:SI   2 "const_int_operand")]
> > +  ""
> > +  {
> > +/* If TARGET_RCPC and this is an ACQUIRE load, then expand to a
> pattern
> > +   using UNSPECV_LDAP.  */
> > +enum memmodel model = memmodel_from_int (INTVAL
> (operands[2]));
> > +if (TARGET_RCPC
> > +   && (is_mm_acquire (model)
> > +   || is_mm_acq_rel (model)))
> > +{
> > +  emit_insn (gen_aarch64_atomic_load_rcpc (operands[0],
> operands[1],
> > +operands[2]));
> > +}
> > +else
> > +{
> > +  emit_insn (gen_aarch64_atomic_load (operands[0],
> operands[1],
> > +   operands[2]));
> > +}
> >
> > No braces needed for single-statement bodies.
> >
> > diff --git a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-options/machine-
> dependent-options/aarch64-options.rst
> > index
> c2b23a6ee97ef2b7c74119f22c1d3e3d85385f4d..25d609238db7d45845dbc44
> 6ac21d12dddcf8eac 100644
> > --- a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> > +++ b/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> > @@ -437,9 +437,9 @@ the following and their inverses no
> :samp:`{feature}` :
> > floating-point instructions. This option is enabled by default for 
> > :option:`-
> march=armv8.4-a`. Use of this option with architectures prior to Armv8.2-A is
> n

RE: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Monday, November 14, 2022 2:10 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Richard Sandiford
> 
> Subject: Re: [PATCH 2/2] aarch64: Add support for widening LDAPR
> instructions
> 
> Updated version of the patch to account for the testsuite changes in the
> first patch.
> 
> On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote:
> > Hi,
> >
> > This patch adds support for the widening LDAPR instructions.
> >
> > Bootstrapped and regression tested on aarch64-none-linux-gnu.
> >
> > OK for trunk?

Ok once the first patch is approved.
Thanks,
Kyrill

> >
> > 2022-11-09  Andre Vieira  
> >     Kyrylo Tkachov  
> >
> > gcc/ChangeLog:
> >
> >     * config/aarch64/atomics.md
> > (*aarch64_atomic_load_rcpc_zext): New pattern.
> >     (*aarch64_atomic_load_rcpc_zext): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.target/aarch64/ldapr-ext.c: New test.


Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 13:20, Arsen Arsenović  wrote:
>
> Hi,
>
> Jonathan Wakely  writes:
> >> This looks simple, and more consistent with what we already do. Does
> >> it solve your issue?
>
> It does work; though, if I was more daring I'd have said that it's fine
> without checking, too, since it does the same operation on the same
> directory ;)
>
> Was the omission of the mkdir $(DESTDIR)$(toolexeclibdir) intentional?

It's the first thing the recipe does:

install-data-local: gdb.py
@$(mkdir_p) $(DESTDIR)$(toolexeclibdir)

That's why I'm suggesting to do the same thing for the debug dir.


> I only see TELD/debug in your revision of the patch.  Chances are, it
> gets created elsewhere (my test was just install-target-libstdc++-v3, so
> not even the full install), but it might be worth being conservative
> about it.
>
> Thanks,
> --
> Arsen Arsenović



Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 14:13, Jonathan Wakely  wrote:
>
> On Mon, 14 Nov 2022 at 13:20, Arsen Arsenović  wrote:
> >
> > Hi,
> >
> > Jonathan Wakely  writes:
> > >> This looks simple, and more consistent with what we already do. Does
> > >> it solve your issue?
> >
> > It does work; though, if I was more daring I'd have said that it's fine
> > without checking, too, since it does the same operation on the same
> > directory ;)
> >
> > Was the omission of the mkdir $(DESTDIR)$(toolexeclibdir) intentional?
>
> It's the first thing the recipe does:
>
> install-data-local: gdb.py
> @$(mkdir_p) $(DESTDIR)$(toolexeclibdir)
>
> That's why I'm suggesting to do the same thing for the debug dir.

This presumably means it has the problems that mkinstalldirs is
supposed to solve, but is that only relevant for Solaris 8, i.e. not
relevant?



Re: [PATCH 0/8] middle-end: Ensure at_stmt is defined before an early exit

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/11/22 06:39, Andrew Carlotti via Gcc-patches wrote:

This prevents a null dereference error when outputing debug information
following an early exit from number_of_iterations_exit_assumptions.

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (number_of_iterations_exit_assumptions):
Move at_stmt assignment.


OK

jeff




Re: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire semantics

2022-11-14 Thread Andre Vieira (lists) via Gcc-patches


On 14/11/2022 14:12, Kyrylo Tkachov wrote:



-Original Message-
From: Andre Vieira (lists) 
Sent: Monday, November 14, 2022 2:09 PM
To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
Cc: Richard Earnshaw ; Richard Sandiford

Subject: Re: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire
semantics

Here is the latest version and an updated ChangeLog:

2022-11-14  Andre Vieira  
     Kyrylo Tkachov 

gcc/ChangeLog:

      * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): New Macro.
      (TARGET_RCPC): New Macro.
      * config/aarch64/atomics.md (atomic_load): Change into an
expand.
      (aarch64_atomic_load_rcpc): New define_insn for ldapr.
      (aarch64_atomic_load): Rename of old define_insn for ldar.
      * config/aarch64/iterators.md (UNSPEC_LDAP): New unspec enum
value.
      * doc/invoke.texi (rcpc): Ammend documentation to mention the
effects
      on code generation.

gcc/testsuite/ChangeLog:

      * gcc.target/aarch64/ldapr.c: New test.

I don't see this test in the patch?
Thanks,
Kyrill


Oops... here it is.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
e60f9bce023b2cd5e7233ee9b8c61fc93c1494c2..51a8aa02a5850d5c79255dbf7e0764ffdec73ccd
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -221,6 +221,7 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_V9_3A  (aarch64_isa_flags & AARCH64_FL_V9_3A)
 #define AARCH64_ISA_MOPS  (aarch64_isa_flags & AARCH64_FL_MOPS)
 #define AARCH64_ISA_LS64  (aarch64_isa_flags & AARCH64_FL_LS64)
+#define AARCH64_ISA_RCPC   (aarch64_isa_flags & AARCH64_FL_RCPC)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO)
@@ -328,6 +329,9 @@ enum class aarch64_feature : unsigned char {
 /* SB instruction is enabled through +sb.  */
 #define TARGET_SB (AARCH64_ISA_SB)
 
+/* RCPC loads from Armv8.3-a.  */
+#define TARGET_RCPC (AARCH64_ISA_RCPC)
+
 /* Apply the workaround for Cortex-A53 erratum 835769.  */
 #define TARGET_FIX_ERR_A53_835769  \
   ((aarch64_fix_a53_err835769 == 2)\
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 
bc95f6d9d15f190a3e33704b4def2860d5f339bd..dc5f52ee8a4b349c0d8466a16196f83604893cbb
 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -657,7 +657,38 @@
   }
 )
 
-(define_insn "atomic_load"
+(define_expand "atomic_load"
+  [(match_operand:ALLI 0 "register_operand" "=r")
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "Q")
+   (match_operand:SI   2 "const_int_operand")]
+  ""
+  {
+/* If TARGET_RCPC and this is an ACQUIRE load, then expand to a pattern
+   using UNSPECV_LDAP.  */
+enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+if (TARGET_RCPC
+   && (is_mm_acquire (model)
+   || is_mm_acq_rel (model)))
+  emit_insn (gen_aarch64_atomic_load_rcpc (operands[0], operands[1],
+operands[2]));
+else
+  emit_insn (gen_aarch64_atomic_load (operands[0], operands[1],
+   operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_atomic_load_rcpc"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+(unspec_volatile:ALLI
+  [(match_operand:ALLI 1 "aarch64_sync_memory_operand" "Q")
+   (match_operand:SI 2 "const_int_operand")]   ;; model
+  UNSPECV_LDAP))]
+  "TARGET_RCPC"
+  "ldapr\t%0, %1"
+)
+
+(define_insn "aarch64_atomic_load"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
 (unspec_volatile:ALLI
   [(match_operand:ALLI 1 "aarch64_sync_memory_operand" "Q")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
a8ad4e5ff215ade06c3ca13a24ef18d259afcb6c..d8c2f9d6c32d6f188d584c2e9d8fb36511624de6
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -988,6 +988,7 @@
 UNSPECV_LX ; Represent a load-exclusive.
 UNSPECV_SX ; Represent a store-exclusive.
 UNSPECV_LDA; Represent an atomic load or 
load-acquire.
+UNSPECV_LDAP   ; Represent an atomic acquire load with RCpc 
semantics.
 UNSPECV_STL; Represent an atomic store or 
store-release.
 UNSPECV_ATOMIC_CMPSW   ; Represent an atomic compare swap.
 UNSPECV_ATOMIC_EXCHG   ; Represent an atomic exchange.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
449df59729884aa3292559fffcfbbcc99182c13a..5a32d7b6e94502c57e6438cfd2563bc5631690e1
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20168,9 +20168,9 @@ Enable FP16 fmla extension.  This also enables FP16 
extensions and
 floating-point instructions. This option is enabled by default for 
@option{-march=armv8.4-a}. Use of this option with architectures 

Re: [PATCH 2/8] middle-end: Remove prototype for number_of_iterations_popcount

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/11/22 06:46, Andrew Carlotti via Gcc-patches wrote:

gcc/ChangeLog:

* tree-ssa-loop-niter.c (ssa_defined_by_minus_one_stmt_p): Move
(number_of_iterations_popcount): Move, and remove separate prototype.


OK.

jeff




RE: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire semantics

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Monday, November 14, 2022 2:24 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> 
> Subject: Re: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-acquire
> semantics
> 
> 
> On 14/11/2022 14:12, Kyrylo Tkachov wrote:
> >
> >> -Original Message-
> >> From: Andre Vieira (lists) 
> >> Sent: Monday, November 14, 2022 2:09 PM
> >> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> >> Cc: Richard Earnshaw ; Richard Sandiford
> >> 
> >> Subject: Re: [PATCH 1/2] aarch64: Enable the use of LDAPR for load-
> acquire
> >> semantics
> >>
> >> Here is the latest version and an updated ChangeLog:
> >>
> >> 2022-11-14  Andre Vieira  
> >>      Kyrylo Tkachov 
> >>
> >> gcc/ChangeLog:
> >>
> >>       * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): New Macro.
> >>       (TARGET_RCPC): New Macro.
> >>       * config/aarch64/atomics.md (atomic_load): Change into an
> >> expand.
> >>       (aarch64_atomic_load_rcpc): New define_insn for ldapr.
> >>       (aarch64_atomic_load): Rename of old define_insn for ldar.
> >>       * config/aarch64/iterators.md (UNSPEC_LDAP): New unspec enum
> >> value.
> >>       * doc/invoke.texi (rcpc): Ammend documentation to mention the
> >> effects
> >>       on code generation.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>       * gcc.target/aarch64/ldapr.c: New test.
> > I don't see this test in the patch?
> > Thanks,
> > Kyrill
> >
> Oops... here it is.

Ok.
Thanks,
Kyrill



Re: [PATCH] [range-ops] Implement sqrt.

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 00:45, Aldy Hernandez via Gcc-patches wrote:

On Sun, Nov 13, 2022 at 9:39 PM Jakub Jelinek  wrote:

On Sun, Nov 13, 2022 at 09:05:53PM +0100, Aldy Hernandez wrote:

It seems SQRT is relatively straightforward, and it's something Jakub
wanted for this release.

Jakub, what do you think?

p.s. Too tired to think about op1_range.

That would be multiplication of the same value twice, i.e.
fop_mult with trio that has op1_op2 () == VREL_EQ?
But see below, as sqrt won't be always precise, we need to account for
some errors.


gcc/ChangeLog:

   * gimple-range-op.cc (class cfn_sqrt): New.
   (gimple_range_op_handler::maybe_builtin_call): Add cases for sqrt.

Yes, I'd like to see SQRT support in.
The only thing I'm worried is that unlike {+,-,*,/}, negation etc. typically
implemented in hardware or precise soft-float, sqrt is often implemented
in library using multiple floating point arithmetic functions.  And different
implementations have different accuracy.

So, I wonder if we don't need to add a target hook where targets will be
able to provide upper bound on error for floating point functions for
different floating point modes and some way to signal unknown accuracy/can't
be trusted, in which case we would give up or return just the range for
VARYING.
Then, we could write some tests that say in a loop constructs random
floating point values (perhaps sanitized to be non-NAN), calls libm function
and the same mpfr one and return maximum error in ulps.
And then record those, initially for glibc and most common targets and
gradually maintainers could supply more.

If we add an infrastructure for that within a few days, then we could start
filling the details.  One would hope that sqrt has < 10ulps accuracy if not
already the 0.5ulp one, but for various other functions I think it can be

I don't know what would possess me to think that sqrt would be easy
;-).  Sure, I can sink a few days to flesh this out if you're willing
to review it.


To Jakub's concern.  I thought sqrt was treated like +-/* WRT accuracy 
requirements by IEEE.   ie, for any input there is a well defined answer 
for a confirming IEEE implementation.   In fact, getting to that .5ulp 
bound is a significant amount of the  cost for a NR or Goldschmidt (or 
hybrid) implementation if you've got a reasonable (say 12 or 14 bit) 
estimator and high performance fmacs.



Jeff




Re: [PATCH] [range-ops] Implement sqrt.

2022-11-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 14, 2022 at 07:30:18AM -0700, Jeff Law via Gcc-patches wrote:
> To Jakub's concern.  I thought sqrt was treated like +-/* WRT accuracy
> requirements by IEEE.   ie, for any input there is a well defined answer for
> a confirming IEEE implementation.   In fact, getting to that .5ulp bound is
> a significant amount of the  cost for a NR or Goldschmidt (or hybrid)
> implementation if you've got a reasonable (say 12 or 14 bit) estimator and
> high performance fmacs.

That might be the case (except for the known libquadmath sqrtq case
PR105101 which fortunately is not a builtin).
But we'll need to ulps infrastructure for other functions anyway and
it would be nice to write a short testcase first that will test
sqrt{,f,l,f32,f64,f128} and can be easily adjusted to test other functions.
I'll try to cook something up tomorrow.

Jakub



[PATCH] GCC13: aarch64: Document new cores

2022-11-14 Thread Philipp Tomsich
Document the new cores added recently:
 - ampere1a
 - cortex-x1c
 - cortex-a715

Signed-off-by: Philipp Tomsich 
---

 htdocs/gcc-13/changes.html | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 0daf921b..b82e198b 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -210,7 +210,16 @@ a work-in-progress.
 
 New Targets and Target Specific Improvements
 
-
+AArch64
+
+  A number of new CPUs are supported through the -mcpu and
+  -mtune options (GCC identifiers in parentheses).
+
+  Ampere-1A (ampere1a).
+  ARM Cortex-X1C (cortex-x1c).
+  ARM Cortex-A715 (cortex-a715).
+
+
 
 AMD Radeon (GCN)
 
-- 
2.34.1



RE: [PATCH][GCC] aarch64: Add support for Cortex-X3 CPU.

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: Friday, November 11, 2022 3:08 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [PATCH][GCC] aarch64: Add support for Cortex-X3 CPU.
> 
> Hi,
> 
> This patch adds support for Cortex-X3 CPU.
> 
> Bootstrapped on aarch64-none-linux-gnu and found no regressions.
> 
> Ok for GCC master?

Ok, but the documentation needs to be rebased as we've moved back to .texi.
Thanks,
Kyrill

> 
> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2022-11-09  Srinath Parvathaneni  
> 
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-X3
> CPU.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
> options.rst:
> Document Cortex-X3 CPU.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index
> 3055da9b268b6b71bc3bd6db721812b387e8dd44..a2062468136bf1c38b941c
> 53868d26dafedda276 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -172,6 +172,8 @@ AARCH64_CORE("cortex-a715",  cortexa715,
> cortexa57, V9A,  (SVE2_BITPERM, MEMTAG,
> 
>  AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM,
> MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
> 
> +AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM,
> MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4e, -1)
> +
>  AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16,
> SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
> 
>  AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16,
> SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
> diff --git a/gcc/config/aarch64/aarch64-tune.md
> b/gcc/config/aarch64/aarch64-tune.md
> index
> 22ec1be5a4c71b930221d2c4f1e62df57df0cadf..74c4384712b202058a58f1da0
> ca28adec97a6b9b 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -
>   "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunder
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa
> 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,co
> rtexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeo
> ntx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,ts
> v110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cort
> exa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa7
> 5cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715
> ,cortexx2,neoversen2,demeter,neoversev2"
> +
>   "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunder
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa
> 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,co
> rtexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeo
> ntx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,ts
> v110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cort
> exa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa7
> 5cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715
> ,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
>   (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-options/machine-
> dependent-options/aarch64-options.rst
> index
> d97515d9e54feaa85a2ead4e9b73f0eb966cb39f..7cc369ef95e510e30873159b
> 8e2130c4f77a57d3 100644
> --- a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> +++ b/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> @@ -258,8 +258,8 @@ These options are defined for AArch64
> implementations:
>:samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,
>:samp:`cortex-a75.cortex-a55`, :samp:`cortex-a76.cortex-a55`,
>:samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x1c`, :samp:`cortex-
> x2`,
> -  :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`cortex-a715`,
> :samp:`ampere1`,
> -  :samp:`native`.
> +  :samp:`cortex-x3`, :samp:`cortex-a510`, :samp:`cortex-a710`,
> +  :samp:`cortex-a715`, :samp:`ampere1`, :samp:`native`.
> 
>The values :samp:`cortex-a57.cortex-a53`, :samp:`cortex-a72.cortex-a53`,
> 

Re: [PATCH] [range-ops] Implement sqrt.

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 07:35, Jakub Jelinek wrote:

On Mon, Nov 14, 2022 at 07:30:18AM -0700, Jeff Law via Gcc-patches wrote:

To Jakub's concern.  I thought sqrt was treated like +-/* WRT accuracy
requirements by IEEE.   ie, for any input there is a well defined answer for
a confirming IEEE implementation.   In fact, getting to that .5ulp bound is
a significant amount of the  cost for a NR or Goldschmidt (or hybrid)
implementation if you've got a reasonable (say 12 or 14 bit) estimator and
high performance fmacs.

That might be the case (except for the known libquadmath sqrtq case
PR105101 which fortunately is not a builtin).
But we'll need to ulps infrastructure for other functions anyway and
it would be nice to write a short testcase first that will test
sqrt{,f,l,f32,f64,f128} and can be easily adjusted to test other functions.
I'll try to cook something up tomorrow.


Agreed we'll need it elsewhere, so no objection to building it out if 
it's not going to delay things for sqrt.



Jeff



RE: [PATCH] GCC13: aarch64: Document new cores

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> Tomsich
> Sent: Monday, November 14, 2022 2:43 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Philipp Tomsich
> 
> Subject: [PATCH] GCC13: aarch64: Document new cores
> 
> Document the new cores added recently:
>  - ampere1a
>  - cortex-x1c
>  - cortex-a715
> 
> Signed-off-by: Philipp Tomsich 
> ---
> 
>  htdocs/gcc-13/changes.html | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 0daf921b..b82e198b 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -210,7 +210,16 @@ a work-in-progress.
>  
>  New Targets and Target Specific Improvements
> 
> -
> +AArch64
> +
> +  A number of new CPUs are supported through the -
> mcpu and
> +  -mtune options (GCC identifiers in parentheses).
> +
> +  Ampere-1A (ampere1a).
> +  ARM Cortex-X1C (cortex-x1c).
> +  ARM Cortex-A715 (cortex-a715).

For the Arm cores can you please capitalize as "Arm Cortex-X1C" and "Arm 
Cortex-A715".
Also, let's add "Arm Cortex-X3" (I just approved the patch from Srinath) and 
"Arm Neoverse V2"
Thanks,
Kyrill

> +
> +
> 
>  AMD Radeon (GCN)
>  
> --
> 2.34.1



Re: [PATCH 8/8] middle-end: Expand comment for tree_niter_desc.max

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/11/22 12:07, Andrew Carlotti via Gcc-patches wrote:

This requirement is enforced by a gcc_checking_assert in
record_estimate.

gcc/ChangeLog:

* tree-ssa-loop.h (tree_niter_desc): Update comment.


OK

jeff




Re: [PATCH 6/8] docs: Add popcount, clz and ctz target attributes

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/11/22 11:54, Andrew Carlotti via Gcc-patches wrote:

gcc/ChangeLog:

* 
doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst:
Add missing target attributes.


OK

jeff




Re: [PATCH 3/8] middle-end: Refactor number_of_iterations_popcount

2022-11-14 Thread Richard Biener via Gcc-patches
On Fri, Nov 11, 2022 at 2:58 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> This includes various changes to improve clarity, and to enable the code
> to be more similar to the clz and ctz idiom recognition added in
> subsequent patches.
>
> We create new number_of_iterations_bitcount function, which will be used
> to call the other bit-counting recognition functions added in subsequent
> patches, as well as a generic comment describing the loop structures
> that are common to each idiom. Some of the variables in
> number_of_iterations_popcount are given more descriptive names, and the
> popcount expression builder is extracted into a separate function.
>
> As part of the refactoring, we also fix a bug where the max loop count
> for modes shorter than an integer would be incorrectly computed as if
> the input mode were actually an integer.
>
> We also ensure that niter->max takes into account the final value for
> niter->niter (after any folding and simplifying), since if the latter is a
> constant, then record_estimate mandates that the two values are equivalent.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-loop-niter.cc
> (number_of_iterations_exit_assumptions): Modify to call...
> (number_of_iterations_bitcount): ...this new function.
> (number_of_iterations_popcount): Now called by the above.
> Refactor, and extract popcount expression builder to...
> (build_popcount_expr): this new function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/popcount-max.c: New test.
>
>
> --
>
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount-max.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/popcount-max.c
> new file mode 100644
> index 
> ..ca7204cbc3cea636183408e24d7dd36d702ffdb2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount-max.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */
> +
> +#define PREC (__CHAR_BIT__)
> +
> +int count1 (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b &= b - 1;
> +   c++;
> +}
> +if (c <= PREC)
> +  return 0;
> +else
> +  return 34567;
> +}
> +
> +int count2 (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b &= b - 1;
> +   c++;
> +}
> +if (c <= PREC - 1)
> +  return 0;
> +else
> +  return 76543;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "34567" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "76543" 1 "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 
> 0af34e46580bb9a6f9b40e09c9f29b8454a4aaf6..fece876099c1687569d6351e7d2416ea6acae5b5
>  100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -2026,6 +2026,48 @@ number_of_iterations_cond (class loop *loop,
>return ret;
>  }
>
> +/* Return an expression that computes the popcount of src.  */
> +
> +static tree
> +build_popcount_expr (tree src)
> +{
> +  tree fn;
> +  int prec = TYPE_PRECISION (TREE_TYPE (src));
> +  int i_prec = TYPE_PRECISION (integer_type_node);
> +  int li_prec = TYPE_PRECISION (long_integer_type_node);
> +  int lli_prec = TYPE_PRECISION (long_long_integer_type_node);
> +  if (prec <= i_prec)
> +fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
> +  else if (prec == li_prec)
> +fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
> +  else if (prec == lli_prec || prec == 2 * lli_prec)
> +fn = builtin_decl_implicit (BUILT_IN_POPCOUNTLL);
> +  else
> +return NULL_TREE;
> +
> +  tree utype = unsigned_type_for (TREE_TYPE (src));
> +  src = fold_convert (utype, src);
> +  if (prec < i_prec)
> +src = fold_convert (unsigned_type_node, src);
> +  tree call;
> +  if (prec == 2 * lli_prec)
> +{
> +  tree src1 = fold_convert (long_long_unsigned_type_node,
> +   fold_build2 (RSHIFT_EXPR, TREE_TYPE (src),
> +unshare_expr (src),
> +build_int_cst (integer_type_node,
> +   lli_prec)));
> +  tree src2 = fold_convert (long_long_unsigned_type_node, src);
> +  tree call1 = build_call_expr (fn, 1, src1);
> +  tree call2 = build_call_expr (fn, 1, src2);
> +  call = fold_build2 (PLUS_EXPR, integer_type_node, call1, call2);
> +}
> +  else
> +call = build_call_expr (fn, 1, src);
> +
> +  return call;
> +}
> +
>  /* Utility function to check if OP is defined by a stmt
> that is a val - 1.  */
>
> @@ -2041,45 +2083,18 @@ ssa_defined_by_minus_one_stmt_p (tree op, tree val)
>   && integer_minus_onep (gimple_assign_rhs2 (stmt)));
>  }
>
> -/* See if LOOP is a popcout implementation, determine NITER for the loop
> +/* See comment below for number_of_iterations_bitcount.
> +   For popcount, we have:
>
> -   We match:
> -   
> -   goto 
> +   mod

Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and operation [PR98167]

2022-11-14 Thread Richard Biener via Gcc-patches
On Thu, Nov 10, 2022 at 3:27 PM Hongyu Wang  wrote:
>
> > Well, with AVX512 v64qi that's 64*64 == 4096 cases to check.  I think
> > a lambda function is fine to use.  The alternative (used by the vectorizer
> > in some places) is to use sth like
> >
> >  auto_sbitmap seen (nelts);
> >  for (i = 0; i < nelts; i++)
> >{
> >  if (!bitmap_set_bit (seen, i))
> >break;
> >  count++;
> >}
> >  full_perm_p = count == nelts;
> >
> > I'll note that you should still check .encoding ().encoded_full_vector_p ()
> > and only bother to check that case, that's a very simple check.
>
> Thanks for the good example! We also tried using wide_int as a bitmask
> but your code looks more simple and reasonable.
>
> Updated the patch accordingly.

OK.

Thanks,
Richard.

> Richard Biener  于2022年11月10日周四 16:56写道:
>
>
> >
> > On Thu, Nov 10, 2022 at 3:27 AM Hongyu Wang  wrote:
> > >
> > > Hi Prathamesh and Richard,
> > >
> > > Thanks for the review and nice suggestions!
> > >
> > > > > I guess the transform should work as long as mask is same for both
> > > > > vectors even if it's
> > > > > not constant ?
> > > >
> > > > Yes, please change accordingly (and maybe push separately).
> > > >
> > >
> > > Removed VECTOR_CST for integer ops.
> > >
> > > > > If this transform is meant only for VLS vectors, I guess you should
> > > > > bail out if TYPE_VECTOR_SUBPARTS is not constant,
> > > > > otherwise it will crash for VLA vectors.
> > > >
> > > > I suppose it's difficult to create a VLA permute that covers all 
> > > > elements
> > > > and that is not trivial though.  But indeed add ().is_constant to the
> > > > VECTOR_FLOAT_TYPE_P guard.
> > >
> > > Added.
> > >
> > > > Meh, that's quadratic!  I suggest to check .encoding 
> > > > ().encoded_full_vector_p ()
> > > > (as said I can't think of a non-full encoding that isn't trivial
> > > > but covers all elements) and then simply .qsort () the vector_builder
> > > > (it derives
> > > > from vec<>) so the scan is O(n log n).
> > >
> > > The .qsort () approach requires an extra cmp_func that IMO would not
> > > be feasible to be implemented in match.pd (I suppose lambda function
> > > would not be a good idea either).
> > > Another solution would be using hash_set but it does not work here for
> > > int64_t or poly_int64 type.
> > > So I kept current O(n^2) simple code here, and I suppose usually the
> > > permutation indices would be a small number even for O(n^2)
> > > complexity.
> >
> > Well, with AVX512 v64qi that's 64*64 == 4096 cases to check.  I think
> > a lambda function is fine to use.  The alternative (used by the vectorizer
> > in some places) is to use sth like
> >
> >  auto_sbitmap seen (nelts);
> >  for (i = 0; i < nelts; i++)
> >{
> >  if (!bitmap_set_bit (seen, i))
> >break;
> >  count++;
> >}
> >  full_perm_p = count == nelts;
> >
> > I'll note that you should still check .encoding ().encoded_full_vector_p ()
> > and only bother to check that case, that's a very simple check.
> >
> > >
> > > Attached updated patch.
> > >
> > > Richard Biener via Gcc-patches  于2022年11月8日周二 
> > > 22:38写道:
> > >
> > >
> > > >
> > > > On Fri, Nov 4, 2022 at 7:44 AM Prathamesh Kulkarni via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > On Fri, 4 Nov 2022 at 05:36, Hongyu Wang via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is a follow-up patch for PR98167
> > > > > >
> > > > > > The sequence
> > > > > >  c1 = VEC_PERM_EXPR (a, a, mask)
> > > > > >  c2 = VEC_PERM_EXPR (b, b, mask)
> > > > > >  c3 = c1 op c2
> > > > > > can be optimized to
> > > > > >  c = a op b
> > > > > >  c3 = VEC_PERM_EXPR (c, c, mask)
> > > > > > for all integer vector operation, and float operation with
> > > > > > full permutation.
> > > > > >
> > > > > > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > > >
> > > > > > Ok for trunk?
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/98167
> > > > > > * match.pd: New perm + vector op patterns for int and fp 
> > > > > > vector.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > PR target/98167
> > > > > > * gcc.target/i386/pr98167.c: New test.
> > > > > > ---
> > > > > >  gcc/match.pd| 49 
> > > > > > +
> > > > > >  gcc/testsuite/gcc.target/i386/pr98167.c | 44 ++
> > > > > >  2 files changed, 93 insertions(+)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr98167.c
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index 194ba8f5188..b85ad34f609 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -8189,3 +8189,52 @@ and,
> > > > > >   (bit_and (negate @0) integer_onep@1)
> > > > > >   (if (!TYPE_OVERFLOW_SANITIZED (type))
> > > > > >(bit_and @0 @1)))
> > > > > > +
> > > > > > +/* Optimize
> > > > > > +   c1 = VEC_PERM_EXPR (a, a, m

Re: [PATCH v2 1/3] doc: -falign-functions doesn't override the __attribute__((align(N)))

2022-11-14 Thread Richard Biener via Gcc-patches
On Tue, Oct 11, 2022 at 11:02 PM Palmer Dabbelt  wrote:
>
> I found this when reading the documentation for Kito's recent patch.
> From the discussion it sounds like this is the desired behavior, so
> let's document it.

OK.

> gcc/doc/ChangeLog
>
> * invoke.texi (-falign-functions): Mention __align__
> ---
>  gcc/doc/invoke.texi | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2a9ea3455f6..8326a60dcf1 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13136,7 +13136,9 @@ effective only in combination with 
> @option{-fstrict-aliasing}.
>  Align the start of functions to the next power-of-two greater than or
>  equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
>  least the first @var{m} bytes of the function can be fetched by the CPU
> -without crossing an @var{n}-byte alignment boundary.
> +without crossing an @var{n}-byte alignment boundary.  This does not override
> +functions that otherwise specify their own alignment constraints, such as via
> +an alignment attribute.
>
>  If @var{m} is not specified, it defaults to @var{n}.
>
> --
> 2.34.1
>


Re: [PATCH v2 3/3] doc: -falign-functions is ignored for cold/size-optimized functions

2022-11-14 Thread Richard Biener via Gcc-patches
On Tue, Oct 11, 2022 at 11:02 PM Palmer Dabbelt  wrote:
>
> gcc/doc/ChangeLog

OK.

> * invoke.texi (-falign-functions): Mention cold/size-optimized
> functions.
> ---
>  gcc/doc/invoke.texi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a24798d5029..6af18ae9bfd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13138,7 +13138,8 @@ equal to @var{n}, skipping up to @var{m}-1 bytes.  
> This ensures that at
>  least the first @var{m} bytes of the function can be fetched by the CPU
>  without crossing an @var{n}-byte alignment boundary.  This does not override
>  functions that otherwise specify their own alignment constraints, such as via
> -an alignment attribute.
> +an alignment attribute.  Functions that are optimized for size, for example
> +cold functions, are not aligned.
>
>  If @var{m} is not specified, it defaults to @var{n}.
>
> --
> 2.34.1
>


[PATCH v2] gcc-13: aarch64: Document new cores

2022-11-14 Thread Philipp Tomsich
Document the new cores added recently:
 - ampere1a
 - cortex-x1c
 - cortex-a715
 - cortex-x3
 - neoverse-v2

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Change capitalization of ARM to Arm.
- Add documentation for Cortex-X3 and Neoverse V2.

 htdocs/gcc-13/changes.html | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 0daf921b..b82e198b 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -210,7 +210,16 @@ a work-in-progress.
 
 New Targets and Target Specific Improvements
 
-
+AArch64
+
+  A number of new CPUs are supported through the -mcpu and
+  -mtune options (GCC identifiers in parentheses).
+
+  Ampere-1A (ampere1a).
+  ARM Cortex-X1C (cortex-x1c).
+  ARM Cortex-A715 (cortex-a715).
+
+
 
 AMD Radeon (GCN)
 
-- 
2.34.1



Re: [PATCH v2] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-14 Thread Philipp Tomsich
Richard,

is this OK for backport to GCC-12 and GCC-11?

Thanks,
Philipp.

On Mon, 14 Nov 2022 at 14:53, Philipp Tomsich  wrote:
>
> This patch adds support for Ampere-1A CPU:
>  - recognize the name of the core and provide detection for -mcpu=native,
>  - updated extra_costs,
>  - adds a new fusion pair for (A+B+1 and A-B-1).
>
> Ampere-1A and Ampere-1 have more timing difference than the extra
> costs indicate, but these don't propagate through to the headline
> items in our extra costs (e.g. the change in latency for scalar sqrt
> doesn't have a corresponding table entry).
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
> * config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
> * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
> Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
> registers and then +1/-1).
> * config/aarch64/aarch64-tune.md: Regenerate.
> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
> idiom-matcher for the new fusion pair.
> * doc/invoke.texi: Add ampere1a.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
> Changes in v2:
> - break line in fusion matcher to stay below 80 characters
> - rename fusion pair addsub_2reg_const1
> - document 'ampere1a' in invoke.texi
>
>  gcc/config/aarch64/aarch64-cores.def|   1 +
>  gcc/config/aarch64/aarch64-cost-tables.h| 107 
>  gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
>  gcc/config/aarch64/aarch64-tune.md  |   2 +-
>  gcc/config/aarch64/aarch64.cc   |  64 
>  gcc/doc/invoke.texi |   2 +-
>  6 files changed, 175 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index d2671778928..aead587cec1 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  
> V8A,  (CRC, CRYPTO), thu
>
>  /* Ampere Computing ('\xC0') cores. */
>  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
> ampere1, 0xC0, 0xac3, -1)
> +AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
> MEMTAG), ampere1a, 0xC0, 0xac4, -1)
>  /* Do not swap around "emag" and "xgene1",
> this order is required to handle variant correctly. */
>  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
> 0x50, 0x000, 3)
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> b/gcc/config/aarch64/aarch64-cost-tables.h
> index 760d7b30368..48522606fbe 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -775,4 +775,111 @@ const struct cpu_cost_table ampere1_extra_costs =
>}
>  };
>
> +const struct cpu_cost_table ampere1a_extra_costs =
> +{
> +  /* ALU */
> +  {
> +0, /* arith.  */
> +0, /* logical.  */
> +0, /* shift.  */
> +COSTS_N_INSNS (1), /* shift_reg.  */
> +0, /* arith_shift.  */
> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> +0, /* log_shift.  */
> +COSTS_N_INSNS (1), /* log_shift_reg.  */
> +0, /* extend.  */
> +COSTS_N_INSNS (1), /* extend_arith.  */
> +0, /* bfi.  */
> +0, /* bfx.  */
> +0, /* clz.  */
> +0, /* rev.  */
> +0, /* non_exec.  */
> +true   /* non_exec_costs_exec.  */
> +  },
> +  {
> +/* MULT SImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  COSTS_N_INSNS (3),   /* flag_setting.  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (4),   /* add.  */
> +  COSTS_N_INSNS (4),   /* extend_add.  */
> +  COSTS_N_INSNS (19)   /* idiv.  */
> +},
> +/* MULT DImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  0,   /* flag_setting (N/A).  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (4),   /* add.  */
> +  COSTS_N_INSNS (4),   /* extend_add.  */
> +  COSTS_N_INSNS (35)   /* idiv.  */
> +}
> +  },
> +  /* LD/ST */
> +  {
> +COSTS_N_INSNS (4), /* load.  */
> +COSTS_N_INSNS (4), /* load_sign_extend.  */
> +0, /* ldrd (n/a).  */
> +0, /* ldm_1st.  */
> +0, /* ldm_regs_per_insn_1st.  */
> +0, /* ldm_regs_per_insn_subsequent.  */
> +COSTS_N_INSNS (5), /* loadf.  */
> +COSTS_N_INSNS (5), /* loadd.  */
> +COSTS_N_INSNS (5), /* load_unaligned.  */
> +0, /* store.  */
> +0,   

RE: [PATCH v2] gcc-13: aarch64: Document new cores

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Philipp Tomsich 
> Sent: Monday, November 14, 2022 2:54 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> ; Philipp Tomsich 
> Subject: [PATCH v2] gcc-13: aarch64: Document new cores
> 
> Document the new cores added recently:
>  - ampere1a
>  - cortex-x1c
>  - cortex-a715
>  - cortex-x3
>  - neoverse-v2

Thanks, but I think the patch doesn't reflect those yet 😉
Ok with the entries added as discussed.
Thanks,
Kyirll

> 
> Signed-off-by: Philipp Tomsich 
> ---
> 
> Changes in v2:
> - Change capitalization of ARM to Arm.
> - Add documentation for Cortex-X3 and Neoverse V2.
> 
>  htdocs/gcc-13/changes.html | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 0daf921b..b82e198b 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -210,7 +210,16 @@ a work-in-progress.
>  
>  New Targets and Target Specific Improvements
> 
> -
> +AArch64
> +
> +  A number of new CPUs are supported through the -
> mcpu and
> +  -mtune options (GCC identifiers in parentheses).
> +
> +  Ampere-1A (ampere1a).
> +  ARM Cortex-X1C (cortex-x1c).
> +  ARM Cortex-A715 (cortex-a715).
> +
> +
> 
>  AMD Radeon (GCN)
>  
> --
> 2.34.1



Re: [PATCH v2] gcc-13: aarch64: Document new cores

2022-11-14 Thread Philipp Tomsich
I’ll hold off pushing until these are on master.

On Mon 14. Nov 2022 at 15:56, Kyrylo Tkachov  wrote:

>
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Monday, November 14, 2022 2:54 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Kyrylo Tkachov
> > ; Philipp Tomsich 
> > Subject: [PATCH v2] gcc-13: aarch64: Document new cores
> >
> > Document the new cores added recently:
> >  - ampere1a
> >  - cortex-x1c
> >  - cortex-a715
> >  - cortex-x3
> >  - neoverse-v2
>
> Thanks, but I think the patch doesn't reflect those yet 😉
> Ok with the entries added as discussed.
> Thanks,
> Kyirll
>
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> > Changes in v2:
> > - Change capitalization of ARM to Arm.
> > - Add documentation for Cortex-X3 and Neoverse V2.
> >
> >  htdocs/gcc-13/changes.html | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 0daf921b..b82e198b 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -210,7 +210,16 @@ a work-in-progress.
> >  
> >  New Targets and Target Specific Improvements
> >
> > -
> > +AArch64
> > +
> > +  A number of new CPUs are supported through the -
> > mcpu and
> > +  -mtune options (GCC identifiers in parentheses).
> > +
> > +  Ampere-1A (ampere1a).
> > +  ARM Cortex-X1C (cortex-x1c).
> > +  ARM Cortex-A715 (cortex-a715).
> > +
> > +
> >
> >  AMD Radeon (GCN)
> >  
> > --
> > 2.34.1
>
>


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-14 Thread Aldy Hernandez via Gcc-patches
Huh...no argument from me.

Thanks.
Aldy

On Mon, Nov 14, 2022, 15:35 Jakub Jelinek  wrote:

> On Mon, Nov 14, 2022 at 07:30:18AM -0700, Jeff Law via Gcc-patches wrote:
> > To Jakub's concern.  I thought sqrt was treated like +-/* WRT accuracy
> > requirements by IEEE.   ie, for any input there is a well defined answer
> for
> > a confirming IEEE implementation.   In fact, getting to that .5ulp bound
> is
> > a significant amount of the  cost for a NR or Goldschmidt (or hybrid)
> > implementation if you've got a reasonable (say 12 or 14 bit) estimator
> and
> > high performance fmacs.
>
> That might be the case (except for the known libquadmath sqrtq case
> PR105101 which fortunately is not a builtin).
> But we'll need to ulps infrastructure for other functions anyway and
> it would be nice to write a short testcase first that will test
> sqrt{,f,l,f32,f64,f128} and can be easily adjusted to test other functions.
> I'll try to cook something up tomorrow.
>
> Jakub
>
>


Re: [PATCH v2] gcc-13: aarch64: Document new cores

2022-11-14 Thread Philipp Tomsich
I squashed (so these are actually in the commit) and applied to master.
Philipp.

On Mon, 14 Nov 2022 at 15:56, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Monday, November 14, 2022 2:54 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Kyrylo Tkachov
> > ; Philipp Tomsich 
> > Subject: [PATCH v2] gcc-13: aarch64: Document new cores
> >
> > Document the new cores added recently:
> >  - ampere1a
> >  - cortex-x1c
> >  - cortex-a715
> >  - cortex-x3
> >  - neoverse-v2
>
> Thanks, but I think the patch doesn't reflect those yet 😉
> Ok with the entries added as discussed.
> Thanks,
> Kyirll
>
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> > Changes in v2:
> > - Change capitalization of ARM to Arm.
> > - Add documentation for Cortex-X3 and Neoverse V2.
> >
> >  htdocs/gcc-13/changes.html | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 0daf921b..b82e198b 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -210,7 +210,16 @@ a work-in-progress.
> >  
> >  New Targets and Target Specific Improvements
> >
> > -
> > +AArch64
> > +
> > +  A number of new CPUs are supported through the -
> > mcpu and
> > +  -mtune options (GCC identifiers in parentheses).
> > +
> > +  Ampere-1A (ampere1a).
> > +  ARM Cortex-X1C (cortex-x1c).
> > +  ARM Cortex-A715 (cortex-a715).
> > +
> > +
> >
> >  AMD Radeon (GCN)
> >  
> > --
> > 2.34.1
>


[COMMITTED] Fix some @opindex with - in the front

2022-11-14 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

I noticed this during the conversion of the docs
to sphinx that some options in the option index had a -
in the front of it for the texinfo docs. When the sphinx
conversion was reverted, I thought I would fix the texinfo
documentation for these options.

Committed as obvious after doing "make html" to check
the resulting option index page.

gcc/ChangeLog:

* doc/invoke.texi: Remove the front - from
some @opindex.
---
 gcc/doc/invoke.texi | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2e4433d..80365d8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17032,7 +17032,7 @@ routines generate output or allocate memory).
 @xref{Common Function Attributes}.
 
 @item -finstrument-functions-once
-@opindex -finstrument-functions-once
+@opindex finstrument-functions-once
 This is similar to @option{-finstrument-functions}, but the profiling
 functions are called only once per instrumented function, i.e. the first
 profiling function is called after the first entry into the instrumented
@@ -25215,7 +25215,7 @@ These command-line options are defined for LoongArch 
targets:
 
 @table @gcctabopt
 @item -march=@var{cpu-type}
-@opindex -march
+@opindex march
 Generate instructions for the machine type @var{cpu-type}.  In contrast to
 @option{-mtune=@var{cpu-type}}, which merely tunes the generated code
 for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC
@@ -25285,43 +25285,43 @@ registers for parameter passing.  This option may 
change the target
 ABI.
 
 @item -msingle-float
-@opindex -msingle-float
+@opindex msingle-float
 Force @option{-mfpu=32} and allow the use of 32-bit floating-point
 registers for parameter passing.  This option may change the target
 ABI.
 
 @item -mdouble-float
-@opindex -mdouble-float
+@opindex mdouble-float
 Force @option{-mfpu=64} and allow the use of 32/64-bit floating-point
 registers for parameter passing.  This option may change the target
 ABI.
 
 @item -mbranch-cost=@var{n}
-@opindex -mbranch-cost
+@opindex mbranch-cost
 Set the cost of branches to roughly @var{n} instructions.
 
 @item -mcheck-zero-division
 @itemx -mno-check-zero-divison
-@opindex -mcheck-zero-division
+@opindex mcheck-zero-division
 Trap (do not trap) on integer division by zero.  The default is
 @option{-mcheck-zero-division} for @option{-O0} or @option{-Og}, and
 @option{-mno-check-zero-division} for other optimization levels.
 
 @item -mcond-move-int
 @itemx -mno-cond-move-int
-@opindex -mcond-move-int
+@opindex mcond-move-int
 Conditional moves for integral data in general-purpose registers
 are enabled (disabled).  The default is @option{-mcond-move-int}.
 
 @item -mcond-move-float
 @itemx -mno-cond-move-float
-@opindex -mcond-move-float
+@opindex mcond-move-float
 Conditional moves for floating-point registers are enabled (disabled).
 The default is @option{-mcond-move-float}.
 
 @item -mmemcpy
 @itemx -mno-memcpy
-@opindex -mmemcpy
+@opindex mmemcpy
 Force (do not force) the use of @code{memcpy} for non-trivial block moves.
 The default is @option{-mno-memcpy}, which allows GCC to inline most
 constant-sized copies.  Setting optimization level to @option{-Os} also
@@ -25331,18 +25331,18 @@ the command line.
 
 @item -mstrict-align
 @itemx -mno-strict-align
-@opindex -mstrict-align
+@opindex mstrict-align
 Avoid or allow generating memory accesses that may not be aligned on a natural
 object boundary as described in the architecture specification. The default is
 @option{-mno-strict-align}.
 
 @item -msmall-data-limit=@var{number}
-@opindex -msmall-data-limit
+@opindex msmall-data-limit
 Put global and static data smaller than @var{number} bytes into a special
 section (on some targets).  The default value is 0.
 
 @item -mmax-inline-memcpy-size=@var{n}
-@opindex -mmax-inline-memcpy-size
+@opindex mmax-inline-memcpy-size
 Inline all block moves (such as calls to @code{memcpy} or structure copies)
 less than or equal to @var{n} bytes.  The default value of @var{n} is 1024.
 
-- 
1.8.3.1



Re: [PATCH 1/3] libstdc++: Implement ranges::contains/contains_subrange from P2302R4

2022-11-14 Thread Patrick Palka via Gcc-patches
On Mon, 14 Nov 2022, Jonathan Wakely wrote:

> On Mon, 14 Nov 2022 at 04:51, Patrick Palka via Libstdc++
>  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/ranges_algo.h (__contains_fn, contains): Define.
> > (__contains_subrange_fn, contains_subrange): Define.
> > * testsuite/25_algorithms/contains/1.cc: New test.
> > * testsuite/25_algorithms/contains_subrange/1.cc: New test.
> > ---
> >  libstdc++-v3/include/bits/ranges_algo.h   | 54 +++
> >  .../testsuite/25_algorithms/contains/1.cc | 33 
> >  .../25_algorithms/contains_subrange/1.cc  | 35 
> >  3 files changed, 122 insertions(+)
> >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/contains/1.cc
> >  create mode 100644 
> > libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc
> >
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > b/libstdc++-v3/include/bits/ranges_algo.h
> > index de71bd07a2f..da0ca981dc3 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -3464,6 +3464,60 @@ namespace ranges
> >
> >inline constexpr __prev_permutation_fn prev_permutation{};
> >
> > +#if __cplusplus > 202002L
> > +  struct __contains_fn
> > +  {
> > +template _Sent,
> > +   typename _Tp, typename _Proj = identity>
> > +  requires indirect_binary_predicate > +projected<_Iter, _Proj>, const 
> > _Tp*>
> > +  constexpr bool
> > +  operator()(_Iter __first, _Sent __last, const _Tp& __value, _Proj 
> > __proj = {}) const
> > +  { return ranges::find(std::move(__first), __last, __value, __proj) 
> > != __last; }
> 
> Should this use std::move(__proj)?

Oops yes, IIUC std::move'ing projections isn't necessary since they're
copyable and equality preserving, but doing so is consistent with the
rest of the ranges algos which tend to std::move function objects.

-- >8 --

Subject: [PATCH 1/3] libstdc++: Implement ranges::contains/contains_subrange
 from P2302R4

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__contains_fn, contains): Define.
(__contains_subrange_fn, contains_subrange): Define.
* testsuite/25_algorithms/contains/1.cc: New test.
* testsuite/25_algorithms/contains_subrange/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 54 +++
 .../testsuite/25_algorithms/contains/1.cc | 33 
 .../25_algorithms/contains_subrange/1.cc  | 37 +
 3 files changed, 124 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/contains/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index de71bd07a2f..11206bdbcaa 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3464,6 +3464,60 @@ namespace ranges
 
   inline constexpr __prev_permutation_fn prev_permutation{};
 
+#if __cplusplus > 202002L
+  struct __contains_fn
+  {
+template _Sent,
+   typename _Tp, typename _Proj = identity>
+  requires indirect_binary_predicate, const _Tp*>
+  constexpr bool
+  operator()(_Iter __first, _Sent __last, const _Tp& __value, _Proj __proj 
= {}) const
+  { return ranges::find(std::move(__first), __last, __value, 
std::move(__proj)) != __last; }
+
+template
+  requires indirect_binary_predicate, _Proj>, 
const _Tp*>
+  constexpr bool
+  operator()(_Range&& __r, const _Tp& __value, _Proj __proj = {}) const
+  { return (*this)(ranges::begin(__r), ranges::end(__r), __value, 
std::move(__proj)); }
+  };
+
+  inline constexpr __contains_fn contains{};
+
+  struct __contains_subrange_fn
+  {
+template _Sent1,
+forward_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
+typename _Pred = ranges::equal_to,
+typename Proj1 = identity, typename Proj2 = identity>
+  requires indirectly_comparable<_Iter1, _Iter2, _Pred, Proj1, Proj2>
+  constexpr bool
+  operator()(_Iter1 __first1, _Sent1 __last1, _Iter2 __first2, _Sent2 
__last2,
+_Pred __pred = {}, Proj1 __proj1 = {}, Proj2 __proj2 = {}) 
const
+  {
+   return __first2 == __last2
+ || !ranges::search(__first1, __last1, __first2, __last2,
+std::move(__pred), std::move(__proj1), 
std::move(__proj2)).empty();
+  }
+
+template
+  requires indirectly_comparable, iterator_t<_Range2>,
+_Pred, _Proj1, _Proj2>
+  constexpr bool
+  operator()(_Range1&& __r1, _Range2&& __r2, _Pred __pred = {},
+_Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
+  {
+   return (*this)(ranges::begin(__r1), ranges::end(__r1),
+  ra

Re: [PATCH 1/3] libstdc++: Implement ranges::contains/contains_subrange from P2302R4

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 15:07, Patrick Palka  wrote:
>
> On Mon, 14 Nov 2022, Jonathan Wakely wrote:
>
> > On Mon, 14 Nov 2022 at 04:51, Patrick Palka via Libstdc++
> >  wrote:
> > >
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/ranges_algo.h (__contains_fn, contains): Define.
> > > (__contains_subrange_fn, contains_subrange): Define.
> > > * testsuite/25_algorithms/contains/1.cc: New test.
> > > * testsuite/25_algorithms/contains_subrange/1.cc: New test.
> > > ---
> > >  libstdc++-v3/include/bits/ranges_algo.h   | 54 +++
> > >  .../testsuite/25_algorithms/contains/1.cc | 33 
> > >  .../25_algorithms/contains_subrange/1.cc  | 35 
> > >  3 files changed, 122 insertions(+)
> > >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/contains/1.cc
> > >  create mode 100644 
> > > libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc
> > >
> > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > index de71bd07a2f..da0ca981dc3 100644
> > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > @@ -3464,6 +3464,60 @@ namespace ranges
> > >
> > >inline constexpr __prev_permutation_fn prev_permutation{};
> > >
> > > +#if __cplusplus > 202002L
> > > +  struct __contains_fn
> > > +  {
> > > +template _Sent,
> > > +   typename _Tp, typename _Proj = identity>
> > > +  requires indirect_binary_predicate > > +projected<_Iter, _Proj>, const 
> > > _Tp*>
> > > +  constexpr bool
> > > +  operator()(_Iter __first, _Sent __last, const _Tp& __value, _Proj 
> > > __proj = {}) const
> > > +  { return ranges::find(std::move(__first), __last, __value, __proj) 
> > > != __last; }
> >
> > Should this use std::move(__proj)?
>
> Oops yes, IIUC std::move'ing projections isn't necessary since they're
> copyable and equality preserving, but doing so is consistent with the
> rest of the ranges algos which tend to std::move function objects.

Revised patch is OK for trunk, thanks.

>
> -- >8 --
>
> Subject: [PATCH 1/3] libstdc++: Implement ranges::contains/contains_subrange
>  from P2302R4
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__contains_fn, contains): Define.
> (__contains_subrange_fn, contains_subrange): Define.
> * testsuite/25_algorithms/contains/1.cc: New test.
> * testsuite/25_algorithms/contains_subrange/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 54 +++
>  .../testsuite/25_algorithms/contains/1.cc | 33 
>  .../25_algorithms/contains_subrange/1.cc  | 37 +
>  3 files changed, 124 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/contains/1.cc
>  create mode 100644 
> libstdc++-v3/testsuite/25_algorithms/contains_subrange/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index de71bd07a2f..11206bdbcaa 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -3464,6 +3464,60 @@ namespace ranges
>
>inline constexpr __prev_permutation_fn prev_permutation{};
>
> +#if __cplusplus > 202002L
> +  struct __contains_fn
> +  {
> +template _Sent,
> +   typename _Tp, typename _Proj = identity>
> +  requires indirect_binary_predicate +projected<_Iter, _Proj>, const _Tp*>
> +  constexpr bool
> +  operator()(_Iter __first, _Sent __last, const _Tp& __value, _Proj 
> __proj = {}) const
> +  { return ranges::find(std::move(__first), __last, __value, 
> std::move(__proj)) != __last; }
> +
> +template
> +  requires indirect_binary_predicate +projected, 
> _Proj>, const _Tp*>
> +  constexpr bool
> +  operator()(_Range&& __r, const _Tp& __value, _Proj __proj = {}) const
> +  { return (*this)(ranges::begin(__r), ranges::end(__r), __value, 
> std::move(__proj)); }
> +  };
> +
> +  inline constexpr __contains_fn contains{};
> +
> +  struct __contains_subrange_fn
> +  {
> +template _Sent1,
> +forward_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
> +typename _Pred = ranges::equal_to,
> +typename Proj1 = identity, typename Proj2 = identity>
> +  requires indirectly_comparable<_Iter1, _Iter2, _Pred, Proj1, Proj2>
> +  constexpr bool
> +  operator()(_Iter1 __first1, _Sent1 __last1, _Iter2 __first2, _Sent2 
> __last2,
> +_Pred __pred = {}, Proj1 __proj1 = {}, Proj2 __proj2 = {}) 
> const
> +  {
> +   return __first2 == __last2
> + || !ranges::search(__first1, __last1, __first2, __last2,
> +std::move(__pred), std::move(

Re: [PATCH 5/8] middle-end: Add cltz_complement idiom recognition

2022-11-14 Thread Richard Biener via Gcc-patches
On Fri, Nov 11, 2022 at 7:53 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> This recognises patterns of the form:
> while (n) { n >>= 1 }
>
> This patch results in improved (but still suboptimal) codegen:
>
> foo (unsigned int b) {
> int c = 0;
>
> while (b) {
> b >>= 1;
> c++;
> }
>
> return c;
> }
>
> foo:
> .LFB11:
> .cfi_startproc
> cbz w0, .L3
> clz w1, w0
> tst x0, 1
> mov w0, 32
> sub w0, w0, w1
> cselw0, w0, wzr, ne
> ret
>
> The conditional is unnecessary. phiopt could recognise a redundant csel
> (using cond_removal_in_builtin_zero_pattern) when one of the inputs is a
> clz call, but it cannot recognise the redunancy when the input is (e.g.)
> (32 - clz).
>
> I could perhaps extend this function to recognise this pattern in a later
> patch, if this is a good place to recognise more patterns.
>
> gcc/ChangeLog:
>
> * tree-scalar-evolution.cc (expression_expensive_p): Add checks
> for c[lt]z optabs.
> * tree-ssa-loop-niter.cc (build_cltz_expr): New.
> (number_of_iterations_cltz_complement): New.
> (number_of_iterations_bitcount): Add call to the above.
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp (check_effective_target_clz)
> (check_effective_target_clzl, check_effective_target_clzll)
> (check_effective_target_ctz, check_effective_target_clzl)
> (check_effective_target_ctzll): New.
> * gcc.dg/tree-ssa/cltz-complement-max.c: New test.
> * gcc.dg/tree-ssa/clz-complement-char.c: New test.
> * gcc.dg/tree-ssa/clz-complement-int.c: New test.
> * gcc.dg/tree-ssa/clz-complement-long-long.c: New test.
> * gcc.dg/tree-ssa/clz-complement-long.c: New test.
> * gcc.dg/tree-ssa/ctz-complement-char.c: New test.
> * gcc.dg/tree-ssa/ctz-complement-int.c: New test.
> * gcc.dg/tree-ssa/ctz-complement-long-long.c: New test.
> * gcc.dg/tree-ssa/ctz-complement-long.c: New test.
>
>
> --
>
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
> new file mode 100644
> index 
> ..1a29ca52e42e50822e4e3213b2cb008b766d0318
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */
> +
> +#define PREC (__CHAR_BIT__)
> +
> +int clz_complement_count1 (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b >>= 1;
> +   c++;
> +}
> +if (c <= PREC)
> +  return 0;
> +else
> +  return 34567;
> +}
> +
> +int clz_complement_count2 (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b >>= 1;
> +   c++;
> +}
> +if (c <= PREC - 1)
> +  return 0;
> +else
> +  return 76543;
> +}
> +
> +int ctz_complement_count1 (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b <<= 1;
> +   c++;
> +}
> +if (c <= PREC)
> +  return 0;
> +else
> +  return 23456;
> +}
> +
> +int ctz_complement_count2 (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b <<= 1;
> +   c++;
> +}
> +if (c <= PREC - 1)
> +  return 0;
> +else
> +  return 65432;
> +}
> +/* { dg-final { scan-tree-dump-times "34567" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "76543" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "23456" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "65432" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c
> new file mode 100644
> index 
> ..2ebe8fabcaf0ce88f3a6a46e9ba4ba79b7d3672e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target clz } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#define PREC (__CHAR_BIT__)
> +
> +int
> +__attribute__ ((noinline, noclone))
> +foo (unsigned char b) {
> +int c = 0;
> +
> +while (b) {
> +   b >>= 1;
> +   c++;
> +}
> +
> +return c;
> +}
> +
> +int main()
> +{
> +  if (foo(0) != 0)
> +__builtin_abort ();
> +  if (foo(5) != 3)
> +__builtin_abort ();
> +  if (foo(255) != 8)
> +__builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 1 "optimized" } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c
> new file mode 100644
> index 
> ..f2c5c23f6a7d84ecb637c6961698b0fc30d7426b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c
> @@ -

Re: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1

2022-11-14 Thread Patrick Palka via Gcc-patches
On Mon, 14 Nov 2022, Jonathan Wakely wrote:

> On Mon, 14 Nov 2022 at 10:17, Daniel Krügler  
> wrote:
> >
> > Am Mo., 14. Nov. 2022 um 11:09 Uhr schrieb Jonathan Wakely via
> > Libstdc++ :
> > >
> > > On Mon, 14 Nov 2022 at 04:52, Patrick Palka via Libstdc++
> > >  wrote:
> > > >
> > > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > > * include/bits/ranges_algo.h (out_value_result): Define.
> > > > (iota_result): Define.
> > > > (__iota_fn, iota): Define.
> > > > * testsuite/25_algorithms/iota/1.cc: New test.
> > > > ---
> > > >  libstdc++-v3/include/bits/ranges_algo.h   | 48 +++
> > > >  .../testsuite/25_algorithms/iota/1.cc | 29 +++
> > > >  2 files changed, 77 insertions(+)
> > > >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc
> > > >
> > > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > > index da0ca981dc3..f003117c569 100644
> > > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > > @@ -3517,6 +3517,54 @@ namespace ranges
> > > >};
> > > >
> > > >inline constexpr __contains_subrange_fn contains_subrange{};
> > > > +
> > > > +  template
> > > > +struct out_value_result
> > > > +{
> > > > +  [[no_unique_address]] _Out out;
> > > > +  [[no_unique_address]] _Tp value;
> > > > +
> > > > +  template
> > > > +   requires convertible_to
> > > > + && convertible_to
> > > > +   constexpr
> > > > +   operator out_value_result<_Out2, _Tp2>() const &
> > > > +   { return {out, value}; }
> > > > +
> > > > +  template
> > > > +   requires convertible_to<_Out, _Out2>
> > > > + && convertible_to<_Tp, _Tp2>
> > > > +   constexpr
> > > > +   operator out_value_result<_Out2, _Tp2>() &&
> > > > +   { return {std::move(out), std::move(value)}; }
> > > > +};
> > > > +
> > > > +  template
> > > > +using iota_result = out_value_result<_Out, _Tp>;
> > > > +
> > > > +  struct __iota_fn
> > > > +  {
> > > > +template _Sent, 
> > > > weakly_incrementable _Tp>
> > > > +  requires indirectly_writable<_Out, const _Tp&>
> > > > +  constexpr iota_result<_Out, _Tp>
> > > > +  operator()(_Out __first, _Sent __last, _Tp __value) const
> > > > +  {
> > > > +   while (__first != __last)
> > > > + {
> > > > +   *__first = static_cast&>(__value);
> > >
> > > Is this any different to const_cast(__value) ?
> >
> > I think it is. const_cast can potentially mean the removal
> > of volatile,
> 
> True.
> 
> > so I would always look with suspicion on const_cast > _Tp&>, while static_cast is clearer. Alternatively, as_const could be
> > used, which does add_const_t.
> 
> Which means evaluating the add_const trait *and* overload resolution
> for as_const* *and* a runtime function call.
> 
> Let's go with static_cast.

Sounds good, like so?

-- >8 --


Subject: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (out_value_result): Define.
(iota_result): Define.
(__iota_fn, iota): Define.
* testsuite/25_algorithms/iota/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 48 +++
 .../testsuite/25_algorithms/iota/1.cc | 29 +++
 2 files changed, 77 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 11206bdbcaa..f75735f02cb 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3517,6 +3517,54 @@ namespace ranges
   };
 
   inline constexpr __contains_subrange_fn contains_subrange{};
+
+  template
+struct out_value_result
+{
+  [[no_unique_address]] _Out out;
+  [[no_unique_address]] _Tp value;
+
+  template
+   requires convertible_to
+ && convertible_to
+   constexpr
+   operator out_value_result<_Out2, _Tp2>() const &
+   { return {out, value}; }
+
+  template
+   requires convertible_to<_Out, _Out2>
+ && convertible_to<_Tp, _Tp2>
+   constexpr
+   operator out_value_result<_Out2, _Tp2>() &&
+   { return {std::move(out), std::move(value)}; }
+};
+
+  template
+using iota_result = out_value_result<_Out, _Tp>;
+
+  struct __iota_fn
+  {
+template _Sent, 
weakly_incrementable _Tp>
+  requires indirectly_writable<_Out, const _Tp&>
+  constexpr iota_result<_Out, _Tp>
+  operator()(_Out __first, _Sent __last, _Tp __value) const
+  {
+   while (__first != __last)
+ {
+   *__first = static_cast(__value);
+   ++__first;
+   ++__value;
+ }
+   return {std::move(__first), std::move(__value)};
+  }

Re: [PATCH 3/3] libstdc++: Implement ranges::find_last{, _if, _if_not} from P1223R5

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 04:51, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (__find_last_fn, find_last):
> Define.
> (__find_last_if_fn, find_last_if): Define.
> (__find_last_if_not_fn, find_last_if_not): Define.
> * testsuite/25_algorithms/find_last/1.cc: New test.
> * testsuite/25_algorithms/find_last_if/1.cc: New test.
> * testsuite/25_algorithms/find_last_if_not/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 123 ++
>  .../testsuite/25_algorithms/find_last/1.cc|  90 +
>  .../testsuite/25_algorithms/find_last_if/1.cc |  92 +
>  .../25_algorithms/find_last_if_not/1.cc   |  92 +
>  4 files changed, 397 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/find_last/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/find_last_if/1.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/find_last_if_not/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index f003117c569..0e4329382eb 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -3565,6 +3565,129 @@ namespace ranges
>};
>
>inline constexpr __iota_fn iota{};
> +
> +  struct __find_last_fn
> +  {
> +template _Sent, typename T, 
> typename _Proj = identity>
> +  requires indirect_binary_predicate _Proj>, const T*>
> +  constexpr subrange<_Iter>
> +  operator()(_Iter __first, _Sent __last, const T& __value, _Proj __proj 
> = {}) const
> +  {
> +   if constexpr (same_as<_Iter, _Sent> && bidirectional_iterator<_Iter>)
> + {
> +   _Iter __found = ranges::find(reverse_iterator<_Iter>{__last},
> +reverse_iterator<_Iter>{__first},
> +__value, __proj).base();
> +   if (__found == __first)
> + return {__last, __last};
> +   else
> + return {ranges::prev(__found), __last};
> + }
> +   else
> + {
> +   _Iter __found = ranges::find(__first, __last, __value, __proj);

std::move(__proj) here too, for consistency.

> +   if (__found == __last)
> + return {__found, __found};
> +   for (;;)
> + {
> +   __first = ranges::find(ranges::next(__first), __last, 
> __value, __proj);

And here.

> +   if (__first == __last)
> + return {__found, __first};
> +   __found = __first;
> + }
> + }
> +  }
> +
> +template
> +  requires indirect_binary_predicate projected, _Proj>, const T*>
> +  constexpr borrowed_subrange_t<_Range>
> +  operator()(_Range&& __r, const T& __value, _Proj __proj = {}) const
> +  { return (*this)(ranges::begin(__r), ranges::end(__r), __value, 
> std::move(__proj)); }
> +  };
> +
> +  inline constexpr __find_last_fn find_last{};
> +
> +  struct __find_last_if_fn
> +  {
> +template _Sent, typename 
> _Proj = identity,
> +indirect_unary_predicate> _Pred>
> +  constexpr subrange<_Iter>
> +  operator()(_Iter __first, _Sent __last, _Pred __pred, _Proj __proj = 
> {}) const
> +  {
> +   if constexpr (same_as<_Iter, _Sent> && bidirectional_iterator<_Iter>)
> + {
> +   _Iter __found = ranges::find_if(reverse_iterator<_Iter>{__last},
> +   reverse_iterator<_Iter>{__first},
> +   __pred, __proj).base();

And here, and std::move(__pred) too, I think.

OK for trunk with those changes here (and the later cases).



Re: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 15:11, Patrick Palka  wrote:
>
> On Mon, 14 Nov 2022, Jonathan Wakely wrote:
>
> > On Mon, 14 Nov 2022 at 10:17, Daniel Krügler  
> > wrote:
> > >
> > > Am Mo., 14. Nov. 2022 um 11:09 Uhr schrieb Jonathan Wakely via
> > > Libstdc++ :
> > > >
> > > > On Mon, 14 Nov 2022 at 04:52, Patrick Palka via Libstdc++
> > > >  wrote:
> > > > >
> > > > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > > > >
> > > > > libstdc++-v3/ChangeLog:
> > > > >
> > > > > * include/bits/ranges_algo.h (out_value_result): Define.
> > > > > (iota_result): Define.
> > > > > (__iota_fn, iota): Define.
> > > > > * testsuite/25_algorithms/iota/1.cc: New test.
> > > > > ---
> > > > >  libstdc++-v3/include/bits/ranges_algo.h   | 48 
> > > > > +++
> > > > >  .../testsuite/25_algorithms/iota/1.cc | 29 +++
> > > > >  2 files changed, 77 insertions(+)
> > > > >  create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc
> > > > >
> > > > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> > > > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > > > index da0ca981dc3..f003117c569 100644
> > > > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > > > @@ -3517,6 +3517,54 @@ namespace ranges
> > > > >};
> > > > >
> > > > >inline constexpr __contains_subrange_fn contains_subrange{};
> > > > > +
> > > > > +  template
> > > > > +struct out_value_result
> > > > > +{
> > > > > +  [[no_unique_address]] _Out out;
> > > > > +  [[no_unique_address]] _Tp value;
> > > > > +
> > > > > +  template
> > > > > +   requires convertible_to
> > > > > + && convertible_to
> > > > > +   constexpr
> > > > > +   operator out_value_result<_Out2, _Tp2>() const &
> > > > > +   { return {out, value}; }
> > > > > +
> > > > > +  template
> > > > > +   requires convertible_to<_Out, _Out2>
> > > > > + && convertible_to<_Tp, _Tp2>
> > > > > +   constexpr
> > > > > +   operator out_value_result<_Out2, _Tp2>() &&
> > > > > +   { return {std::move(out), std::move(value)}; }
> > > > > +};
> > > > > +
> > > > > +  template
> > > > > +using iota_result = out_value_result<_Out, _Tp>;
> > > > > +
> > > > > +  struct __iota_fn
> > > > > +  {
> > > > > +template 
> > > > > _Sent, weakly_incrementable _Tp>
> > > > > +  requires indirectly_writable<_Out, const _Tp&>
> > > > > +  constexpr iota_result<_Out, _Tp>
> > > > > +  operator()(_Out __first, _Sent __last, _Tp __value) const
> > > > > +  {
> > > > > +   while (__first != __last)
> > > > > + {
> > > > > +   *__first = static_cast&>(__value);
> > > >
> > > > Is this any different to const_cast(__value) ?
> > >
> > > I think it is. const_cast can potentially mean the removal
> > > of volatile,
> >
> > True.
> >
> > > so I would always look with suspicion on const_cast > > _Tp&>, while static_cast is clearer. Alternatively, as_const could be
> > > used, which does add_const_t.
> >
> > Which means evaluating the add_const trait *and* overload resolution
> > for as_const* *and* a runtime function call.
> >
> > Let's go with static_cast.
>
> Sounds good, like so?

OK for trunk, thanks.


>
> -- >8 --
>
>
> Subject: [PATCH 2/3] libstdc++: Implement ranges::iota from P2440R1
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_algo.h (out_value_result): Define.
> (iota_result): Define.
> (__iota_fn, iota): Define.
> * testsuite/25_algorithms/iota/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 48 +++
>  .../testsuite/25_algorithms/iota/1.cc | 29 +++
>  2 files changed, 77 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/iota/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index 11206bdbcaa..f75735f02cb 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -3517,6 +3517,54 @@ namespace ranges
>};
>
>inline constexpr __contains_subrange_fn contains_subrange{};
> +
> +  template
> +struct out_value_result
> +{
> +  [[no_unique_address]] _Out out;
> +  [[no_unique_address]] _Tp value;
> +
> +  template
> +   requires convertible_to
> + && convertible_to
> +   constexpr
> +   operator out_value_result<_Out2, _Tp2>() const &
> +   { return {out, value}; }
> +
> +  template
> +   requires convertible_to<_Out, _Out2>
> + && convertible_to<_Tp, _Tp2>
> +   constexpr
> +   operator out_value_result<_Out2, _Tp2>() &&
> +   { return {std::move(out), std::move(value)}; }
> +};
> +
> +  template
> +using iota_result = out_value_result<_Out, _Tp>;
> +
> +  struct __iota_fn
> +  {
> +template _Sent, 
> weakly_incrementable _Tp>
> +  requires 

Re: [PATCH 7/8] middle-end: Add c[lt]z idiom recognition

2022-11-14 Thread Richard Biener via Gcc-patches
On Fri, Nov 11, 2022 at 8:06 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> This recognises the patterns of the form:
>   while (n & 1) { n >>= 1 }
>
> Unfortunately there are currently two issues relating to this patch.
>
> Firstly, simplify_using_initial_conditions does not recognise that
> (n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0).
>
> This preconditions arise following the loop copy-header pass, and the
> assumptions returned by number_of_iterations_exit_assumptions then
> prevent final value replacement from using the niter result.
>
> I'm not sure what is the best way to fix this - one approach could be to
> modify simplify_using_initial_conditions to handle this sort of case,
> but it seems that it basically wants the information that ranger could
> give anway, so would something like that be a better option?

I've noted elsewhere that simplify_using_initial_conditions should be
rewritten to use (path) ranger somehow.  But I've also worked around
that for the case in PR100756, though not in this function.

> The second issue arises in the vectoriser, which is able to determine
> that the niter->assumptions are always true.
> When building with -march=armv8.4-a+sve -S -O3, we get this codegen:
>
> foo (unsigned int b) {
> int c = 0;
>
> if (b == 0)
>   return PREC;
>
> while (!(b & (1 << (PREC - 1 {
> b <<= 1;
> c++;
> }
>
> return c;
> }
>
> foo:
> .LFB0:
> .cfi_startproc
> cmp w0, 0
> cbz w0, .L6
> blt .L7
> lsl w1, w0, 1
> clz w2, w1
> cmp w2, 14
> bls .L8
> mov x0, 0
> cntwx3
> add w1, w2, 1
> index   z1.s, #0, #1
> whilelo p0.s, wzr, w1
> .L4:
> add x0, x0, x3
> mov p1.b, p0.b
> mov z0.d, z1.d
> whilelo p0.s, w0, w1
> incwz1.s
> b.any   .L4
> add z0.s, z0.s, #1
> lastb   w0, p1, z0.s
> ret
> .p2align 2,,3
> .L8:
> mov w0, 0
> b   .L3
> .p2align 2,,3
> .L13:
> lsl w1, w1, 1
> .L3:
> add w0, w0, 1
> tbz w1, #31, .L13
> ret
> .p2align 2,,3
> .L6:
> mov w0, 32
> ret
> .p2align 2,,3
> .L7:
> mov w0, 0
> ret
> .cfi_endproc
>
> In essence, the vectoriser uses the niter information to determine
> exactly how many iterations of the loop it needs to run. It then uses
> SVE whilelo instructions to run this number of iterations. The original
> loop counter is also vectorised, despite only being used in the final
> iteration, and then the final value of this counter is used as the
> return value (which is the same as the number of iterations it computed
> in the first place).
>
> This vectorisation is obviously bad, and I think it exposes a latent
> bug in the vectoriser, rather than being an issue caused by this
> specific patch.

The main issue is that we use niter analysis to detect popcount and
friends but final value replacement doesn't always apply.  When other
optimizations pick up this niter result the final values are not
replaced as aggressively.  Ideally we'd replace the loops IV with
a counting one, but sometimes the intermediate values of the
popcounted variable are still used.

This patch looks OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-loop-niter.cc (number_of_iterations_cltz): New.
> (number_of_iterations_bitcount): Add call to the above.
> (number_of_iterations_exit_assumptions): Add EQ_EXPR case for
> c[lt]z idiom recognition.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/cltz-max.c: New test.
> * gcc.dg/tree-ssa/clz-char.c: New test.
> * gcc.dg/tree-ssa/clz-int.c: New test.
> * gcc.dg/tree-ssa/clz-long-long.c: New test.
> * gcc.dg/tree-ssa/clz-long.c: New test.
> * gcc.dg/tree-ssa/ctz-char.c: New test.
> * gcc.dg/tree-ssa/ctz-int.c: New test.
> * gcc.dg/tree-ssa/ctz-long-long.c: New test.
> * gcc.dg/tree-ssa/ctz-long.c: New test.
>
>
> --
>
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
> new file mode 100644
> index 
> ..a6bea3d338940efee2e7e1c95a5941525945af9e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
> @@ -0,0 +1,72 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */
> +
> +#define PREC (__CHAR_BIT__)
> +
> +int clz_count1 (unsigned char b) {
> +int c = 0;
> +
> +if (b == 0)
> +  return 0;
> +
> +while (!(b & (1 << (PREC - 1 {
> +   b <<= 1;
> +   c++;
> +}
> +if (c <= PREC - 1)
> +  return 0;
> +else
> +  return 34567;
> +}
> +
> +int clz_count2 (unsigned char b) {
> +int c = 0;
> +
> +if (b == 0)
>

  1   2   3   >