Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jan Beulich via Gcc-patches
On 09.08.2023 04:14, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
>>
>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  wrote:
>>>
>>> Do you have any comments on the interaction of AVX10 with the
>>> micro-architecture levels defined in the ABI (and supported with
>>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
>>> should we take it that any future levels will be ones supporting 512-bit
>>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
>>> that only support 256-bit vector width will be considered to match the
>>> x86-64-v3 micro-architecture level but not any higher level?
>> This is actually something we really want to discuss in the community,
>> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
>> One big reason is Intel E-core will only support AVX10 256-bit, if we
>> want to use x86-64-v5 accross  server and client, it's better to
>> 256-bit default.

Aiui these ABI levels were intended to be incremental, i.e. higher versions
would include everything earlier ones cover. Without such a guarantee, how
would you propose compatibility checks to be implemented in a way
applicable both forwards and backwards? If a new level is wanted here, then
I guess it could only be something like v3.5.

Jan


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>
> On 09.08.2023 04:14, Hongtao Liu wrote:
> > On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >>
> >> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  
> >> wrote:
> >>>
> >>> Do you have any comments on the interaction of AVX10 with the
> >>> micro-architecture levels defined in the ABI (and supported with
> >>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> >>> should we take it that any future levels will be ones supporting 512-bit
> >>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> >>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> >>> that only support 256-bit vector width will be considered to match the
> >>> x86-64-v3 micro-architecture level but not any higher level?
> >> This is actually something we really want to discuss in the community,
> >> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> >> One big reason is Intel E-core will only support AVX10 256-bit, if we
> >> want to use x86-64-v5 accross  server and client, it's better to
> >> 256-bit default.
>
> Aiui these ABI levels were intended to be incremental, i.e. higher versions
> would include everything earlier ones cover. Without such a guarantee, how
> would you propose compatibility checks to be implemented in a way
Are there many software implemenation based on this assumption?
At least in GCC, it's not a big problem, we can adjust code for the
new micro-architecture level.
> applicable both forwards and backwards? If a new level is wanted here, then
> I guess it could only be something like v3.5.
But if we use avx10.1 as v3.5, it's still not subset of
x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
x86-64-v4), there will be still a diverge.
Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

Our main proposal is to make AVX10.x as new micro-architecture level
with 256-bit default, either v3.5 or v5 would be acceptable if it's
just the name.
>
> Jan



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jan Beulich via Gcc-patches
On 09.08.2023 09:38, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>>
>> On 09.08.2023 04:14, Hongtao Liu wrote:
>>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:

 On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers  
 wrote:
>
> Do you have any comments on the interaction of AVX10 with the
> micro-architecture levels defined in the ABI (and supported with
> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> should we take it that any future levels will be ones supporting 512-bit
> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> that only support 256-bit vector width will be considered to match the
> x86-64-v3 micro-architecture level but not any higher level?
 This is actually something we really want to discuss in the community,
 our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
 One big reason is Intel E-core will only support AVX10 256-bit, if we
 want to use x86-64-v5 accross  server and client, it's better to
 256-bit default.
>>
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way
> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.
>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.
> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.

Hmm, yes. But something will end up being odd in any event. Versions no
longer being integral values is kind of indicating a "branch", i.e. v4
not being a successor. Maybe v3.1 would be better, for it to then have
possible successors v3.2, v3.3, etc. Of course it would be possible to
"merge" branches back then, into e.g. v5 covering AVX10.2/512 (and
thus fully covering everything that's in v4).

Jan

> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
> 
> Our main proposal is to make AVX10.x as new micro-architecture level
> with 256-bit default, either v3.5 or v5 would be acceptable if it's
> just the name.



Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Florian Weimer via Gcc-patches
* Richard Biener via Gcc-patches:

> I don’t think we can realistically change the ABI.  If we could
> passing them in two 256bit registers would be possible as well.
>
> Note I fully expect intel to turn around and implement 512 bits on a
> 256 but data path on the E cores in 5 years.  And it will take at
> least that time for AVX10 to take off (look at AVX512 for this and how
> they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> shouldn’t worry at all and just wait and see for AVX42 to arrive.

Yes, the direction is a bit unclear.  In retrospect, we could have
defined x86-64-v4 to use 256 bit vector width, so it could eventually be
compatible with AVX10; it's also what current Intel CPUs prefer (and
past, with the exception of the Xeon Phi line).  But in the meantime,
AMD has started to ship CPUs that seem to prefer 512 bit vectors,
despite having a double pumped implementation.  (Disclaimer: All CPU
preferences inferred from current compiler tuning defaults, not actual
experiments. 8-/)

To me, this looks like we may have defined x86-64-v4 prematurely, and
this suggests we should wait a bit to see where things are heading.

Thanks,
Florian



Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 4:14 PM Florian Weimer  wrote:
>
> * Richard Biener via Gcc-patches:
>
> > I don’t think we can realistically change the ABI.  If we could
> > passing them in two 256bit registers would be possible as well.
> >
> > Note I fully expect intel to turn around and implement 512 bits on a
> > 256 but data path on the E cores in 5 years.  And it will take at
> > least that time for AVX10 to take off (look at AVX512 for this and how
> > they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> > shouldn’t worry at all and just wait and see for AVX42 to arrive.
>
> Yes, the direction is a bit unclear.  In retrospect, we could have
> defined x86-64-v4 to use 256 bit vector width, so it could eventually be
> compatible with AVX10; it's also what current Intel CPUs prefer (and
NOTE, avx10.x-256 also inhibit the usage of 64-bit kmask which is
supposed to be only used  by zmm instructions.
But in theory, those 64-bit kmask intrinsics can be used standalone
.i.e. kshift/kand/kor.
> past, with the exception of the Xeon Phi line).  But in the meantime,
> AMD has started to ship CPUs that seem to prefer 512 bit vectors,
> despite having a double pumped implementation.  (Disclaimer: All CPU
> preferences inferred from current compiler tuning defaults, not actual
> experiments. 8-/)
>
> To me, this looks like we may have defined x86-64-v4 prematurely, and
> this suggests we should wait a bit to see where things are heading.
>
> Thanks,
> Florian
>


-- 
BR,
Hongtao


Re: [PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-09 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/8/8 01:50, Carl Love wrote:
> 
> GCC maintainers:
> 
> Ver 3: Updated description to make it clear the patch fixes the
> confusion on the availability of the builtins.  Fixed the dg-require-
> effective-target on the test cases and the dg-options.  Change the test
> case so the for loop for the test will not be unrolled.  Fixed a
> spelling error in a vec-cmpne.c comment.  Retested on Power 10LE.
> 
> Ver 2:  Re-worked the test vec-cmpne.c to create a compile only test
> verify the instruction generation and a runnable test to verify the
> built-in functionality.  Retested the patch on Power 8 LE/BE, Power
> 9LE/BE and Power 10 LE with no regressions.
> 
> The following patch cleans up the definition for the
> __builtin_altivec_vcmpne{b,h,w}.  The current implementation implies
> that the built-in is only supported on Power 9 since it is defined
> under the Power 9 stanza.  However the built-in has no ISA restrictions
> as stated in the Power Vector Intrinsic Programming Reference document.
> The current built-in works because the built-in gets replaced during
> GIMPLE folding by a simple not-equal operator so it doesn't get
> expanded and checked for Power 9 code generation.
> 
> This patch moves the definition to the Altivec stanza in the built-in
> definition file to make it clear the built-ins are valid for Power 8,
> Power 9 and beyond.  
> 
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> 
> 
> rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> 
> The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
> under the Power 9 section of r66000-builtins.  This implies they are only
> supported on Power 9 and above when in fact they are defined and work with
> Altivec as well with the appropriate Altivec instruction generation.
> 
> The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
> Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
> processors.
> 
> This patch moves the definitions to the Altivec stanza to make it clear
> the built-ins are supported for all Altivec processors.  The patch
> removes the confusion as to which processors support the vcmpequ{b,h,w}
> instructions.
> 
> There is existing test coverage for the vec_cmpne built-in for
> vector bool char, vector bool short, vector bool int,
> vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> Coverage for vector signed int, vector unsigned int is in
> p8vector-builtin-2.c.
> 
> Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
> instructions for Altivec.  A new test vec-cmpne-runnable.c is added to
> verify the built-ins work as expected.
> 
> Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
> with no regressions.

Okay for trunk with two nits below fixed, thanks!

> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
>   Move definitions to Altivec stanza.
>   * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
>   define_expand.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
>   * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
>   execute_test_functions) moved to vec-cmpne.h.  Added
>   scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.

s/ moved/: Move/ => "... execute_test_functions): Move "

s/Added/Add/

>   * gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
>   and vec-cmpne-runnable.c. Split define_test_functions definition
>   into define_test_functions and define_init_verify_functions.
> ---
>  gcc/config/rs6000/altivec.md  |  12 ++
>  gcc/config/rs6000/rs6000-builtins.def |  18 +--
>  .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
>  gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 112 ++
>  gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  90 ++
>  5 files changed, 156 insertions(+), 112 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index ad1224e0b57..31f65aa1b7a 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
>"vcmpequq. %0,%1,%2"
>[(set_attr "type" "veccmpfx")])
>  
> +;; Expand for builtin vcmpne{b,h,w}
> +(define_expand "altivec_vcmpne_"
> +  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
> + (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
> "altivec_register_operand" "v")
> +   (match_operan

Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2023-08-09 Thread Kewen.Lin via Gcc-patches
Hi,

on 2023/7/20 12:35, jeevitha via Gcc-patches wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> When the user specifies PTImode as an attribute, it breaks. Created
> a tree node to handle PTImode types. PTImode attribute helps in generating
> even/odd register pairs on 128 bits.
> 
> 2023-07-20  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>   to hold PTImode type.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>   for PTImode type.
> 
> gcc/testsuite/
>   PR target/106895
>   * gcc.target/powerpc/pr106895.c: New testcase.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index a8f291c6a72..ca00c3b0d4c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -756,6 +756,15 @@ rs6000_init_builtins (void)
>else
>  ieee128_float_type_node = NULL_TREE;
>  
> +  /* PTImode to get even/odd register pairs.  */
> +  intPTI_type_internal_node = make_node(INTEGER_TYPE);
> +  TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode);
> +  layout_type (intPTI_type_internal_node);
> +  SET_TYPE_MODE (intPTI_type_internal_node, PTImode);
> +  t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST);
> +  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
> +   "__int128pti");

IIUC, this builtin type registering makes this type expose to users, so
I wonder if we want to actually expose this type for users' uses.
If yes, we need to update the documentation (and not sure if the current
name is good enough); otherwise, I wonder if there is some existing
practice to declare a builtin type with a name which users can't actually
use and is just for shadowing a mode.

BR,
Kewen

> +
>/* Vector pair and vector quad support.  */
>vector_pair_type_node = make_node (OPAQUE_TYPE);
>SET_TYPE_MODE (vector_pair_type_node, OOmode);
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3503614efbd..0456bf56d17 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2303,6 +2303,7 @@ enum rs6000_builtin_type_index
>RS6000_BTI_ptr_vector_quad,
>RS6000_BTI_ptr_long_long,
>RS6000_BTI_ptr_long_long_unsigned,
> +  RS6000_BTI_PTI,
>RS6000_BTI_MAX
>  };
>  
> @@ -2347,6 +2348,7 @@ enum rs6000_builtin_type_index
>  #define uintDI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_UINTDI])
>  #define intTI_type_internal_node  
> (rs6000_builtin_types[RS6000_BTI_INTTI])
>  #define uintTI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_UINTTI])
> +#define intPTI_type_internal_node (rs6000_builtin_types[RS6000_BTI_PTI])
>  #define float_type_internal_node  
> (rs6000_builtin_types[RS6000_BTI_float])
>  #define double_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_double])
>  #define long_double_type_internal_node
> (rs6000_builtin_types[RS6000_BTI_long_double])
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106895.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> new file mode 100644
> index 000..04630fe1df5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> @@ -0,0 +1,15 @@
> +/* PR target/106895 */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify the following generates even/odd register pairs.  */
> +
> +typedef __int128 pti __attribute__((mode(PTI)));
> +
> +void
> +set128 (pti val, pti *mem)
> +{
> +asm("stq %1,%0" : "=m"(*mem) : "r"(val));
> +}
> +
> +/* { dg-final { scan-assembler "stq 10,0\\(5\\)" } } */
> 
>


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Florian Weimer via Gcc-patches
* Hongtao Liu:

> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way

Correct, this was the intent.  But it's mostly to foster adoption and
make it easier for developers to pick the variants that they want to
target custom builds.  If it's an ascending chain, the trade-offs are
simpler.

> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.

The glibc framework can deal with alternate choices in principle,
although I'd prefer not to go there for the reasons indicated.

>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.

> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.
> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

The question is whether you want to mandate the 16-bit floating point
extensions.  You might get better adoption if you stay compatible with
shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
current Intel CPUs, even though they can do 512-bit vectors.

(The thread subject is a bit misleading for this sub-topic, by the way.)

Thanks,
Florian



Re: [PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762]

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 7, 2023 at 1:20 PM Richard Biener
 wrote:

> > Please also note the RFC patch [1] that relaxes clears for V2SFmode
> > with -fno-trapping-math. The patched compiler will then emit the same
> > code as clang does for -O2. Which raises another question - should gcc
> > default to -fno-trapping-math?
>
> I think we discussed this before and yes, IMHO we should default to
> -fno-trapping-math at least for C/C++ to be consistent with our other
> handling of the FP environment (default to -fno-rounding-math) and
> lack of proper FENV access barriers for inspecting the exceptions.
>
> Note Fortran has the -ffpe-trap= option which would then need to make
> sure to also enable -ftrapping-math.  Ada might have similar constraints
> (it also uses -fnon-call-exceptions, but unless it enables CPU traps for
> FP exceptions that would be a no-op).  Note this also shows we should
> possibly separate maintaining the IEEE exception state and considering
> changes in the IEEE exception states to cause CPU traps (that's also
> a source of common confusion on the user side).

FTR: PR54192, "-fno-trapping-math by default?" [1]

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192

Uros.


Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 9, 2023 at 5:15 PM Florian Weimer  wrote:
>
> * Hongtao Liu:
>
> > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
> >> Aiui these ABI levels were intended to be incremental, i.e. higher versions
> >> would include everything earlier ones cover. Without such a guarantee, how
> >> would you propose compatibility checks to be implemented in a way
>
> Correct, this was the intent.  But it's mostly to foster adoption and
> make it easier for developers to pick the variants that they want to
> target custom builds.  If it's an ascending chain, the trade-offs are
> simpler.
>
> > Are there many software implemenation based on this assumption?
> > At least in GCC, it's not a big problem, we can adjust code for the
> > new micro-architecture level.
>
> The glibc framework can deal with alternate choices in principle,
> although I'd prefer not to go there for the reasons indicated.
>
> >> applicable both forwards and backwards? If a new level is wanted here, then
> >> I guess it could only be something like v3.5.
>
> > But if we use avx10.1 as v3.5, it's still not subset of
> > x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> > x86-64-v4), there will be still a diverge.
> > Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
>
> The question is whether you want to mandate the 16-bit floating point
> extensions.  You might get better adoption if you stay compatible with
> shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
> current Intel CPUs, even though they can do 512-bit vectors.
Not only 16-bit floating point, here's a whole picture of  AVX512->AVX10 in
Figure 1-1. Intel® AVX-512 Feature Flags Across Intel® Xeon® Processor
Generations vs. Intel® AVX10
and Figure 1-2. Intel® ISA Families and Features
at https://cdrdv2.intel.com/v1/dl/getContent/784343 (this link is a
direct download of pdf).



>
> (The thread subject is a bit misleading for this sub-topic, by the way.)
>
> Thanks,
> Florian
>


-- 
BR,
Hongtao


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Zhang, Annita via Gcc-patches


> -Original Message-
> From: Florian Weimer 
> Sent: Wednesday, August 9, 2023 5:16 PM
> To: Hongtao Liu 
> Cc: Beulich, Jan ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com;
> Liu, Hongtao ; Zhang, Annita
> ; Wang, Phoebe ; x86-
> 64-abi ; llvm-dev ;
> Craig Topper ; Joseph Myers
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> * Hongtao Liu:
> 
> > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
> >> Aiui these ABI levels were intended to be incremental, i.e. higher
> >> versions would include everything earlier ones cover. Without such a
> >> guarantee, how would you propose compatibility checks to be
> >> implemented in a way
> 
> Correct, this was the intent.  But it's mostly to foster adoption and make it
> easier for developers to pick the variants that they want to target custom
> builds.  If it's an ascending chain, the trade-offs are simpler.
> 
> > Are there many software implemenation based on this assumption?
> > At least in GCC, it's not a big problem, we can adjust code for the
> > new micro-architecture level.
> 
> The glibc framework can deal with alternate choices in principle, although I'd
> prefer not to go there for the reasons indicated.
> 
> >> applicable both forwards and backwards? If a new level is wanted
> >> here, then I guess it could only be something like v3.5.
> 
> > But if we use avx10.1 as v3.5, it's still not subset of
> > x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> > x86-64-v4), there will be still a diverge.
> > Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
> 
> The question is whether you want to mandate the 16-bit floating point
> extensions.  You might get better adoption if you stay compatible with 
> shipping
> CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel CPUs,
> even though they can do 512-bit vectors.
> 
> (The thread subject is a bit misleading for this sub-topic, by the way.)
> 
> Thanks,
> Florian

Since 256bit and 512bit are diverged from AVX10.1 and will continue in the 
future AVX10 versions, I think it's hard to keep a single version number to 
cover both and increase monotonically. Hence I'd like to suggest x86-64-v5 for 
512bit and x86-64-v5-256 for 256bit, and so on. 

Thx,
Annita



 


Re: [PATCH] vect: Add a popcount fallback.

2023-08-09 Thread Robin Dapp via Gcc-patches
> We seem to be looking at promotions of the call argument, lhs_type
> is the same as the type of the call LHS.  But the comment mentions .POPCOUNT
> and the following code also handles others, so maybe handling should be
> moved.  Also when we look to vectorize popcount (x) instead of popcount((T)x)
> we can simply promote the result accordingly.

IMHO lhs_type is the type of the conversion

  lhs_oprnd = gimple_assign_lhs (last_stmt);
  lhs_type = TREE_TYPE (lhs_oprnd);

and rhs/unprom_diff has the type of the call's input argument

  rhs_oprnd = gimple_call_arg (call_stmt, 0);
  vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff);

So we can potentially have
  T0 arg
  T1 in = (T1)arg
  T2 ret = __builtin_popcount (in)
  T3 lhs = (T3)ret

and we're checking if precision (T0) == precision (T3).

This will never be true for a proper __builtin_popcountll except if
the return value is cast to uint64_t (which I just happened to do
in my test...).  Therefore it still doesn't really make sense to me.

Interestingly though, it helps for an aarch64 __builtin_popcountll
testcase where we abort here and then manage to vectorize via
vectorizable_call.  When we skip this check, recognition succeeds
and replaces the call with the pattern.  Then scalar costs are lower
than in the vectorizable_call case because __builtin_popcountll is
not STMT_VINFO_RELEVANT_P anymore (not live or so?).
Then, vectorization costs are too high compared to the wrong scalar
costs and we don't vectorize... Odd, might require fixing separately.
We might need to calculate the scalar costs in advance?

> It looks like vect_recog_popcount_clz_ctz_ffs_pattern is specifcally for
> the conversions, so your fallback should possibly apply even when not
> matching them.

Mhm, yes it appears to only match when casting the return value to
something else than an int.  So we'd need a fallback in vectorizable_call?
And it would potentially look a bit out of place there only handling
popcount and not ctz, clz, ...  Not sure if it is worth it then?

Regards
 Robin



Re: [pushed][LRA] Check input insn pattern hard regs against early clobber hard regs for live info

2023-08-09 Thread SenthilKumar.Selvaraj--- via Gcc-patches
On Fri, 2023-08-04 at 09:16 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> The following patch fixes a problem found by LRA port for avr target.
> The problem description is in the commit message.
> 
> The patch was successfully bootstrapped and tested on x86-64 and aarch64.

I can confirm it fixes the problem on avr - thank you.

Regards
Senthil


[PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread Juzhe-Zhong
Realize we have a bug in VSETVL PASS which is triggered by strided_load_run-1.c 
in RV32 system.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test

This is because VSETVL PASS incorrect hoist vsetvl instruction:

...
   10156:   0d9075d7vsetvli a1,zero,e64,m2,ta,ma ---> 
pollute 'a1' register which will be used by following insns.
   1015a:   01d586b3add a3,a1,t4  > use 'a1'
   1015e:   5e070257vmv.v.v v4,v14
   10162:   b7032257vmacc.vvv4,v6,v16
   10166:   26440257vand.vv v4,v4,v8
   1016a:   22880227vs2r.v  v4,(a6)
   1016e:   00b6b7b3sltua5,a3,a1
   10172:   22888227vs2r.v  v4,(a7)
   10176:   9e60b157vmv2r.v v2,v6
   1017a:   97baadd a5,a5,a4
   1017c:   a6a62157vmadd.vvv2,v12,v10
   10180:   26240157vand.vv v2,v2,v8
   10184:   22830127vs2r.v  v2,(t1)
   10188:   873emv  a4,a5
   1018a:   982aadd a6,a6,a0
   1018c:   98aaadd a7,a7,a0
   1018e:   932aadd t1,t1,a0
   10190:   85b6mv  a1,a3   -> set 'a1'
...

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix 
incorrect anticipate info.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-36.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  4 ++-
 .../gather-scatter/strided_load_run-1.c   |  1 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-24.c |  2 +-
 .../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 31 +--
 .../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 30 +-
 .../riscv/rvv/vsetvl/vlmax_back_prop-36.c |  2 +-
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-14.c  | 10 +++---
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-15.c  | 14 -
 8 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 628bf116db0..08c487d82c0 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -333,7 +333,9 @@ anticipatable_occurrence_p (const bb_info *bb, const 
vector_insn_info dem)
   if (dem.has_avl_reg ())
 {
   /* rs1 (avl) are not modified in the basic block prior to the VSETVL.  */
-  if (!vlmax_avl_p (dem.get_avl ()))
+  rtx avl
+   = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl ();
+  if (!vlmax_avl_p (avl))
{
  set_info *set = dem.get_avl_source ();
  /* If it's undefined, it's not anticipatable conservatively.  */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
index 4b03c25a907..7ffa93bf13f 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
 
 #include "strided_load-1.c"
 #include 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c
index bc98e5f8269..fe41d15cb28 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c
@@ -30,7 +30,7 @@ void f (int32_t * restrict in, int32_t * restrict out, int n, 
int cond)
   *(vint32mf2_t*)(out + 7000) = v;
  
   for (int i = 0; i < n; i++) {
-vbool64_t v;
+vbool64_t v = *(vbool64_t*)(in + i + 9000);
 *(vbool64_t*)(out + i + 700) = v;
   }
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
index 0a10827daf5..c566f8a4751 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/

Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread Richard Biener via Gcc-patches
On Wed, 9 Aug 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, this patch is adding loop len control on extract_last autovectorization.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE)\
>   TYPE __attribute__ ((noinline, noclone))\
>   test_##TYPE (TYPE *x, int n, TYPE value)\
>   {   \
> TYPE last;\
> for (int j = 0; j < n; ++j)   \
>   {   \
>   last = x[j];\
>   x[j] = last * value;\
>   }   \
> return last;  \
>   }
> 
> #define TEST_ALL(T)   \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ...
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> 
> For RVV since we prefer len in loop control, after this patch for RVV:
> 
> Loop:
>   ...
>   loop_len_22 = SELECT_VL;
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23);
> 
> This patch didn't add a new pattern for length loop control of extract_last.
> Instead we reuse current extract_last.
> 
> Here is the code:
> 
> Step 1 - Enable length and record length for extract_last:
> 
> +   machine_mode vec_mode = TYPE_MODE (vectype);
> +   if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> + vect_record_loop_len (loop_vinfo,
> +   &LOOP_VINFO_LENS (loop_vinfo), 1,
> +   vectype, 1);
> +   else
> + vect_record_loop_mask (loop_vinfo,
> +&LOOP_VINFO_MASKS (loop_vinfo), 1,
> +vectype, NULL);
> 
> We use 'get_len_load_store_mode' to check whether targets support loop len 
> control or not.
> If yes, record a loop len.
> 
> Step 2 - Build EXTRACT_LAST with len:
> 
> -   tree mask = vect_get_loop_mask (loop_vinfo, gsi,
> -   &LOOP_VINFO_MASKS (loop_vinfo),
> -   1, vectype, 0);
> +   tree control;
> +   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + control = vect_get_loop_len (loop_vinfo, gsi,
> +  &LOOP_VINFO_LENS (loop_vinfo), 1,
> +  vectype, 0, 0);
> +   else
> + control = vect_get_loop_mask (loop_vinfo, gsi,
> +   &LOOP_VINFO_MASKS (loop_vinfo), 1,
> +   vectype, 0);
> tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -   mask, vec_lhs_phi);
> +   control, vec_lhs_phi);
> 
> Reuse the current codes (build EXTRACT_LAST with mask), build length instead 
> if
> 'LOOP_VINFO_FULLY_WITH_LENGTH_P' is true.
> 
> This patch has been fully tested in RISC-V port.
> 
> Bootstrap and Regression on X86 passed.
> 
> Ok for trunk ?
> 
> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (vectorizable_live_operation): Add length control.
> 
> ---
>  gcc/tree-vect-loop.cc | 40 
>  1 file changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 00058c3c13e..fde098cafde 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10311,9 +10311,15 @@ vectorizable_live_operation (vec_info *vinfo,
> else
>   {
> gcc_assert (ncopies == 1 && !slp_node);
> -   vect_record_loop_mask (loop_vinfo,
> -  &LOOP_VINFO_MASKS (loop_vinfo),
> -  1, vectype, NULL);
> +   machine_mode vec_mode = TYPE_MODE (vectype);
> +   if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> + vect_record_loop_len (loop_vinfo,
> +   &LOOP_VINFO_LENS (loop_vinfo), 1,
> +   vectype, 1);
> +   else
> + vect_record_loop_mask (loop_vinfo,
> +&LOOP_VINFO_MASKS (loop_vinfo), 1,
> +   

[PATCH] Remove insert location argument from vectorizable_live_operation

2023-08-09 Thread Richard Biener via Gcc-patches
The insert location argument isn't actually used but we compute
that ourselves.  There's a single spot, namely when asking
for the loop mask via vect_get_loop_mask that the passed argument
is used but that looks like an oversight.  The following fixes that
and adjusts vectorizable_live_operation and can_vectorize_live_stmts
to no longer take a stmt iterator argument.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vectorizer.h (vectorizable_live_operation): Remove
gimple_stmt_iterator * argument.
* tree-vect-loop.cc (vectorizable_live_operation): Likewise.
Adjust plumbing around vect_get_loop_mask.
(vect_analyze_loop_operations): Adjust.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Likewise.
(vect_bb_slp_mark_live_stmts): Likewise.
(vect_schedule_slp_node): Likewise.
* tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise.
Remove gimple_stmt_iterator * argument.
(vect_transform_stmt): Adjust.
---
 gcc/tree-vect-loop.cc  | 12 ++--
 gcc/tree-vect-slp.cc   |  8 +++-
 gcc/tree-vect-stmts.cc | 14 ++
 gcc/tree-vectorizer.h  |  3 +--
 4 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c2241ed0eb4..1cd6bb43194 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2061,8 +2061,7 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
  if (ok
  && STMT_VINFO_LIVE_P (stmt_info)
  && !PURE_SLP_STMT (stmt_info))
-   ok = vectorizable_live_operation (loop_vinfo,
- stmt_info, NULL, NULL, NULL,
+   ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
  -1, false, &cost_vec);
 
   if (!ok)
@@ -10190,9 +10189,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
it can be supported.  */
 
 bool
-vectorizable_live_operation (vec_info *vinfo,
-stmt_vec_info stmt_info,
-gimple_stmt_iterator *gsi,
+vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 slp_tree slp_node, slp_instance slp_node_instance,
 int slp_index, bool vec_stmt_p,
 stmt_vector_for_cost *cost_vec)
@@ -10398,9 +10395,12 @@ vectorizable_live_operation (vec_info *vinfo,
 the loop mask for the final iteration.  */
  gcc_assert (ncopies == 1 && !slp_node);
  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
- tree mask = vect_get_loop_mask (loop_vinfo, gsi,
+ gimple_seq tem = NULL;
+ gimple_stmt_iterator gsi = gsi_last (tem);
+ tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
  &LOOP_VINFO_MASKS (loop_vinfo),
  1, vectype, 0);
+ gimple_seq_add_seq (&stmts, tem);
  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
  mask, vec_lhs_phi);
 
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 6cfcda202e9..18119c09965 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6014,8 +6014,7 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, slp_stmt_info)
{
  if (STMT_VINFO_LIVE_P (slp_stmt_info)
- && !vectorizable_live_operation (vinfo,
-  slp_stmt_info, NULL, node,
+ && !vectorizable_live_operation (vinfo, slp_stmt_info, node,
   node_instance, i,
   false, cost_vec))
return false;
@@ -6332,7 +6331,7 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo, 
slp_tree node,
  {
STMT_VINFO_LIVE_P (stmt_info) = true;
if (vectorizable_live_operation (bb_vinfo, stmt_info,
-NULL, node, instance, i,
+node, instance, i,
 false, cost_vec))
  /* ???  So we know we can vectorize the live stmt
 from one SLP node.  If we cannot do so from all
@@ -9049,8 +9048,7 @@ vect_schedule_slp_node (vec_info *vinfo,
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, slp_stmt_info)
if (STMT_VINFO_LIVE_P (slp_stmt_info))
  {
-   done = vectorizable_live_operation (vinfo,
-   slp_stmt_info, &si, node,
+   done = vectorizable_live_operation (vinfo, slp_stmt_info, node,
instanc

[Patch, fortran] PR109684 - compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-09 Thread Paul Richard Thomas via Gcc-patches
Committed to trunk as 'obvious' in
r14-3098-gb8ec3c952324f866f191883473922e250be81341

13-branch to follow in a few days.

Paul


Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> that should be

>>   || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
>>   && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))

>> I think.  It seems to imply that SLP isn't supported with
>> masking/lengthing.

Oh, yes.  At first glance, the original code is quite suspicious and your 
comments make sense to me.

>> Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently?
>> Don't you need some CFN_LEN_EXTRACT_LAST instead?

I think CFN_EXTRACT_LAST always has either loop mask or loop len.

When both mask and length are not needed, IMHO, I think current BIT_FIELD_REF 
flow is good enough:
https://godbolt.org/z/Yr5M9hcc6

So I think we don't need CFN_LEN_EXTRACT_LAST. 

Instead, I think we will need CFN_LEN_FOLD_EXTRACT_LAST in the next patch.

Feel free to correct me it I am wrong.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-09 19:00
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Wed, 9 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, this patch is adding loop len control on extract_last autovectorization.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE) \
>   TYPE __attribute__ ((noinline, noclone)) \
>   test_##TYPE (TYPE *x, int n, TYPE value) \
>   { \
> TYPE last; \
> for (int j = 0; j < n; ++j) \
>   { \
> last = x[j]; \
> x[j] = last * value; \
>   } \
> return last; \
>   }
> 
> #define TEST_ALL(T) \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ...
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> 
> For RVV since we prefer len in loop control, after this patch for RVV:
> 
> Loop:
>   ...
>   loop_len_22 = SELECT_VL;
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23);
> 
> This patch didn't add a new pattern for length loop control of extract_last.
> Instead we reuse current extract_last.
> 
> Here is the code:
> 
> Step 1 - Enable length and record length for extract_last:
> 
> +   machine_mode vec_mode = TYPE_MODE (vectype);
> +   if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> + vect_record_loop_len (loop_vinfo,
> +   &LOOP_VINFO_LENS (loop_vinfo), 1,
> +   vectype, 1);
> +   else
> + vect_record_loop_mask (loop_vinfo,
> +&LOOP_VINFO_MASKS (loop_vinfo), 1,
> +vectype, NULL);
> 
> We use 'get_len_load_store_mode' to check whether targets support loop len 
> control or not.
> If yes, record a loop len.
> 
> Step 2 - Build EXTRACT_LAST with len:
> 
> -   tree mask = vect_get_loop_mask (loop_vinfo, gsi,
> -   &LOOP_VINFO_MASKS (loop_vinfo),
> -   1, vectype, 0);
> +   tree control;
> +   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + control = vect_get_loop_len (loop_vinfo, gsi,
> + &LOOP_VINFO_LENS (loop_vinfo), 1,
> + vectype, 0, 0);
> +   else
> + control = vect_get_loop_mask (loop_vinfo, gsi,
> +   &LOOP_VINFO_MASKS (loop_vinfo), 1,
> +   vectype, 0);
>tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -   mask, vec_lhs_phi);
> +   control, vec_lhs_phi);
> 
> Reuse the current codes (build EXTRACT_LAST with mask), build length instead 
> if
> 'LOOP_VINFO_FULLY_WITH_LENGTH_P' is true.
> 
> This patch has been fully tested in RISC-V port.
> 
> Bootstrap and Regression on X86 passed.
> 
> Ok for trunk ?
> 
> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (vectorizable_live_operation): Add length control.
> 
> ---
>  gcc/tree-vect-loop.cc | 40 
>  1 file changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 00058c3c13e..fde098cafde 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10311,9 +10311,15 @@ vectorizable_live_operation (vec_info *vinfo,
>else
>  {
>gcc_assert (ncopies == 1 && !slp_node);
> -   vect_record_loop_mask (loop_vinfo,
> -  &LOOP_VINFO_MASKS (loop_vinfo),
> -  1, vectype, NULL);
> +   machine_mode vec_mode = TYPE_MODE (vectype);
> +   if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> + vect_record_loop_len (loop_vinfo,
> +   &LOOP_VINFO_LENS (loop_vinfo), 1,
> +   vectype, 1);
> +   else
> + vect_record_loop_mask (loop_vinfo,
> +&LOOP_VINFO_MASKS (loop_vinfo), 1,
> +vectype, NULL);
>  

Re: [PATCH] aarch64: SVE/NEON Bridging intrinsics

2023-08-09 Thread Richard Sandiford via Gcc-patches
Richard Ball  writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
> gcc/ChangeLog:
>
>   * config.gcc: Adds new header to config.
>   * config/aarch64/aarch64-builtins.cc (GTY): Externs aarch64_simd_types.
>   * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
>   Defines pragma for arm_neon_sve_bridge.h.
>   * config/aarch64/aarch64-protos.h: New function.
>   * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
>   * config/aarch64/aarch64-sve-builtins-base.cc
>   (class svget_neonq_impl): New intrinsic implementation.
>   (class svset_neonq_impl): Likewise.
>   (class svdup_neonq_impl): Likewise.
>   (NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
>   * config/aarch64/aarch64-sve-builtins-functions.h
>   (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE 
> functions.
>   * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
>   * config/aarch64/aarch64-sve-builtins-shapes.cc
>   (parse_neon_type): Parser for NEON types.
>   (parse_element_type): Add NEON element types.
>   (parse_type): Likewise.
>   (NEON_SVE_BRIDGE_SHAPE): Defines macro for NEON_SVE_BRIDGE shapes.
>   (struct get_neonq_def): Defines function shape for get_neonq.
>   (struct set_neonq_def): Defines function shape for set_neonq.
>   (struct dup_neonq_def): Defines function shape for dup_neonq.
>   * config/aarch64/aarch64-sve-builtins.cc (DEF_NEON_SVE_FUNCTION): 
> Defines
>   macro for NEON_SVE_BRIDGE functions.
>   (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
>   * config/aarch64/aarch64-builtins.h: New header file to extern neon 
> types.
>   * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New instrinsics
>   function def file.
>   * config/aarch64/arm_neon_sve_bridge.h: New header file.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.c-torture/execute/neon-sve-bridge.c: New test.
>
> #
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 
> d88071773c9e1280cc5f38e36e09573214323b48..ca55992200dbe58782c3dbf66906339de021ba6b
>  
> 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -334,7 +334,7 @@ m32c*-*-*)
>;;
>aarch64*-*-*)
>   cpu_type=aarch64
> - extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
> + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
> arm_neon_sve_bridge.h"
>   c_target_objs="aarch64-c.o"
>   cxx_target_objs="aarch64-c.o"
>   d_target_objs="aarch64-d.o"
> diff --git a/gcc/config/aarch64/aarch64-builtins.h 
> b/gcc/config/aarch64/aarch64-builtins.h
> new file mode 100644
> index 
> ..eebde448f92c230c8f88b4da1ca8ebd9670b1536
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-builtins.h
> @@ -0,0 +1,86 @@
> +/* Builtins' description for AArch64 SIMD architecture.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of GCC.
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +#ifndef GCC_AARCH64_BUILTINS_H
> +#define GCC_AARCH64_BUILTINS_H
> +#include "tree.h"

It looks like the include shouldn't be needed.  tree is forward-declared
in coretypes.h, which is included everywhere.

> +enum aarch64_type_qualifiers
> +{
> +  /* T foo.  */
> +  qualifier_none = 0x0,
> +  /* unsigned T foo.  */
> +  qualifier_unsigned = 0x1, /* 1 << 0  */
> +  /* const T foo.  */
> +  qualifier_const = 0x2, /* 1 << 1  */
> +  /* T *foo.  */
> +  qualifier_pointer = 0x4, /* 1 << 2  */
> +  /* Used when expanding arguments if an operand could
> + be an immediate.  */
> +  qualifier_immediate = 0x8, /* 1 << 3  */
> +  qualifier_maybe_immediate = 0x10, /* 1 << 4  */
> +  /* void foo (...).  */
> +  qualifier_void = 0x20, /* 1 << 5  */
> +  /* 1 << 6 is now unused */
> +  /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum
> + rather than using the type of the operand.  */
> +  qualifier_map_mode = 0x80, /* 1 << 7  */
> +  /* qualifier_pointer | qualifier_map_mode  */
> +  quali

[PATCH v2 00/14] LoongArch: Add loongarch32 and ilp32 abi

2023-08-09 Thread Jiajie Chen via Gcc-patches
The patch series add loongarch32 and ilp32 abi support to gcc. One can
build libgcc, libatomic and glibc etc and generate a complete
loongarch32-unknown-linux-gnu-toolchain with minimal patches at:

- binutils: https://github.com/jiegec/binutils-gdb/tree/loongarch32
- glibc: https://github.com/jiegec/glibc/tree/loongarch32
- crosstool-ng: https://github.com/jiegec/crosstool-ng/tree/loongarch32

We will wait for folks at Loongson to complete abi documentation of
ilp32d abi.

Changes since v1:

- Can build a complete toolchain with glibc
- Fix sizeof(long double) in ilp32 abi
- Fix ftintrz generation for loongarch32

Full changes:

Jiajie Chen (14):
  LoongArch: Introduce loongarch32 target
  LoongArch: Fix default ISA setting
  LoongArch: Fix SI division for loongarch32 target
  LoongArch: Fix movgr2frh.w operand order
  LoongArch: Fix 64-bit move for loongarch32 target
  LoongArch: Fix 64-bit immediate move for loongarch32 target
  LoongArch: Fix signed 32-bit overflow for loongarch32 target
  LoongArch: Disable SF/DF -> unsigned DI expand in loongarch32
  LoongArch: Add -march=loongarch64 to tests with -mabi=lp64d
  LoongArch: Forbid ADDRESS_REG_REG in loongarch32
  LoongArch: Mark am* instructions as LA64-only
  LoongArch: Set long double width to 128 in la32
  LoongArch: Fix ilp32 detection
  LoongArch: Allow ftintrz for DF->DI in loongarch32

 contrib/config-list.mk|  1 +
 gcc/config.gcc| 61 ---
 .../loongarch/genopts/loongarch-strings   |  5 ++
 gcc/config/loongarch/genopts/loongarch.opt.in | 12 
 gcc/config/loongarch/gnu-user.h   |  3 +
 gcc/config/loongarch/linux.h  |  8 ++-
 gcc/config/loongarch/loongarch-c.cc   | 12 
 gcc/config/loongarch/loongarch-def.c  | 33 ++
 gcc/config/loongarch/loongarch-def.h  | 25 +---
 gcc/config/loongarch/loongarch-driver.h   |  4 ++
 gcc/config/loongarch/loongarch-opts.cc| 27 ++--
 gcc/config/loongarch/loongarch-opts.h | 20 --
 gcc/config/loongarch/loongarch-str.h  |  5 ++
 gcc/config/loongarch/loongarch.cc |  7 ++-
 gcc/config/loongarch/loongarch.h  |  2 +-
 gcc/config/loongarch/loongarch.md | 39 ++--
 gcc/config/loongarch/loongarch.opt| 12 
 gcc/config/loongarch/sync.md  | 10 +--
 gcc/config/loongarch/t-linux  | 16 -
 gcc/testsuite/g++.target/loongarch/bytepick.C |  2 +-
 gcc/testsuite/g++.target/loongarch/pr106828.C |  2 +-
 .../gcc.target/loongarch/add-const.c  |  2 +-
 gcc/testsuite/gcc.target/loongarch/arch-1.c   |  5 ++
 gcc/testsuite/gcc.target/loongarch/arch-2.c   |  5 ++
 .../gcc.target/loongarch/array-ldx.c  |  6 ++
 .../gcc.target/loongarch/attr-model-1.c   |  2 +-
 .../gcc.target/loongarch/attr-model-2.c   |  2 +-
 .../gcc.target/loongarch/flt-abi-isa-1.c  |  2 +-
 gcc/testsuite/gcc.target/loongarch/fscaleb.c  |  2 +-
 .../gcc.target/loongarch/ftint-no-inexact.c   |  2 +-
 gcc/testsuite/gcc.target/loongarch/ftint.c|  2 +-
 .../gcc.target/loongarch/func-call-1.c|  2 +-
 .../gcc.target/loongarch/func-call-2.c|  2 +-
 .../gcc.target/loongarch/func-call-3.c|  2 +-
 .../gcc.target/loongarch/func-call-4.c|  2 +-
 .../gcc.target/loongarch/func-call-5.c|  2 +-
 .../gcc.target/loongarch/func-call-6.c|  2 +-
 .../gcc.target/loongarch/func-call-7.c|  2 +-
 .../gcc.target/loongarch/func-call-8.c|  2 +-
 .../loongarch/func-call-extreme-1.c   |  2 +-
 .../loongarch/func-call-extreme-2.c   |  2 +-
 .../gcc.target/loongarch/func-call-medium-1.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-2.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-3.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-4.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-5.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-6.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-7.c |  2 +-
 .../gcc.target/loongarch/func-call-medium-8.c |  2 +-
 gcc/testsuite/gcc.target/loongarch/imm-load.c |  2 +-
 .../gcc.target/loongarch/imm-load1.c  |  2 +-
 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c |  2 +-
 .../gcc.target/loongarch/pr109465-1.c |  2 +-
 .../gcc.target/loongarch/pr109465-2.c |  2 +-
 .../gcc.target/loongarch/pr109465-3.c |  2 +-
 .../gcc.target/loongarch/prolog-opt.c |  2 +-
 .../loongarch/relocs-symbol-noaddend.c|  2 +-
 .../loongarch/zero-size-field-pass.c  |  2 +-
 .../loongarch/zero-size-field-ret.c   |  2 +-
 libitm/config/loongarch/asm.h |  2 +-
 60 files changed, 298 insertions(+), 96 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/arch-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/arch-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/array-ldx.c

-- 
2.41.0



[PATCH v2 02/14] LoongArch: Fix default ISA setting

2023-08-09 Thread Jiajie Chen via Gcc-patches
When loongarch_arch_target is called, la_target has not been
initialized, thus the macro LARCH_ACTUAL_ARCH always equals to zero.

This commit fixes by expanding the macro and reading the latest value.
It permits -march=loongarch64 when the default target is loongarch32 and
vice versa.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.cc (loongarch_config_target):
Fix -march detection.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/arch-1.c: New test.
* gcc.target/loongarch/arch-2.c: New test.
---
 gcc/config/loongarch/loongarch-opts.cc  | 5 -
 gcc/testsuite/gcc.target/loongarch/arch-1.c | 5 +
 gcc/testsuite/gcc.target/loongarch/arch-2.c | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/arch-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/arch-2.c

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 9fc0bbbcb6e..29c0c4468bb 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -246,7 +246,10 @@ loongarch_config_target (struct loongarch_target *target,
 config_target_isa:
 
   /* Get default ISA from "-march" or its default value.  */
-  t.isa = loongarch_cpu_default_isa[LARCH_ACTUAL_ARCH];
+  if (t.cpu_arch == TARGET_ARCH_NATIVE)
+t.isa = loongarch_cpu_default_isa[t.cpu_native];
+  else
+t.isa = loongarch_cpu_default_isa[t.cpu_arch];
 
   /* Apply incremental changes.  */
   /* "-march=native" overrides the default FPU type.  */
diff --git a/gcc/testsuite/gcc.target/loongarch/arch-1.c 
b/gcc/testsuite/gcc.target/loongarch/arch-1.c
new file mode 100644
index 000..379036ec76f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/arch-1.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=loongarch64 -mabi=lp64d" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/arch-2.c 
b/gcc/testsuite/gcc.target/loongarch/arch-2.c
new file mode 100644
index 000..55d646902a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/arch-2.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=loongarch32 -mabi=ilp32d" } */
+int foo()
+{
+}
-- 
2.41.0



[PATCH v2 05/14] LoongArch: Fix 64-bit move for loongarch32 target

2023-08-09 Thread Jiajie Chen via Gcc-patches
Bring back 64-bit move splitting for loongarch32. The code was removed
in commit 16fc26d4e7a (`LoongArch: Support split symbol.`) for unknown
reason.

gcc/ChangeLog:

* config/loongarch/loongarch.md: Handle move splitting for
64-bit operands.
---
 gcc/config/loongarch/loongarch.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 93d8bf5bcca..9eb6bb75c35 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1965,6 +1965,16 @@
   [(set_attr "move_type" "move,load,store")
(set_attr "mode" "DF")])
 
+(define_split
+  [(set (match_operand:MOVE64 0 "nonimmediate_operand")
+   (match_operand:MOVE64 1 "move_operand"))]
+  "reload_completed && loongarch_split_move_p (operands[0], operands[1])"
+  [(const_int 0)]
+{
+  loongarch_split_move (operands[0], operands[1], curr_insn);
+  DONE;
+})
+
 ;; Emit a doubleword move in which exactly one of the operands is
 ;; a floating-point register.  We can't just emit two normal moves
 ;; because of the constraints imposed by the FPU register model;
-- 
2.41.0



[PATCH v2 04/14] LoongArch: Fix movgr2frh.w operand order

2023-08-09 Thread Jiajie Chen via Gcc-patches
The operand order of movgr2frh.w was wrong. The correct order should be
`movgr2frh.w fd, rj`.

gcc/ChangeLog:

* config/loongarch/loongarch.md (movgr2frh): Correct
movgr2frh.w operand order.
---
 gcc/config/loongarch/loongarch.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 95c5b25d22a..93d8bf5bcca 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2297,7 +2297,7 @@
(match_operand:SPLITF 2 "register_operand" "0")]
UNSPEC_MOVGR2FRH))]
   "TARGET_DOUBLE_FLOAT"
-  "movgr2frh.w\t%z1,%0"
+  "movgr2frh.w\t%0,%z1"
   [(set_attr "move_type" "mgtf")
(set_attr "mode" "")])
 
-- 
2.41.0



[PATCH v2 01/14] LoongArch: Introduce loongarch32 target

2023-08-09 Thread Jiajie Chen via Gcc-patches
Introduce loongarch32 target and ilp32 abi variants. The ilp32d abi
variant is selected as the default abi of loongarch32 target.

Currently, ilp32 abi can only be used for loongarch32, but in the
future, it might be possible to use ilp32 abi for loongarch64.

contrib/ChangeLog:

* config-list.mk: Add loongarch32-linux-gnu*.

gcc/ChangeLog:

* config.gcc: Add target triple loongarch32-*-*-* and
corresponding abi ilp32f, ilp32d and ilp32s.
* config/loongarch/genopts/loongarch-strings: Add strings for
loongarch32 and ilp32 abi variants.
* config/loongarch/genopts/loongarch.opt.in: Add
-march=loongarch32 and -mabi=ilp32d/ilp32f/ilp32s.
* config/loongarch/gnu-user.h: Add ilp32 abi variants to spec.
* config/loongarch/linux.h: Add ABI_LIBDIR for ilp32 abi
variants.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins):
Add builtin definitions for loongarch32 target.
* config/loongarch/loongarch-def.c: Add loongarch32 and ilp32
definitions.
* config/loongarch/loongarch-def.h: Add loongarch32 and ilp32
definitions.
* config/loongarch/loongarch-driver.h: Add ilp32 abi variants to
spec.
* config/loongarch/loongarch-opts.cc: Handle ilp32 abi variants.
* config/loongarch/loongarch-opts.h: Add loongarch32 case to
macros.
* config/loongarch/loongarch-str.h: Add loongarch32 and ilp32
strings.
* config/loongarch/loongarch.cc: Disable -fpcc-struct-return for
ilp32.
* config/loongarch/loongarch.opt: Add -march=loongarch32 and
-mabi=ilp32d/ilp32f/ilp32s.
* config/loongarch/t-linux: Add ilp32 abi variants to multilib.
---
 contrib/config-list.mk|  1 +
 gcc/config.gcc| 61 ---
 .../loongarch/genopts/loongarch-strings   |  5 ++
 gcc/config/loongarch/genopts/loongarch.opt.in | 12 
 gcc/config/loongarch/gnu-user.h   |  3 +
 gcc/config/loongarch/linux.h  |  8 ++-
 gcc/config/loongarch/loongarch-c.cc   | 12 
 gcc/config/loongarch/loongarch-def.c  | 33 ++
 gcc/config/loongarch/loongarch-def.h  | 25 +---
 gcc/config/loongarch/loongarch-driver.h   |  4 ++
 gcc/config/loongarch/loongarch-opts.cc| 22 ++-
 gcc/config/loongarch/loongarch-opts.h | 20 --
 gcc/config/loongarch/loongarch-str.h  |  5 ++
 gcc/config/loongarch/loongarch.cc |  2 +-
 gcc/config/loongarch/loongarch.opt| 12 
 gcc/config/loongarch/t-linux  | 16 -
 16 files changed, 210 insertions(+), 31 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index e570b13c71b..3c00ce5410a 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -57,6 +57,7 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
   ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \
   lm32-rtems lm32-uclinux \
+  loongarch32-linux-gnuf64 loongarch32-linux-gnuf32 loongarch32-linux-gnusf \
   loongarch64-linux-gnuf64 loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
   m32c-elf m32r-elf m32rle-elf \
   m68k-elf m68k-netbsdelf \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 415e0e1ebc5..45e69b24b44 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4901,10 +4901,24 @@ case "${target}" in
arch_pattern arch_default   \
fpu_pattern  fpu_default\
tune_pattern tune_default   \
-   triplet_os   triplet_abi
+   triplet_os   triplet_abi\
+   triplet_cpu
 
# Infer ABI from the triplet.
case ${target} in
+   loongarch32-*-*-*f64)
+   abi_pattern="ilp32d"
+   ;;
+   loongarch32-*-*-*f32)
+   abi_pattern="ilp32f"
+   ;;
+   loongarch32-*-*-*sf)
+   abi_pattern="ilp32s"
+   ;;
+   loongarch32-*-*-*)
+   abi_pattern="ilp32[dfs]"
+   abi_default="ilp32d"
+   ;;
loongarch64-*-*-*f64)
abi_pattern="lp64d"
;;
@@ -4939,7 +4953,7 @@ case "${target}" in
 
# Perform initial sanity checks on --with-* options.
case ${with_arch} in
-   "" | loongarch64 | la464) ;; # OK, append here.
+   "" | loongarch32 | loongarch64 | la464) ;; # OK, append here.
native)
if test x${host} != x${target}; then
echo "--with-arch=native is illegal for 
cross-compiler." 1>

[PATCH v2 12/14] LoongArch: Set long double width to 128 in la32

2023-08-09 Thread Jiajie Chen via Gcc-patches
According to latest loongarch procedure call standard, sizeof(long
double) == 128 in ilp32 data model regardless of target bitness.

gcc/ChangeLog:

* config/loongarch/loongarch.h: Set LONG_DOUBLE_TYPE_SIZE to 128
regardless of target bitness.
---
 gcc/config/loongarch/loongarch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index eca723293a1..ab0c80c69c1 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -205,7 +205,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #define FLOAT_TYPE_SIZE 32
 #define DOUBLE_TYPE_SIZE 64
-#define LONG_DOUBLE_TYPE_SIZE (TARGET_64BIT ? 128 : 64)
+#define LONG_DOUBLE_TYPE_SIZE 128
 
 /* Define the sizes of fixed-point types.  */
 #define SHORT_FRACT_TYPE_SIZE 8
-- 
2.41.0



[PATCH v2 06/14] LoongArch: Fix 64-bit immediate move for loongarch32 target

2023-08-09 Thread Jiajie Chen via Gcc-patches
loongarch_move_integer does not support splitting 64-bit integer into
two 32-bit ones. Thus, define_split is removed from movdi_32bit and
TARGET_64BIT is added to the split condition of movdi_64bit to avoid
using it for loongarch32.

gcc/ChangeLog:

* config/loongarch/loongarch.md (movdi_32bit): Remove not
working split, use existing loongarch_split_move instead.
(movdi_64bit): Add TARGET_64BIT to split condition.
---
 gcc/config/loongarch/loongarch.md | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 9eb6bb75c35..c611a8a822a 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1777,22 +1777,13 @@
 DONE;
 })
 
-(define_insn_and_split "*movdi_32bit"
+(define_insn "*movdi_32bit"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,w,*f,*f,*r,*m")
(match_operand:DI 1 "move_operand" "r,i,w,r,*J*r,*m,*f,*f"))]
   "!TARGET_64BIT
&& (register_operand (operands[0], DImode)
|| reg_or_0_operand (operands[1], DImode))"
   { return loongarch_output_move (operands[0], operands[1]); }
-  "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
-  (operands[0]))"
-  [(const_int 0)]
-  "
-{
-  loongarch_move_integer (operands[0], operands[0], INTVAL (operands[1]));
-  DONE;
-}
-  "
   [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore")
(set_attr "mode" "DI")])
 
@@ -1804,7 +1795,7 @@
|| reg_or_0_operand (operands[1], DImode))"
   { return loongarch_output_move (operands[0], operands[1]); }
   "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
-  (operands[0]))"
+  (operands[0])) && TARGET_64BIT"
   [(const_int 0)]
   "
 {
-- 
2.41.0



[PATCH v2 03/14] LoongArch: Fix SI division for loongarch32 target

2023-08-09 Thread Jiajie Chen via Gcc-patches
Add TARGET_64BIT check for loongarch64-only handling of SI division. It
shall not promote SI to DI before division in loongarch32 target.

gcc/ChangeLog:

* config/loongarch/loongarch.md: Add TARGET_64BIT check for
loongarch64-only handling of SI division.
---
 gcc/config/loongarch/loongarch.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index b37e070660f..95c5b25d22a 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -851,7 +851,7 @@
 (match_operand:GPR 2 "register_operand")))]
   ""
 {
- if (GET_MODE (operands[0]) == SImode)
+ if (GET_MODE (operands[0]) == SImode && TARGET_64BIT)
   {
 rtx reg1 = gen_reg_rtx (DImode);
 rtx reg2 = gen_reg_rtx (DImode);
-- 
2.41.0



[PATCH v2 07/14] LoongArch: Fix signed 32-bit overflow for loongarch32 target

2023-08-09 Thread Jiajie Chen via Gcc-patches
When rhs equals to 0x7fff, adding 1 to rhs overflows SI, generating
invalid const_int.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_emit_int_compare):
Call trunc_int_mode to ensure valid rhs.
---
 gcc/config/loongarch/loongarch.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index c980de98758..49df9509ba9 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4284,6 +4284,7 @@ loongarch_emit_int_compare (enum rtx_code *code, rtx 
*op0, rtx *op1)
break;
 
  new_rhs = rhs + (increment ? 1 : -1);
+ new_rhs = trunc_int_for_mode (new_rhs, GET_MODE (*op0));
  if (loongarch_integer_cost (new_rhs)
< loongarch_integer_cost (rhs))
{
-- 
2.41.0



[PATCH v2 08/14] LoongArch: Disable SF/DF -> unsigned DI expand in loongarch32

2023-08-09 Thread Jiajie Chen via Gcc-patches
The current SF/DF -> unsigned DI expand rules require iordi3 insn which
is not available in loongarch32.

gcc/ChangeLog:

* config/loongarch/loongarch.md (fixuns_truncdfdi2): Add
TARGET_64BIT to condition.
(fixuns_truncsfdi2): Add TARGET_64BIT to condition.
---
 gcc/config/loongarch/loongarch.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index c611a8a822a..31bdf3388f6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1576,7 +1576,7 @@
 (define_expand "fixuns_truncdfdi2"
   [(set (match_operand:DI 0 "register_operand")
(unsigned_fix:DI (match_operand:DF 1 "register_operand")))]
-  "TARGET_DOUBLE_FLOAT"
+  "TARGET_DOUBLE_FLOAT && TARGET_64BIT"
 {
   rtx reg1 = gen_reg_rtx (DFmode);
   rtx reg2 = gen_reg_rtx (DFmode);
@@ -1658,7 +1658,7 @@
 (define_expand "fixuns_truncsfdi2"
   [(set (match_operand:DI 0 "register_operand")
(unsigned_fix:DI (match_operand:SF 1 "register_operand")))]
-  "TARGET_DOUBLE_FLOAT"
+  "TARGET_DOUBLE_FLOAT && TARGET_64BIT"
 {
   rtx reg1 = gen_reg_rtx (SFmode);
   rtx reg2 = gen_reg_rtx (SFmode);
-- 
2.41.0



[PATCH v2 09/14] LoongArch: Add -march=loongarch64 to tests with -mabi=lp64d

2023-08-09 Thread Jiajie Chen via Gcc-patches
The compiler emits a warning if the current target (-march=loongarch32)
mismatches with abi(-march=lp64d). Adding: Add -march=loongarch64
explicitly fixes the tests.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/bytepick.C: Add -march=loongarch64
* g++.target/loongarch/pr106828.C: Add -march=loongarch64
* gcc.target/loongarch/add-const.c: Add -march=loongarch64
* gcc.target/loongarch/arch-1.c: Add -march=loongarch64
* gcc.target/loongarch/attr-model-1.c: Add -march=loongarch64
* gcc.target/loongarch/attr-model-2.c: Add -march=loongarch64
* gcc.target/loongarch/flt-abi-isa-1.c: Add -march=loongarch64
* gcc.target/loongarch/fscaleb.c: Add -march=loongarch64
* gcc.target/loongarch/ftint-no-inexact.c: Add
-march=loongarch64
* gcc.target/loongarch/ftint.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-1.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-2.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-3.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-4.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-5.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-6.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-7.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-8.c: Add -march=loongarch64
* gcc.target/loongarch/func-call-extreme-1.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-extreme-2.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-1.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-2.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-3.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-4.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-5.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-6.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-7.c: Add
-march=loongarch64
* gcc.target/loongarch/func-call-medium-8.c: Add
-march=loongarch64
* gcc.target/loongarch/imm-load.c: Add -march=loongarch64
* gcc.target/loongarch/imm-load1.c: Add -march=loongarch64
* gcc.target/loongarch/mulw_d_w.c: Add -march=loongarch64
* gcc.target/loongarch/pr109465-1.c: Add -march=loongarch64
* gcc.target/loongarch/pr109465-2.c: Add -march=loongarch64
* gcc.target/loongarch/pr109465-3.c: Add -march=loongarch64
* gcc.target/loongarch/prolog-opt.c: Add -march=loongarch64
* gcc.target/loongarch/relocs-symbol-noaddend.c: Add
-march=loongarch64
* gcc.target/loongarch/zero-size-field-pass.c: Add
-march=loongarch64
* gcc.target/loongarch/zero-size-field-ret.c: Add
-march=loongarch64
---
 gcc/testsuite/g++.target/loongarch/bytepick.C   | 2 +-
 gcc/testsuite/g++.target/loongarch/pr106828.C   | 2 +-
 gcc/testsuite/gcc.target/loongarch/add-const.c  | 2 +-
 gcc/testsuite/gcc.target/loongarch/attr-model-1.c   | 2 +-
 gcc/testsuite/gcc.target/loongarch/attr-model-2.c   | 2 +-
 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c  | 2 +-
 gcc/testsuite/gcc.target/loongarch/fscaleb.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/ftint-no-inexact.c   | 2 +-
 gcc/testsuite/gcc.target/loongarch/ftint.c  | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-1.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-2.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-3.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-4.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-5.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-6.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-7.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-8.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-extreme-1.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-extreme-2.c| 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-1.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-2.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-3.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-4.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-5.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-6.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-7.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/func-call-medium-8.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/imm-load.c   | 2 +-
 gcc/testsuite/gcc.target/loongarch/imm-load1.c  | 2 +-
 gcc/testsuite/gcc.tar

[PATCH v2 13/14] LoongArch: Fix ilp32 detection

2023-08-09 Thread Jiajie Chen via Gcc-patches
The correct ilp32 macro name is __loongarch_ilp32.

libitm/ChangeLog:

* config/loongarch/asm.h: Fix ilp32 detection.
---
 libitm/config/loongarch/asm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libitm/config/loongarch/asm.h b/libitm/config/loongarch/asm.h
index 39e02b45f17..11d6d3c079e 100644
--- a/libitm/config/loongarch/asm.h
+++ b/libitm/config/loongarch/asm.h
@@ -30,7 +30,7 @@
 #  define GPR_S st.d
 #  define SZ_GPR 8
 #  define ADDSP(si)   addi.d  $sp, $sp, si
-#elif defined(__loongarch64_ilp32)
+#elif defined(__loongarch_ilp32)
 #  define GPR_L ld.w
 #  define GPR_S st.w
 #  define SZ_GPR 4
-- 
2.41.0



[PATCH v2 10/14] LoongArch: Forbid ADDRESS_REG_REG in loongarch32

2023-08-09 Thread Jiajie Chen via Gcc-patches
LoongArch32 does not include LDX/STD instructions, and cannot lower
(plus (reg) (reg)) pattern. Forbid ADDRESS_REG_REG and do not emit
ldx/stx.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_valid_index_p): Check
ADDRESS_REG_REG pattern and fail in loongarch32.
(loongarch_output_move_index): assertion failed if generating
ldx/stx in loongarch32.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/array-ldx.c: Add regression test for ldx
in loongarch32.
---
 gcc/config/loongarch/loongarch.cc  | 4 +++-
 gcc/testsuite/gcc.target/loongarch/array-ldx.c | 6 ++
 2 files changed, 9 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/array-ldx.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 49df9509ba9..1fde680ccd4 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2016,7 +2016,8 @@ loongarch_valid_index_p (struct loongarch_address_info 
*info, rtx x,
   && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
 index = SUBREG_REG (index);
 
-  if (loongarch_valid_base_register_p (index, mode, strict_p))
+  /* LA32 does not provide LDX/STX.  */
+  if (loongarch_valid_base_register_p (index, mode, strict_p) && !TARGET_32BIT)
 {
   info->type = ADDRESS_REG_REG;
   info->offset = index;
@@ -3853,6 +3854,7 @@ loongarch_output_move_index (rtx x, machine_mode mode, 
bool ldr)
   }
 };
 
+  gcc_assert (!TARGET_32BIT);
   return insn[ldr][index];
 }
 
diff --git a/gcc/testsuite/gcc.target/loongarch/array-ldx.c 
b/gcc/testsuite/gcc.target/loongarch/array-ldx.c
new file mode 100644
index 000..0797af3bbfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/array-ldx.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=loongarch32 -mabi=ilp32d -O2" } */
+long long foo(long long *arr, long long index)
+{
+   return arr[index];
+}
\ No newline at end of file
-- 
2.41.0



[PATCH v2 14/14] LoongArch: Allow ftintrz for DF->DI in loongarch32

2023-08-09 Thread Jiajie Chen via Gcc-patches
In LoongArch, signed DF->DI conversion can be done if -mfpu=64 and
-march=loongarch32.

gcc/ChangeLog:

* config/loongarch/loongarch.md (fix_trunc*2): Use ANYFI instead
of GPR because it depends on fpu width instead of target bits.
---
 gcc/config/loongarch/loongarch.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 31bdf3388f6..f6042af25b7 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1482,11 +1482,11 @@
 
 ;; conversion of a floating-point value to a integer
 
-(define_insn "fix_trunc2"
-  [(set (match_operand:GPR 0 "register_operand" "=f")
-   (fix:GPR (match_operand:ANYF 1 "register_operand" "f")))]
+(define_insn "fix_trunc2"
+  [(set (match_operand:ANYFI 0 "register_operand" "=f")
+   (fix:ANYFI (match_operand:ANYF 1 "register_operand" "f")))]
   ""
-  "ftintrz.. %0,%1"
+  "ftintrz.. %0,%1"
   [(set_attr "type" "fcvt")
(set_attr "mode" "")])
 
-- 
2.41.0



[PATCH v2 11/14] LoongArch: Mark am* instructions as LA64-only

2023-08-09 Thread Jiajie Chen via Gcc-patches
LoongArch32 only provides basic ll/sc instructions for atomic
operations. Mark am* atomic instructions as 64-bit only.

gcc/ChangeLog:

* config/loongarch.sync.md: Guard am* atomic insns by
TARGET_64BIT.
---
 gcc/config/loongarch/sync.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 9924d522bcd..151b553bcc6 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -77,7 +77,7 @@
   [(match_operand:GPR 1 "reg_or_0_operand" "rJ")
(match_operand:SI 2 "const_int_operand")]  ;; model
   UNSPEC_ATOMIC_STORE))]
-  ""
+  "TARGET_64BIT"
   "amswap%A2.\t$zero,%z1,%0"
   [(set (attr "length") (const_int 8))])
 
@@ -88,7 +88,7 @@
   (match_operand:GPR 1 "reg_or_0_operand" "rJ"))
   (match_operand:SI 2 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
-  ""
+  "TARGET_64BIT"
   "am%A2.\t$zero,%z1,%0"
   [(set (attr "length") (const_int 8))])
 
@@ -101,7 +101,7 @@
 (match_operand:GPR 2 "reg_or_0_operand" "rJ"))
   (match_operand:SI 3 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
-  ""
+  "TARGET_64BIT"
   "am%A3.\t%0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
@@ -113,7 +113,7 @@
  UNSPEC_SYNC_EXCHANGE))
(set (match_dup 1)
(match_operand:GPR 2 "register_operand" "r"))]
-  ""
+  "TARGET_64BIT"
   "amswap%A3.\t%0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
@@ -182,7 +182,7 @@
   [(match_operand:QI 0 "register_operand" "") ;; bool output
(match_operand:QI 1 "memory_operand" "+ZB");; memory
(match_operand:SI 2 "const_int_operand" "")]   ;; model
-  ""
+  "TARGET_64BIT"
 {
   /* We have no QImode atomics, so use the address LSBs to form a mask,
  then use an aligned SImode atomic.  */
-- 
2.41.0



Re: [PATCH] vect: Add a popcount fallback.

2023-08-09 Thread Richard Biener via Gcc-patches
On Wed, Aug 9, 2023 at 12:23 PM Robin Dapp  wrote:
>
> > We seem to be looking at promotions of the call argument, lhs_type
> > is the same as the type of the call LHS.  But the comment mentions .POPCOUNT
> > and the following code also handles others, so maybe handling should be
> > moved.  Also when we look to vectorize popcount (x) instead of 
> > popcount((T)x)
> > we can simply promote the result accordingly.
>
> IMHO lhs_type is the type of the conversion
>
>   lhs_oprnd = gimple_assign_lhs (last_stmt);
>   lhs_type = TREE_TYPE (lhs_oprnd);
>
> and rhs/unprom_diff has the type of the call's input argument
>
>   rhs_oprnd = gimple_call_arg (call_stmt, 0);
>   vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff);
>
> So we can potentially have
>   T0 arg
>   T1 in = (T1)arg
>   T2 ret = __builtin_popcount (in)
>   T3 lhs = (T3)ret
>
> and we're checking if precision (T0) == precision (T3).

Looks like so.  Note T1 == T2.  What we're really after is
changing T1/T2 and the actual popcount used closer to
T0/T3, like in case T0 was 'char' and T3 was 'long' we
could still use popcountqi and then widen to T3 (or the
other way around).  So yes, I think requiring that T0 and T3
are equal isn't necessary.

> This will never be true for a proper __builtin_popcountll except if
> the return value is cast to uint64_t (which I just happened to do
> in my test...).  Therefore it still doesn't really make sense to me.
>
> Interestingly though, it helps for an aarch64 __builtin_popcountll
> testcase where we abort here and then manage to vectorize via
> vectorizable_call.  When we skip this check, recognition succeeds
> and replaces the call with the pattern.  Then scalar costs are lower
> than in the vectorizable_call case because __builtin_popcountll is
> not STMT_VINFO_RELEVANT_P anymore (not live or so?).
> Then, vectorization costs are too high compared to the wrong scalar
> costs and we don't vectorize... Odd, might require fixing separately.
> We might need to calculate the scalar costs in advance?
>
> > It looks like vect_recog_popcount_clz_ctz_ffs_pattern is specifcally for
> > the conversions, so your fallback should possibly apply even when not
> > matching them.
>
> Mhm, yes it appears to only match when casting the return value to
> something else than an int.  So we'd need a fallback in vectorizable_call?
> And it would potentially look a bit out of place there only handling
> popcount and not ctz, clz, ...  Not sure if it is worth it then?

I'd keep the handling as pattern just also match on popcount directly
when not converted.

>
> Regards
>  Robin
>


[PATCH] RISC-V: Support NPATTERNS = 1 stepped vector[PR110950]

2023-08-09 Thread Juzhe-Zhong
This patch fix ICE: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950

0x1cf8939 expand_const_vector
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:1587

PR target/110950

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Add NPATTERNS = 1 
stepped vector support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr110950.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 19 +++
 .../gcc.target/riscv/rvv/autovec/pr110950.c   | 12 
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110950.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a7b2d7dd2fe..0bea04c1967 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1563,6 +1563,25 @@ expand_const_vector (rtx target, rtx src)
   add_ops);
}
}
+  else if (npatterns == 1 && nelts_per_pattern == 3)
+   {
+ /* Generate the following CONST_VECTOR:
+{ base0, base1, base1 + step, base1 + step * 2, ... }  */
+ rtx base0 = CONST_VECTOR_ELT (src, 0);
+ rtx base1 = CONST_VECTOR_ELT (src, 1);
+ rtx step = CONST_VECTOR_ELT (src, 2);
+ /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
+ rtx tmp = gen_reg_rtx (mode);
+ emit_insn (gen_vec_series (mode, tmp, base1, step));
+ /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
+ scalar_mode elem_mode = GET_MODE_INNER (mode);
+ if (!rtx_equal_p (base0, const0_rtx))
+   base0 = force_reg (elem_mode, base0);
+
+ insn_code icode = optab_handler (vec_shl_insert_optab, mode);
+ gcc_assert (icode != CODE_FOR_nothing);
+ emit_insn (GEN_FCN (icode) (target, tmp, base0));
+   }
   else
/* TODO: We will enable more variable-length vector in the future.  */
gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110950.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110950.c
new file mode 100644
index 000..9f276d06338
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110950.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=scalable -Ofast" } */
+
+int a;
+void b() {
+  long *c = 0;
+  int *d;
+  for (; a; ++a)
+c[a] = d[-a];
+}
+
+/* { dg-final { scan-assembler-times {vslide1up\.vx} 1 } } */
-- 
2.36.3



Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread Richard Biener via Gcc-patches
On Wed, 9 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> that should be
> 
> >>   || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> >>   && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> 
> >> I think.  It seems to imply that SLP isn't supported with
> >> masking/lengthing.
> 
> Oh, yes.  At first glance, the original code is quite suspicious and your 
> comments make sense to me.
> 
> >> Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently?
> >> Don't you need some CFN_LEN_EXTRACT_LAST instead?
> 
> I think CFN_EXTRACT_LAST always has either loop mask or loop len.
> 
> When both mask and length are not needed, IMHO, I think current BIT_FIELD_REF 
> flow is good enough:
> https://godbolt.org/z/Yr5M9hcc6
> 
> So I think we don't need CFN_LEN_EXTRACT_LAST. 
> 
> Instead, I think we will need CFN_LEN_FOLD_EXTRACT_LAST in the next patch.
> 
> Feel free to correct me it I am wrong.

Richard S. should know best, but I don't think FOLD_EXTRACT_LAST is
any different from EXTRACT_LAST (the value for EXTRACT_LAST with
all zeros mask seems unspecified?).
Note the expanders are documented
as to receive 'mask' operands, not 'len' ones (and we'd miss BIAS).

As for SLP support the loop mask should have all SLP lanes
consistently masked/unmasked (same for 'len' I suppose), but we
want to extract a specific SLP lane only.  For masks I think
producing a mask that has all 'i' SLP lanes enabled and AND
that to the mask would select the proper lane for EXTRACT_LAST.
Not sure how to handle this for 'len' - I guess since 'len'
covers all SLP lanes as well we could just subtract
SLP_TREE_LANES (node) - slp_index from it?  I'll note we don't
handle ncopies > 1 which I think we could with using FOLD_EXTRACT_LAST?

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-09 19:00
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST 
> vectorization
> On Wed, 9 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, this patch is adding loop len control on extract_last autovectorization.
> > 
> > Consider this following case:
> > 
> > #include 
> > 
> > #define EXTRACT_LAST(TYPE) \
> >   TYPE __attribute__ ((noinline, noclone)) \
> >   test_##TYPE (TYPE *x, int n, TYPE value) \
> >   { \
> > TYPE last; \
> > for (int j = 0; j < n; ++j) \
> >   { \
> > last = x[j]; \
> > x[j] = last * value; \
> >   } \
> > return last; \
> >   }
> > 
> > #define TEST_ALL(T) \
> >   T (uint8_t) \
> > 
> > TEST_ALL (EXTRACT_LAST)
> > 
> > ARM SVE IR:
> > 
> > Preheader:
> >   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> > 
> > Loop:
> >   ...
> >   # loop_mask_22 = PHI 
> >   ...
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
> >   ...
> >   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> > 
> > For RVV since we prefer len in loop control, after this patch for RVV:
> > 
> > Loop:
> >   ...
> >   loop_len_22 = SELECT_VL;
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23);
> > 
> > This patch didn't add a new pattern for length loop control of extract_last.
> > Instead we reuse current extract_last.
> > 
> > Here is the code:
> > 
> > Step 1 - Enable length and record length for extract_last:
> > 
> > +   machine_mode vec_mode = TYPE_MODE (vectype);
> > +   if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> > + vect_record_loop_len (loop_vinfo,
> > +   &LOOP_VINFO_LENS (loop_vinfo), 1,
> > +   vectype, 1);
> > +   else
> > + vect_record_loop_mask (loop_vinfo,
> > +&LOOP_VINFO_MASKS (loop_vinfo), 1,
> > +vectype, NULL);
> > 
> > We use 'get_len_load_store_mode' to check whether targets support loop len 
> > control or not.
> > If yes, record a loop len.
> > 
> > Step 2 - Build EXTRACT_LAST with len:
> > 
> > -   tree mask = vect_get_loop_mask (loop_vinfo, gsi,
> > -   &LOOP_VINFO_MASKS (loop_vinfo),
> > -   1, vectype, 0);
> > +   tree control;
> > +   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > + control = vect_get_loop_len (loop_vinfo, gsi,
> > + &LOOP_VINFO_LENS (loop_vinfo), 1,
> > + vectype, 0, 0);
> > +   else
> > + control = vect_get_loop_mask (loop_vinfo, gsi,
> > +   &LOOP_VINFO_MASKS (loop_vinfo), 1,
> > +   vectype, 0);
> >tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > -   mask, vec_lhs_phi);
> > +   control, vec_lhs_phi);
> > 
> > Reuse the current codes (build EXTRACT_LAST with mask), build length 
> > instead if
> > 'LOOP_VINFO_FUL

Re: [PATCH] RISC-V: Support NPATTERNS = 1 stepped vector[PR110950]

2023-08-09 Thread Robin Dapp via Gcc-patches
OK, thanks.

Regards
 Robin


Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread juzhe.zh...@rivai.ai
Hi, Richi.
>> Note the expanders are documented
>> as to receive 'mask' operands, not 'len' ones (and we'd miss BIAS).
Oh. Yes. This patch is reusing current EXTRACT_LAST and generate loop len 
instead of loop mask.
It seems this patch missed 'BIAS'. If we need 'BIAS', we may need 
LEN_EXTRACT_LAST pattern.

 
>> Richard S. should know best,
>> As for SLP support the loop mask should have all SLP lanes
>> consistently masked/unmasked (same for 'len' I suppose), but we
>> want to extract a specific SLP lane only.  For masks I think
>> producing a mask that has all 'i' SLP lanes enabled and AND
>> that to the mask would select the proper lane for EXTRACT_LAST.
>> Not sure how to handle this for 'len' - I guess since 'len'
>> covers all SLP lanes as well we could just subtract
>> SLP_TREE_LANES (node) - slp_index from it?  I'll note we don't
handle ncopies > 1 which I think we could with using FOLD_EXTRACT_LAST?

For SLP stuff, I am not sure.
And I agree that we need to wait for Richard S review.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-09 20:21
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Wed, 9 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> >> that should be
> 
> >>   || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> >>   && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> 
> >> I think.  It seems to imply that SLP isn't supported with
> >> masking/lengthing.
> 
> Oh, yes.  At first glance, the original code is quite suspicious and your 
> comments make sense to me.
> 
> >> Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently?
> >> Don't you need some CFN_LEN_EXTRACT_LAST instead?
> 
> I think CFN_EXTRACT_LAST always has either loop mask or loop len.
> 
> When both mask and length are not needed, IMHO, I think current BIT_FIELD_REF 
> flow is good enough:
> https://godbolt.org/z/Yr5M9hcc6
> 
> So I think we don't need CFN_LEN_EXTRACT_LAST. 
> 
> Instead, I think we will need CFN_LEN_FOLD_EXTRACT_LAST in the next patch.
> 
> Feel free to correct me it I am wrong.
 
Richard S. should know best, but I don't think FOLD_EXTRACT_LAST is
any different from EXTRACT_LAST (the value for EXTRACT_LAST with
all zeros mask seems unspecified?).
Note the expanders are documented
as to receive 'mask' operands, not 'len' ones (and we'd miss BIAS).
 
As for SLP support the loop mask should have all SLP lanes
consistently masked/unmasked (same for 'len' I suppose), but we
want to extract a specific SLP lane only.  For masks I think
producing a mask that has all 'i' SLP lanes enabled and AND
that to the mask would select the proper lane for EXTRACT_LAST.
Not sure how to handle this for 'len' - I guess since 'len'
covers all SLP lanes as well we could just subtract
SLP_TREE_LANES (node) - slp_index from it?  I'll note we don't
handle ncopies > 1 which I think we could with using FOLD_EXTRACT_LAST?
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-09 19:00
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST 
> vectorization
> On Wed, 9 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, this patch is adding loop len control on extract_last autovectorization.
> > 
> > Consider this following case:
> > 
> > #include 
> > 
> > #define EXTRACT_LAST(TYPE) \
> >   TYPE __attribute__ ((noinline, noclone)) \
> >   test_##TYPE (TYPE *x, int n, TYPE value) \
> >   { \
> > TYPE last; \
> > for (int j = 0; j < n; ++j) \
> >   { \
> > last = x[j]; \
> > x[j] = last * value; \
> >   } \
> > return last; \
> >   }
> > 
> > #define TEST_ALL(T) \
> >   T (uint8_t) \
> > 
> > TEST_ALL (EXTRACT_LAST)
> > 
> > ARM SVE IR:
> > 
> > Preheader:
> >   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> > 
> > Loop:
> >   ...
> >   # loop_mask_22 = PHI 
> >   ...
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
> >   ...
> >   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> > 
> > For RVV since we prefer len in loop control, after this patch for RVV:
> > 
> > Loop:
> >   ...
> >   loop_len_22 = SELECT_VL;
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23);
> > 
> > This patch didn't add a new pattern for length loop control of extract_last.
> > Instead we reuse current extract_last.
> > 
> > Here is the code:
> > 
> > Step 1 - Enable length and record length for extract_last:
> > 
> > +   machine_mode vec_mo

Re: RISC-V: Added support for CRC.

2023-08-09 Thread Paul Koning via Gcc-patches



> On Aug 9, 2023, at 2:32 AM, Alexander Monakov  wrote:
> 
> 
> On Tue, 8 Aug 2023, Jeff Law wrote:
> 
>> If the compiler can identify a CRC and collapse it down to a table or clmul,
>> that's a major win and such code does exist in the real world. That was the
>> whole point behind the Fedora experiment -- to determine if these things are
>> showing up in the real world or if this is just a benchmarking exercise.
> 
> Can you share the results of the experiment and give your estimate of what
> sort of real-world improvement is expected? I already listed the popular
> FOSS projects where CRC performance is important: the Linux kernel and
> a few compression libraries. Those projects do not use a bitwise CRC loop,
> except sometimes for table generation on startup (which needs less time
> than a page fault that may be necessary to bring in a hardcoded table).
> 
> For those projects that need a better CRC, why is the chosen solution is
> to optimize it in the compiler instead of offering them a library they
> could use with any compiler?
> 
> Was there any thought given to embedded projects that use bitwise CRC
> exactly because they little space for a hardcoded table to spare?

Or those that use smaller tables -- for example, the classic VAX microcode 
approach with a 16-entry table, doing CRC 4 bits at a time.

I agree that this seems an odd thing to optimize.  CRC is a well known CPU hog 
with well established efficient solutions, and it's hard to see  why anyone who 
needs good performance would fail to understand and apply that knowledge.

paul




[PATCH] Handle in-order reductions when SLP vectorizing non-loops

2023-08-09 Thread Richard Biener via Gcc-patches
The following teaches the non-loop reduction vectorization code to
handle non-associatable reductions.  Using the existing FOLD_LEFT_PLUS
internal functions might be possible but I'd have to convince myself
that +0.0 + x[0] is a safe extra operation in ever rounding mode
(I also have no way to test the resulting code).

The reduction code is now extra lame in lacking any way to do
associatable reductions without a direct optab support but always
supporting in-order reductions by open-coding them.

I'll also note the pre-existing issue of every associatable
operation now triggering at least a two-lane SLP reduction discovery
attempt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

* tree-vectorizer.h (vect_expand_fold_left): Export.
* tree-vect-loop.cc (vect_expand_fold_left): Likewise.
Support NULL lhs.
* tree-vect-slp.cc (vect_slp_linearize_chain): Add a flag
to indicate whether we can associate, if not only follow
the left associative chain.
(vect_build_slp_tree_2): Adjust.
(vect_slp_check_for_constructors): Use needs_fold_left_reduction_p
instead of open-coding, properly linearize the chain according
to that and avoid sorting.
(vectorizable_bb_reduc_epilogue): For fold-left reductions
require a single vector definition and at most one remaining op.
Adjust costing.
(vectorize_slp_instance_root_stmt): Vectorize fold-left
reductions with vect_expand_fold_left.

* gcc.dg/vect/bb-slp-75.c: New testcase.
* gcc.dg/vect/bb-slp-46.c: Adjust.
* gcc.dg/vect/vect-reduc-in-order-1.c: Disable SLP vectorization.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-46.c |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-75.c |  20 
 .../gcc.dg/vect/vect-reduc-in-order-1.c   |   2 +-
 gcc/tree-vect-loop.cc |  21 ++--
 gcc/tree-vect-slp.cc  | 111 --
 gcc/tree-vectorizer.h |   2 +
 6 files changed, 111 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-75.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
index 98b29062a19..4eceea44efc 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
@@ -15,7 +15,7 @@ int foo ()
   a[1] = tem1;
   a[2] = tem2;
   a[3] = tem3;
-  return temx + temy;
+  return temx / temy;
 }
 
 /* We should extract the live lane from the vectorized add rather than
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-75.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-75.c
new file mode 100644
index 000..d7f91089c87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-75.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+double
+not_vectorizable (double *x)
+{
+  double r = x[0];
+  return r + x[2] + x[1];
+}
+
+double
+vectorizable (double *x)
+{
+  double r = x[0];
+  return r + x[1] + x[2];
+}
+
+/* We can vectorize the in-order reduction in vectorizable but not the one
+   in not_vectorizable since we cannot handle the gap after x[2].  */
+/* { dg-final { scan-tree-dump-times "optimized: basic block part vectorized" 
1 "slp2" { target vect_hw_misalign } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c
index 4c17f2c1978..c1c853cb93b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c
@@ -1,7 +1,7 @@
 /* { dg-xfail-run-if "" { { i?86-*-* x86_64-*-* } && ia32 } } */
 /* { dg-require-effective-target vect_double } */
 /* { dg-add-options ieee } */
-/* { dg-additional-options "-fno-fast-math" } */
+/* { dg-additional-options "-fno-fast-math -fno-tree-slp-vectorize" } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..1cd6bb43194 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6769,11 +6769,11 @@ merge_with_identity (gimple_stmt_iterator *gsi, tree 
mask, tree vectype,
 }
 
 /* Successively apply CODE to each element of VECTOR_RHS, in left-to-right
-   order, starting with LHS.  Insert the extraction statements before GSI and
-   associate the new scalar SSA names with variable SCALAR_DEST.
+   order, starting with LHS if not NULL.  Insert the extraction statements
+   before GSI and associate the new scalar SSA names with variable SCALAR_DEST.
Return the SSA name for the result.  */
 
-static tree
+tree
 vect_expand_fold_left (gimple_stmt_iterator *gsi, tree scalar_dest,
   tree_code code, tree lhs, tree vector_rhs)
 {
@@ -6796,11 +6796,16 @@ vect_expand_fold_left (gimple_stmt_iterator *gsi, tree 
scalar_dest,
   gimple_assign_set_lhs (stmt, rhs);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 
-  stmt = gimple_build_assign (scalar_dest, code, lh

Re: [PATCH] RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC

2023-08-09 Thread Maciej W. Rozycki
On Mon, 31 Jul 2023, Maciej W. Rozycki wrote:

> > > That's a good suggestion! Thanks, let me try to apply myself workflow  :)
> > I'm thinking that as part of the CI POC being done by RISE that the base AMI
> > image ought to be gcc-13 based and that we should configure the toolchains 
> > we
> > build with -enable-werror-always.
> > 
> > While we can't necessarily get every developer to embrace this workflow, we
> > ought to be catching it quicker than we currently are.
> 
>  I wonder if we should enable the option by default, perhaps under certain 
> conditions such as matching the build compiler version, for builds made 
> from a Git checkout rather than a release tarball.  I suspect some people 
> are simply not aware of this option.

 Also the Linux kernel community has bots that monitor the relevant 
mailing lists for patches, apply them, build in various configurations, 
and report back any issues, so when you submit a change that doesn't 
compile in some cases, then it's often within minutes that you get a 
notification, even before anyone has a chance to review your submission.  
That also helps maintainers catch such issues before a change gets merged 
anywhere.

 Cf. , 
.

 That surely hasn't come for free, someone had to make the infrastructure,
and then with contemporary hardware the Linux kernel often builds within 
seconds, which we don't have the luxury of, but I wonder if it's an 
approach that has been previously considered for GCC.  Overall I think the 
more effort we can offload to automata the less remains for us.

  Maciej


Re: [RFC] GCC Security policy

2023-08-09 Thread Richard Earnshaw (lists) via Gcc-patches

On 08/08/2023 20:39, Carlos O'Donell via Gcc-patches wrote:

On 8/8/23 13:46, David Edelsohn wrote:

I believe that upstream projects for components that are imported
into GCC should be responsible for their security policy, including
libgo, gofrontend, libsanitizer (other than local patches), zlib,
libtool, libphobos, libcody, libffi, eventually Rust libcore, etc.


I agree completely.

We can reference the upstream and direct people to follow upstream security
policy for these bundled components.

Any other policy risks having conflicting guidance between the projects,
which is not useful for security policy.

There might be exceptions to this rule, particularly when the downstream
wants to accept particular risks while upstream does not; but none of these
components are in that case IMO.



I agree with that, but with one caveat.  Our policy should state what we 
 do once upstream has addressed the issue.


R.


Re: [PATCH] preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

2023-08-09 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 1, 2023 at 11:01 AM Joseph Myers  wrote:
>
> On Mon, 31 Jul 2023, Lewis Hyatt via Gcc-patches wrote:
>
> > I added some additional testcases from the PR for x86. The other targets
> > that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
> > already had tests verifying that the pragma sets macros as expected; here I
> > have added -save-temps to some of them, to test that it now works in
> > preprocess-only mode as well.
>
> It would seem better to have copies of the tests with and without
> -save-temps, to test in both modes, rather than changing what's tested by
> an existing test here.  Or a test variant that #includes the original test
> but uses different options, if the original test isn't doing anything that
> would fail to work with that approach.

Thank you, I will adjust this.

-Lewis


Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richi.
>
>>> that should be
>
>>>   || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
>>>   && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>
>>> I think.  It seems to imply that SLP isn't supported with
>>> masking/lengthing.
>
> Oh, yes.  At first glance, the original code is quite suspicious and your 
> comments make sense to me.
>
>>> Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently?
>>> Don't you need some CFN_LEN_EXTRACT_LAST instead?
>
> I think CFN_EXTRACT_LAST always has either loop mask or loop len.
>
> When both mask and length are not needed, IMHO, I think current BIT_FIELD_REF 
> flow is good enough:
> https://godbolt.org/z/Yr5M9hcc6

I'm a bit behind of email, but why isn't BIT_FIELD_REF enough for
the case that the patch is handling?  It seems that:

  .EXTRACT_LAST (len, vec)

is equivalent to:

  vec[len - 1]

I think eventually there'll be the temptation to lower/fold it like that.

FWIW, I agree a IFN_LEN_EXTRACT_LAST/IFN_EXTRACT_LAST_LEN would be OK,
with a mask, vector, length and bias.  But even then, I think there'll
be a temptation to lower calls with all-1 masks to val[len - 1 - bias].
So I think the function only makes sense if we have a use case where
the mask might not be all-1s.

Thanks,
Richard


RE: [PATCH] RISC-V: Support NPATTERNS = 1 stepped vector[PR110950]

2023-08-09 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Wednesday, August 9, 2023 8:34 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@gmail.com; kito.ch...@sifive.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Support NPATTERNS = 1 stepped vector[PR110950]

OK, thanks.

Regards
 Robin


Re: [PATCH] Handle in-order reductions when SLP vectorizing non-loops

2023-08-09 Thread Alexander Monakov


On Wed, 9 Aug 2023, Richard Biener via Gcc-patches wrote:

> The following teaches the non-loop reduction vectorization code to
> handle non-associatable reductions.  Using the existing FOLD_LEFT_PLUS
> internal functions might be possible but I'd have to convince myself
> that +0.0 + x[0] is a safe extra operation in ever rounding mode
> (I also have no way to test the resulting code).

It's not. Under our default -fno-signaling-nans -fno-rounding-math
negative zero is the neutral element for addition, so '-0.0 + x[0]'
might be (but negative zero costs more to materialize).

If the reduction has at least two elements, then 

-0.0 + x[0] + x[1]

has the same behavior w.r.t SNaNs as 'x[0] + x[1]', but unfortunately
yields negative zero when x[0] = x[1] = +0.0 and rounding towards
negative infinity (unlike x[0] + x[1], which is +0.0).

Alexander


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:

> > The question is whether you want to mandate the 16-bit floating point
> > extensions.  You might get better adoption if you stay compatible with 
> > shipping
> > CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel 
> > CPUs,
> > even though they can do 512-bit vectors.
> > 
> > (The thread subject is a bit misleading for this sub-topic, by the way.)
> > 
> > Thanks,
> > Florian
> 
> Since 256bit and 512bit are diverged from AVX10.1 and will continue in 
> the future AVX10 versions, I think it's hard to keep a single version 
> number to cover both and increase monotonically. Hence I'd like to 
> suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.

The raison d'etre for the x86-64-vX scheme is to make life sensible as 
distributor.  That goal can only be achieved if this scheme contains only 
a few components that have a simple relationship.  That basically means: 
one dimension only.  If you now add a second dimension (with and without 
-512) we have to add another one if Intel (or whomever else) next does a 
marketing stunt for feature "foobar" and end up with x86-64-v6, 
x86-64-v6-512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, 
x86-64-v6-1024-foobar.

In short: no.

It isn't the right time anyway to assign meaning to x86-64-v5, as it 
wasn't the right time for assigning x86-64-v4 (as we now see).  These are 
supposed to reflect generally useful feature sets actually shipped in 
generally available CPUs in the market, and be vendor independend.  As 
such it's much too early to define v5 based purely on text documents.


Ciao,
Michael.


Re: [PATCH v2 01/14] LoongArch: Introduce loongarch32 target

2023-08-09 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-09 at 19:46 +0800, Jiajie Chen wrote:
> +  builtin_define ("_ABILP32=3");
> +  builtin_define ("_LOONGARCH_SIM=_ABILP32");

Let's remove them.  These MIPS-style definitions are deprecated:
https://github.com/loongson/LoongArch-Documentation/pull/28.

Unfortunately for LP64 ABI _ABILP64 is already a part of public API. 
I've tried to raise a deprecation warning for them, but it seems doing
so needs a major change in libcpp...  However ILP32 ABI is "fresh new"
so we should take the advantage to remove the historic burden.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2 11/14] LoongArch: Mark am* instructions as LA64-only

2023-08-09 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-09 at 19:46 +0800, Jiajie Chen wrote:
> LoongArch32 only provides basic ll/sc instructions for atomic
> operations. Mark am* atomic instructions as 64-bit only.

I'd prefer using a different symbol, say TARGET_LOONGARCH_AM here.  Then
it would be easier to adjust the code if we have a LA32 core with am*
support in the future.  For now we can just
#define TARGET_LOONGARCH_AM TARGET_64BIT.

> gcc/ChangeLog:
> 
> * config/loongarch.sync.md: Guard am* atomic insns by
> TARGET_64BIT.
> ---
>  gcc/config/loongarch/sync.md | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
> index 9924d522bcd..151b553bcc6 100644
> --- a/gcc/config/loongarch/sync.md
> +++ b/gcc/config/loongarch/sync.md
> @@ -77,7 +77,7 @@
>    [(match_operand:GPR 1 "reg_or_0_operand" "rJ")
>     (match_operand:SI 2 "const_int_operand")]  ;; model
>    UNSPEC_ATOMIC_STORE))]
> -  ""
> +  "TARGET_64BIT"
>    "amswap%A2.\t$zero,%z1,%0"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -88,7 +88,7 @@
>    (match_operand:GPR 1 "reg_or_0_operand" "rJ"))
>    (match_operand:SI 2 "const_int_operand")] ;; model
>  UNSPEC_SYNC_OLD_OP))]
> -  ""
> +  "TARGET_64BIT"
>    "am%A2.\t$zero,%z1,%0"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -101,7 +101,7 @@
>  (match_operand:GPR 2 "reg_or_0_operand" "rJ"))
>    (match_operand:SI 3 "const_int_operand")] ;; model
>  UNSPEC_SYNC_OLD_OP))]
> -  ""
> +  "TARGET_64BIT"
>    "am%A3.\t%0,%z2,%1"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -113,7 +113,7 @@
>   UNSPEC_SYNC_EXCHANGE))
>     (set (match_dup 1)
> (match_operand:GPR 2 "register_operand" "r"))]
> -  ""
> +  "TARGET_64BIT"
>    "amswap%A3.\t%0,%z2,%1"
>    [(set (attr "length") (const_int 8))])
>  
> @@ -182,7 +182,7 @@
>    [(match_operand:QI 0 "register_operand" "") ;; bool output
>     (match_operand:QI 1 "memory_operand" "+ZB")    ;; memory
>     (match_operand:SI 2 "const_int_operand" "")]   ;; model
> -  ""
> +  "TARGET_64BIT"
>  {
>    /* We have no QImode atomics, so use the address LSBs to form a mask,
>   then use an aligned SImode atomic.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread 钟居哲
Hi, Richard.

>> I'm a bit behind of email, but why isn't BIT_FIELD_REF enough for
>> the case that the patch is handling?  It seems that:

>>   .EXTRACT_LAST (len, vec)

>> is equivalent to:

>>   vec[len - 1]

>> I think eventually there'll be the temptation to lower/fold it like that.

Current BIT_FIELD_REF doesn't make use of LOOP_LEN.

Consider this following case:

#include 
#define EXTRACT_LAST(TYPE)  \
  TYPE __attribute__ ((noinline, noclone))  \
  test_##TYPE (TYPE *x, int n, TYPE value)  \
  { \
TYPE last;  \
for (int j = 0; j < n; ++j) \
  { \
last = x[j];\
x[j] = last * value;\
  } \
return last;\
  }
#define TEST_ALL(T) \
  T (uint8_t)   \
TEST_ALL (EXTRACT_LAST)

The assembly:
https://godbolt.org/z/z1PPT948b

test_uint8_t:
mv  a3,a0
ble a1,zero,.L10
addiw   a5,a1,-1
li  a4,14
sext.w  a0,a1
bleua5,a4,.L11
srliw   a4,a0,4
sllia4,a4,4
mv  a5,a3
add a4,a4,a3
vsetivlizero,16,e8,m1,ta,ma
vmv.v.x v3,a2
.L4:
vl1re8.vv1,0(a5)
vmul.vv v2,v1,v3
vs1r.v  v2,0(a5)
addia5,a5,16
bne a4,a5,.L4
andia4,a1,-16
mv  a5,a4
vsetivlizero,8,e8,mf2,ta,ma
beq a0,a4,.L16
.L3:
subwa0,a0,a4
addiw   a7,a0,-1
li  a6,6
bleua7,a6,.L7
sllia4,a4,32
srlia4,a4,32
add a4,a3,a4
andia6,a0,-8
vle8.v  v2,0(a4)
vmv.v.x v1,a2
andia0,a0,7
vmul.vv v1,v1,v2
vse8.v  v1,0(a4)
addwa5,a6,a5
beq a0,zero,.L8
.L7:
add a6,a3,a5
lbu a0,0(a6)
addiw   a4,a5,1
mulwa7,a0,a2
sb  a7,0(a6)
bge a4,a1,.L14
add a4,a3,a4
lbu a0,0(a4)
addiw   a6,a5,2
mulwa7,a2,a0
sb  a7,0(a4)
ble a1,a6,.L14
add a6,a3,a6
lbu a0,0(a6)
addiw   a4,a5,3
mulwa7,a2,a0
sb  a7,0(a6)
ble a1,a4,.L14
add a4,a3,a4
lbu a0,0(a4)
addiw   a6,a5,4
mulwa7,a2,a0
sb  a7,0(a4)
ble a1,a6,.L14
add a6,a3,a6
lbu a0,0(a6)
addiw   a4,a5,5
mulwa7,a2,a0
sb  a7,0(a6)
ble a1,a4,.L14
add a4,a3,a4
lbu a0,0(a4)
addiw   a5,a5,6
mulwa6,a2,a0
sb  a6,0(a4)
ble a1,a5,.L14
add a3,a3,a5
lbu a0,0(a3)
mulwa2,a2,a0
sb  a2,0(a3)
ret
.L10:
li  a0,0
.L14:
ret
.L8:
vslidedown.vi   v2,v2,7
vmv.x.s a0,v2
andia0,a0,0xff
ret
.L11:
li  a4,0
li  a5,0
vsetivlizero,8,e8,mf2,ta,ma
j   .L3
.L16:
vsetivlizero,16,e8,m1,ta,ma
vslidedown.vi   v1,v1,15
vmv.x.s a0,v1
andia0,a0,0xff
ret


This patch is trying to optimize the codegen for RVV for length control,
after this patch:

Gimple IR:

   [local count: 955630224]:
  # vectp_x.6_22 = PHI 
  # vectp_x.10_30 = PHI 
  # ivtmp_34 = PHI 
  _36 = .SELECT_VL (ivtmp_34, 16);
  vect_last_12.8_24 = .MASK_LEN_LOAD (vectp_x.6_22, 8B, { -1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, _36, 0);
  vect__4.9_28 = vect_last_12.8_24 * vect_cst__27;
  .MASK_LEN_STORE (vectp_x.10_30, 8B, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1 }, _36, 0, vect__4.9_28);
  vectp_x.6_23 = vectp_x.6_22 + _36;
  vectp_x.10_31 = vectp_x.10_30 + _36;
  ivtmp_35 = ivtmp_34 - _36;
  if (ivtmp_35 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _26 = .EXTRACT_LAST (_36, vect_last_12.8_24); [tail call]

ASM:
test_uint8_t:
ble a1,zero,.L4
mv a4,a0
vsetivli zero,16,e8,m1,ta,ma
vmv.v.x v3,a2
.L3:
vsetvli a5,a1,e8,m1,ta,ma
vle8.v v1,0(a0)
vsetivli zero,16,e8,m1,ta,ma
sub a1,a1,a5
vmul.vv v2,v1,v3
vsetvli zero,a5,e8,m1,ta,ma
vse8.v v2,0(a4)
add a0,a0,a5
add a4,a4,a5
bne a1,zero,.L3
addi a5,a5,-1
vsetivli zero,16,e8,m1,ta,ma
vslidedown.vx v1,v1,a5
vmv.x.s a0,v1
andi a0,a0,0xff
ret
.L4:
li a0,0
ret

I think this codegen is much better with this patch.

>> FWIW, I agree a IFN_LEN_EXTRACT_LAST/IFN_EXTRACT_LAST_LEN would be OK,
>> with a mask, vector, length and bias.  But even then, I think there'll
>> be a temptation to lower calls with all-1 masks to val[len - 1 - bias].
>> So I think the f

Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread Richard Biener via Gcc-patches
On Wed, 9 Aug 2023, ??? wrote:

> Hi, Richard.
> 
> >> I'm a bit behind of email, but why isn't BIT_FIELD_REF enough for
> >> the case that the patch is handling?  It seems that:
> 
> >>   .EXTRACT_LAST (len, vec)
> 
> >> is equivalent to:
> 
> >>   vec[len - 1]
> 
> >> I think eventually there'll be the temptation to lower/fold it like that.
> 
> Current BIT_FIELD_REF doesn't make use of LOOP_LEN.

Yes, BIT_FIELD_REF doesn't support variable offset.

> Consider this following case:
> 
> #include 
> #define EXTRACT_LAST(TYPE)\
>   TYPE __attribute__ ((noinline, noclone))\
>   test_##TYPE (TYPE *x, int n, TYPE value)\
>   {   \
> TYPE last;\
> for (int j = 0; j < n; ++j)   \
>   {   \
>   last = x[j];\
>   x[j] = last * value;\
>   }   \
> return last;  \
>   }
> #define TEST_ALL(T)   \
>   T (uint8_t) \
> TEST_ALL (EXTRACT_LAST)
> 
> The assembly:
> https://godbolt.org/z/z1PPT948b
> 
> test_uint8_t:
> mv  a3,a0
> ble a1,zero,.L10
> addiw   a5,a1,-1
> li  a4,14
> sext.w  a0,a1
> bleua5,a4,.L11
> srliw   a4,a0,4
> sllia4,a4,4
> mv  a5,a3
> add a4,a4,a3
> vsetivlizero,16,e8,m1,ta,ma
> vmv.v.x v3,a2
> .L4:
> vl1re8.vv1,0(a5)
> vmul.vv v2,v1,v3
> vs1r.v  v2,0(a5)
> addia5,a5,16
> bne a4,a5,.L4
> andia4,a1,-16
> mv  a5,a4
> vsetivlizero,8,e8,mf2,ta,ma
> beq a0,a4,.L16
> .L3:
> subwa0,a0,a4
> addiw   a7,a0,-1
> li  a6,6
> bleua7,a6,.L7
> sllia4,a4,32
> srlia4,a4,32
> add a4,a3,a4
> andia6,a0,-8
> vle8.v  v2,0(a4)
> vmv.v.x v1,a2
> andia0,a0,7
> vmul.vv v1,v1,v2
> vse8.v  v1,0(a4)
> addwa5,a6,a5
> beq a0,zero,.L8
> .L7:
> add a6,a3,a5
> lbu a0,0(a6)
> addiw   a4,a5,1
> mulwa7,a0,a2
> sb  a7,0(a6)
> bge a4,a1,.L14
> add a4,a3,a4
> lbu a0,0(a4)
> addiw   a6,a5,2
> mulwa7,a2,a0
> sb  a7,0(a4)
> ble a1,a6,.L14
> add a6,a3,a6
> lbu a0,0(a6)
> addiw   a4,a5,3
> mulwa7,a2,a0
> sb  a7,0(a6)
> ble a1,a4,.L14
> add a4,a3,a4
> lbu a0,0(a4)
> addiw   a6,a5,4
> mulwa7,a2,a0
> sb  a7,0(a4)
> ble a1,a6,.L14
> add a6,a3,a6
> lbu a0,0(a6)
> addiw   a4,a5,5
> mulwa7,a2,a0
> sb  a7,0(a6)
> ble a1,a4,.L14
> add a4,a3,a4
> lbu a0,0(a4)
> addiw   a5,a5,6
> mulwa6,a2,a0
> sb  a6,0(a4)
> ble a1,a5,.L14
> add a3,a3,a5
> lbu a0,0(a3)
> mulwa2,a2,a0
> sb  a2,0(a3)
> ret
> .L10:
> li  a0,0
> .L14:
> ret
> .L8:
> vslidedown.vi   v2,v2,7
> vmv.x.s a0,v2
> andia0,a0,0xff
> ret
> .L11:
> li  a4,0
> li  a5,0
> vsetivlizero,8,e8,mf2,ta,ma
> j   .L3
> .L16:
> vsetivlizero,16,e8,m1,ta,ma
> vslidedown.vi   v1,v1,15
> vmv.x.s a0,v1
> andia0,a0,0xff
> ret
> 
> 
> This patch is trying to optimize the codegen for RVV for length control,
> after this patch:
> 
> Gimple IR:
> 
>[local count: 955630224]:
>   # vectp_x.6_22 = PHI 
>   # vectp_x.10_30 = PHI 
>   # ivtmp_34 = PHI 
>   _36 = .SELECT_VL (ivtmp_34, 16);
>   vect_last_12.8_24 = .MASK_LEN_LOAD (vectp_x.6_22, 8B, { -1, -1, -1, -1, -1, 
> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, _36, 0);
>   vect__4.9_28 = vect_last_12.8_24 * vect_cst__27;
>   .MASK_LEN_STORE (vectp_x.10_30, 8B, { -1, -1, -1, -1, -1, -1, -1, -1, -1, 
> -1, -1, -1, -1, -1, -1, -1 }, _36, 0, vect__4.9_28);
>   vectp_x.6_23 = vectp_x.6_22 + _36;
>   vectp_x.10_31 = vectp_x.10_30 + _36;
>   ivtmp_35 = ivtmp_34 - _36;
>   if (ivtmp_35 != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
>   _26 = .EXTRACT_LAST (_36, vect_last_12.8_24); [tail call]
> 
> ASM:
> test_uint8_t:
> ble a1,zero,.L4
> mv a4,a0
> vsetivli zero,16,e8,m1,ta,ma
> vmv.v.x v3,a2
> .L3:
> vsetvli a5,a1,e8,m1,ta,ma
> vle8.v v1,0(a0)
> vsetivli zero,16,e8,m1,ta,ma
> sub a1,a1,a5
> vmul.vv v2,v1,v3
> vsetvli zero,a5,e8,m1,ta,ma
> vse8.v v2,0(a4)
> add a

[committed] libstdc++: Minor fixes for some warnings in

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/format: Fix some warnings.
(__format::__write(Ctx&, basic_string_view)): Remove
unused function template.
---
 libstdc++-v3/include/std/format | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index f68308e7210..96eb4cd742e 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -79,7 +79,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 namespace __format
 {
   // Type-erased character sink.
-  template struct _Sink;
+  template class _Sink;
   // Output iterator that writes to a type-erase character sink.
   template
 class _Sink_iter;
@@ -280,7 +280,6 @@ namespace __format
}
   else
{
- unsigned short __val = 0;
  constexpr int __n = 32;
  char __buf[__n]{};
  for (int __i = 0; __i < __n && (__first + __i) != __last; ++__i)
@@ -1468,6 +1467,8 @@ namespace __format
  if (__use_prec)
__fmt = chars_format::general;
  break;
+   default:
+ __builtin_unreachable();
  }
 
  // Write value into buffer using std::to_chars.
@@ -2083,7 +2084,11 @@ namespace __format
 
 // _GLIBCXX_RESOLVE_LIB_DEFECTS
 // P2510R3 Formatting pointers
-#define _GLIBCXX_P2518R3 (__cplusplus > 202302L || ! defined __STRICT_ANSI__)
+#if __cplusplus > 202302L || ! defined __STRICT_ANSI__
+#define _GLIBCXX_P2518R3 1
+#else
+#define _GLIBCXX_P2518R3 0
+#endif
 
 #if _GLIBCXX_P2518R3
__first = __spec._M_parse_zero_fill(__first, __last);
@@ -2641,7 +2646,7 @@ namespace __format
 _Arg_none, _Arg_bool, _Arg_c, _Arg_i, _Arg_u, _Arg_ll, _Arg_ull,
 _Arg_flt, _Arg_dbl, _Arg_ldbl, _Arg_str, _Arg_sv, _Arg_ptr, _Arg_handle,
 _Arg_i128, _Arg_u128,
-_Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f64,
+_Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f64, // These are unused.
 #ifdef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
 _Arg_next_value_,
 _Arg_f128 = _Arg_ldbl,
@@ -3106,14 +3111,16 @@ namespace __format
case _Arg_u128:
  return std::forward<_Visitor>(__vis)(_M_val._M_u128);
 #endif
- // TODO _Arg_f16 etc.
 
 #if _GLIBCXX_FORMAT_F128 == 2
case _Arg_f128:
  return std::forward<_Visitor>(__vis)(_M_val._M_f128);
 #endif
+
+   default:
+ // _Arg_f16 etc.
+ __builtin_unreachable();
  }
- __builtin_unreachable();
}
 };
 
@@ -3422,15 +3429,6 @@ namespace __format
 /// @cond undocumented
 namespace __format
 {
-  template
-[[__gnu__::__always_inline__]]
-inline void
-__write(_Ctx& __ctx, basic_string_view<_CharT> __str)
-requires requires { { __ctx.out() } -> output_iterator; }
-{
-  __ctx.advance_to(__format::__write(__ctx.out()));
-}
-
   // Abstract base class defining an interface for scanning format strings.
   // Scan the characters in a format string, dividing it up into strings of
   // ordinary characters, escape sequences, and replacement fields.
-- 
2.41.0



[committed] libstdc++: Fix some -Wunused-parameter warnings

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h (allocate): Add [[maybe_unused]]
attribute.
* include/bits/regex_executor.tcc: Remove name of unused
parameter.
* include/bits/shared_ptr_atomic.h (atomic_is_lock_free):
Likewise.
* include/bits/stl_uninitialized.h: Likewise.
* include/bits/streambuf_iterator.h (operator==): Likewise.
* include/bits/uses_allocator.h: Likewise.
* include/c_global/cmath (isfinite, isinf, isnan): Likewise.
* include/std/chrono (zoned_time): Likewise.
* include/std/future (__future_base::_S_allocate_result):
Likewise.
(packaged_task): Likewise.
* include/std/optional (_Optional_payload_base): Likewise.
* include/std/scoped_allocator (__inner_type_impl): Likewise.
* include/std/tuple (_Tuple_impl): Likewise.
---
 libstdc++-v3/include/bits/alloc_traits.h   |  3 ++-
 libstdc++-v3/include/bits/regex_executor.tcc   |  2 +-
 libstdc++-v3/include/bits/shared_ptr_atomic.h  |  2 +-
 libstdc++-v3/include/bits/stl_uninitialized.h  |  3 +--
 libstdc++-v3/include/bits/streambuf_iterator.h |  2 +-
 libstdc++-v3/include/bits/uses_allocator.h |  2 +-
 libstdc++-v3/include/c_global/cmath|  6 +++---
 libstdc++-v3/include/std/chrono|  4 ++--
 libstdc++-v3/include/std/future|  4 ++--
 libstdc++-v3/include/std/optional  |  4 ++--
 libstdc++-v3/include/std/scoped_allocator  |  4 ++--
 libstdc++-v3/include/std/tuple | 16 
 12 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/bits/alloc_traits.h 
b/libstdc++-v3/include/bits/alloc_traits.h
index 182c3e23eed..bc936ec15dd 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -493,7 +493,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   */
   [[__nodiscard__,__gnu__::__always_inline__]]
   static _GLIBCXX20_CONSTEXPR pointer
-  allocate(allocator_type& __a, size_type __n, const_void_pointer __hint)
+  allocate(allocator_type& __a, size_type __n,
+  [[maybe_unused]] const_void_pointer __hint)
   {
 #if __cplusplus <= 201703L
return __a.allocate(__n, __hint);
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index 38fbe3dc854..9643a3575d7 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -339,7 +339,7 @@ namespace __detail
   template
 struct _Backref_matcher
 {
-  _Backref_matcher(bool __icase, const _TraitsT& __traits)
+  _Backref_matcher(bool /* __icase */, const _TraitsT& __traits)
   : _M_traits(__traits) { }
 
   bool
diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h 
b/libstdc++-v3/include/bits/shared_ptr_atomic.h
index 2295b48e732..3f921d311d6 100644
--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -99,7 +99,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   */
   template
 inline bool
-atomic_is_lock_free(const __shared_ptr<_Tp, _Lp>* __p)
+atomic_is_lock_free(const __shared_ptr<_Tp, _Lp>*)
 {
 #ifdef __GTHREADS
   return __gthread_active_p() == 0;
diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
b/libstdc++-v3/include/bits/stl_uninitialized.h
index be7b4afdd05..474a9a11297 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -806,8 +806,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   template
 static void
-__uninit_default_novalue(_ForwardIterator __first,
-_ForwardIterator __last)
+__uninit_default_novalue(_ForwardIterator, _ForwardIterator)
{
}
 };
diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h 
b/libstdc++-v3/include/bits/streambuf_iterator.h
index 32285a668fc..14a95ad42c2 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -223,7 +223,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus > 201703L && __cpp_lib_concepts
   [[nodiscard]]
   friend bool
-  operator==(const istreambuf_iterator& __i, default_sentinel_t __s)
+  operator==(const istreambuf_iterator& __i, default_sentinel_t)
   { return __i._M_at_eof(); }
 #endif
 };
diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index d3b26c7d974..ebd26d291b3 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -168,7 +168,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // C++14
 
   template
-void __uses_allocator_construct_impl(__uses_alloc0 __a, _Tp* __ptr,
+void __uses_allocator_construct_impl(__uses_alloc0, _Tp* __pt

[committed] libstdc++: Explicitly default some copy ctors and assignments

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The standard says that the implicit copy assignment operator is
deprecated for classes that have a user-provided copy constructor, and
vice versa.

libstdc++-v3/ChangeLog:

* include/bits/new_allocator.h (__new_allocator): Define copy
assignment operator as defaulted.
* include/std/complex (complex, complex)
(complex): Define copy constructor as defaulted.
---
 libstdc++-v3/include/bits/new_allocator.h |  4 
 libstdc++-v3/include/std/complex  | 13 +
 2 files changed, 17 insertions(+)

diff --git a/libstdc++-v3/include/bits/new_allocator.h 
b/libstdc++-v3/include/bits/new_allocator.h
index 0a0b12eb504..357700292ed 100644
--- a/libstdc++-v3/include/bits/new_allocator.h
+++ b/libstdc++-v3/include/bits/new_allocator.h
@@ -96,6 +96,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_GLIBCXX20_CONSTEXPR
__new_allocator(const __new_allocator<_Tp1>&) _GLIBCXX_USE_NOEXCEPT { }
 
+#if __cplusplus >= 201103L
+  __new_allocator& operator=(const __new_allocator&) = default;
+#endif
+
 #if __cplusplus <= 201703L
   ~__new_allocator() _GLIBCXX_USE_NOEXCEPT { }
 
diff --git a/libstdc++-v3/include/std/complex b/libstdc++-v3/include/std/complex
index f01a3af4371..0ba2167bf02 100644
--- a/libstdc++-v3/include/std/complex
+++ b/libstdc++-v3/include/std/complex
@@ -1359,6 +1359,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 #endif
 
+#if __cplusplus >= 201103L
+  _GLIBCXX14_CONSTEXPR complex(const complex&) = default;
+#endif
+
 #if __cplusplus > 202002L
   template
explicit(!requires(_Up __u) { value_type{__u}; })
@@ -1512,6 +1516,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 #endif
 
+#if __cplusplus >= 201103L
+  _GLIBCXX14_CONSTEXPR complex(const complex&) = default;
+#endif
+
 #if __cplusplus > 202002L
   template
explicit(!requires(_Up __u) { value_type{__u}; })
@@ -1666,6 +1674,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 #endif
 
+#if __cplusplus >= 201103L
+  _GLIBCXX14_CONSTEXPR complex(const complex&) = default;
+#endif
+
 #if __cplusplus > 202002L
   template
explicit(!requires(_Up __u) { value_type{__u}; })
@@ -1901,6 +1913,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Let the compiler synthesize the copy and assignment
   // operator.  It always does a pretty good job.
+  constexpr complex(const complex&) = default;
   constexpr complex& operator=(const complex&) = default;
 
   template
-- 
2.41.0



[committed] libstdc++: Fix some -Wmismatched-tags warnings

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_atomic.h (atomic): Change class-head
to struct.
* include/bits/stl_tree.h (_Rb_tree_merge_helper): Change
class-head to struct in friend declaration.
* include/std/chrono (tzdb_list::_Node): Likewise.
* include/std/future (_Task_state_base, _Task_state): Likewise.
* include/std/scoped_allocator (__inner_type_impl): Likewise.
* include/std/valarray (_BinClos, _SClos, _GClos, _IClos)
(_ValFunClos, _RefFunClos): Change class-head to struct.
---
 libstdc++-v3/include/bits/shared_ptr_atomic.h |  8 
 libstdc++-v3/include/bits/stl_tree.h  |  2 +-
 libstdc++-v3/include/std/chrono   |  6 +++---
 libstdc++-v3/include/std/future   |  4 ++--
 libstdc++-v3/include/std/scoped_allocator |  4 ++--
 libstdc++-v3/include/std/valarray | 12 ++--
 6 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h 
b/libstdc++-v3/include/bits/shared_ptr_atomic.h
index 3f921d311d6..b56b8153a89 100644
--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -358,7 +358,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus >= 202002L
 # define __cpp_lib_atomic_shared_ptr 201711L
   template
-class atomic;
+struct atomic;
 
   /**
* @addtogroup pointer_abstractions
@@ -376,7 +376,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   using value_type = _Tp;
 
-  friend class atomic<_Tp>;
+  friend struct atomic<_Tp>;
 
   // An atomic version of __shared_count<> and __weak_count<>.
   // Stores a _Sp_counted_base<>* but uses the LSB as a lock.
@@ -610,7 +610,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   template
-class atomic>
+struct atomic>
 {
 public:
   using value_type = shared_ptr<_Tp>;
@@ -733,7 +733,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   template
-class atomic>
+struct atomic>
 {
 public:
   using value_type = weak_ptr<_Tp>;
diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 3c331fbc952..f870f3dfa53 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -1554,7 +1554,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  = _Rb_tree<_Key, _Val, _KeyOfValue, _Compare2, _Alloc>;
 
   template
-   friend class _Rb_tree_merge_helper;
+   friend struct _Rb_tree_merge_helper;
 
   /// Merge from a compatible container into one with unique keys.
   template
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 9b160488afa..e63d6c71b4a 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2792,7 +2792,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 private:
   friend const tzdb& reload_tzdb();
-  friend class tzdb_list::_Node;
+  friend struct tzdb_list::_Node;
 
   explicit time_zone_link(nullptr_t) { }
 
@@ -2896,7 +2896,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   explicit leap_second(seconds::rep __s) : _M_s(__s) { }
 
-  friend class tzdb_list::_Node;
+  friend struct tzdb_list::_Node;
 
   friend const tzdb& reload_tzdb();
 
@@ -2937,7 +2937,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   friend const tzdb& reload_tzdb();
   friend class time_zone;
-  friend class tzdb_list::_Node;
+  friend struct tzdb_list::_Node;
 };
 
 tzdb_list& get_tzdb_list();
diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index b94ae0b679b..c46ead742c3 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -625,10 +625,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class _Async_state_impl;
 
 template
-  class _Task_state_base;
+  struct _Task_state_base;
 
 template
-  class _Task_state;
+  struct _Task_state;
 
 template
diff --git a/libstdc++-v3/include/std/scoped_allocator 
b/libstdc++-v3/include/std/scoped_allocator
index cb15c8cc7dd..8af432ada42 100644
--- a/libstdc++-v3/include/std/scoped_allocator
+++ b/libstdc++-v3/include/std/scoped_allocator
@@ -164,7 +164,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return _M_inner == __other._M_inner; }
 
 private:
-  template friend class __inner_type_impl;
+  template friend struct __inner_type_impl;
   template friend class scoped_allocator_adaptor;
 
   __type _M_inner;
@@ -186,7 +186,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 friend class scoped_allocator_adaptor;
 
   template
-friend class __inner_type_impl;
+   friend struct __inner_type_impl;
 
   tuple
   _M_tie() const noexcept
diff --git a/libstdc++-v3/include/std/valarray 
b/libstdc++-v3/include/std/valarray
index 6bd23e0914b..f172db6c623 100644
--- a/libstdc++-v3/include/

[committed] libstdc++: Fix a -Wsign-compare warning in std::list

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/list.tcc (list::sort(Cmp)): Fix -Wsign-compare
warning for loop condition.
---
 libstdc++-v3/include/bits/list.tcc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/list.tcc 
b/libstdc++-v3/include/bits/list.tcc
index 3e5b1f7b972..344386aa4d0 100644
--- a/libstdc++-v3/include/bits/list.tcc
+++ b/libstdc++-v3/include/bits/list.tcc
@@ -654,7 +654,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
{
  // Move all nodes back into *this.
  __carry._M_put_all(end()._M_node);
- for (int __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i)
+ for (size_t __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i)
__tmp[__i]._M_put_all(end()._M_node);
  __throw_exception_again;
}
-- 
2.41.0



[committed] libstdc++: Fix constexpr functions to conform to older standards

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Some constexpr functions were inadvertently relying on relaxed constexpr
rules from later standards.

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (duration_cast): Do not use braces
around statements for C++11 constexpr rules.
* include/bits/stl_algobase.h (__lg): Rewrite as a single
statement for C++11 constexpr rules.
* include/experimental/bits/fs_path.h (path::string): Use
_GLIBCXX17_CONSTEXPR not _GLIBCXX_CONSTEXPR for 'if constexpr'.
* include/std/charconv (__to_chars_8): Initialize variable for
C++17 constexpr rules.
---
 libstdc++-v3/include/bits/chrono.h   |  6 --
 libstdc++-v3/include/bits/stl_algobase.h | 15 ++-
 libstdc++-v3/include/experimental/bits/fs_path.h |  2 +-
 libstdc++-v3/include/std/charconv|  2 +-
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono.h 
b/libstdc++-v3/include/bits/chrono.h
index e9490767c56..b2713d57ef1 100644
--- a/libstdc++-v3/include/bits/chrono.h
+++ b/libstdc++-v3/include/bits/chrono.h
@@ -276,8 +276,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if constexpr (is_same_v<_ToDur, duration<_Rep, _Period>>)
  return __d;
else
+ {
 #endif
-   {
  using __to_period = typename _ToDur::period;
  using __to_rep = typename _ToDur::rep;
  using __cf = ratio_divide<_Period, __to_period>;
@@ -285,7 +285,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  using __dc = __duration_cast_impl<_ToDur, __cf, __cr,
__cf::num == 1, __cf::den == 1>;
  return __dc::__cast(__d);
-   }
+#if __cpp_inline_variables && __cpp_if_constexpr
+ }
+#endif
   }
 
 /** Trait indicating whether to treat a type as a floating-point type.
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index dd95e94f7e9..2037d6c0443 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1518,15 +1518,12 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   return std::__bit_width(make_unsigned_t<_Tp>(__n)) - 1;
 #else
   // Use +__n so it promotes to at least int.
-  const int __sz = sizeof(+__n);
-  int __w = __sz * __CHAR_BIT__ - 1;
-  if (__sz == sizeof(long long))
-   __w -= __builtin_clzll(+__n);
-  else if (__sz == sizeof(long))
-   __w -= __builtin_clzl(+__n);
-  else if (__sz == sizeof(int))
-   __w -= __builtin_clz(+__n);
-  return __w;
+  return (sizeof(+__n) * __CHAR_BIT__ - 1)
+  - (sizeof(+__n) == sizeof(long long)
+   ? __builtin_clzll(+__n)
+   : (sizeof(+__n) == sizeof(long)
+? __builtin_clzl(+__n)
+: __builtin_clz(+__n)));
 #endif
 }
 
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index e0e47188bb9..bf07754fd84 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -1042,7 +1042,7 @@ namespace __detail
 inline std::basic_string<_CharT, _Traits, _Allocator>
 path::string(const _Allocator& __a) const
 {
-  if _GLIBCXX_CONSTEXPR (is_same<_CharT, value_type>::value)
+  if _GLIBCXX17_CONSTEXPR (is_same<_CharT, value_type>::value)
return { _M_pathname.begin(), _M_pathname.end(), __a };
 
   using _WString = basic_string<_CharT, _Traits, _Allocator>;
diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index b34d672f5bd..cf2b1161014 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -242,7 +242,7 @@ namespace __detail
   static_assert(__integer_to_chars_is_unsigned<_Tp>, "implementation bug");
 
   to_chars_result __res;
-  unsigned __len;
+  unsigned __len = 0;
 
   if _GLIBCXX17_CONSTEXPR (__gnu_cxx::__int_traits<_Tp>::__digits <= 16)
{
-- 
2.41.0



[committed] libstdc++: Suppress clang -Wc99-extensions warnings in

2023-08-09 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This prevents Clang from warning about the use of the non-standard
__complex__ keyword.

libstdc++-v3/ChangeLog:

* include/std/complex: Add diagnostic pragma for clang.
---
 libstdc++-v3/include/std/complex | 9 +
 1 file changed, 9 insertions(+)

diff --git a/libstdc++-v3/include/std/complex b/libstdc++-v3/include/std/complex
index 0ba2167bf02..a4abe9aa96a 100644
--- a/libstdc++-v3/include/std/complex
+++ b/libstdc++-v3/include/std/complex
@@ -47,6 +47,11 @@
 // Get rid of a macro possibly defined in 
 #undef complex
 
+#ifdef __clang__
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wc99-extensions"
+#endif
+
 #if __cplusplus > 201703L
 # define __cpp_lib_constexpr_complex 201711L
 #endif
@@ -2642,4 +2647,8 @@ _GLIBCXX_END_NAMESPACE_VERSION
 
 #endif  // C++11
 
+#ifdef __clang__
+#pragma clang diagnostic pop
+#endif
+
 #endif  /* _GLIBCXX_COMPLEX */
-- 
2.41.0



RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Zhang, Annita via Gcc-patches



> -Original Message-
> From: Michael Matz 
> Sent: Wednesday, August 9, 2023 9:54 PM
> To: Zhang, Annita 
> Cc: Florian Weimer ; Hongtao Liu
> ; Beulich, Jan ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com;
> Liu, Hongtao ; Wang, Phoebe
> ; x86-64-abi ;
> llvm-dev ; Craig Topper ;
> Joseph Myers 
> Subject: RE: Intel AVX10.1 Compiler Design and Support
> 
> Hello,
> 
> On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:
> 
> > > The question is whether you want to mandate the 16-bit floating
> > > point extensions.  You might get better adoption if you stay
> > > compatible with shipping CPUs.  Furthermore, the 256-bit tuning
> > > apparently benefits current Intel CPUs, even though they can do 512-bit
> vectors.
> > >
> > > (The thread subject is a bit misleading for this sub-topic, by the
> > > way.)
> > >
> > > Thanks,
> > > Florian
> >
> > Since 256bit and 512bit are diverged from AVX10.1 and will continue in
> > the future AVX10 versions, I think it's hard to keep a single version
> > number to cover both and increase monotonically. Hence I'd like to
> > suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.
> 
> The raison d'etre for the x86-64-vX scheme is to make life sensible as
> distributor.  That goal can only be achieved if this scheme contains only a 
> few
> components that have a simple relationship.  That basically means:
> one dimension only.  If you now add a second dimension (with and without
> -512) we have to add another one if Intel (or whomever else) next does a
> marketing stunt for feature "foobar" and end up with x86-64-v6, x86-64-v6-
> 512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, x86-64-v6-
> 1024-foobar.
> 
> In short: no.
> 
> It isn't the right time anyway to assign meaning to x86-64-v5, as it wasn't 
> the
> right time for assigning x86-64-v4 (as we now see).  These are supposed to
> reflect generally useful feature sets actually shipped in generally available 
> CPUs
> in the market, and be vendor independend.  As such it's much too early to
> define v5 based purely on text documents.
> 
> 
> Ciao,
> Michael.

Make sense. 


Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread 钟居哲
Hi, Richard.

>> Yes, I think VEC_EXTRACT is suitable for this.

Thanks for suggestion.

Actually I was trying to use VEC_EXTRACT yesterday but it ICE since GCC failed 
to create LCSSA PHI for VEC_EXTRACT.

For example, like ARM SVE:
https://godbolt.org/z/rfbb4rfKv 

vect dump IR:
;; Created LCSSA PHI: loop_mask_36 = PHI 
   [local count: 105119324]:
  # vect_last_12.8_24 = PHI 
  # loop_mask_36 = PHI 
  _25 = .EXTRACT_LAST (loop_mask_36, vect_last_12.8_24);

Then it can work.

I was trying to do the similar thing but with VEC_EXTRACT as follows for RVV:

...
loop_len_36 = SELECT_VL

# vect_last_12.8_24 = PHI 
// Missed a LCSSA PHI.
_25 = .VEC_EXTRACT (loop_len_36, vect_last_12.8_24);

This Gimple IR will cause an ICE.

When I use EXTRACT_LAST instead of VEC_EXTRACT, then:
...
loop_len_36 = SELECT_VL

# vect_last_12.8_24 = PHI 
# loop_len_22 = PHI 
_25 = .VEC_EXTRACT (loop_len_22, vect_last_12.8_24);

Then it works.

I didn't figure out where to make GCC recognize VEC_EXTRACT to generate LCSSA 
PHI for VEC_EXTRACT.

Could you give me some help for this?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-09 22:17
To: 钟居哲
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Wed, 9 Aug 2023, ??? wrote:
 
> Hi, Richard.
> 
> >> I'm a bit behind of email, but why isn't BIT_FIELD_REF enough for
> >> the case that the patch is handling?  It seems that:
> 
> >>   .EXTRACT_LAST (len, vec)
> 
> >> is equivalent to:
> 
> >>   vec[len - 1]
> 
> >> I think eventually there'll be the temptation to lower/fold it like that.
> 
> Current BIT_FIELD_REF doesn't make use of LOOP_LEN.
 
Yes, BIT_FIELD_REF doesn't support variable offset.
 
> Consider this following case:
> 
> #include 
> #define EXTRACT_LAST(TYPE) \
>   TYPE __attribute__ ((noinline, noclone)) \
>   test_##TYPE (TYPE *x, int n, TYPE value) \
>   { \
> TYPE last; \
> for (int j = 0; j < n; ++j) \
>   { \
> last = x[j]; \
> x[j] = last * value; \
>   } \
> return last; \
>   }
> #define TEST_ALL(T) \
>   T (uint8_t) \
> TEST_ALL (EXTRACT_LAST)
> 
> The assembly:
> https://godbolt.org/z/z1PPT948b
> 
> test_uint8_t:
> mv  a3,a0
> ble a1,zero,.L10
> addiw   a5,a1,-1
> li  a4,14
> sext.w  a0,a1
> bleua5,a4,.L11
> srliw   a4,a0,4
> sllia4,a4,4
> mv  a5,a3
> add a4,a4,a3
> vsetivlizero,16,e8,m1,ta,ma
> vmv.v.x v3,a2
> .L4:
> vl1re8.vv1,0(a5)
> vmul.vv v2,v1,v3
> vs1r.v  v2,0(a5)
> addia5,a5,16
> bne a4,a5,.L4
> andia4,a1,-16
> mv  a5,a4
> vsetivlizero,8,e8,mf2,ta,ma
> beq a0,a4,.L16
> .L3:
> subwa0,a0,a4
> addiw   a7,a0,-1
> li  a6,6
> bleua7,a6,.L7
> sllia4,a4,32
> srlia4,a4,32
> add a4,a3,a4
> andia6,a0,-8
> vle8.v  v2,0(a4)
> vmv.v.x v1,a2
> andia0,a0,7
> vmul.vv v1,v1,v2
> vse8.v  v1,0(a4)
> addwa5,a6,a5
> beq a0,zero,.L8
> .L7:
> add a6,a3,a5
> lbu a0,0(a6)
> addiw   a4,a5,1
> mulwa7,a0,a2
> sb  a7,0(a6)
> bge a4,a1,.L14
> add a4,a3,a4
> lbu a0,0(a4)
> addiw   a6,a5,2
> mulwa7,a2,a0
> sb  a7,0(a4)
> ble a1,a6,.L14
> add a6,a3,a6
> lbu a0,0(a6)
> addiw   a4,a5,3
> mulwa7,a2,a0
> sb  a7,0(a6)
> ble a1,a4,.L14
> add a4,a3,a4
> lbu a0,0(a4)
> addiw   a6,a5,4
> mulwa7,a2,a0
> sb  a7,0(a4)
> ble a1,a6,.L14
> add a6,a3,a6
> lbu a0,0(a6)
> addiw   a4,a5,5
> mulwa7,a2,a0
> sb  a7,0(a6)
> ble a1,a4,.L14
> add a4,a3,a4
> lbu a0,0(a4)
> addiw   a5,a5,6
> mulwa6,a2,a0
> sb  a6,0(a4)
> ble a1,a5,.L14
> add a3,a3,a5
> lbu a0,0(a3)
> mulwa2,a2,a0
> sb  a2,0(a3)
> ret
> .L10:
> li  a0,0
> .L14:
> ret
> .L8:
> vslidedown.vi   v2,v2,7
> vmv.x.s a0,v2
> andia0,a0,0xff
> ret
> .L11:
> li  a4,0
> li  a5,0
> vsetivlizero,8,e8,mf2,ta,ma
> j   .L3
> .L16:
> vsetivlizero,16,e8,m1,ta,ma
> vslidedown.vi   v1,v1,15
> vmv.x.s a0,v1
> andia0,a0,0xff
> ret
> 
> 
> This patch is trying to optimize the codegen for RVV for length control,
> after this patch:
> 
> Gimple IR:
> 
>[local count: 955630224]:
>   # vectp_x.6_22 = PHI 
>   # vectp_x.10_30 = PHI 
>   # ivtmp_34 = P

Re: [Patch, fortran] PR109684 - compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-09 Thread Paul Richard Thomas via Gcc-patches
I took a look at my calendar and decided to backport right away.

r13-7703-ged049e5d5f36cc0f4318cd93bb6b33ed6f6f2ba7

BTW It is a regression :-)

Paul

On Wed, 9 Aug 2023 at 12:10, Paul Richard Thomas
 wrote:
>
> Committed to trunk as 'obvious' in
> r14-3098-gb8ec3c952324f866f191883473922e250be81341
>
> 13-branch to follow in a few days.
>
> Paul


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Andre Vieira (lists) via Gcc-patches

Here is my new version, see inline response to your comments.

New cover letter:

This patch enables the use of mixed-types for simd clones for AArch64, 
adds aarch64 as a target_vect_simd_clones and corrects the way the 
simdlen is chosen for non-specified simdlen clauses according to the 
'Vector Function Application Binary Interface Specification for AArch64'.


gcc/ChangeLog:

* config/aarch64/aarch64.cc (currently_supported_simd_type): 
Remove.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Determine 
simdlen according to NDS rule.

(lane_size): New function.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
* c-c++-common/gomp/declare-variant-14.c: Add aarch64 checks 
and remove warning check.

* g++.dg/gomp/attrs-10.C: Likewise.
* g++.dg/gomp/declare-simd-1.C: Likewise.
* g++.dg/gomp/declare-simd-3.C: Likewise.
* g++.dg/gomp/declare-simd-4.C: Likewise.
* gcc.dg/gomp/declare-simd-3.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* c-c++-common/gomp/pr60823-1.c: Remove warning check.
* c-c++-common/gomp/pr60823-3.c: Likewise.
* g++.dg/gomp/declare-simd-7.C: Likewise.
* g++.dg/gomp/declare-simd-8.C: Likewise.
* g++.dg/gomp/pr88182.C: Likewise.
* gcc.dg/declare-simd.c: Likewise.
* gcc.dg/gomp/declare-simd-1.c: Likewise.
* gcc.dg/gomp/pr87895-1.c: Likewise.
* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
* gfortran.dg/gomp/pr79154-1.f90: Likewise.
* gfortran.dg/gomp/pr83977.f90: Likewise.
* gcc.dg/gomp/pr87887-1.c: Add warning test.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/pr99542.c: Update warning test.



On 08/08/2023 11:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:



warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "unsupported return type %qT for % functions",
+   "unsupported return type %qT for simd",
ret_type);


What's the reason for s/% functions/simd/, in particular for
dropping the quotes around simd?


It's to align with i386's error message, this helps with testing as then 
I can avoid having different tests for the same error.


I asked Jakub which one he preferred, and he gave me an explanation why 
the i386's one was preferable, ... but I didn't write it down unfortunately.





return 0;
  }
  
+  nfs_type = ret_type;


Genuine question, but what does nfs stand for in this context?

Was supposed to be nds... my bad.

I don't think this implements the NDS calculation in the spec:

  The `Narrowest Data Size of f`, or ``NDS(f)``, as the minumum of
  the lane size ``LS(P)`` among all input parameters and
  return value  of ``f``.

   ...

   We then define the `Lane Size of P`, or ``LS(P)``, as follows.

   1. If ``MTV(P)`` is ``false`` and ``P`` is a pointer or reference to
  some type ``T`` for which ``PBV(T)`` is ``true``, ``LS(P) =
  sizeof(T)``.
   2. If ``PBV(T(P))`` is ``true``, ``LS(P) = sizeof(P)``.
   3. Otherwise ``LS(P) = sizeof(uintptr_t)``.

AIUI, (1) means that we need to look at the targets of uniform and
linear scalars[*] that have pointer type, so that e.g. a uniform uint8_t *
pointer should cause NDS to be 1.

[*] i.e. arguments that remain scalar in the vector prototype

(2) means that other types of uniform and linear scalars do contribute.
A uniform uint8_t should cause NDS to be 1.


You are right, I misread the ABI description there.



Thanks,
Richarddiff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
7cd230c4602a15980016bdc92e80579be0c07094..458a4dbf76138e329eb99077780089a9b501c046
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27274,28 +27274,57 @@ supported_simd_type (tree t)
   return false;
 }
 
-/* Return true for types that currently are supported as SIMD return
-   or argument types.  */
+/* Determine the lane size for the clone argument/return type.  This follows
+   the LS(P) rule in the VFABIA64.  */
 
-static bool
-currently_supported_simd_type (tree t, tree b)
+static unsigned
+lane_size (cgraph_simd_clone_arg_type clone_arg_type, tree type)
 {
-  if (COMPLEX_FLOAT_TYPE_P (t))
-return false;
+  gcc_assert (clone_arg_type != SIMD_CLONE_ARG_TYPE_MASK);
 
-  if (TYPE_SIZE (t) != TYPE_SIZE (b))
-return false;
+  /* For non map-to-vector types that are pointers we use the element type it
+ points to.  */
+  if (POINTER_TYPE_P (type))
+switch (clone_arg_type)
+  {
+  default:
+   break;
+  case SIMD_CLONE_ARG_TYPE_UNIFORM:
+  case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+  case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
+   type = TREE_TYPE (type

Re: [PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-09 Thread Carl Love via Gcc-patches
Kewen:

On Wed, 2023-08-09 at 16:47 +0800, Kewen.Lin wrote:


> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> 
> Okay for trunk with two nits below fixed, thanks!

Thanks for all the help with the patch.  Fixed the nits below, compiled
and reran the test cases to make sure everything was OK.  Will go ahead
and commit the patch.
> 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h.  Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> 
>   s/ moved/: Move/ => "... execute_test_functions): Move "
>   
> s/Added/Add/

Fixed both issues.

> 



> >  
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand "altivec_vcmpne_"
> > +  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand"
> > "=v")
> > +   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1
> > "altivec_register_operand" "v")
> > + (match_operand:VSX_EXTRACT_I 2
> > "altivec_register_operand" "v")))
> > +   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand"
> > "=v")
> > +(not:VSX_EXTRACT_I (match_dup 3)))]
> > +  "TARGET_ALTIVEC"
> > +  {
> > +operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> > +  });
> 
> Nit: Useless ";".

removed semicolon.

   Carl 



[PATCH] rs6000, add overloaded DFP quantize support

2023-08-09 Thread Carl Love via Gcc-patches


GCC maintainers:

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love


--
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the bult-in.  The RM value is a 2-bit const int which
specifies the rounding mode to use.  For the immediate versions of the
built-in, TE field is a 5-bit constant that specifies the value of the
ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int)
  __Decimal128 builtin_dfpq_quantize (_Decimal128, _Decimal128,
  const int RM)
  __Decimal128 builtin_dfpq_quantize (const int TE, _Decimal128,
  const int)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPECDQUAN.
(dfp_quan_, dfp_quan_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
__builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
__builtin_dfp_quantize_128i): New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
__builtin_dfpq_quantize): New overloaded definitions.

gcc/testsuite/
 * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
case.
---
 gcc/config/rs6000/dfp.md  |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 .../powerpc/builtin-dfp-quantize-runnable.c   | 198 ++
 4 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-dfp-quantize-runnable.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..254c22a5c20 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_quan_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:QI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_quan_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "const_int_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 35c4cdf74c5..36a56311643 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_quantize_64 (_Decimal64, _Decimal64, \
+ const int<2>);
+DFPQUAN_64 dfp_quan_dd {}
+
+  const _Decimal64 __builtin_dfp_quantize_64i (const int<5>, _Decimal64, \
+const int<2>);
+DFPQUAN_64i dfp_quan_idd {}
+
+  const _Decimal128 __builtin_dfp_quantize_128 (_Decimal128, _Decimal128, \
+ const int<2>);
+DFPQUAN_128 dfp_quan_td {}
+
+  const _Decimal128 __builtin_dfp_quantize_128i (const int<5>, _Decimal128, \
+  const int<2>);
+DFPQUAN_128i dfp_quan_itd {}
 
 [crypto]
   const vull __builtin_crypto_vcipher (vull, vull);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index b83946f5ad8..3bb1bedd69d 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -195,6 +195,18 @@
   unsigned l

verify

2023-08-09 Thread Saurabh Jha via Gcc-patches
verify


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Qing Zhao via Gcc-patches
Hi, Martin,

Thanks for raising this issue.

Although this is an old FAM related issue that does not relate to my current 
patch 
(and might need to be resolved in a separate patch).  I think that it’s 
necessary to have
more discussion on this old issue and resolve it. 

The first thing that I’d like to confirm is:

What the exact memory layout for the following structure x?

struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };

And the key that is confusing me is, where should the field “t” start? 

A.  Starting at offset 8 as the following:

a   4-bytes
b   2-bytes
padding   2-bytes
t   3-bytes

B. Starting at offset 6 as the following:

a   4-bytes
b   2-bytes
t   3-bytes

From my understanding, A should be correct. However, when I debugged into gcc, 
I found that the following

tree
byte_position (const_tree field)
{
  return byte_from_pos (DECL_FIELD_OFFSET (field),
DECL_FIELD_BIT_OFFSET (field));
}

Returned 6 for the field “t”:

498   tree pos = byte_position (last);
(gdb) n
499   size = fold_build2 (PLUS_EXPR, TREE_TYPE (size), pos, compsize);
(gdb) call debug_generic_expr(pos)
6

So, I suspect that there is a bug in GCC which incorrectly represent the offset 
of the FAM field in the IR.

Thanks.

Qing
> On Aug 8, 2023, at 10:54 AM, Martin Uecker  wrote:
> 
> 
> 
> I am sure this has been discussed before, but seeing that you
> test for a specific formula, let me point out the following:
> 
> There at least three different size expression which could
> make sense. Consider
> 
> short foo { int a; short b; char t[]; }; 
> 
> Most people seem to use
> 
> sizeof(struct foo) + N * sizeof(foo->t);
> 
> which for N == 3 yields 11 bytes on x86-64 because the formula
> adds the padding of the original struct. There is an example
> in the  C standard that uses this formula.
> 
> 
> But he minimum size of an object which stores N elements is
> 
> max(sizeof (struct s), offsetof(struct s, t[n]))
> 
> which is 9 bytes. 
> 
> This is what clang uses for statically allocated objects with
> initialization, while GCC uses the rule above (11 bytes). But 
> bdos / bos  then returns the smaller size of 9 which is a bit
> confusing.
> 
> 
> https://godbolt.org/z/K1hvaK1ns
> 
> https://github.com/llvm/llvm-project/issues/62929
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956
> 
> 
> Then there is also the size of a similar array where the FAM
> is replaced with an array of static size:
> 
> struct foo { int a; short b; char t[3]; }; 
> 
> This would make the most sense to me, but it has 12 bytes
> because the padding is according to the usual alignment
> rules.
> 
> 
> Martin
> 
> 
> 
> Am Montag, dem 07.08.2023 um 09:16 -0700 schrieb Kees Cook:
>> On Fri, Aug 04, 2023 at 07:44:28PM +, Qing Zhao wrote:
>>> This is the 2nd version of the patch, per our discussion based on the
>>> review comments for the 1st version, the major changes in this version
>>> are:
>> 
>> Thanks for the update!
>> 
>>> 
>>> 1. change the name "element_count" to "counted_by";
>>> 2. change the parameter for the attribute from a STRING to an
>>> Identifier;
>>> 3. Add logic and testing cases to handle anonymous structure/unions;
>>> 4. Clarify documentation to permit the situation when the allocation
>>> size is larger than what's specified by "counted_by", at the same time,
>>> it's user's error if allocation size is smaller than what's specified by
>>> "counted_by";
>>> 5. Add a complete testing case for using counted_by attribute in
>>> __builtin_dynamic_object_size when there is mismatch between the
>>> allocation size and the value of "counted_by", the expecting behavior
>>> for each case and the explanation on why in the comments. 
>> 
>> All the "normal" test cases I have are passing; this is wonderful! :)
>> 
>> I'm still seeing unexpected situations when I've intentionally set
>> counted_by to be smaller than alloc_size, but I assume it's due to not
>> yet having the patch you mention below.
>> 
>>> As discussed, I plan to add two more separate patch sets after this initial
>>> patch set is approved and committed.
>>> 
>>> set 1. A new warning option and a new sanitizer option for the user error
>>>when the allocation size is smaller than the value of "counted_by".
>>> set 2. An improvement to __builtin_dynamic_object_size  for the following
>>>case:
>>> 
>>> struct A
>>> {
>>> size_t foo;
>>> int array[] __attribute__((counted_by (foo)));
>>> };
>>> 
>>> extern struct fix * alloc_buf ();
>>> 
>>> int main ()
>>> {
>>> struct fix *p = alloc_buf ();
>>> __builtin_object_size(p->array, 0) == sizeof(struct A) + p->foo * 
>>> sizeof(int);
>>>   /* with the current algorithm, it’s UNKNOWN */ 
>>> __builtin_object_size(p->array, 2) == sizeof(struct A) + p->foo * 
>>> sizeof(int);
>>>   /* with the current algorithm, it’s UNKNOWN */
>>> }
>> 
>> Should the above be bdos instead of bos?
>> 
>>> Bootstra

[PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-09 Thread Andrew Pinski via Gcc-patches
If `A` has a range of `[0,0][100,INF]` and the comparison
of `A < 50`. This should be optimized to `A <= 0` (which then
will be optimized to just `A == 0`).
This patch implement this via a new function which sees if
the constant of a comparison is in the middle of 2 range pairs
and change the constant to the either upper bound of the first pair
or the lower bound of the second pair depending on the comparison.

This is the first step in fixing the following PRS:
PR 110131, PR 108360, and PR 108397.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* vr-values.cc (simplify_compare_using_range_pairs): New function.
(simplify_using_ranges::simplify_compare_using_ranges_1): Call
it.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp124.c: New test.
* gcc.dg/pr21643.c: Disable VRP.
---
 gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
 gcc/vr-values.cc   | 65 ++
 3 files changed, 114 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c

diff --git a/gcc/testsuite/gcc.dg/pr21643.c b/gcc/testsuite/gcc.dg/pr21643.c
index 4e7f93d351a..7f121d7006f 100644
--- a/gcc/testsuite/gcc.dg/pr21643.c
+++ b/gcc/testsuite/gcc.dg/pr21643.c
@@ -1,6 +1,10 @@
 /* PR tree-optimization/21643 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
logical-op-non-short-circuit=1" } */
+/* Note VRP is able to transform `c >= 0x20` in f7
+   to `c >= 0x21` since we want to test
+   reassociation and not VRP, turn it off. */
+
+/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
logical-op-non-short-circuit=1 -fno-tree-vrp" } */
 
 int
 f1 (unsigned char c)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
new file mode 100644
index 000..6ccbda35d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Should be optimized to a == -100 */
+int g(int a)
+{
+  if (a == -100 || a >= 0)
+;
+  else
+return 0;
+  return a < 0;
+}
+
+/* Should optimize to a == 0 */
+int f(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 50;
+}
+
+/* Should be optimized to a == 0. */
+int f2(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 100;
+}
+
+/* Should optimize to a == 100 */
+int f1(int a)
+{
+  if (a < 0 || a == 100)
+;
+  else
+return 0;
+  return a > 50;
+}
+
+/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index a4fddd62841..1262e7cf9f0 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -968,9 +968,72 @@ test_for_singularity (enum tree_code cond_code, tree op0,
   if (operand_equal_p (min, max, 0) && is_gimple_min_invariant (min))
return min;
 }
+
   return NULL;
 }
 
+/* Simplify integer comparisons such that the constant is one of the range 
pairs.
+   For an example, 
+   A has a range of [0,0][100,INF]
+   and the comparison of `A < 50`.
+   This should be optimized to `A <= 0`
+   and then test_for_singularity can optimize it to `A == 0`.   */
+
+static bool
+simplify_compare_using_range_pairs (tree_code &cond_code, tree &op0, tree &op1,
+   const value_range *vr)
+{
+  if (TREE_CODE (op1) != INTEGER_CST
+  || vr->num_pairs () < 2)
+return false;
+  auto val_op1 = wi::to_wide (op1);
+  tree type = TREE_TYPE (op0);
+  auto sign = TYPE_SIGN (type);
+  auto p = vr->num_pairs ();
+  /* Find the value range pair where op1
+ is in the middle of if one exist. */
+  for (unsigned i = 1; i < p; i++)
+{
+  auto lower = vr->upper_bound (i - 1);
+  auto upper = vr->lower_bound (i);
+  if (wi::lt_p (val_op1, lower, sign))
+   continue;
+  if (wi::gt_p (val_op1, upper, sign))
+   continue;
+  if (cond_code == LT_EXPR
+  && val_op1 != lower)
+{
+ op1 = wide_int_to_tree (type, lower);
+ cond_code = LE_EXPR;
+ return true;
+}
+  if (cond_code == LE_EXPR
+  && val_op1 != upper
+  && val_op1 != lower)
+{
+ op1 = wide_int_to_tree (type, lower);
+ cond_code = LE_EXPR;
+ return true;
+}
+  if (cond_code == GT_EXPR
+  && val_op1 != upper)
+{
+ op1 = wide_int_to_tree (type, upper);
+ cond_code = GE_EXPR;
+ return true;
+}
+  if (cond_code == GE_EXPR
+  && val_op1 != lower
+  && val_op1 != upper)
+{
+ op1 = wide_int_to_tree (type, upper);
+ cond_code = GE_EXPR;
+ return true;
+}
+}
+  return false;
+}
+
 /* Return whether the value range *VR fits in an integer type specified
by PRECISION and UNSIGNED_P.  */
 
@@

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> Although this is an old FAM related issue that does not relate to my current 
> patch 
> (and might need to be resolved in a separate patch).  I think that it’s 
> necessary to have
> more discussion on this old issue and resolve it. 
> 
> The first thing that I’d like to confirm is:
> 
> What the exact memory layout for the following structure x?
> 
> struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> 
> And the key that is confusing me is, where should the field “t” start? 
> 
> A.  Starting at offset 8 as the following:
> 
> a 4-bytes
> b 2-bytes
> padding   2-bytes
> t 3-bytes

Why should there be padding before 't'?  It's a char array (FAM or not), 
so it can be (and should be) placed directly after 'b'.  So ...

> B. Starting at offset 6 as the following:
> 
> a 4-bytes
> b 2-bytes
> t 3-bytes

... this is the correct layout, when seen in isolation.  The discussion 
revolves around what should come after 't': if it's a non-FAM struct (with 
t[3]), then it's clear that there needs to be padding after it, so to pad 
out the whole struct to be 12 bytes long (for sizeof() purpose), as 
required by its alignment (due to the int field 'a').

So, should the equivalent FAM struct also have this sizeof()?  If no: 
there should be a good argument why it shouldn't be similar to the non-FAM 
one.

Then there is an argument that the compiler would be fine, when allocating 
a single object of such type (not as part of an array!), to only reserve 9 
bytes of space for the FAM-struct.  Then the question is: should it also 
do that for a non-FAM struct (based on the argument that the padding 
behind 't' isn't accessible, and hence doesn't need to be alloced).  I 
think it would be quite unexpected for the compiler to actually reserve 
less space than sizeof() implies, so I personally don't buy that argument.  
For FAM or non-FAM structs.

Note that if one choses to allocate less space than sizeof implies that 
this will have quite some consequences for code generation, in that 
sometimes the instruction sequences (e.g. for copying) need to be careful 
to never access tail padding that should be there in array context, but 
isn't there in single-object context.  I think this alone should make it 
clear that it's advisable that sizeof() and allocated size agree.

As in: I think sizeof for both structs should return 12, and 12 bytes 
should be reserved for objects of such types.

And then the next question is what __builtin_object_size should do with 
these: should it return the size with or without padding at end (i.e. 
could/should it return 9 even if sizeof is 12).  I can see arguments for 
both.


Ciao,
Michael.


Re: [PATCH v4] Implement new RTL optimizations pass: fold-mem-offsets.

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/7/23 08:33, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

We still have the m68k issue to deal with.

We've got these key insns going into fold-mem-offsets:


(insn 39 38 41 3 (set (reg:SI 13 %a5 [orig:36 _14 ] [36])
(plus:SI (reg:SI 10 %a2 [47])
(const_int 1 [0x1]))) "j.c":19:3 157 {*addsi3_internal}
 (nil))
(insn 41 39 42 3 (set (reg:SI 8 %a0 [61])
(plus:SI (reg/f:SI 12 %a4 [52])
(reg:SI 13 %a5 [orig:36 _14 ] [36]))) "j.c":19:3 discrim 1 157 
{*addsi3_internal}
 (nil))
(insn 42 41 43 3 (set (mem:SI (reg:SI 8 %a0 [61]) [0 MEM  [(void 
*)_15]+0 S4 A8])
(const_int 1633837924 [0x61626364])) "j.c":19:3 discrim 1 55 
{*movsi_m68k2}
 (nil))
(insn 43 42 45 3 (set (mem:QI (plus:SI (reg:SI 8 %a0 [61])
(const_int 4 [0x4])) [0 MEM  [(void *)_15]+4 S1 A8])
(const_int 101 [0x65])) "j.c":19:3 discrim 1 62 {*m68k.md:1130}
 (expr_list:REG_DEAD (reg:SI 8 %a0 [61])
(nil)))


[ ... ]

(insn 58 57 59 3 (set (reg:SI 8 %a0 [72])
(plus:SI (plus:SI (reg:SI 13 %a5 [orig:36 _14 ] [36])
(reg:SI 11 %a3 [49]))
(const_int 5 [0x5]))) "j.c":24:3 421 {*lea}
 (expr_list:REG_DEAD (reg:SI 13 %a5 [orig:36 _14 ] [36])
(expr_list:REG_DEAD (reg:SI 11 %a3 [49])
(nil



f-m-o will propagate the (const_int 1) from insn 39 into insns 42 & 43 
and modifies insn 39 into a simple copy.  Note carefully that turning 
insn 39 into a simple copy changes the value in a5.


As a result insn 58 computes the wrong value and the test (strlenopt-2) 
fails on the m68k.


In fold_offsets we have this code:


 /* We only fold through instructions that are transitively used as
 memory addresses and do not have other uses.  Use the same logic
 from offset calculation to visit instructions that can propagate
 offsets and keep track of them in CAN_FOLD_INSNS.  */
  bool success;
  struct df_link *uses = get_uses (def, dest, &success), *ref_link;

  if (!success)
return 0;

  for (ref_link = uses; ref_link; ref_link = ref_link->next)
{
  rtx_insn* use = DF_REF_INSN (ref_link->ref);

  if (DEBUG_INSN_P (use))
continue;

  /* Punt if the use is anything more complicated than a set
 (clobber, use, etc).  */
  if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
return 0;

  /* This use affects instructions outside of CAN_FOLD_INSNS.  */
  if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
return 0;

  rtx use_set = PATTERN (use);

  /* Special case: A foldable memory store is not foldable if it
 mentions DEST outside of the address calculation.  */
  if (use_set && MEM_P (SET_DEST (use_set))
  && reg_mentioned_p (dest, SET_SRC (use_set)))
return 0;
}


AFAICT that code is supposed to detect if there are uses outside of the 
set of insns that can be optimized.  In our case DEF is a0 (the base of 
the memory references) and its def is insn 41.  The assignment of a0 in 
insn 41 only reaches insns 42 and 43 which are marked in can_fold_insns.


That's find and good, but insufficient for correctness.  ISTM we must 
look at the uses of any insn where we change the result of the 
computation, not just the def of the base register in the memory reference.


In this particular case we're going to modify insn 39, so we need to 
look at the uses of a5 (the value defined by insn 39).  If we did that 
we'd see the use of a5 at insn 38 and insn 38 and reject the optimization.




Testcase below.  You should be able to see the incorrect transformation 
with a m68k-linux-gnu cross compiler with -O2.




typedef unsigned int size_t;
extern void abort (void);
void *calloc (size_t, size_t);

Re: [PATCH] Handle in-order reductions when SLP vectorizing non-loops

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 07:51, Alexander Monakov wrote:


On Wed, 9 Aug 2023, Richard Biener via Gcc-patches wrote:


The following teaches the non-loop reduction vectorization code to
handle non-associatable reductions.  Using the existing FOLD_LEFT_PLUS
internal functions might be possible but I'd have to convince myself
that +0.0 + x[0] is a safe extra operation in ever rounding mode
(I also have no way to test the resulting code).


It's not. Under our default -fno-signaling-nans -fno-rounding-math
negative zero is the neutral element for addition, so '-0.0 + x[0]'
might be (but negative zero costs more to materialize).

If the reduction has at least two elements, then

-0.0 + x[0] + x[1]

has the same behavior w.r.t SNaNs as 'x[0] + x[1]', but unfortunately
yields negative zero when x[0] = x[1] = +0.0 and rounding towards
negative infinity (unlike x[0] + x[1], which is +0.0).
Hmm, then there's a bug in an non-released port I worked on a while 
back.  It supports FOLD_LEFT_PLUS by starting the sequence with a +0.0 
in the destination register.


I guess if that port ever gets upstreamed I'll have to keep an eye out 
for that problem.  Luckily I think they can synthesize a -0.0 trivially, 
potentially even zero cost.


Thanks!
Jeff


[PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-09 Thread Di Zhao OS via Gcc-patches
Hi,

The previous version of this patch tries to solve two problems
at the same time. For better clarity, I'll separate them and 
only deal with the "nested" FMA in this version. I plan to
propose another patch in avoiding bad shaped FMA (deferring FMA).

Other changes:

1. Added new testcases for the "nested" FMA issue. For the
   following code:

tmp1 = a + c * c + d * d + x * y;
tmp2 = x * tmp1;
result += (a + c + d + tmp2);

   , when "tmp1 = ..." is not rewritten, tmp1 will be result of
   an FMA, and there will be a list of consecutive FMAs: 

_1 = .FMA (c, c, a_39);
_2 = .FMA (d, d, _1);
tmp1 = .FMA (x, y, _2);
_3 = .FMA (tmp1, x, d);
...
   
   If "tmp1 = ..." is rewritten to parallel, tmp1 will be result
   of a PLUS_EXPR between FMAs:

_1 = .FMA (c, c, a_39);
_2 = x * y;
_3 = .FMA (d, d, _2);
 tmp1 = _3 + _1;
 _4 = .FMA (tmp1, x, d);
...

   It seems the register pressure of the latter is higher than
   the former. On the test machines we have (including Ampere1,
   Neoverse-n1 and Intel Xeon), with "tmp1 = ..." is rewritten to
   parallel, the run time all increased significantly. In
   contrast, when "tmp1" is not the 1st or 2nd operand of another
   FMA (pr110279-1.c), rewriting it results in better performance.
   (I'll also append the testcases in the bug tracker.)

2. Enhanced checking for nested FMA by: 1) Modified
   convert_mult_to_fma so it can return multiple LHS.  2) Check
   NEGATE_EXPRs for nested FMA.

(I think maybe this can be further refined by enabling rewriting
to parallel for very long op list. )

Bootstrapped and regression tested on x86_64-linux-gnu.

Thanks,
Di Zhao



PR tree-optimization/110279

gcc/ChangeLog:

* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added
new parameter collect_lhs.
(struct fma_transformation_info): Moved to header.
(class fma_deferring_state): Moved to header.
(convert_mult_to_fma): Added new parameter collect_lhs.
* tree-ssa-math-opts.h (struct fma_transformation_info):
(class fma_deferring_state): Moved from .cc.
(convert_mult_to_fma): Moved from .cc.
* tree-ssa-reassoc.cc (enum fma_state): Defined enum to
describe the state of FMA candidates for a list of
operands.
(rewrite_expr_tree_parallel): Changed boolean parameter
to enum type.
(has_nested_fma_p): New function to check for nested FMA
on given multiplication statement.
(rank_ops_for_fma): Return enum fma_state.
(reassociate_bb): Avoid rewriting to parallel if nested
FMAs are found.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110279-1.c: New test.
* gcc.dg/pr110279-2.c: New test.


tree-optimization-110279-Check-for-nested-FMA-in-rea.patch
Description: tree-optimization-110279-Check-for-nested-FMA-in-rea.patch


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Here is my new version, see inline response to your comments.
>
> New cover letter:
>
> This patch enables the use of mixed-types for simd clones for AArch64, 
> adds aarch64 as a target_vect_simd_clones and corrects the way the 
> simdlen is chosen for non-specified simdlen clauses according to the 
> 'Vector Function Application Binary Interface Specification for AArch64'.
>
> gcc/ChangeLog:
>
>  * config/aarch64/aarch64.cc (currently_supported_simd_type): 
> Remove.
>  (aarch64_simd_clone_compute_vecsize_and_simdlen): Determine 
> simdlen according to NDS rule.
>  (lane_size): New function.
>
> gcc/testsuite/ChangeLog:
>
>  * lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
>  * c-c++-common/gomp/declare-variant-14.c: Add aarch64 checks 
> and remove warning check.
>  * g++.dg/gomp/attrs-10.C: Likewise.
>  * g++.dg/gomp/declare-simd-1.C: Likewise.
>  * g++.dg/gomp/declare-simd-3.C: Likewise.
>  * g++.dg/gomp/declare-simd-4.C: Likewise.
>  * gcc.dg/gomp/declare-simd-3.c: Likewise.
>  * gcc.dg/gomp/simd-clones-2.c: Likewise.
>  * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>  * c-c++-common/gomp/pr60823-1.c: Remove warning check.
>  * c-c++-common/gomp/pr60823-3.c: Likewise.
>  * g++.dg/gomp/declare-simd-7.C: Likewise.
>  * g++.dg/gomp/declare-simd-8.C: Likewise.
>  * g++.dg/gomp/pr88182.C: Likewise.
>  * gcc.dg/declare-simd.c: Likewise.
>  * gcc.dg/gomp/declare-simd-1.c: Likewise.
>  * gcc.dg/gomp/pr87895-1.c: Likewise.
>  * gfortran.dg/gomp/declare-simd-2.f90: Likewise.
>  * gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
>  * gfortran.dg/gomp/pr79154-1.f90: Likewise.
>  * gfortran.dg/gomp/pr83977.f90: Likewise.
>  * gcc.dg/gomp/pr87887-1.c: Add warning test.
>  * gcc.dg/gomp/pr89246-1.c: Likewise.
>  * gcc.dg/gomp/pr99542.c: Update warning test.
>
>
>
> On 08/08/2023 11:51, Richard Sandiford wrote:
>> "Andre Vieira (lists)"  writes:
>
>>> warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>>> -   "unsupported return type %qT for % functions",
>>> +   "unsupported return type %qT for simd",
>>> ret_type);
>> 
>> What's the reason for s/% functions/simd/, in particular for
>> dropping the quotes around simd?
>
> It's to align with i386's error message, this helps with testing as then 
> I can avoid having different tests for the same error.
>
> I asked Jakub which one he preferred, and he gave me an explanation why 
> the i386's one was preferable, ... but I didn't write it down unfortunately.

Jakub: do you remember what the reason was?  I don't mind dropping
"function", but it feels weird to drop the quotes around "simd".
Seems like, if we do that, there'll one day be a patch to add
them back. :)

Thanks,
Richard


Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 04:51, Juzhe-Zhong wrote:

Realize we have a bug in VSETVL PASS which is triggered by strided_load_run-1.c 
in RV32 system.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test

This is because VSETVL PASS incorrect hoist vsetvl instruction:

...
10156:  0d9075d7vsetvli a1,zero,e64,m2,ta,ma ---> 
pollute 'a1' register which will be used by following insns.
1015a:  01d586b3add a3,a1,t4  > use 'a1'
1015e:  5e070257vmv.v.v v4,v14
10162:  b7032257vmacc.vvv4,v6,v16
10166:  26440257vand.vv v4,v4,v8
1016a:  22880227vs2r.v  v4,(a6)
1016e:  00b6b7b3sltua5,a3,a1
10172:  22888227vs2r.v  v4,(a7)
10176:  9e60b157vmv2r.v v2,v6
1017a:  97baadd a5,a5,a4
1017c:  a6a62157vmadd.vvv2,v12,v10
10180:  26240157vand.vv v2,v2,v8
10184:  22830127vs2r.v  v2,(t1)
10188:  873emv  a4,a5
1018a:  982aadd a6,a6,a0
1018c:  98aaadd a7,a7,a0
1018e:  932aadd t1,t1,a0
10190:  85b6mv  a1,a3   -> set 'a1'
...

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix 
incorrect anticipate info.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
Adapt test.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-36.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.

OK.

Do we need to backport this to gcc-13?

jeff


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Andre Vieira (lists) via Gcc-patches




On 09/08/2023 17:55, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:


On 08/08/2023 11:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:



warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "unsupported return type %qT for % functions",
+   "unsupported return type %qT for simd",
ret_type);


What's the reason for s/% functions/simd/, in particular for
dropping the quotes around simd?


It's to align with i386's error message, this helps with testing as then
I can avoid having different tests for the same error.

I asked Jakub which one he preferred, and he gave me an explanation why
the i386's one was preferable, ... but I didn't write it down unfortunately.


Jakub: do you remember what the reason was?  I don't mind dropping
"function", but it feels weird to drop the quotes around "simd".
Seems like, if we do that, there'll one day be a patch to add
them back. :)


After some IRC scrolling, unfortunately my client doesn't have a fancy 
search :(


avieira> Andre Vieira
jakub: which one do you prefer?
1:59 PM
"unsupported argument type %qT for simd" (i386)
1:59 PM
 "unsupported argument type %qT for % functions", (aarch64)
1:59 PM
Gonna change one to be the same as the other ...
2:04 PM
→ gaius joined  ⇐ lh_ideapad, egallager and jwakely_ quit
2:36 PM 
I'd just go with for simd; % functions isn't an established term, 
it would be either % functions, but we have also simd 
attribute...


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
> Jakub: do you remember what the reason was?  I don't mind dropping
> "function", but it feels weird to drop the quotes around "simd".
> Seems like, if we do that, there'll one day be a patch to add
> them back. :)

Because in OpenMP their are % functions, not %
%functions, but we also have the %/% attribute as
extension.

Jakub



Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> Jakub: do you remember what the reason was?  I don't mind dropping
>> "function", but it feels weird to drop the quotes around "simd".
>> Seems like, if we do that, there'll one day be a patch to add
>> them back. :)
>
> Because in OpenMP their are % functions, not %
> %functions, but we also have the %/% attribute as
> extension.

Yeah, I can understand dropping the "function" bit.  But why
s/unsupported ... for %/unsupported ... for simd/?
Even if it's only a partial syntax quote, it is still a syntax quote.

Thanks,
Richard


Re: [RFC] GCC Security policy

2023-08-09 Thread Siddhesh Poyarekar

On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
Do you have a suggestion for the language to address libgcc, 
libstdc++, etc. and libiberty, libbacktrace, etc.?


I'll work on this a bit and share a draft.


Hi David,

Here's what I came up with for different parts of GCC, including the 
runtime libraries.  Over time we may find that specific parts of runtime 
libraries simply cannot be used safely in some contexts and flag that.


Sid

"""
What is a GCC security bug?
===

A security bug is one that threatens the security of a system or
network, or might compromise the security of data stored on it.
In the context of GCC there are multiple ways in which this might
happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

The compiler driver processes source code, invokes other programs
such as the assembler and linker and generates the output result,
which may be assembly code or machine code.  It is necessary that
all source code inputs to the compiler are trusted, since it is
impossible for the driver to validate input source code beyond
conformance to a programming language standard.

The GCC JIT implementation, libgccjit, is intended to be plugged
into applications to translate input source code in the application
context.  Limitations that apply to the compiler
driver, apply here too in terms of sanitizing inputs, so it is
recommended that inputs are either sanitized by an external program
to allow only trusted, safe execution in the context of the
application or the JIT execution context is appropriately sandboxed
to contain the effects of any bugs in the JIT or its generated code
to the sandboxed environment.

Support libraries such as libiberty, libcc1 libvtv and libcpp have
been developed separately to share code with other tools such as
binutils and gdb.  These libraries again have similar challenges to
compiler drivers.  While they are expected to be robust against
arbitrary input, they should only be used with trusted inputs.

Libraries such as zlib and libffi that bundled into GCC to build it
will be treated the same as the compiler drivers and programs as far
as security coverage is concerned.

As a result, the only case for a potential security issue in all
these cases is when it ends up generating vulnerable output for
valid input source code.

Language runtime libraries
--

GCC also builds and distributes libraries that are intended to be
used widely to implement runtime support for various programming
languages.  These include the following:

* libada
* libatomic
* libbacktrace
* libcc1
* libcody
* libcpp
* libdecnumber
* libgcc
* libgfortran
* libgm2
* libgo
* libgomp
* libiberty
* libitm
* libobjc
* libphobos
* libquadmath
* libssp
* libstdc++

These libraries are intended to be used in arbitrary contexts and as
a result, bugs in these libraries may be evaluated for security
impact.  However, some of these libraries, e.g. libgo, libphobos,
etc.  are not maintained in the GCC project, due to which the GCC
project may not be the correct point of contact for them.  You are
encouraged to look at README files within those library directories
to locate the canonical security contact point for those projects.

Diagnostic libraries


The sanitizer library bundled in GCC is intended to be used in
diagnostic cases and not intended for use in sensitive environments.
As a result, bugs in the sanitizer will not be considered security
sensitive.

GCC plugins
---

It should be noted that GCC may execute arbitrary code loaded by a
user through the GCC plugin mechanism or through system preloading
mechanism.  Such custom code should be vetted by the user for safety
as bugs exposed through such code will not be considered security
issues.


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 06:27:20PM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
> >> Jakub: do you remember what the reason was?  I don't mind dropping
> >> "function", but it feels weird to drop the quotes around "simd".
> >> Seems like, if we do that, there'll one day be a patch to add
> >> them back. :)
> >
> > Because in OpenMP their are % functions, not %
> > %functions, but we also have the %/% attribute as
> > extension.
> 
> Yeah, I can understand dropping the "function" bit.  But why
> s/unsupported ... for %/unsupported ... for simd/?
> Even if it's only a partial syntax quote, it is still a syntax quote.

% in OpenMP is something very different though, so I think it is
better to use it as a generic term which covers the different syntax cases.

Jakub



Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-08-09 Thread Vineet Gupta

Hi Jin Ma,

On 5/16/23 00:06, jinma via Gcc-patches wrote:

On 5/15/23 07:16, Jin Ma wrote:


Do we also need to check Z[FDH]INX too?

Otherwise it looks pretty good.  We just need to wait for everything to
freeze and finalization on the assembler interface.

jeff

Yes, you are right, we also need to check Z[FDH]INX. I will send a patch
again to fix it after others give some review comments.


Can we please revisit this and get this merged upstream.
Seems like gcc is supporting frozen but not ratified extensions.

Thx,
-Vineet


[PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch series introduces support for C23 bit-precise integer
types.  In short, they are similar to other integral types in many ways,
just aren't subject for integral promotions if smaller than int and they can
have even much wider precisions than ordinary integer types.

This series includes and thus subsumes all so far uncommitted _BitInt related
patches.  Compared to the last posted series, there is bit-field _BitInt
support, _Atomic/stdatomic.h support, conversions between _Decimal{32,64,128}
and _BitInt and vice versa (this particular item compared to what has been
posted before has a fix for the large powers of 10 computations which
with the _BitInt(575) limitation can't be really seen so far, but I've tried
to call the underlying routines with very large arrays of limbs, and in
addition to that the generated tables header has been made more compact) and
Richard's patch review feedback has been incorporated and series has been
further split into more patches.

It is enabled only on targets which have agreed on processor specific
ABI how to lay those out or pass as function arguments/return values,
which currently is just x86-64 I believe, would be nice if target maintainers
helped to get agreement on psABI changes and GCC 14 could enable it on far
more architectures than just one.

C23 says that  defines BITINT_MAXWIDTH macro and that is the
largest supported precision of the _BitInt types, smallest is precision
of unsigned long long (but due to lack of psABI agreement we'll violate
that on architectures which don't have the support done yet).
The following series uses for the time just WIDE_INT_MAX_PRECISION as
that BITINT_MAXWIDTH, with the intent to increase it incrementally later
on.  WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller
on lots of architectures.  This is the largest precision we can support
without changes of wide_int/widest_int representation (to make those non-POD
and allow use of some allocated buffer rather than the included fixed size
one).  Once that would be overcome, there is another internal enforced limit,
INTEGER_CST in current layout allows at most 255 64-bit limbs, which is
16320 bits as another cap.  And if that is overcome, then we have limitation
of TYPE_PRECISION being 16-bit, so 65535 as maximum precision.  Perhaps
we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use
32-bit precision in that case later.  Latest Clang/LLVM I think supports
on paper up to 8388608 bits, but is hardly usable even with much shorter
precisions.

Besides this hopefully temporary cap on supported precision and support
only on targets which buy into it, the support has the following limitations:

- _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
  mention how those should be passed/returned; in a limited way they are
  supported internally because the internal functions into which
  __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
  hack to return 2 values without using references/pointers

- vectors of _BitInt(N) aren't supported, both because psABIs don't specify
  how that works and because I'm not really sure it would be useful given
  lack of hw support for anything but bit-precise integers with the same
  bit precision as standard integer types

Because the bit-precise types have different behavior both in the C FE
(e.g. the lack of promotion) and do or can have different behavior in type
layout and function argument passing/returning values, the patch introduces
a new integral type, BITINT_TYPE, so various spots which explicitly check
for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted.
Also the assumption that all integral types have scalar integer type mode
is no longer true, larger BITINT_TYPEs have BLKmode type.

The patch makes 4 different categories of _BitInt depending on the target hook
decisions and their precision.  The x86-64 psABI says that _BitInt which fit
into signed/unsigned char, short, int, long and long long are laid out and
passed as those types (with padding bits undefined if they don't have mode
precision).  Such smallest precision bit-precise integer types are categorized
as small, the target hook gives for specific precision a scalar integral mode
where a single such mode contains all the bits.  Such small _BitInt types are
generally kept in the IL until expansion into RTL, with minor tweaks during
expansion to avoid relying on the padding bit values.  All larger precision
_BitInt types are supposed to be handled as structure containing an array
of limbs or so, where a limb has some integral mode (for libgcc purposes
best if it has word-size) and the limbs have either little or big endian
ordering in the array.  The padding bits in the most significant limb if any
are either undefined or should be always sign/zero extended (but support for 
this
isn't in yet, we don't know if any psABI will require it).  As mentioned in
some psABI

[PATCH 1/12] expr: Small optimization [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

Small optimization to avoid testing modifier multiple times.

2023-08-09  Jakub Jelinek  

PR c/102989
* expr.cc (expand_expr_real_1) : Add an early return for
EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple
times.

--- gcc/expr.cc.jj  2023-08-08 15:55:06.499164554 +0200
+++ gcc/expr.cc 2023-08-08 15:59:36.594382141 +0200
@@ -11248,17 +11248,15 @@ expand_expr_real_1 (tree exp, rtx target
set_mem_addr_space (temp, as);
if (TREE_THIS_VOLATILE (exp))
  MEM_VOLATILE_P (temp) = 1;
-   if (modifier != EXPAND_WRITE
-   && modifier != EXPAND_MEMORY
-   && !inner_reference_p
+   if (modifier == EXPAND_WRITE || modifier == EXPAND_MEMORY)
+ return temp;
+   if (!inner_reference_p
&& mode != BLKmode
&& align < GET_MODE_ALIGNMENT (mode))
  temp = expand_misaligned_mem_ref (temp, mode, unsignedp, align,
modifier == EXPAND_STACK_PARM
? NULL_RTX : target, alt_rtl);
-   if (reverse
-   && modifier != EXPAND_MEMORY
-   && modifier != EXPAND_WRITE)
+   if (reverse)
  temp = flip_storage_order (mode, temp);
return temp;
   }

Jakub



[PATCH 2/12] lto-streamer-in: Adjust assert [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

With _BitInt(575) or any other _BitInt(513) or larger constants we can
run into this assertion.  MAX_BITSIZE_MODE_ANY_INT is just a value from
which WIDE_INT_MAX_PRECISION is derived.

2023-08-09  Jakub Jelinek  

PR c/102989
* lto-streamer-in.cc (lto_input_tree_1): Assert TYPE_PRECISION
is up to WIDE_INT_MAX_PRECISION rather than MAX_BITSIZE_MODE_ANY_INT.

--- gcc/lto-streamer-in.cc.jj   2023-07-17 09:07:42.078283882 +0200
+++ gcc/lto-streamer-in.cc  2023-07-27 15:03:24.255234159 +0200
@@ -1888,7 +1888,7 @@ lto_input_tree_1 (class lto_input_block
 
   for (i = 0; i < len; i++)
a[i] = streamer_read_hwi (ib);
-  gcc_assert (TYPE_PRECISION (type) <= MAX_BITSIZE_MODE_ANY_INT);
+  gcc_assert (TYPE_PRECISION (type) <= WIDE_INT_MAX_PRECISION);
   result = wide_int_to_tree (type, wide_int::from_array
 (a, len, TYPE_PRECISION (type)));
   streamer_tree_cache_append (data_in->reader_cache, result, hash);

Jakub



[PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os
when enabling expensive tests, and unfortunately I can't reproduce without
_BitInt.  The IL before phiopt3 has:
   [local count: 203190070]:
  # .MEM_428 = VDEF <.MEM_367>
  bitint.159 = VIEW_CONVERT_EXPR(*.LC3);
  goto ; [100.00%]

   [local count: 203190070]:
  # .MEM_427 = VDEF <.MEM_367>
  bitint.159 = VIEW_CONVERT_EXPR(*.LC4);

   [local count: 406380139]:
  # .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)>
  # VUSE <.MEM_368>
  _123 = VIEW_CONVERT_EXPR(r495[i_107].D.2780)[0];
and factor_out_conditional_operation is called on the vop PHI, it
sees it has exactly two operands and defining statements of both
PHI arguments are converts (VCEs in this case), so it thinks it is
a good idea to try to optimize that and while doing that it constructs
void type SSA_NAMEs and the like.

2023-08-09  

PR c/102989
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Punt for
vops.

--- gcc/tree-ssa-phiopt.cc.jj   2023-08-08 15:55:09.508122417 +0200
+++ gcc/tree-ssa-phiopt.cc  2023-08-09 15:55:23.762314103 +0200
@@ -241,6 +241,7 @@ factor_out_conditional_operation (edge e
 }
 
   if (TREE_CODE (arg0) != SSA_NAME
+  || SSA_NAME_IS_VIRTUAL_OPERAND (arg0)
   || (TREE_CODE (arg1) != SSA_NAME
  && TREE_CODE (arg1) != INTEGER_CST))
 return NULL;

Jakub



[PATCH 4/12] Middle-end _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch introduces the middle-end part of the _BitInt
support, a new BITINT_TYPE, handling it where needed, except the lowering
pass and sanitizer support.

2023-08-09  Jakub Jelinek  

PR c/102989
* tree.def (BITINT_TYPE): New type.
* tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define.
(NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include
BITINT_TYPE.
(BITINT_TYPE_P): Define.
(CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if
they have BITINT_TYPE type.
(tree_check6, tree_not_check6): New inline functions.
(any_integral_type_check): Include BITINT_TYPE.
(build_bitint_type): Declare.
* tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst,
build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal,
type_hash_canon): Handle BITINT_TYPE.
(bitint_type_cache): New variable.
(build_bitint_type): New function.
(signed_or_unsigned_type_for, verify_type_variant, verify_type):
Handle BITINT_TYPE.
(tree_cc_finalize): Free bitint_type_cache.
* builtins.cc (type_to_class): Handle BITINT_TYPE.
(fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE.
* cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE
INTEGER_CSTs.
* convert.cc (convert_to_pointer_1, convert_to_real_1,
convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE.
(convert_to_integer_1): Likewise.  For BITINT_TYPE don't check
GET_MODE_PRECISION (TYPE_MODE (type)).
* doc/generic.texi (BITINT_TYPE): Document.
* doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New.
* doc/tm.texi: Regenerated.
* dwarf2out.cc (base_type_die, is_base_type, modified_type_die,
gen_type_die_with_usage): Handle BITINT_TYPE.
(rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or
handle those which fit into shwi.
* expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce
to bitfield precision reads from BITINT_TYPE vars, parameters or
memory locations.  Expand large/huge BITINT_TYPE INTEGER_CSTs into
memory.
* fold-const.cc (fold_convert_loc, make_range_step): Handle
BITINT_TYPE.
(extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE).
(native_encode_int, native_interpret_int, native_interpret_expr):
Handle BITINT_TYPE.
* gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE
to some other integral type or vice versa conversions non-useless.
* gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE.
(clear_padding_unit): Mention in comment that _BitInt types don't need
to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.
* internal-fn.cc (expand_mul_overflow): For unsigned non-mode
precision operands force pos_neg? to 1.
(expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT,
expand_BITINTTOFLOAT): New functions.
* internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT,
BITINTTOFLOAT): New internal functions.
* internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT,
expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare.
* match.pd (non-equality compare simplifications from fold_binary):
Punt if TYPE_MODE (arg1_type) is BLKmode.
* pretty-print.h (pp_wide_int): Handle printing of large precision
wide_ints which would buffer overflow digit_buffer.
* stor-layout.cc (finish_bitfield_representative): For bit-fields
with BITINT_TYPE, prefer representatives with precisions in
multiple of limb precision.
(layout_type): Handle BITINT_TYPE.  Handle COMPLEX_TYPE with BLKmode
element type and assert it is BITINT_TYPE.
* target.def (bitint_type_info): New C target hook.
* target.h (struct bitint_info): New type.
* targhooks.cc (default_bitint_type_info): New function.
* targhooks.h (default_bitint_type_info): Declare.
* tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE.
Handle printing large wide_ints which would buffer overflow
digit_buffer.
* tree-ssa-sccvn.cc: Include target.h.
(eliminate_dom_walker::eliminate_stmt): Punt for large/huge
BITINT_TYPE.
* tree-switch-conversion.cc (jump_table_cluster::emit): For more than
64-bit BITINT_TYPE subtract low bound from expression and cast to
64-bit integer type both the controlling expression and case labels.
* typeclass.h (enum type_class): Add bitint_type_class enumerator.
* varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs.

Re: [RFC] GCC Security policy

2023-08-09 Thread David Edelsohn via Gcc-patches
On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar 
wrote:

> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
> >> Do you have a suggestion for the language to address libgcc,
> >> libstdc++, etc. and libiberty, libbacktrace, etc.?
> >
> > I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for different parts of GCC, including the
> runtime libraries.  Over time we may find that specific parts of runtime
> libraries simply cannot be used safely in some contexts and flag that.
>
> Sid
>

Hi, Sid

Thanks for iterating on this.


>
> """
> What is a GCC security bug?
> ===
>
>  A security bug is one that threatens the security of a system or
>  network, or might compromise the security of data stored on it.
>  In the context of GCC there are multiple ways in which this might
>  happen and they're detailed below.
>
> Compiler drivers, programs, libgccjit and support libraries
> ---
>
>  The compiler driver processes source code, invokes other programs
>  such as the assembler and linker and generates the output result,
>  which may be assembly code or machine code.  It is necessary that
>  all source code inputs to the compiler are trusted, since it is
>  impossible for the driver to validate input source code beyond
>  conformance to a programming language standard.
>
>  The GCC JIT implementation, libgccjit, is intended to be plugged
>  into applications to translate input source code in the application
>  context.  Limitations that apply to the compiler
>  driver, apply here too in terms of sanitizing inputs, so it is
>  recommended that inputs are either sanitized by an external program
>  to allow only trusted, safe execution in the context of the
>  application or the JIT execution context is appropriately sandboxed
>  to contain the effects of any bugs in the JIT or its generated code
>  to the sandboxed environment.
>
>  Support libraries such as libiberty, libcc1 libvtv and libcpp have
>  been developed separately to share code with other tools such as
>  binutils and gdb.  These libraries again have similar challenges to
>  compiler drivers.  While they are expected to be robust against
>  arbitrary input, they should only be used with trusted inputs.
>
>  Libraries such as zlib and libffi that bundled into GCC to build it
>  will be treated the same as the compiler drivers and programs as far
>  as security coverage is concerned.
>

Should we direct people to the upstream projects for their security
policies?


>  As a result, the only case for a potential security issue in all
>  these cases is when it ends up generating vulnerable output for
>  valid input source code.


> Language runtime libraries
> --
>
>  GCC also builds and distributes libraries that are intended to be
>  used widely to implement runtime support for various programming
>  languages.  These include the following:
>
>  * libada
>  * libatomic
>  * libbacktrace
>  * libcc1
>  * libcody
>  * libcpp
>  * libdecnumber
>  * libgcc
>  * libgfortran
>  * libgm2
>  * libgo
>  * libgomp
>  * libiberty
>  * libitm
>  * libobjc
>  * libphobos
>  * libquadmath
>  * libssp
>  * libstdc++
>
>  These libraries are intended to be used in arbitrary contexts and as
>  a result, bugs in these libraries may be evaluated for security
>  impact.  However, some of these libraries, e.g. libgo, libphobos,
>  etc.  are not maintained in the GCC project, due to which the GCC
>  project may not be the correct point of contact for them.  You are
>  encouraged to look at README files within those library directories
>  to locate the canonical security contact point for those projects.
>

As Richard mentioned, should GCC make a specific statement about the
security policy / response for issues that are discovered and fixed in the
upstream projects from which the GCC libraries are imported?


>
> Diagnostic libraries
> 
>
>  The sanitizer library bundled in GCC is intended to be used in
>  diagnostic cases and not intended for use in sensitive environments.
>  As a result, bugs in the sanitizer will not be considered security
>  sensitive.
>
> GCC plugins
> ---
>
>  It should be noted that GCC may execute arbitrary code loaded by a
>  user through the GCC plugin mechanism or through system preloading
>  mechanism.  Such custom code should be vetted by the user for safety
>  as bugs exposed through such code will not be considered security
>  issues.
>

Thanks, David


[PATCH 6/12] i386: Enable _BitInt on x86-64 [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch enables _BitInt support on x86-64, the only
target which has _BitInt specified in psABI.

2023-08-09  Jakub Jelinek  

PR c/102989
* config/i386/i386.cc (classify_argument): Handle BITINT_TYPE.
(ix86_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Redefine.

--- gcc/config/i386/i386.cc.jj  2023-08-08 15:55:05.627176766 +0200
+++ gcc/config/i386/i386.cc 2023-08-08 16:12:02.308940091 +0200
@@ -2121,7 +2121,8 @@ classify_argument (machine_mode mode, co
return 0;
 }
 
-  if (type && AGGREGATE_TYPE_P (type))
+  if (type && (AGGREGATE_TYPE_P (type)
+  || (TREE_CODE (type) == BITINT_TYPE && words > 1)))
 {
   int i;
   tree field;
@@ -2270,6 +2271,14 @@ classify_argument (machine_mode mode, co
}
  break;
 
+   case BITINT_TYPE:
+ /* _BitInt(N) for N > 64 is passed as structure containing
+(N + 63) / 64 64-bit elements.  */
+ if (words > 2)
+   return 0;
+ classes[0] = classes[1] = X86_64_INTEGER_CLASS;
+ return 2;
+
default:
  gcc_unreachable ();
}
@@ -24842,6 +24851,26 @@ ix86_get_excess_precision (enum excess_p
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Return true if _BitInt(N) is supported and fill details about it into
+   *INFO.  */
+bool
+ix86_bitint_type_info (int n, struct bitint_info *info)
+{
+  if (!TARGET_64BIT)
+return false;
+  if (n <= 8)
+info->limb_mode = QImode;
+  else if (n <= 16)
+info->limb_mode = HImode;
+  else if (n <= 32)
+info->limb_mode = SImode;
+  else
+info->limb_mode = DImode;
+  info->big_endian = false;
+  info->extended = false;
+  return true;
+}
+
 /* Implement PUSH_ROUNDING.  On 386, we have pushw instruction that
decrements by exactly 2 no matter what the position was, there is no pushb.
 
@@ -25446,6 +25475,8 @@ ix86_run_selftests (void)
 
 #undef TARGET_C_EXCESS_PRECISION
 #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO ix86_bitint_type_info
 #undef TARGET_PROMOTE_PROTOTYPES
 #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
 #undef TARGET_PUSH_ARGUMENT

Jakub



[PATCH 7/12] ubsan: _BitInt -fsanitize=undefined support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch introduces some -fsanitize=undefined support for _BitInt,
but some of the diagnostics is limited by lack of proper support in the
library.
I've filed https://github.com/llvm/llvm-project/issues/64100 to request
proper support, for now some of the diagnostics might have less or more
confusing or inaccurate wording but UB should still be diagnosed when it
happens.

2023-08-09  Jakub Jelinek  

PR c/102989
gcc/
* internal-fn.cc (expand_ubsan_result_store): Add LHS, MODE and
DO_ERROR arguments.  For non-mode precision BITINT_TYPE results
check if all padding bits up to mode precision are zeros or sign
bit copies and if not, jump to DO_ERROR.
(expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
Adjust expand_ubsan_result_store callers.
* ubsan.cc: Include target.h and langhooks.h.
(ubsan_encode_value): Pass BITINT_TYPE values which fit into pointer
size converted to pointer sized integer, pass BITINT_TYPE values
which fit into TImode (if supported) or DImode as those integer types
or otherwise for now punt (pass 0).
(ubsan_type_descriptor): Handle BITINT_TYPE.  For pstyle of
UBSAN_PRINT_FORCE_INT use TK_Integer (0x) mode with a
TImode/DImode precision rather than TK_Unknown used otherwise for
large/huge BITINT_TYPEs.
(instrument_si_overflow): Instrument BITINT_TYPE operations even when
they don't have mode precision.
* ubsan.h (enum ubsan_print_style): New enumerator.
gcc/c-family/
* c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT
for type0 type descriptor.

--- gcc/ubsan.cc.jj 2023-08-08 15:54:35.443599459 +0200
+++ gcc/ubsan.cc2023-08-08 16:12:02.329939798 +0200
@@ -50,6 +50,8 @@ along with GCC; see the file COPYING3.
 #include "gimple-fold.h"
 #include "varasm.h"
 #include "realmpfr.h"
+#include "target.h"
+#include "langhooks.h"
 
 /* Map from a tree to a VAR_DECL tree.  */
 
@@ -125,6 +127,25 @@ tree
 ubsan_encode_value (tree t, enum ubsan_encode_value_phase phase)
 {
   tree type = TREE_TYPE (t);
+  if (TREE_CODE (type) == BITINT_TYPE)
+{
+  if (TYPE_PRECISION (type) <= POINTER_SIZE)
+   {
+ type = pointer_sized_int_node;
+ t = fold_build1 (NOP_EXPR, type, t);
+   }
+  else
+   {
+ scalar_int_mode arith_mode
+   = (targetm.scalar_mode_supported_p (TImode) ? TImode : DImode);
+ if (TYPE_PRECISION (type) > GET_MODE_PRECISION (arith_mode))
+   return build_zero_cst (pointer_sized_int_node);
+ type
+   = build_nonstandard_integer_type (GET_MODE_PRECISION (arith_mode),
+ TYPE_UNSIGNED (type));
+ t = fold_build1 (NOP_EXPR, type, t);
+   }
+}
   scalar_mode mode = SCALAR_TYPE_MODE (type);
   const unsigned int bitsize = GET_MODE_BITSIZE (mode);
   if (bitsize <= POINTER_SIZE)
@@ -355,14 +376,32 @@ ubsan_type_descriptor (tree type, enum u
 {
   /* See through any typedefs.  */
   type = TYPE_MAIN_VARIANT (type);
+  tree type3 = type;
+  if (pstyle == UBSAN_PRINT_FORCE_INT)
+{
+  /* Temporary hack for -fsanitize=shift with _BitInt(129) and more.
+libubsan crashes if it is not TK_Integer type.  */
+  if (TREE_CODE (type) == BITINT_TYPE)
+   {
+ scalar_int_mode arith_mode
+   = (targetm.scalar_mode_supported_p (TImode)
+  ? TImode : DImode);
+ if (TYPE_PRECISION (type) > GET_MODE_PRECISION (arith_mode))
+   type3 = build_qualified_type (type, TYPE_QUAL_CONST);
+   }
+  if (type3 == type)
+   pstyle = UBSAN_PRINT_NORMAL;
+}
 
-  tree decl = decl_for_type_lookup (type);
+  tree decl = decl_for_type_lookup (type3);
   /* It is possible that some of the earlier created DECLs were found
  unused, in that case they weren't emitted and varpool_node::get
  returns NULL node on them.  But now we really need them.  Thus,
  renew them here.  */
   if (decl != NULL_TREE && varpool_node::get (decl))
-return build_fold_addr_expr (decl);
+{
+  return build_fold_addr_expr (decl);
+}
 
   tree dtype = ubsan_get_type_descriptor_type ();
   tree type2 = type;
@@ -370,6 +409,7 @@ ubsan_type_descriptor (tree type, enum u
   pretty_printer pretty_name;
   unsigned char deref_depth = 0;
   unsigned short tkind, tinfo;
+  char tname_bitint[sizeof ("unsigned _BitInt(2147483647)")];
 
   /* Get the name of the type, or the name of the pointer type.  */
   if (pstyle == UBSAN_PRINT_POINTER)
@@ -403,8 +443,18 @@ ubsan_type_descriptor (tree type, enum u
 }
 
   if (tname == NULL)
-/* We weren't able to determine the type name.  */
-tname = "";
+{
+  if (TREE_CODE (type2) == BITINT_TYPE)
+   {
+ snprintf (tname_bitint, sizeof (tname_bitint),
+   "%s_BitInt(%d)", TYPE_UNSIGNED (type2) ? "unsigned " : "",
+

[PATCH 9/12] libgcc _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point (both binary and decimal).
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies.  For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that.  In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway.  Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.

2023-08-09  Jakub Jelinek  

PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_bid_list): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
(softfp_bid_file_list): New.
(LIB2ADD_ST): Add $(softfp_bid_file_list).
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.
* soft-fp/bitintpow10.c: New file.
* soft-fp/fixsdbitint.c: New file.
* soft-fp/fixddbitint.c: New file.
* soft-fp/fixtdbitint.c: New file.
* soft-fp/floatbitintsd.c: New file.
* soft-fp/floatbitintdd.c: New file.
* soft-fp/floatbitinttd.c: New file.

--- libgcc/config/aarch64/t-softfp.jj   2023-08-08 15:54:35.737595343 +0200
+++ libgcc/config/aarch64/t-softfp  2023-08-08 16:12:02.346939560 +0200
@@ -3,7 +3,7 @@ softfp_int_modes := si di ti
 softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floatdibf floatundibf floattibf floatuntibf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
--- libgcc/config/i386/64/t-softfp.jj   2023-08-08 15:54:35.766594936 +0200
+++ libgcc/config/i386/64/t-softfp  2023-08-08 16:12:02.346939560 +0200
@@ -1,4 +1,4 @@
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floattibf floatuntibf
 
 CFLAGS-fixhfti.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj  2023-08-08 15:54:35.831594026 
+0200
+++ libgcc/config/i386/libgcc-glibc.ver 2023-08-

[PATCH 10/12] C _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the C FE support, c-family support, small libcpp change
so that 123wb and 42uwb suffixes are handled plus glimits.h change
to define BITINT_MAXWIDTH macro.

The previous patches really do nothing without this, which enables
all the support.

2023-08-09  Jakub Jelinek  

PR c/102989
gcc/
* glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is
predefined.
gcc/c-family/
* c-common.cc (c_common_reswords): Add _BitInt as keyword.
(c_common_signed_or_unsigned_type): Handle BITINT_TYPE.
(check_builtin_function_arguments): Handle BITINT_TYPE like
INTEGER_TYPE.
(sync_resolve_size): Add ORIG_FORMAT argument.  If
FETCH && !ORIG_FORMAT, type is BITINT_TYPE, return -1 if size isn't
one of 1, 2, 4, 8 or 16 or if it is 16 but TImode is not supported.
(atomic_bitint_fetch_using_cas_loop): New function.
(resolve_overloaded_builtin): Adjust sync_resolve_size caller.  If
-1 is returned, use atomic_bitint_fetch_using_cas_loop to lower it.
Formatting fix.
(keyword_begins_type_specifier): Handle RID_BITINT.
* c-common.h (enum rid): Add RID_BITINT enumerator.
* c-cppbuiltin.cc (c_cpp_builtins): For C call
targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__
and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and
__LIBGCC_BITINT_ORDER__ macros if _BitInt is supported.
* c-lex.cc (interpret_integer): Handle CPP_N_BITINT.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier,
c_pretty_printer::direct_abstract_declarator): Handle BITINT_TYPE.
(pp_c_integer_constant): Handle printing of large precision wide_ints
which would buffer overflow digit_buffer.
gcc/c/
* c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE.
* c-decl.cc (check_bitfield_type_and_width): Allow BITINT_TYPE
bit-fields.
(finish_struct): Prefer to use BITINT_TYPE for BITINT_TYPE bit-fields
if possible.
(declspecs_add_type): Formatting fixes.  Handle cts_bitint.  Adjust
for added union in *specs.  Handle RID_BITINT.
(finish_declspecs): Handle cts_bitint.  Adjust for added union
in *specs.
* c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs,
c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle
RID_BITINT.
* c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment.
Add cts_bitint enumerator.
(struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union
and add bitint_prec there as well.
* c-typeck.cc (composite_type, c_common_type, comptypes_internal):
Handle BITINT_TYPE.
(perform_integral_promotions): Promote BITINT_TYPE bit-fields to
their declared type.
(build_array_ref, build_unary_op, build_conditional_expr,
convert_for_assignment, digest_init, build_binary_op): Likewise.
libcpp/
* expr.cc (interpret_int_suffix): Handle wb and WB suffixes.
* include/cpplib.h (CPP_N_BITINT): Define.

--- gcc/glimits.h.jj2023-08-08 15:54:34.481612931 +0200
+++ gcc/glimits.h   2023-08-08 16:12:02.321939910 +0200
@@ -157,6 +157,11 @@ see the files COPYING3 and COPYING.RUNTI
 # undef BOOL_WIDTH
 # define BOOL_WIDTH 1
 
+# ifdef __BITINT_MAXWIDTH__
+#  undef BITINT_MAXWIDTH
+#  define BITINT_MAXWIDTH __BITINT_MAXWIDTH__
+# endif
+
 # define __STDC_VERSION_LIMITS_H__ 202311L
 #endif
 
--- gcc/c-family/c-common.cc.jj 2023-08-08 15:55:05.243182143 +0200
+++ gcc/c-family/c-common.cc2023-08-08 16:19:29.102683903 +0200
@@ -349,6 +349,7 @@ const struct c_common_resword c_common_r
   { "_Alignas",RID_ALIGNAS,   D_CONLY },
   { "_Alignof",RID_ALIGNOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
+  { "_BitInt", RID_BITINT,D_CONLY },
   { "_Bool",   RID_BOOL,  D_CONLY },
   { "_Complex",RID_COMPLEX,0 },
   { "_Imaginary",  RID_IMAGINARY, D_CONLY },
@@ -2728,6 +2729,9 @@ c_common_signed_or_unsigned_type (int un
   || TYPE_UNSIGNED (type) == unsignedp)
 return type;
 
+  if (TREE_CODE (type) == BITINT_TYPE)
+return build_bitint_type (TYPE_PRECISION (type), unsignedp);
+
 #define TYPE_OK(node)  \
   (TYPE_MODE (type) == TYPE_MODE (node)
\
&& TYPE_PRECISION (type) == TYPE_PRECISION (node))
@@ -6341,8 +6345,10 @@ check_builtin_function_arguments (locati
  code0 = TREE_CODE (TREE_TYPE (args[0]));
  code1 = TREE_CODE (TREE_TYPE (args[1]));
  if (!((code0 == REAL_TYPE && code1 == REAL_TYPE)
-   || (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
-   || (code0 == INTEGER_TYPE && code1 == REAL_TYPE)))
+   || (code0 == REAL_TYPE
+   &

Re: [PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Andrew Pinski via Gcc-patches
On Wed, Aug 9, 2023 at 11:17 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os
> when enabling expensive tests, and unfortunately I can't reproduce without
> _BitInt.  The IL before phiopt3 has:
>[local count: 203190070]:
>   # .MEM_428 = VDEF <.MEM_367>
>   bitint.159 = VIEW_CONVERT_EXPR(*.LC3);
>   goto ; [100.00%]
>
>[local count: 203190070]:
>   # .MEM_427 = VDEF <.MEM_367>
>   bitint.159 = VIEW_CONVERT_EXPR(*.LC4);
>
>[local count: 406380139]:
>   # .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)>
>   # VUSE <.MEM_368>
>   _123 = VIEW_CONVERT_EXPR(r495[i_107].D.2780)[0];
> and factor_out_conditional_operation is called on the vop PHI, it
> sees it has exactly two operands and defining statements of both
> PHI arguments are converts (VCEs in this case), so it thinks it is
> a good idea to try to optimize that and while doing that it constructs
> void type SSA_NAMEs and the like.

Maybe it is better to punt for VOPS after the call to
single_non_singleton_phi_for_edges since none of functions called
afterwards support VOPs.
That is something like:
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index ff36bb0119b..d0b659042a7 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4165,6 +4165,10 @@ pass_phiopt::execute (function *)
   arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
   arg1 = gimple_phi_arg_def (phi, e2->dest_idx);

+  /* Can't do anything with a VOP here.  */
+  if (SSA_NAME_IS_VIRTUAL_OPERAND (arg0))
+   continue;
+
   /* Something is wrong if we cannot find the arguments in the PHI
  node.  */
   gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);

Thanks,
Andrew Pinski

>
> 2023-08-09  
>
> PR c/102989
> * tree-ssa-phiopt.cc (factor_out_conditional_operation): Punt for
> vops.
>
> --- gcc/tree-ssa-phiopt.cc.jj   2023-08-08 15:55:09.508122417 +0200
> +++ gcc/tree-ssa-phiopt.cc  2023-08-09 15:55:23.762314103 +0200
> @@ -241,6 +241,7 @@ factor_out_conditional_operation (edge e
>  }
>
>if (TREE_CODE (arg0) != SSA_NAME
> +  || SSA_NAME_IS_VIRTUAL_OPERAND (arg0)
>|| (TREE_CODE (arg1) != SSA_NAME
>   && TREE_CODE (arg1) != INTEGER_CST))
>  return NULL;
>
> Jakub
>


PING^3: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-09 Thread Pat Haugen via Gcc-patches

Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.




Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Kees Cook via Gcc-patches
On Mon, Aug 07, 2023 at 04:33:13PM +, Qing Zhao wrote:
> What’s the testing case for the one that failed? 
> If it’s 
> 
> __builtin_dynamic_object_size(p->array, 0/2) without the allocation 
> information in the routine, 
> then with the current algorithm, gcc cannot deduce the size for the whole 
> object.
> 
> If not such case, let me know.

I found some more bugs in my tests (now fixed), but I'm left with a single
failure case, which is think it going to boil down to pointer/pointee
issue we discussed earlier.

Using my existing testing tool:
https://github.com/kees/kernel-tools/blob/trunk/fortify/array-bounds.c

I see this error with the "counted_by_seen_by_bdos" case:

Expected __builtin_dynamic_object_size(p, 1) (18446744073709551615) == 
sizeof(*p) + p->count * sizeof(*p->array) (80)

A reduced view of the code is:

struct annotated *p;
int index = MAX_INDEX + unconst;

p = alloc_annotated(index);

EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->count * 
sizeof(*p->array));

It looks like __bdos will not use the __counted_by information from the
pointee if the argument is only the pointer. i.e. this test works:

EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->count * 
sizeof(*p->array));

However, I thought if any part of the pointee was used (e.g. p->count,
p->array), GCC would be happy to start using the pointee details?

And, again, for the initial version at this feature, I'm fine with this
failing test being declared not a valid test. :) But I'll need some
kind of builtin that can correctly interrogate a pointer to find the
full runtime size with the assumption that pointer is valid, but that
can come later.

And as a side note, I am excited to see the very correct warnings for
bad (too late) assignment of the __counted_by member:

p->array[0] = 0;
p->count = 1;

array-bounds.c: In function 'invalid_assignment_order':
array-bounds.c:366:17: warning: '*p.count' is used uninitialized 
[-Wuninitialized]
  366 | p->array[0] = 0;
  | ^~~

Yay! :)

-Kees

-- 
Kees Cook


[PATCH] MATCH: [PR110937/PR100798] (a ? ~b : b) should be optimized to b ^ -(a)

2023-08-09 Thread Andrew Pinski via Gcc-patches
This adds a simple match pattern for this case.
I noticed it a couple of different places.
One while I was looking at code generation of a parser and
also while I was looking at locations where bitwise_inverted_equal_p
should be used more.

Committed as approved after bootstrapped and tested on x86_64-linux-gnu with no 
regressions.

PR tree-optimization/110937
PR tree-optimization/100798

gcc/ChangeLog:

* match.pd (`a ? ~b : b`): Handle this
case.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bool-14.c: New test.
* gcc.dg/tree-ssa/bool-15.c: New test.
* gcc.dg/tree-ssa/phi-opt-33.c: New test.
* gcc.dg/tree-ssa/20030709-2.c: Update testcase
so `a ? -1 : 0` is not used to hit the match
pattern.
---
 gcc/match.pd   | 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c |  5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/bool-14.c| 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/bool-15.c| 18 ++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c | 13 +
 5 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 9b4819e5be7..fc630b63563 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6460,6 +6460,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (cmp == NE_EXPR)
{ constant_boolean_node (true, type); })))
 
+#if GIMPLE
+/* a?~t:t -> (-(a))^t */
+(simplify
+ (cond @0 @1 @2)
+ (if (INTEGRAL_TYPE_P (type)
+  && bitwise_inverted_equal_p (@1, @2))
+  (with {
+auto prec = TYPE_PRECISION (type);
+auto unsign = TYPE_UNSIGNED (type);
+tree inttype = build_nonstandard_integer_type (prec, unsign);
+   }
+   (convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2))
+#endif
+
 /* Simplify pointer equality compares using PTA.  */
 (for neeq (ne eq)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
index 5009cd69cfe..78938f919d4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
@@ -29,15 +29,16 @@ union tree_node
 };
 int make_decl_rtl (tree, int);
 void *
-get_alias_set (t)
+get_alias_set (t, t1)
  tree t;
+ void *t1;
 {
   long set;
   if (t->decl.rtl)
 return (t->decl.rtl->fld[1].rtmem 
? 0
: (((t->decl.rtl ? t->decl.rtl: (make_decl_rtl (t, 0), 
t->decl.rtl)))->fld[1]).rtmem);
-  return (void*)-1;
+  return t1;
 }
 
 /* There should be precisely one load of ->decl.rtl.  If there is
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
new file mode 100644
index 000..0149380a63b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/110937 */
+
+_Bool f2(_Bool a, _Bool b)
+{
+if (a)
+  return !b;
+return b;
+}
+
+/* We should be able to remove the conditional and convert it to an xor. */
+/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
new file mode 100644
index 000..1f496663863
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/110937 */
+
+_Bool f2(int x, int y, int w, int z)
+{
+  _Bool a = x == y;
+  _Bool b = w == z;
+  if (a)
+return !b;
+  return b;
+}
+
+/* We should be able to remove the conditional and convert it to an xor. */
+/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "ne_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
new file mode 100644
index 000..b79fe44187a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/100798 */
+
+int f(int a, int t)
+{
+  return (a=='s' ? ~t : t);
+}
+
+/* This should be convert into t^-(a=='s').  */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "negate_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_not_expr, " "optimized" } } *

[PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-09 Thread Eric Feng via Gcc-patches
Thank you for your help in getting dg-require-python-h working! I can
confirm that the FAILs are related to differences between the --cflags
affecting the gimple seen by the analyzer. For this reason, I have
changed it to --includes for now. To be sure, I tested on Python 3.8 as
well and it works as expected. I have also addressed the following
comments on the WIP patch as you described.

-- Update Changelog entry to list new functions being simulated.
-- Update region_model::get_or_create_region_for_heap_alloc leading
comment.
-- Change register_alloc to update_state_machine.
-- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
-- Static helper functions for:
-- Initializing ob_refcnt field.
-- Setting ob_type field.
-- Getting ob_base field.
-- Initializing heap allocated region for PyObjects.
-- Incrementing a field by one.
-- Change arg_is_long_p to arg_is_integral_p.
-- Extract common failure scenario for reusability.

The initial WIP patch using 

/* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */

have been bootstrapped and regtested on aarch64-unknown-linux-gnu. Since
we did not change any core logic in the revision and the only changes
within the analyzer core are changing variable names, is it OK for
trunk. In the mean time, the revised patch is currently going through
bootstrap and regtest process.

Best,
Eric

---
This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/region-model.cc  |  20 +-
 gcc/analyzer/region-model.h   |  10 +-
 gcc/analyzer/sm-malloc.cc |  40 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 711 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 7 files changed, 881 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index e92b3f7b074..c338f045d92 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing heap_al

Re: RISC-V: Folding memory for FP + constant case

2023-08-09 Thread Jeff Law via Gcc-patches



On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:

Accessing local arrays element turned into load form (fp + (index << C1)) +
C2 address.
In the case when access is in the loop we got loop invariant computation.
For some reason, moving out that part cannot be done in
loop-invariant passes.
But we can handle that in target-specific hook (legitimize_address).
That provides an opportunity to rewrite memory access more suitable for the
target architecture.

This patch solves the mentioned case by rewriting mentioned case to ((fp +
C2) + (index << C1))
I have evaluated it on SPEC2017 and got an improvement on leela (over 7b
instructions,
.39% of the dynamic count) and dwarfs the regression for gcc (14m
instructions, .0012%
of the dynamic count).


gcc/ChangeLog:
 * config/riscv/riscv.cc (riscv_legitimize_address): Handle folding.
 (mem_shadd_or_shadd_rtx_p): New predicate.

So I poked a bit more in this space today.

As you may have noted, Manolis's patch still needs another rev.  But I 
was able to test this patch in conjunction with the f-m-o patch as well 
as the additional improvements made to hard register cprop.  The net 
result was that this patch still shows a nice decrease in instruction 
counts on leela.  It's a bit of a mixed bag elsewhere.


I dove a bit deeper into the small regression in x264.  In the case I 
looked at the reason the patch regresses is the original form of the 
address calculations exposes a common subexpression ie


addr1 = (reg1 << 2) + fp + C1
addr2 = (reg1 << 2) + fp + C2

(reg1 << 2) + fp is a common subexpression resulting in something like 
this as we leave CSE:


t = (reg1 << 2) + fp;
addr1 = t + C1
addr2 = t + C2
mem (addr1)
mem (addr2)

C1 and C2 are small constants, so combine generates

t = (reg1 << 2) + fp;
mem (t+C1)
mem (t+C2)

FP elimination occurs after IRA and we get:

t2 = sp + C3
t = (reg << 2) + t2
mem (t + C1)
mem (t + C2)


Not bad.  Manolis's work should allow us to improve that a bit more.


With this patch we don't capture the CSE and ultimately generate 
slightly worse code.  This kind of issue is fairly inherent in 
reassociations -- and given the regression is 2 orders of magnitude 
smaller than the improvement my inclination is to go forward with this 
patch.




I've fixed a few formatting issues and changed once conditional to use 
CONST_INT_P rather than checking the code directory and pushed the final 
version to the trunk.


Thanks for your patience.

jeff
commit a16dc729fda9fabd6472d50cce45791cb3b6ada8
Author: Jivan Hakobyan 
Date:   Wed Aug 9 13:26:58 2023 -0600

RISC-V: Folding memory for FP + constant case

Accessing local arrays element turned into load form (fp + (index << C1)) +
C2 address.

In the case when access is in the loop we got loop invariant computation.  
For
some reason, moving out that part cannot be done in loop-invariant passes.  
But
we can handle that in target-specific hook (legitimize_address).  That 
provides
an opportunity to rewrite memory access more suitable for the target
architecture.

This patch solves the mentioned case by rewriting mentioned case to ((fp +
C2) + (index << C1))

I have evaluated it on SPEC2017 and got an improvement on leela (over 7b
instructions, .39% of the dynamic count) and dwarfs the regression for gcc 
(14m
instructions, .0012% of the dynamic count).

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_address): Handle folding.
(mem_shadd_or_shadd_rtx_p): New function.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 77892da2920..7f2041a54ba 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1805,6 +1805,22 @@ riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
   return addr;
 }
 
+/* Helper for riscv_legitimize_address. Given X, return true if it
+   is a left shift by 1, 2 or 3 positions or a multiply by 2, 4 or 8.
+
+   This respectively represent canonical shift-add rtxs or scaled
+   memory addresses.  */
+static bool
+mem_shadd_or_shadd_rtx_p (rtx x)
+{
+  return ((GET_CODE (x) == ASHIFT
+  || GET_CODE (x) == MULT)
+ && CONST_INT_P (XEXP (x, 1))
+ && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
+ || (GET_CODE (x) == MULT
+ && IN_RANGE (exact_log2 (INTVAL (XEXP (x, 1))), 1, 3;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
be legitimized in a way that the generic machinery might not expect,
return a new address, otherwise return NULL.  MODE is the mode of
@@ -1830,6 +1846,32 @@ riscv_legitimize_address (rtx x, rtx oldx 
ATTRIBUTE_UNUSED,
   rtx base = XEXP (x, 0);
   HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
 
+  /* Handle (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case.  
*/
+  if (GET_CODE (base) == PLUS && mem_shadd_or_shadd_rtx_p (XEXP (base, 0))
+ 

Re: [PATCH] RISC-V: Handle no_insn in TARGET_SCHED_VARIABLE_ISSUE.

2023-08-09 Thread Jeff Law via Gcc-patches



On 5/29/23 06:46, Jeff Law wrote:



On 5/29/23 05:01, Jin Ma wrote:
Reference: 
https://github.com/gcc-mirror/gcc/commit/d0bc0cb66bcb0e6a5a5a31a9e900e8ccc98e34e5


RISC-V should also be implemented to handle no_insn patterns for 
pipelining.


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): New function.
(TARGET_SCHED_VARIABLE_ISSUE): New macro.
---
  gcc/config/riscv/riscv.cc | 21 +
  1 file changed, 21 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3954fc07a8b..559fa9cd7e0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6225,6 +6225,24 @@ riscv_issue_rate (void)
    return tune_param->issue_rate;
  }
+/* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
+
+static int
+riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more)
+{
+  if (DEBUG_INSN_P (insn))
+    return more;
+
+  rtx_code code = GET_CODE (PATTERN (insn));
+  if (code == USE || code == CLOBBER)
+    return more;
+
+  if (get_attr_type (insn) == TYPE_UNKNOWN)
+    return more;
+
+  return more - 1;
+}
The problem is that INSN is *much* more likely to be a real instruction 
that takes real resources, even if it is TYPE_UNKNOWN.
TYPE_UNKNOWN here is actually an indicator of what I would consider a 
bug in the backend, specifically that we have INSNs that do not provide 
a mapping for the schedulers to suitable types.


With that in mind I'd much rather get to the point where we can do 
something like this for TYPE_UNKNOWN:


type = get_attr_type (insn);
gcc_assert (type != TYPE_UNKNOWN);

That way if we ever encounter a TYPE_UNKNOWN during development, we can 
fix it in the md files in a sensible manner.  I don't know if we are 
close to being able to do that.  We fixed a ton of stuff in bitmanip.md, 
but I don't think there's been a thorough review of the port to find 
other instances of TYPE_UNKNOWN INSNs.



The other thing if this code probably wants to handle GHOST type 
instructions.  While GHOST is used for instructions which generate no 
code, it might seem they should return "more" as those INSNs take no 
resources.  But GHOST is actually used for things like the blockage insn 
which should end a cycle from an issue standpoint.  So the right 
handling of a GHOST is something like this:


if (type == TYPE_GHOST)
   return 0;
So there wasn't ever any follow-up.  Given this was something Ventana 
was also carrying locally (with very minor differences) I went ahead and 
merged up the implementations and pushed the final result to the trunk.



Attached is the patch that was actually committed.

Jeffcommit f088b768d01ae42385697584a2bcac141685dce2
Author: Jin Ma 
Date:   Wed Aug 9 13:52:06 2023 -0600

RISC-V: Handle no_insn in TARGET_SCHED_VARIABLE_ISSUE.

Reference: 
https://github.com/gcc-mirror/gcc/commit/d0bc0cb66bcb0e6a5a5a31a9e900e8ccc98e34e5

RISC-V should also be implemented to handle no_insn patterns for pipelining.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): New function.
(TARGET_SCHED_VARIABLE_ISSUE): New macro.

Co-authored-by: Philipp Tomsich 
Co-authored-by: Jeff Law 

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7f2041a54ba..dfb519ab9a8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6698,6 +6698,31 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }
 
+/* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
+static int
+riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more)
+{
+  if (DEBUG_INSN_P (insn))
+return more;
+
+  rtx_code code = GET_CODE (PATTERN (insn));
+  if (code == USE || code == CLOBBER)
+return more;
+
+  /* GHOST insns are used for blockage and similar cases which
+ effectively end a cycle.  */
+  if (get_attr_type (insn) == TYPE_GHOST)
+return 0;
+
+#if 0
+  /* If we ever encounter an insn with an unknown type, trip
+ an assert so we can find and fix this problem.  */
+  gcc_assert (get_attr_type (insn) != TYPE_UNKNOWN);
+#endif
+
+  return more - 1;
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
@@ -8420,6 +8445,9 @@ riscv_frame_pointer_required (void)
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE riscv_issue_rate
 
+#undef  TARGET_SCHED_VARIABLE_ISSUE
+#define TARGET_SCHED_VARIABLE_ISSUE riscv_sched_variable_issue
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall
 


Re: [PATCH] RISC-V: Remove non-existing 'Zve32d' extension

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 00:09, Tsukasa OI via Gcc-patches wrote:

Since this extension does not exist, this commit prunes this from
the defined extension version table.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc(riscv_ext_version_table):
Remove 'Zve32d' from the version list.

Thanks.  Installed.
jeff


Re: [PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 11:27:48AM -0700, Andrew Pinski wrote:
> Maybe it is better to punt for VOPS after the call to
> single_non_singleton_phi_for_edges since none of functions called
> afterwards support VOPs.
> That is something like:
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index ff36bb0119b..d0b659042a7 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -4165,6 +4165,10 @@ pass_phiopt::execute (function *)
>arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
>arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> 
> +  /* Can't do anything with a VOP here.  */
> +  if (SSA_NAME_IS_VIRTUAL_OPERAND (arg0))
> +   continue;
> +

That would ICE if arg0 isn't SSA_NAME (e.g. is INTEGER_CST).
I think more canonical test for virtual phis is
if (virtual_operand_p (gimple_phi_result (phi)))

Shall already single_non_singleton_phi_for_edges punt if there is
a virtual phi with different arguments from the edges (or if there
is a single virtual phi)?

Jakub



Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

Hello,

I found that a built-in function "__builtin_riscv_pause" is not usable
unless we enable the 'Zihintpause' extension explicitly (still, this
built-in exists EVEN IF the 'Zihintpause' extension is disabled).

Contents of a.c:


void sample(void)
{
 __builtin_riscv_pause();
}



Compiling with the 'Zihintpause' extension is fine.


$ riscv64-unknown-elf-gcc -O2 -march=rv64i_zihintpause -mabi=lp64 -c a.c



However, compiling without the 'Zihintpause' causes an assembler error
(tested with GNU Binutils 2.41):


$ riscv64-unknown-elf-gcc -O2 -march=rv64i -mabi=lp64 -c a.c
/tmp/ccFjacAz.s: Assembler messages:
/tmp/ccFjacAz.s:11: Error: unrecognized opcode `pause', extension `zihintpause' 
required



This is because:

1.  GCC does not handle the 'Zihintpause' extension and
2.  "riscv_pause" (insn) unconditionally emits "pause" even if the
 assembler does not accept it (when the extension is disabled).


This patch set (PATCH 1/2) resolves this issue by:

1.  Handling the 'Zihintpause' extension and
2.  Splitting the "__builtin_riscv_pause" implementation
 depending on the existence of the 'Zihintpause' extension.

Because a released version of GCC defines "__builtin_riscv_pause"
unconditionally, I chose to define no-'Zihintpause' version.

There is another option to unconditionally emit ".insn 0x010f"
(the machine code of "pause") but I didn't because I wanted to improve the
diagnostics (e.g. *.s output).

I also fixed the description of this built-in function (in PATCH 2/2).


I'm not sure whether this is a good method to split the implementation
depending on the 'Zihintpause' extension.  Other than that, I believe that
this is okay and approval is appreciated.

Note that because I assigned copyright of GCC contribution to FSF, I didn't
attach "Signed-off-by" tag.  Tell me if I need it.
I'd tend to think we do not want to expose the intrinsic unless the 
right extensions are enabled -- even though the encoding is a no-op and 
we could emit it as a .insn.


If others think otherwise, I'll go with the consensus opinion.  So let's 
hold off a bit until others have chimed in.


Thanks,
jeff


  1   2   >