Re: [PATCH v2 1/2] RISC-V: Fix ICE caused by early ggc_free on DECL for RVV intrinsics in LTO.

2024-09-11 Thread Jin Ma
> > > I'm curious why you ever get a ggc_freed decl here.
> >
> > It seems that the overloaded interface of RVV has been registered 
> > repeatedly, resulting
> > in invalid registrations except for the first registration, and these 
> > invalid registrations
> > have been ggc_freed. But anyway, I think it is necessary to do a check 
> > here. I think using
> > "integer_zero_node" is to meet the needs, although direct return would be 
> > better.
> 
> But there isn't any way to check whether 'decl' has been freed ...
> just make sure it isn't - you
> should not even have a reference to it.
> Richard.

I'm trying to understand what you mean. You mean that directly using 
"integer_zero_node" to
overwrite decl will not guarantee whether the memory of the original decl has 
been properly
cleaned up, right?

If so, then the current method is really not appropriate. Maybe I should check 
whether
the function has been registered before registering the current function. If it 
has
been registered, I will skip it directly. This will lead to a decrease in 
efficiency.
I am not sure whether this is appropriate. In fact, I see a similar patch on 
aarch64:

https://github.com/gcc-mirror/gcc/commit/685d822e524cc8b2726ad6c44c2ccaabe55a198c

Or any other comments?

BR
Jin


Re: [PATCH v2 1/2] RISC-V: Fix ICE caused by early ggc_free on DECL for RVV intrinsics in LTO.

2024-09-11 Thread Richard Biener
On Wed, Sep 11, 2024 at 9:27 AM Jin Ma  wrote:
>
> > > > I'm curious why you ever get a ggc_freed decl here.
> > >
> > > It seems that the overloaded interface of RVV has been registered 
> > > repeatedly, resulting
> > > in invalid registrations except for the first registration, and these 
> > > invalid registrations
> > > have been ggc_freed. But anyway, I think it is necessary to do a check 
> > > here. I think using
> > > "integer_zero_node" is to meet the needs, although direct return would be 
> > > better.
> >
> > But there isn't any way to check whether 'decl' has been freed ...
> > just make sure it isn't - you
> > should not even have a reference to it.
> > Richard.
>
> I'm trying to understand what you mean. You mean that directly using 
> "integer_zero_node" to
> overwrite decl will not guarantee whether the memory of the original decl has 
> been properly
> cleaned up, right?

I'm saying that if you ever get a ggc_free()d object as argument to
this function that's
the thing to fix - somewhere there is a stale reference to that object
that either shouldn't
be there or that should have prevented the object from being ggc_free()d.

Richard.

>
> If so, then the current method is really not appropriate. Maybe I should 
> check whether
> the function has been registered before registering the current function. If 
> it has
> been registered, I will skip it directly. This will lead to a decrease in 
> efficiency.
> I am not sure whether this is appropriate. In fact, I see a similar patch on 
> aarch64:
>
> https://github.com/gcc-mirror/gcc/commit/685d822e524cc8b2726ad6c44c2ccaabe55a198c
>
> Or any other comments?
>
> BR
> Jin


Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-11 Thread Richard Biener
On Wed, Sep 11, 2024 at 4:17 AM liuhongt  wrote:
>
> GCC12 enables vectorization for O2 with very cheap cost model which is 
> restricted
> to constant tripcount. The vectorization capacity is very limited w/ 
> consideration
> of codesize impact.
>
> The patch extends the very cheap cost model a little bit to support variable 
> tripcount.
> But still disable peeling for gaps/alignment, runtime aliasing checking and 
> epilogue
> vectorization with the consideration of codesize.
>
> So there're at most 2 versions of loop for O2 vectorization, one vectorized 
> main loop
> , one scalar/remainder loop.
>
> .i.e.
>
> void
> foo1 (int* __restrict a, int* b, int* c, int n)
> {
>  for (int i = 0; i != n; i++)
>   a[i] = b[i] + c[i];
> }
>
> with -O2 -march=x86-64-v3, will be vectorized to
>
> .L10:
> vmovdqu (%r8,%rax), %ymm0
> vpaddd  (%rsi,%rax), %ymm0, %ymm0
> vmovdqu %ymm0, (%rdi,%rax)
> addq$32, %rax
> cmpq%rdx, %rax
> jne .L10
> movl%ecx, %eax
> andl$-8, %eax
> cmpl%eax, %ecx
> je  .L21
> vzeroupper
> .L12:
> movl(%r8,%rax,4), %edx
> addl(%rsi,%rax,4), %edx
> movl%edx, (%rdi,%rax,4)
> addq$1, %rax
> cmpl%eax, %ecx
> jne .L12
>
> As measured with SPEC2017 on EMR, the patch(N-Iter) improves performance by 
> 4.11%
> with extra 2.8% codeisze, and cheap cost model improve performance by 5.74% 
> with
> extra 8.88% codesize. The details are as below

I'm confused by this, is the N-Iter numbers ontop of the cheap cost
model numbers?

> Performance measured with -march=x86-64-v3 -O2 on EMR
>
> N-Iter  cheap cost model
> 500.perlbench_r -0.12%  -0.12%
> 502.gcc_r   0.44%   -0.11%
> 505.mcf_r   0.17%   4.46%
> 520.omnetpp_r   0.28%   -0.27%
> 523.xalancbmk_r 0.00%   5.93%
> 525.x264_r  -0.09%  23.53%
> 531.deepsjeng_r 0.19%   0.00%
> 541.leela_r 0.22%   0.00%
> 548.exchange2_r -11.54% -22.34%
> 557.xz_r0.74%   0.49%
> GEOMEAN INT -1.04%  0.60%
>
> 503.bwaves_r3.13%   4.72%
> 507.cactuBSSN_r 1.17%   0.29%
> 508.namd_r  0.39%   6.87%
> 510.parest_r3.14%   8.52%
> 511.povray_r0.10%   -0.20%
> 519.lbm_r   -0.68%  10.14%
> 521.wrf_r   68.20%  76.73%

So this seems to regress as well?

> 526.blender_r   0.12%   0.12%
> 527.cam4_r  19.67%  23.21%
> 538.imagick_r   0.12%   0.24%
> 544.nab_r   0.63%   0.53%
> 549.fotonik3d_r 14.44%  9.43%
> 554.roms_r  12.39%  0.00%
> GEOMEAN FP  8.26%   9.41%
> GEOMEAN ALL 4.11%   5.74%
>
> Code sise impact
> N-Iter  cheap cost model
> 500.perlbench_r 0.22%   1.03%
> 502.gcc_r   0.25%   0.60%
> 505.mcf_r   0.00%   32.07%
> 520.omnetpp_r   0.09%   0.31%
> 523.xalancbmk_r 0.08%   1.86%
> 525.x264_r  0.75%   7.96%
> 531.deepsjeng_r 0.72%   3.28%
> 541.leela_r 0.18%   0.75%
> 548.exchange2_r 8.29%   12.19%
> 557.xz_r0.40%   0.60%
> GEOMEAN INT 1.07%%  5.71%
>
> 503.bwaves_r12.89%  21.59%
> 507.cactuBSSN_r 0.90%   20.19%
> 508.namd_r  0.77%   14.75%
> 510.parest_r0.91%   3.91%
> 511.povray_r0.45%   4.08%
> 519.lbm_r   0.00%   0.00%
> 521.wrf_r   5.97%   12.79%
> 526.blender_r   0.49%   3.84%
> 527.cam4_r  1.39%   3.28%
> 538.imagick_r   1.86%   7.78%
> 544.nab_r   0.41%   3.00%
> 549.fotonik3d_r 25.50%  47.47%
> 554.roms_r  5.17%   13.01%
> GEOMEAN FP  4.14%   11.38%
> GEOMEAN ALL 2.80%   8.88%
>
>
> The only regression is from 548.exchange_r, the vectorization for inner loop 
> in each layer
> of the 9-layer loops increases register pressure and causes more spill.
> - block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
>   - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
> .
> - block(rnext:9, 9, i9) = block(rnext:9, 9, i9) + 10
> ...
> - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
> - block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
>
> Looks like aarch64 doesn't have the issue because aarch64 has 32 gprs, but 
> x86 only has 16.
> I have a extra patch to prevent loop vectorization in deep-depth loop for x86 
> backend which can
> bring the performance back.
>
> For 503.bwaves_r/505.mcf_r/507.cactuBSSN_r/508.namd_r, cheap cost model 
> increases codesize
> a lot but don't imporve any performance. And N-iter is much better for that 
> for codesize.
>
>
> Any comments?
>
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_analyze_loop_costing): Enable
> vectorization f

Re: [PATCH] MIPS: Add some floating point instructions support for MIPSr6

2024-09-11 Thread 梅杰
在 2024/9/10 17:30, Xi Ruoyao 写道:
> On Tue, 2024-09-10 at 16:50 +0800, 梅杰 wrote:
>> As for the function `__builtin_rint`, although it exists, however, after 
>> defining the instruction in `mips.md`, GCC still won't generate `RINT.fmt` 
>> instruction for MIPS, it generates following code instead:
>>
>>> lui $28,%hi(__gnu_local_gp)
>>> addiu   $28,$28,%lo(__gnu_local_gp)
>>> lw  $25,%call16(rint)($28)
>>> .reloc  1f,R_MIPS_JALR,rint
> 
> Why?
> 
> Whis this:
> 
> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index f147667d63a..0c1ef77a816 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -100,6 +100,7 @@ (define_c_enum "unspec" [
>;; Floating-point unspecs.
>UNSPEC_FMIN
>UNSPEC_FMAX
> +  UNSPEC_RINT
>  
>;; HI/LO moves.
>UNSPEC_MFHI
> @@ -8025,6 +8026,14 @@ (define_peephole2
>  (any_extend:SI (match_dup 3)))])]
>"")
>  
> +(define_insn "rint2"
> +  [(set (match_operand:SCALARF 0 "register_operand" "=f")
> + (unspec:SCALARF [(match_operand:SCALARF 1 "register_operand" " f")]
> + UNSPEC_RINT))]
> +  "mips_isa_rev >= 6"
> +  "rint.\t%0,%1")
> +
> +
>  
> 
>  ;; Synchronization instructions.
> 
> it works for me:

Yes, you are right!

I have applied this patch into my current code and I can confirm that
by changing `frint_` to `rint2` works. GCC will generate 
`RINT.fmt` instruction correctly with built-in funtion `__builtin_rint`
after applying the patch.

Maybe you can write a patch for `RINT.fmt`? I will update this patch and 
remove code related to `RINT.fmt`, if you could do that. At the same time, 
could anyone review the rest of this patch? Thanks!


RE: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-09-11 Thread Kong, Lingling



> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, September 6, 2024 5:19 PM
> To: Kong, Lingling 
> Cc: gcc-patches@gcc.gnu.org; Jeff Law ; Richard Biener
> ; Uros Bizjak ; Hongtao Liu
> ; Jakub Jelinek 
> Subject: Re: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert
> pass
> 
> "Kong, Lingling"  writes:
> > Hi,
> >
> > This version has added a new optab named 'cfmovcc'. The new optab is
> > used in the middle end to expand to cfcmov. And simplified my patch by
> > trying to generate the conditional faulting movcc in noce_try_cmove_arith
> function.
> >
> > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > We also tested spec with SDE and passed the runtime test.
> >
> > Ok for trunk?
> >
> >
> > APX CFCMOV[1] feature implements conditionally faulting which means If
> > the comparison is false, all memory faults are suppressed when load or
> > store a memory operand. Now we could load or store a memory operand
> > may trap or fault for conditional move.
> >
> > In middle-end, now we don't support a conditional move if we knew that
> > a load from A or B could trap or fault. To enable CFCMOV, we added a
> > new optab named cfmovcc.
> >
> > Conditional move suppress fault for condition mem store would not move
> > any arithmetic calculations. For condition mem load now just support a
> > conditional move one trap mem and one no trap and no mem cases.
> 
> Sorry if this is going over old ground (I haven't read the earlier versions 
> yet), but:
> instead of adding a new optab, could we treat CFCMOV as a scalar instance of
> maskload_optab?  Robin is working on adding an "else" value for when the
> condition/mask is false.  After that, it would seem to be a pretty close 
> match to
> CFCMOV.
> 
> One reason for preferring maskload is that it makes the load an explicit part 
> of
> the interface.  We could then potentially use it in gimple too, not just 
> expand.
> 

Yes, for conditional load is like a scalar instance of  maskload_optab  with 
else operand.
I could try to use maskload_optab to generate cfcmov in rtl ifcvt pass. But it 
still after expand.
Now we don't have if-covert pass for scalar in gimple, do we have plan to do 
that ?

Thanks,
Lingling

> Thanks,
> Richard
> 
> >
> >
> > [1].https://www.intel.com/content/www/us/en/developer/articles/technic
> > al/advanced-performance-extensions-apx.html
> >
> > gcc/ChangeLog:
> >
> >* doc/md.texi: Add cfmovcc insn pattern explanation.
> >* ifcvt.cc (can_use_cmove_load_mem_notrap): New func
> >for conditional faulting movcc for load.
> >(can_use_cmove_store_mem_notrap): New func for conditional
> >faulting movcc for store.
> >(can_use_cfmovcc):  New func for conditional faulting.
> >(noce_try_cmove_arith): Try to convert to conditional 
> > faulting
> >movcc.
> >(noce_process_if_block): Ditto.
> >* optabs.cc (emit_conditional_move): Handle cfmovcc.
> >(emit_conditional_move_1): Ditto.
> >* optabs.def (OPTAB_D): New optab.
> > ---
> > gcc/doc/md.texi |  10 
> > gcc/ifcvt.cc| 119 
> > gcc/optabs.cc   |  14 +-
> > gcc/optabs.def  |   1 +
> > 4 files changed, 132 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > a9259112251..5f563787c49 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating
> > point number and 0 otherwise.  @var{m} is a scalar floating point
> > mode.  Operand 0 has mode @code{SImode}, and operand 1 has mode
> @var{m}.
> > +@cindex @code{cfmov@var{mode}cc} instruction pattern @item
> > +@samp{cfmov@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for
> > +conditional faulting, If the comparison is false, all memory faults
> > +are suppressed when load or store a memory operand.
> > +
> > +Conditionally move operand 2 or operand 3 into operand 0 according to
> > +the comparison in operand 1.  If the comparison is true, operand 2 is
> > +moved into operand 0, otherwise operand 3 is moved.
> > +
> > @end table
> >  @end ifset
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index
> > 6487574c514..59845390607 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct
> > noce_if_info *); static rtx noce_emit_cmove (struct noce_if_info *, rtx, 
> > enum
> rtx_code, rtx,
> > rtx, rtx, rtx, rtx =
> > NULL, rtx = NULL); static bool noce_try_cmove (struct noce_if_info *);
> > +static bool can_use_cmove_load_mem_notrap (rtx, rtx); static bool
> > +can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool); static bool
> > +can_use_cfmovcc (struct noce_if_info *);
> > static bool noce_try_cmove_arith (struct noce_if_info *); static r

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-11 Thread Hongtao Liu
On Wed, Sep 11, 2024 at 4:04 PM Richard Biener
 wrote:
>
> On Wed, Sep 11, 2024 at 4:17 AM liuhongt  wrote:
> >
> > GCC12 enables vectorization for O2 with very cheap cost model which is 
> > restricted
> > to constant tripcount. The vectorization capacity is very limited w/ 
> > consideration
> > of codesize impact.
> >
> > The patch extends the very cheap cost model a little bit to support 
> > variable tripcount.
> > But still disable peeling for gaps/alignment, runtime aliasing checking and 
> > epilogue
> > vectorization with the consideration of codesize.
> >
> > So there're at most 2 versions of loop for O2 vectorization, one vectorized 
> > main loop
> > , one scalar/remainder loop.
> >
> > .i.e.
> >
> > void
> > foo1 (int* __restrict a, int* b, int* c, int n)
> > {
> >  for (int i = 0; i != n; i++)
> >   a[i] = b[i] + c[i];
> > }
> >
> > with -O2 -march=x86-64-v3, will be vectorized to
> >
> > .L10:
> > vmovdqu (%r8,%rax), %ymm0
> > vpaddd  (%rsi,%rax), %ymm0, %ymm0
> > vmovdqu %ymm0, (%rdi,%rax)
> > addq$32, %rax
> > cmpq%rdx, %rax
> > jne .L10
> > movl%ecx, %eax
> > andl$-8, %eax
> > cmpl%eax, %ecx
> > je  .L21
> > vzeroupper
> > .L12:
> > movl(%r8,%rax,4), %edx
> > addl(%rsi,%rax,4), %edx
> > movl%edx, (%rdi,%rax,4)
> > addq$1, %rax
> > cmpl%eax, %ecx
> > jne .L12
> >
> > As measured with SPEC2017 on EMR, the patch(N-Iter) improves performance by 
> > 4.11%
> > with extra 2.8% codeisze, and cheap cost model improve performance by 5.74% 
> > with
> > extra 8.88% codesize. The details are as below
>
> I'm confused by this, is the N-Iter numbers ontop of the cheap cost
> model numbers?
No, it's N-iter vs base(very cheap cost model), and cheap vs base.
>
> > Performance measured with -march=x86-64-v3 -O2 on EMR
> >
> > N-Iter  cheap cost model
> > 500.perlbench_r -0.12%  -0.12%
> > 502.gcc_r   0.44%   -0.11%
> > 505.mcf_r   0.17%   4.46%
> > 520.omnetpp_r   0.28%   -0.27%
> > 523.xalancbmk_r 0.00%   5.93%
> > 525.x264_r  -0.09%  23.53%
> > 531.deepsjeng_r 0.19%   0.00%
> > 541.leela_r 0.22%   0.00%
> > 548.exchange2_r -11.54% -22.34%
> > 557.xz_r0.74%   0.49%
> > GEOMEAN INT -1.04%  0.60%
> >
> > 503.bwaves_r3.13%   4.72%
> > 507.cactuBSSN_r 1.17%   0.29%
> > 508.namd_r  0.39%   6.87%
> > 510.parest_r3.14%   8.52%
> > 511.povray_r0.10%   -0.20%
> > 519.lbm_r   -0.68%  10.14%
> > 521.wrf_r   68.20%  76.73%
>
> So this seems to regress as well?
Niter increases performance less than the cheap cost model, that's
expected, it is not a regression.
>
> > 526.blender_r   0.12%   0.12%
> > 527.cam4_r  19.67%  23.21%
> > 538.imagick_r   0.12%   0.24%
> > 544.nab_r   0.63%   0.53%
> > 549.fotonik3d_r 14.44%  9.43%
> > 554.roms_r  12.39%  0.00%
> > GEOMEAN FP  8.26%   9.41%
> > GEOMEAN ALL 4.11%   5.74%
> >
> > Code sise impact
> > N-Iter  cheap cost model
> > 500.perlbench_r 0.22%   1.03%
> > 502.gcc_r   0.25%   0.60%
> > 505.mcf_r   0.00%   32.07%
> > 520.omnetpp_r   0.09%   0.31%
> > 523.xalancbmk_r 0.08%   1.86%
> > 525.x264_r  0.75%   7.96%
> > 531.deepsjeng_r 0.72%   3.28%
> > 541.leela_r 0.18%   0.75%
> > 548.exchange2_r 8.29%   12.19%
> > 557.xz_r0.40%   0.60%
> > GEOMEAN INT 1.07%%  5.71%
> >
> > 503.bwaves_r12.89%  21.59%
> > 507.cactuBSSN_r 0.90%   20.19%
> > 508.namd_r  0.77%   14.75%
> > 510.parest_r0.91%   3.91%
> > 511.povray_r0.45%   4.08%
> > 519.lbm_r   0.00%   0.00%
> > 521.wrf_r   5.97%   12.79%
> > 526.blender_r   0.49%   3.84%
> > 527.cam4_r  1.39%   3.28%
> > 538.imagick_r   1.86%   7.78%
> > 544.nab_r   0.41%   3.00%
> > 549.fotonik3d_r 25.50%  47.47%
> > 554.roms_r  5.17%   13.01%
> > GEOMEAN FP  4.14%   11.38%
> > GEOMEAN ALL 2.80%   8.88%
> >
> >
> > The only regression is from 548.exchange_r, the vectorization for inner 
> > loop in each layer
> > of the 9-layer loops increases register pressure and causes more spill.
> > - block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
> >   - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
> > .
> > - block(rnext:9, 9, i9) = block(rnext:9, 9, i9) + 10
> > ...
> > - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
> > - block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
> >
> > Looks like aarch64 doesn't have the issue because aarch64 has 

[PATCH] Fix wrong code out of NRV + RSO + inlining

2024-09-11 Thread Eric Botcazou
Hi,

the attached Ada testcase compiled with -O -flto exhibits a wrong code issue 
when the 3 optimizations NRV + RSO + inlining are applied to the same call: if 
the LHS of the call is marked write-only before inlining, then it will keep 
the mark after inlining although it may be read in GIMPLE from that point on.

The proposed fix is to always clear the flag during inlining in the RSO case.
Tested on x86-64/Linux, OK for the mainline?


2024-09-11  Eric Botcazou  

* tree-inline.cc (declare_return_variable): Clear writeonly flag on
a global variable used directly as the return slot.

2024-09-11  Eric Botcazou  

* gnat.dg/lto27.adb: New test.
* gnat.dg/lto27_pkg1.ads, gnat.dg/lto27_pkg2.ads,
gnat.dg/lto27_pkg2.adb, gnat.dg/lto27_pkg3.ads: New helper.


-- 
Eric Botcazoudiff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index f31a34ac410..2bb1e1602b2 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -3782,6 +3782,11 @@ declare_return_variable (copy_body_data *id, tree return_slot, tree modify_dest,
 	  gcc_assert (TREE_CODE (var) != SSA_NAME);
 	  if (TREE_ADDRESSABLE (result))
 	mark_addressable (var);
+	  /* RESULT may also be read in the callee, typically because the NRV
+	 optimization has been applied to the function, so VAR may also be
+	 read from now on.  */
+	  if (VAR_P (var) && (TREE_STATIC (var) || DECL_EXTERNAL (var)))
+	varpool_node::get (var)->writeonly = 0;
 	}
   if (DECL_NOT_GIMPLE_REG_P (result)
 	  && DECL_P (var))
-- { dg-do run }
-- { dg-options "-O -flto" { target lto } }

with Lto27_Pkg1;

procedure Lto27 is
begin
   null;
end;
with Lto27_Pkg2;

package Lto27_Pkg1 is
   package I is new Lto27_Pkg2.G;
end Lto27_Pkg1;
package body Lto27_Pkg2 is

   function F return Lto27_Pkg3.Q_Rec is
   begin
  return Result : Lto27_Pkg3.Q_Rec := Lto27_Pkg3.Default_Q_Rec do
 Result.A := 1.0;
  end return;
   end;

end Lto27_Pkg2;
with Lto27_Pkg3;

package Lto27_Pkg2 is

   function F return Lto27_Pkg3.Q_Rec;

   generic
  Q_Conf : Lto27_Pkg3.Q_Rec := F;
   package G is end;

end Lto27_Pkg2;
package Lto27_Pkg3 is

   type Discr_Type is (P, Q);

   type Rec (Discr : Discr_Type) is record
  case Discr is
 when Q =>
A : Duration := 0.0;
B : Duration := 0.0;
 when P =>
null;
  end case;
   end record;

   subtype Q_Rec is Rec (Q);

   Default_Q_Rec : constant Q_Rec := (Discr => Q, others => <>);

end Lto27_Pkg3;


Re: [PATCH] Fix wrong code out of NRV + RSO + inlining

2024-09-11 Thread Richard Biener
On Wed, Sep 11, 2024 at 10:26 AM Eric Botcazou  wrote:
>
> Hi,
>
> the attached Ada testcase compiled with -O -flto exhibits a wrong code issue
> when the 3 optimizations NRV + RSO + inlining are applied to the same call: if
> the LHS of the call is marked write-only before inlining, then it will keep
> the mark after inlining although it may be read in GIMPLE from that point on.
>
> The proposed fix is to always clear the flag during inlining in the RSO case.
> Tested on x86-64/Linux, OK for the mainline?

Hmm, it looks to me that the IPA analysis marking the variable readonly
in the first place is wrong - or that NRV may not be applied to such a variable
later.  Is NRV ever applied to say

static const S s = ...;

s = foo ();

thus a readonly declared LHS?

I think adjusting the varpool flag just during inline materialization
doesn't resolve
issues that appear when the wrong flag caused other wrong IPA
decisions for example.

Richard.

>
> 2024-09-11  Eric Botcazou  
>
> * tree-inline.cc (declare_return_variable): Clear writeonly flag on
> a global variable used directly as the return slot.
>
> 2024-09-11  Eric Botcazou  
>
> * gnat.dg/lto27.adb: New test.
> * gnat.dg/lto27_pkg1.ads, gnat.dg/lto27_pkg2.ads,
> gnat.dg/lto27_pkg2.adb, gnat.dg/lto27_pkg3.ads: New helper.
>
>
> --
> Eric Botcazou


[PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

2024-09-11 Thread garthlei
This patch fixes a bug in the current vsetvl pass.  The current pass uses
`m_vl` to determine whether the dest operand has been used by non-RVV
instructions.  However, `m_vl` may have been modified as a result of an
`update_avl` call, and thus would be no longer the dest operand of the
original instruction.  This can lead to incorrect vsetvl eliminations, as is
shown in the testcase.  In this patch, we create a `dest_vl` variable for
this scenerio.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc| 16 +++-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c  | 17 +
 2 files changed, 28 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 017efa8bc17..ce831685439 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1002,6 +1002,9 @@ public:
 
   void parse_insn (insn_info *insn)
   {
+/* The VL dest of the insn */
+rtx dest_vl = NULL_RTX;
+
 m_insn = insn;
 m_bb = insn->bb ();
 /* Return if it is debug insn for the consistency with optimize == 0.  */
@@ -1035,7 +1038,10 @@ public:
 if (m_avl)
   {
if (vsetvl_insn_p (insn->rtl ()) || has_vlmax_avl ())
- m_vl = ::get_vl (insn->rtl ());
+ {
+   m_vl = ::get_vl (insn->rtl ());
+   dest_vl = m_vl;
+ }
 
if (has_nonvlmax_reg_avl ())
  m_avl_def = find_access (insn->uses (), REGNO (m_avl))->def ();
@@ -1132,22 +1138,22 @@ public:
   }
 
 /* Determine if dest operand(vl) has been used by non-RVV instructions.  */
-if (has_vl ())
+if (dest_vl)
   {
const hash_set vl_uses
- = get_all_real_uses (get_insn (), REGNO (get_vl ()));
+ = get_all_real_uses (get_insn (), REGNO (dest_vl));
for (use_info *use : vl_uses)
  {
gcc_assert (use->insn ()->is_real ());
rtx_insn *rinsn = use->insn ()->rtl ();
if (!has_vl_op (rinsn)
-   || count_regno_occurrences (rinsn, REGNO (get_vl ())) != 1)
+   || count_regno_occurrences (rinsn, REGNO (dest_vl)) != 1)
  {
m_vl_used_by_non_rvv_insn = true;
break;
  }
rtx avl = ::get_avl (rinsn);
-   if (!avl || !REG_P (avl) || REGNO (get_vl ()) != REGNO (avl))
+   if (!avl || !REG_P (avl) || REGNO (dest_vl) != REGNO (avl))
  {
m_vl_used_by_non_rvv_insn = true;
break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
new file mode 100644
index 000..c155f5613d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O2 -fdump-rtl-vsetvl-details" } 
*/
+
+#include 
+
+uint64_t a[2], b[2];
+
+void
+foo ()
+{
+  size_t vl = __riscv_vsetvl_e64m1 (2);
+  vuint64m1_t vx = __riscv_vle64_v_u64m1 (a, vl);
+  vx = __riscv_vslide1down_vx_u64m1 (vx, 0xull, vl);
+  __riscv_vse64_v_u64m1 (b, vx, vl);
+}
+
+/* { dg-final { scan-rtl-dump-not "Eliminate insn" "vsetvl" } }  */
-- 
2.17.1



[PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Bohan Lei
The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc   |  3 +++
 .../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c | 18 ++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+ if (!curr_info.vl_used_by_non_rvv_insn_p ()
+ && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+   m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..faa0c8073d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
-- 
2.17.1



Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote:
> Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
> survive register allocation.  This in turn leads to wrong register
> renaming.  Keeping the current approach would mean we need two insns
> for
> *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
> lines
> 
> (define_insn "*tf_to_fprx2_0"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> "=f") 0)
>     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
>    UNSPEC_TF_TO_FPRX2_0))]
>   "TARGET_VXE"
>   "#")
> 
> (define_insn "*tf_to_fprx2_0"
>   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
>     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
>    UNSPEC_TF_TO_FPRX2_0))]
>   "TARGET_VXE"
>   "vpdi\t%v0,%v1,%v0,1
>   [(set_attr "op_type" "VRR")])
> 
> and similar for *tf_to_fprx2_1.  Note, pre register allocation
> operand 0
> has mode FPRX2 and afterwards DF once subregs have been eliminated.
> 
> Since we always copy a whole vector register into a floating-point
> register pair, another way to fix this is to merge *tf_to_fprx2_0 and
> *tf_to_fprx2_1 into a single insn which means we don't have to use
> subregs at all.  The downside of this is that the assembler template
> contains two instructions, now.  The upside is that we don't have to
> come up with some artificial insn before RA which might be more
> readable/maintainable.  That is implemented by this patch.
> 
> In commit r11-4872-ge627cda5686592, the output operand specifier %V
> was
> introduced which is used in tf_to_fprx2 only, now.  I didn't come up
> with its counterpart like %F for floating-point registers.  Instead I
> printed the register pair in the output function directly.  This
> spares
> us a new and "rare" format specifier for a single insn.  I don't have
> a
> strong opinion which option to choose, however, we should either add
> %F
> in order to mimic the same behaviour as %V or getting rid of %V and
> inline the logic in the output function.  I lean towards the latter.
> Any preferences?
> ---
>  gcc/config/s390/s390.md    |  2 +
>  gcc/config/s390/vector.md  | 66 +++-
> --
>  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
>  3 files changed, 60 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c

[...]

> +  char buf[64];
> +  switch (which_alternative)
> +    {
> +    case 0:
> +  if (REGNO (operands[0]) == REGNO (operands[1]))
> + return "vpdi\t%V0,%v1,%V0,5";
> +  else
> + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> +    case 1:
> +  {
> + const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
> + snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> reg_pair);

I wonder if there is a corner case where 8+ does not fit into short
displacement?

[...]


[committed] OpenMP: Add interop routines to omp_runtime_api_procname

2024-09-11 Thread Tobias Burnus
I realized that the attached change (committed asr15-3582-g6291f25631500c) was missing from what I committed in 
r15-3249-g0beac1db38855e  libgomp: Add interop types and routines to 
OpenMP's headers and module I also checked the last 5 or so commits to 
omp.h.in, but for those routines, we seemed to have remembered to update 
the API routine check for those. Tobias




Re: [committed] OpenMP: Add interop routines to omp_runtime_api_procname

2024-09-11 Thread Tobias Burnus

Now with attached patch …

Tobias Burnus wrote:
I realized that the attached change (committed 
asr15-3582-g6291f25631500c) was missing from what I committed in


r15-3249-g0beac1db38855e  libgomp: Add interop types and routines to 
OpenMP's headers and module


I also checked the last 5 or so commits to omp.h.in, but for those 
routines, we seemed to have remembered to update the API routine check 
for those.


Tobias
commit 6291f25631500c2d1c2328f919aa4405c3837f02
Author: Tobias Burnus 
Date:   Wed Sep 11 12:02:24 2024 +0200

OpenMP: Add interop routines to omp_runtime_api_procname

gcc/
* omp-general.cc (omp_runtime_api_procname): Add
omp_get_interop_{int,name,ptr,rc_desc,str,type_desc}
and omp_get_num_interop_properties.

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 0b61335dba4..aaa179afe13 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,7 +3260,10 @@ omp_runtime_api_procname (const char *name)
   "alloc",
   "calloc",
   "free",
+  "get_interop_int",
+  "get_interop_ptr",
   "get_mapped_ptr",
+  "get_num_interop_properties",
   "realloc",
   "target_alloc",
   "target_associate_ptr",
@@ -3289,6 +3292,10 @@ omp_runtime_api_procname (const char *name)
   "get_device_num",
   "get_dynamic",
   "get_initial_device",
+  "get_interop_name",
+  "get_interop_rc_desc",
+  "get_interop_str",
+  "get_interop_type_desc",
   "get_level",
   "get_max_active_levels",
   "get_max_task_priority",


[committed] fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR fortran/116661]

2024-09-11 Thread Tobias Burnus

This patch fixes an issue with unintialized variables causing random ICE.

Committed as r15-3581-g4e9265a474def9

* * *

However, follow-up work is needed as there are multiple issues:

* The check whether something is a identifier (integer parameter) and 
not just a constant expression did fail in some corner cases. → causes 
now reliably a testsuite FAIL.


* Some checks are also not quite right

* After gfc_match_expr, a gobble whitespace is missing

* I missed that 'fr(…)' and 'attr(…)' accept a list of values*

* The latter requires a different internal representation.

I have a partial fix for this, but the last two items remove some more 
work, hence, I defer this to the next patch.


Tobias

(*) It looks also as if there will be post-TR13 spec changes, but it is 
not clear whether those just change the wording or more.
commit 4e9265a474def98cb6cdb59c15fbcb7630ba330e
Author: Tobias Burnus 
Date:   Wed Sep 11 09:25:47 2024 +0200

fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR fortran/116661]

gcc/fortran/ChangeLog:

PR fortran/116661
* openmp.cc (gfc_match_omp_prefer_type): NULL init a gfc_expr
variable and use right locus in gfc_error.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index c04d8b0f528..1145e2ff890 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1860,6 +1860,7 @@ gfc_match_omp_prefer_type (char **pref_str, int *pref_str_len, int **pref_int_ar
 		  }
 		fr_found = true;
 		gfc_symbol *sym = NULL;
+		e = NULL;
 		locus loc = gfc_current_locus;
 		if (gfc_match_symbol (&sym, 0) != MATCH_YES
 		|| gfc_match (" _") == MATCH_YES)
@@ -1881,7 +1882,7 @@ gfc_match_omp_prefer_type (char **pref_str, int *pref_str_len, int **pref_int_ar
 		  {
 		gfc_error ("Expected constant integer identifier or "
 			   "non-empty default-kind character literal at %L",
-			   &e->where);
+			   &loc);
 		gfc_free_expr (e);
 		return MATCH_ERROR;
 		  }


Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Stefan Schulze Frielinghaus
On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote:
> > Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
> > survive register allocation.  This in turn leads to wrong register
> > renaming.  Keeping the current approach would mean we need two insns
> > for
> > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
> > lines
> > 
> > (define_insn "*tf_to_fprx2_0"
> >   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> > "=f") 0)
> >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> >    UNSPEC_TF_TO_FPRX2_0))]
> >   "TARGET_VXE"
> >   "#")
> > 
> > (define_insn "*tf_to_fprx2_0"
> >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> >    UNSPEC_TF_TO_FPRX2_0))]
> >   "TARGET_VXE"
> >   "vpdi\t%v0,%v1,%v0,1
> >   [(set_attr "op_type" "VRR")])
> > 
> > and similar for *tf_to_fprx2_1.  Note, pre register allocation
> > operand 0
> > has mode FPRX2 and afterwards DF once subregs have been eliminated.
> > 
> > Since we always copy a whole vector register into a floating-point
> > register pair, another way to fix this is to merge *tf_to_fprx2_0 and
> > *tf_to_fprx2_1 into a single insn which means we don't have to use
> > subregs at all.  The downside of this is that the assembler template
> > contains two instructions, now.  The upside is that we don't have to
> > come up with some artificial insn before RA which might be more
> > readable/maintainable.  That is implemented by this patch.
> > 
> > In commit r11-4872-ge627cda5686592, the output operand specifier %V
> > was
> > introduced which is used in tf_to_fprx2 only, now.  I didn't come up
> > with its counterpart like %F for floating-point registers.  Instead I
> > printed the register pair in the output function directly.  This
> > spares
> > us a new and "rare" format specifier for a single insn.  I don't have
> > a
> > strong opinion which option to choose, however, we should either add
> > %F
> > in order to mimic the same behaviour as %V or getting rid of %V and
> > inline the logic in the output function.  I lean towards the latter.
> > Any preferences?
> > ---
> >  gcc/config/s390/s390.md    |  2 +
> >  gcc/config/s390/vector.md  | 66 +++-
> > --
> >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> >  3 files changed, 60 insertions(+), 34 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> 
> [...]
> 
> > +  char buf[64];
> > +  switch (which_alternative)
> > +    {
> > +    case 0:
> > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > +   return "vpdi\t%V0,%v1,%V0,5";
> > +  else
> > +   return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > +    case 1:
> > +  {
> > +   const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
> > +   snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > reg_pair);
> 
> I wonder if there is a corner case where 8+ does not fit into short
> displacement?

That is covered by constraint AR, i.e., for short displacement, and AT
for long displacement.


Re: [PATCH] Fix wrong code out of NRV + RSO + inlining

2024-09-11 Thread Eric Botcazou
> Hmm, it looks to me that the IPA analysis marking the variable readonly
> in the first place is wrong - or that NRV may not be applied to such a
> variable later.  Is NRV ever applied to say
> 
> static const S s = ...;
> 
> s = foo ();
> 
> thus a readonly declared LHS?

But NRV is only an example and not necessary, as you may read a RESULT_DECL in 
the callee.  So the combination is actually just RSO + inlining if the callee 
happens to read RESULT_DECL.

-- 
Eric Botcazou




[r15-3581 Regression] FAIL: gfortran.dg/gomp/interop-1.f90 -O (test for excess errors) on Linux/x86_64

2024-09-11 Thread haochen.jiang
On Linux/x86_64,

4e9265a474def98cb6cdb59c15fbcb7630ba330e is the first bad commit
commit 4e9265a474def98cb6cdb59c15fbcb7630ba330e
Author: Tobias Burnus 
Date:   Wed Sep 11 09:25:47 2024 +0200

fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR 
fortran/116661]

caused

FAIL: gfortran.dg/gomp/interop-1.f90   -O  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3581/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/interop-1.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/interop-1.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/interop-1.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/interop-1.f90 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-09-11 Thread Alex Coplan
On 10/09/2024 10:29, Jason Merrill wrote:
> On 9/10/24 6:10 AM, Alex Coplan wrote:
> > On 27/08/2024 10:55, Alex Coplan wrote:
> > > Hi,
> > > 
> > > This is a v3 that hopefully addresses the feedback from both Jason and
> > > Jakub.  v2 was posted here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html
> > 
> > Gentle ping on this C++ patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661559.html
> > 
> > Jason, are you OK with this approach, or would you prefer to not make the
> > INTEGER_CST assumption and do something along the lines of your last 
> > suggestion
> > instead:
> > 
> > > Perhaps we want a recompute_expr_flags like the existing
> > > recompute_constructor_flags, so we don't need to duplicate PROCESS_ARG
> > > logic elsewhere.
> > 
> > ?  Sorry, I'd missed that reply when I wrote the v3 patch.
> 
> I still think that function would be nice to have, but the patch is OK as
> is.

Thanks, I've pushed the patch and the rest of the series as:

3fd07d4f04f libstdc++: Restore unrolling in std::find using pragma [PR116140]
9759f6299d9 lto: Stream has_unroll flag during LTO [PR116140]
31ff173c708 testsuite: Ensure ltrans dump files get cleaned up properly 
[PR116140]
f97d86242b8 c++: Ensure ANNOTATE_EXPRs remain outermost expressions in 
conditions [PR116140]

Alex

> 
> > Thanks,
> > Alex
> > 
> > > 
> > > (Sorry for the delay in posting the re-spin, I was away last week.)
> > > 
> > > In this version we refactor to introudce a helper class (annotate_saver)
> > > which is much less invasive to the caller (maybe_convert_cond) and
> > > should (at least in theory) be reuseable elsewhere.
> > > 
> > > This version also relies on the assumption that operands 1 and 2 of
> > > ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
> > > without having to rely on assumptions about the specific changes made
> > > in maybe_convert_cond.
> > > 
> > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> > > 
> > > Thanks,
> > > Alex
> > > 
> > > -- >8 --
> > > 
> > > For the testcase added with this patch, we would end up losing the:
> > > 
> > >#pragma GCC unroll 4
> > > 
> > > and emitting "warning: ignoring loop annotation".  That warning comes
> > > from tree-cfg.cc:replace_loop_annotate, and means that we failed to
> > > process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
> > > That function walks backwards over the GIMPLE in an exiting BB for a
> > > loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
> > > immediately preceding the gcond.
> > > 
> > > The function documents the following pre-condition:
> > > 
> > > /* [...] We assume that the annotations come immediately before the
> > >condition in BB, if any.  */
> > > 
> > > now looking at the exiting BB of the loop, we have:
> > > 
> > > :
> > >D.4524 = .ANNOTATE (iftmp.1, 1, 4);
> > >retval.0 = D.4524;
> > >if (retval.0 != 0)
> > >  goto ; [INV]
> > >else
> > >  goto ; [INV]
> > > 
> > > and crucially there is an intervening assignment between the gcond and
> > > the preceding .ANNOTATE ifn call.  To see where this comes from, we can
> > > look to the IR given by -fdump-tree-original:
> > > 
> > >if (< > >  int*)operator() (&pred, *first), unroll 4>>>)
> > >  goto ;
> > >else
> > >  goto ;
> > > 
> > > here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
> > > ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
> > > expression in the condition.
> > > 
> > > The CLEANUP_POINT_EXPR gets added by the following call chain:
> > > 
> > > finish_while_stmt_cond
> > >   -> maybe_convert_cond
> > >   -> condition_conversion
> > >   -> fold_build_cleanup_point_expr
> > > 
> > > this patch chooses to fix the issue by first introducing a new helper
> > > class (annotate_saver) to save and restore outer chains of
> > > ANNOTATE_EXPRs and then using it in maybe_convert_cond.
> > > 
> > > With this patch, we don't get any such warning and the loop gets unrolled 
> > > as
> > > expected at -O2.
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >  PR libstdc++/116140
> > >  * semantics.cc (anotate_saver): New. Use it ...
> > >  (maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
> > >  remain the outermost expression(s) of the condition.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >  PR libstdc++/116140
> > >  * g++.dg/ext/pragma-unroll-lambda.C: New test.
> > 
> > > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> > > index 5ab2076b673..b1a49b14238 100644
> > > --- a/gcc/cp/semantics.cc
> > > +++ b/gcc/cp/semantics.cc
> > > @@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
> > > nested_p,
> > >   }
> > >   }
> > > +/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, 
> > > users
> > > +   can construct one of these like so:
> > > +
> > > + annotate_saver 

Re: [PATCH] MIPS: Add some floating point instructions support for MIPSr6

2024-09-11 Thread Xi Ruoyao
On Wed, 2024-09-11 at 16:17 +0800, 梅杰 wrote:
> 在 2024/9/10 17:30, Xi Ruoyao 写道:
> > On Tue, 2024-09-10 at 16:50 +0800, 梅杰 wrote:
> > > As for the function `__builtin_rint`, although it exists, however, after 
> > > defining the instruction in `mips.md`, GCC still won't generate 
> > > `RINT.fmt` 
> > > instruction for MIPS, it generates following code instead:
> > > 
> > > > lui $28,%hi(__gnu_local_gp)
> > > > addiu   $28,$28,%lo(__gnu_local_gp)
> > > > lw  $25,%call16(rint)($28)
> > > > .reloc  1f,R_MIPS_JALR,rint
> > 
> > Why?
> > 
> > Whis this:
> > 
> > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> > index f147667d63a..0c1ef77a816 100644
> > --- a/gcc/config/mips/mips.md
> > +++ b/gcc/config/mips/mips.md
> > @@ -100,6 +100,7 @@ (define_c_enum "unspec" [
> >    ;; Floating-point unspecs.
> >    UNSPEC_FMIN
> >    UNSPEC_FMAX
> > +  UNSPEC_RINT
> >  
> >    ;; HI/LO moves.
> >    UNSPEC_MFHI
> > @@ -8025,6 +8026,14 @@ (define_peephole2
> >        (any_extend:SI (match_dup 3)))])]
> >    "")
> >  
> > +(define_insn "rint2"
> > +  [(set (match_operand:SCALARF 0 "register_operand" "=f")
> > +   (unspec:SCALARF [(match_operand:SCALARF 1 "register_operand" " f")]
> > +   UNSPEC_RINT))]
> > +  "mips_isa_rev >= 6"
> > +  "rint.\t%0,%1")
> > +
> > +
> >  
> > 
> >  ;; Synchronization instructions.
> > 
> > it works for me:
> 
> Yes, you are right!
> 
> I have applied this patch into my current code and I can confirm that
> by changing `frint_` to `rint2` works. GCC will generate 
> `RINT.fmt` instruction correctly with built-in funtion `__builtin_rint`
> after applying the patch.
> 
> Maybe you can write a patch for `RINT.fmt`? I will update this patch and 
> remove code related to `RINT.fmt`, if you could do that. At the same time, 
> could anyone review the rest of this patch? Thanks!

You can use my code in your v2 patch w/o attribution (or with a Co-
authored-by if you'd like to attribute anyway).  I don't have an r6
hardware and I don't like testing my change solely based on an emulator,
so I'd not send a patch myself.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > wrote:
> > > Currently subregs originating from *tf_to_fprx2_0 and
> > > *tf_to_fprx2_1
> > > survive register allocation.  This in turn leads to wrong
> > > register
> > > renaming.  Keeping the current approach would mean we need two
> > > insns
> > > for
> > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along
> > > the
> > > lines
> > > 
> > > (define_insn "*tf_to_fprx2_0"
> > >   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> > > "=f") 0)
> > >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > >    UNSPEC_TF_TO_FPRX2_0))]
> > >   "TARGET_VXE"
> > >   "#")
> > > 
> > > (define_insn "*tf_to_fprx2_0"
> > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > >    UNSPEC_TF_TO_FPRX2_0))]
> > >   "TARGET_VXE"
> > >   "vpdi\t%v0,%v1,%v0,1
> > >   [(set_attr "op_type" "VRR")])
> > > 
> > > and similar for *tf_to_fprx2_1.  Note, pre register allocation
> > > operand 0
> > > has mode FPRX2 and afterwards DF once subregs have been
> > > eliminated.
> > > 
> > > Since we always copy a whole vector register into a floating-
> > > point
> > > register pair, another way to fix this is to merge *tf_to_fprx2_0
> > > and
> > > *tf_to_fprx2_1 into a single insn which means we don't have to
> > > use
> > > subregs at all.  The downside of this is that the assembler
> > > template
> > > contains two instructions, now.  The upside is that we don't have
> > > to
> > > come up with some artificial insn before RA which might be more
> > > readable/maintainable.  That is implemented by this patch.
> > > 
> > > In commit r11-4872-ge627cda5686592, the output operand specifier
> > > %V
> > > was
> > > introduced which is used in tf_to_fprx2 only, now.  I didn't come
> > > up
> > > with its counterpart like %F for floating-point registers. 
> > > Instead I
> > > printed the register pair in the output function directly.  This
> > > spares
> > > us a new and "rare" format specifier for a single insn.  I don't
> > > have
> > > a
> > > strong opinion which option to choose, however, we should either
> > > add
> > > %F
> > > in order to mimic the same behaviour as %V or getting rid of %V
> > > and
> > > inline the logic in the output function.  I lean towards the
> > > latter.
> > > Any preferences?
> > > ---
> > >  gcc/config/s390/s390.md    |  2 +
> > >  gcc/config/s390/vector.md  | 66 +++-
> > > 
> > > --
> > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> > 
> > [...]
> > 
> > > +  char buf[64];
> > > +  switch (which_alternative)
> > > +    {
> > > +    case 0:
> > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > + return "vpdi\t%V0,%v1,%V0,5";
> > > +  else
> > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > +    case 1:
> > > +  {
> > > + const char *reg_pair = reg_names[REGNO (operands[0]) +
> > > 1];
> > > + snprintf (buf, sizeof (buf),
> > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > reg_pair);
> > 
> > I wonder if there is a corner case where 8+ does not fit into short
> > displacement?
> 
> That is covered by constraint AR, i.e., for short displacement, and
> AT
> for long displacement.

Don't they cover only %1, and not 8+%1? Can't there be a situation
where %1 barely fits and 8+%1 doesn't fit? A quick glance shows that
the code doesn't leave any allowance for this:

"AR"
  s390_mem_constraint("AR")
s390_check_qrst_address('R')
  s390_short_displacement()
INTVAL (disp) >= 0 && INTVAL (disp) < 4096


[PATCH] c++: Don't ICE to build private access error message [PR116323]

2024-09-11 Thread Simon Martin
We currently ICE upon the following code while building the "[...] is
private within this context" error message

=== cut here ===
class A { enum Enum{}; };
template class Alloc>
class B : private Alloc, private A {};
template class Alloc>
int B::foo (Enum m) { return 42; }
=== cut here ===

The problem is that since r11-6880, after detecting that Enum cannot be
accessed in B, enforce_access will access the TYPE_BINFO of all the
bases of B, which ICEs for any that is a BOUND_TEMPLATE_TEMPLATE_PARM.
This patch simply skips such bases.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/116323

gcc/cp/ChangeLog:

* search.cc (get_parent_with_private_access): Only call access_in_type
for RECORD_OR_UNION_TYPE_P base BINFOs.

gcc/testsuite/ChangeLog:

* g++.dg/template/access43.C: New test.

---
 gcc/cp/search.cc |  4 +++-
 gcc/testsuite/g++.dg/template/access43.C | 11 +++
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access43.C

diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 60c30ecb881..a810cf70d6a 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -163,9 +163,11 @@ get_parent_with_private_access (tree decl, tree binfo)
   /* Iterate through immediate parent classes.  */
   for (int i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
 {
+  tree base_binfo_type = BINFO_TYPE (base_binfo);
   /* This parent had private access.  Therefore that's why BINFO can't
  access DECL.  */
-  if (access_in_type (BINFO_TYPE (base_binfo), decl) == ak_private)
+  if (RECORD_OR_UNION_TYPE_P (base_binfo_type)
+ && access_in_type (base_binfo_type, decl) == ak_private)
return base_binfo;
 }
 
diff --git a/gcc/testsuite/g++.dg/template/access43.C 
b/gcc/testsuite/g++.dg/template/access43.C
new file mode 100644
index 000..ce9e6c8fbb2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access43.C
@@ -0,0 +1,11 @@
+// PR c++/116323
+// { dg-do "compile" }
+// { dg-additional-options "-Wno-template-body" }
+
+class A { enum Enum{}; };
+
+template class Alloc>
+class B : private Alloc, private A {};
+
+template class Alloc>
+int B::foo (Enum m) { return 42; } // { dg-error "is private" }
-- 
2.44.0




[PATCH] Makefile: Fix typos

2024-09-11 Thread Andrew Kreimer
Fix typos in comments.

Signed-off-by: Andrew Kreimer 
---
 Makefile.def | 2 +-
 Makefile.in  | 4 ++--
 Makefile.tpl | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..b502eb63d36 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -77,7 +77,7 @@ host_modules= { module= gprofng; };
 host_modules= { module= gettext; bootstrap=true; no_install=true;
 module_srcdir= "gettext/gettext-runtime";
// We always build gettext with pic, because some packages 
(e.g. gdbserver)
-   // need it in some configuratons, which is determined via 
nontrivial tests.
+   // need it in some configurations, which is determined via 
nontrivial tests.
// Always enabling pic seems to make sense for something tied to
// user-facing output.
extra_configure_flags='--disable-shared --disable-threads 
--disable-java --disable-csharp --with-pic --disable-libasprintf';
diff --git a/Makefile.in b/Makefile.in
index 966d6045496..0c3511d2cf1 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -666,7 +666,7 @@ AR_FOR_TARGET=@AR_FOR_TARGET@
 AS_FOR_TARGET=@AS_FOR_TARGET@
 CC_FOR_TARGET=$(STAGE_CC_WRAPPER) @CC_FOR_TARGET@
 
-# If GCC_FOR_TARGET is not overriden on the command line, then this
+# If GCC_FOR_TARGET is not overridden on the command line, then this
 # variable is passed down to the gcc Makefile, where it is used to
 # build libgcc2.a.  We define it here so that it can itself be
 # overridden on the command line.
@@ -68937,7 +68937,7 @@ install-gdb: $(INSTALL_GDB_TK)
 @serialization_dependencies@
 
 # 
-# Regenerating top level configury
+# Regenerating top level configure
 # 
 
 # Rebuilding Makefile.in, using autogen.
diff --git a/Makefile.tpl b/Makefile.tpl
index da38dca697a..b32dd1e4583 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -589,7 +589,7 @@ AR_FOR_TARGET=@AR_FOR_TARGET@
 AS_FOR_TARGET=@AS_FOR_TARGET@
 CC_FOR_TARGET=$(STAGE_CC_WRAPPER) @CC_FOR_TARGET@
 
-# If GCC_FOR_TARGET is not overriden on the command line, then this
+# If GCC_FOR_TARGET is not overridden on the command line, then this
 # variable is passed down to the gcc Makefile, where it is used to
 # build libgcc2.a.  We define it here so that it can itself be
 # overridden on the command line.
@@ -2129,7 +2129,7 @@ install-gdb: $(INSTALL_GDB_TK)
 @serialization_dependencies@
 
 # 
-# Regenerating top level configury
+# Regenerating top level configure
 # 
 
 # Rebuilding Makefile.in, using autogen.
-- 
2.46.0



Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Stefan Schulze Frielinghaus
On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus wrote:
> > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > > wrote:
> > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > *tf_to_fprx2_1
> > > > survive register allocation.  This in turn leads to wrong
> > > > register
> > > > renaming.  Keeping the current approach would mean we need two
> > > > insns
> > > > for
> > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along
> > > > the
> > > > lines
> > > > 
> > > > (define_insn "*tf_to_fprx2_0"
> > > >   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> > > > "=f") 0)
> > > >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > >   "TARGET_VXE"
> > > >   "#")
> > > > 
> > > > (define_insn "*tf_to_fprx2_0"
> > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > >   "TARGET_VXE"
> > > >   "vpdi\t%v0,%v1,%v0,1
> > > >   [(set_attr "op_type" "VRR")])
> > > > 
> > > > and similar for *tf_to_fprx2_1.  Note, pre register allocation
> > > > operand 0
> > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > eliminated.
> > > > 
> > > > Since we always copy a whole vector register into a floating-
> > > > point
> > > > register pair, another way to fix this is to merge *tf_to_fprx2_0
> > > > and
> > > > *tf_to_fprx2_1 into a single insn which means we don't have to
> > > > use
> > > > subregs at all.  The downside of this is that the assembler
> > > > template
> > > > contains two instructions, now.  The upside is that we don't have
> > > > to
> > > > come up with some artificial insn before RA which might be more
> > > > readable/maintainable.  That is implemented by this patch.
> > > > 
> > > > In commit r11-4872-ge627cda5686592, the output operand specifier
> > > > %V
> > > > was
> > > > introduced which is used in tf_to_fprx2 only, now.  I didn't come
> > > > up
> > > > with its counterpart like %F for floating-point registers. 
> > > > Instead I
> > > > printed the register pair in the output function directly.  This
> > > > spares
> > > > us a new and "rare" format specifier for a single insn.  I don't
> > > > have
> > > > a
> > > > strong opinion which option to choose, however, we should either
> > > > add
> > > > %F
> > > > in order to mimic the same behaviour as %V or getting rid of %V
> > > > and
> > > > inline the logic in the output function.  I lean towards the
> > > > latter.
> > > > Any preferences?
> > > > ---
> > > >  gcc/config/s390/s390.md    |  2 +
> > > >  gcc/config/s390/vector.md  | 66 +++-
> > > > 
> > > > --
> > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> > > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> > > 
> > > [...]
> > > 
> > > > +  char buf[64];
> > > > +  switch (which_alternative)
> > > > +    {
> > > > +    case 0:
> > > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > +   return "vpdi\t%V0,%v1,%V0,5";
> > > > +  else
> > > > +   return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > > +    case 1:
> > > > +  {
> > > > +   const char *reg_pair = reg_names[REGNO (operands[0]) +
> > > > 1];
> > > > +   snprintf (buf, sizeof (buf),
> > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > > reg_pair);
> > > 
> > > I wonder if there is a corner case where 8+ does not fit into short
> > > displacement?
> > 
> > That is covered by constraint AR, i.e., for short displacement, and
> > AT
> > for long displacement.
> 
> Don't they cover only %1, and not 8+%1? Can't there be a situation
> where %1 barely fits and 8+%1 doesn't fit? A quick glance shows that
> the code doesn't leave any allowance for this:
> 
> "AR"
>   s390_mem_constraint("AR")
> s390_check_qrst_address('R')
>   s390_short_displacement()
> INTVAL (disp) >= 0 && INTVAL (disp) < 4096

Isn't this covered by

int
s390_mem_constraint (const char *str, rtx op)
{
  char c = str[0];

  switch (c)
{
case 'A':
  /* Check for offsettable variants of memory constraints.  */
  if (!MEM_P (op) || MEM_VOLATILE_P (op))
return 0;
  if ((reload_completed || reload_in_progress)
  ? !offsettable_memref_p (op) : !offsettable_nonstrict_memref_p (op))
return 0;

where

/* Return true if OP is a memory reference whose address contains
   no side effects and remains valid after the addition of a positive
   integer less than the size of the object being referenced.

   We assume that the original address is valid and do not check it.

   This uses strict_memory_address_p as a subroutine, so
  

[PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

2024-09-11 Thread 钟居哲
Hi, garthlei.
Thanks for fixing it.

I see, you are trying to fix this bug:

lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,2,e8,mf8,ta,ma ---> It should be a4, 2 instead of 
zero, 2
vle64.v v1,0(a5)
--- missing vsetvli a4, a4 here
sllia4,a4,1
vsetvli zero,a4,e32,m1,ta,ma
li  a2,-1
addia5,a5,16
vslide1down.vx  v1,v1,a2
vslide1down.vx  v1,v1,zero
vsetivlizero,2,e64,m1,ta,ma
vse64.v v1,0(a5)
ret

When I revisit the codes here:

m_vl = ::get_vl
...
update_avl -> "m_vl" variable is modified
...
using wrong m_vl in the following.

A dedicated temporary variable dest_vl looks reasonable here.

LGTM.

The RISC-V folks will commit this patch for you.
Thanks.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2024-09-11 19:29
To: juzhe.zh...@rivai.ai
Subject: FW: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl 
pass
FYI.
 
-Original Message-
From: garthlei  
Sent: Wednesday, September 11, 2024 5:10 PM
To: gcc-patches 
Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass
 
This patch fixes a bug in the current vsetvl pass.  The current pass uses
`m_vl` to determine whether the dest operand has been used by non-RVV
instructions.  However, `m_vl` may have been modified as a result of an
`update_avl` call, and thus would be no longer the dest operand of the
original instruction.  This can lead to incorrect vsetvl eliminations, as is
shown in the testcase.  In this patch, we create a `dest_vl` variable for
this scenerio.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc| 16 +++-
.../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c  | 17 +
2 files changed, 28 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 017efa8bc17..ce831685439 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1002,6 +1002,9 @@ public:
   void parse_insn (insn_info *insn)
   {
+/* The VL dest of the insn */
+rtx dest_vl = NULL_RTX;
+
 m_insn = insn;
 m_bb = insn->bb ();
 /* Return if it is debug insn for the consistency with optimize == 0.  */
@@ -1035,7 +1038,10 @@ public:
 if (m_avl)
   {
if (vsetvl_insn_p (insn->rtl ()) || has_vlmax_avl ())
-   m_vl = ::get_vl (insn->rtl ());
+   {
+ m_vl = ::get_vl (insn->rtl ());
+ dest_vl = m_vl;
+   }
if (has_nonvlmax_reg_avl ())
  m_avl_def = find_access (insn->uses (), REGNO (m_avl))->def ();
@@ -1132,22 +1138,22 @@ public:
   }
 /* Determine if dest operand(vl) has been used by non-RVV instructions.  */
-if (has_vl ())
+if (dest_vl)
   {
const hash_set vl_uses
-   = get_all_real_uses (get_insn (), REGNO (get_vl ()));
+   = get_all_real_uses (get_insn (), REGNO (dest_vl));
for (use_info *use : vl_uses)
  {
gcc_assert (use->insn ()->is_real ());
rtx_insn *rinsn = use->insn ()->rtl ();
if (!has_vl_op (rinsn)
- || count_regno_occurrences (rinsn, REGNO (get_vl ())) != 1)
+ || count_regno_occurrences (rinsn, REGNO (dest_vl)) != 1)
  {
m_vl_used_by_non_rvv_insn = true;
break;
  }
rtx avl = ::get_avl (rinsn);
- if (!avl || !REG_P (avl) || REGNO (get_vl ()) != REGNO (avl))
+ if (!avl || !REG_P (avl) || REGNO (dest_vl) != REGNO (avl))
  {
m_vl_used_by_non_rvv_insn = true;
break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
new file mode 100644
index 000..c155f5613d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O2 -fdump-rtl-vsetvl-details" } 
*/
+
+#include 
+
+uint64_t a[2], b[2];
+
+void
+foo ()
+{
+  size_t vl = __riscv_vsetvl_e64m1 (2);
+  vuint64m1_t vx = __riscv_vle64_v_u64m1 (a, vl);
+  vx = __riscv_vslide1down_vx_u64m1 (vx, 0xull, vl);
+  __riscv_vse64_v_u64m1 (b, vx, vl);
+}
+
+/* { dg-final { scan-rtl-dump-not "Eliminate insn" "vsetvl" } }  */
-- 
2.17.1
 


[PATCH 1/2] ipa: Rename ipa_supports_p to ipa_vr_supported_type_p

2024-09-11 Thread Martin Jambor
Hi,

ipa_supports_p is not a name that captures well what the predicate
determines.  Therefore, this patch renames it to ipa_vr_supported_type_p.

This change has been pre-approved by Honza and has passed bootstrap and
test-suite on x86_64 and so I will push it to master later today.

Thanks,

Martin


gcc/ChangeLog:

2024-09-06  Martin Jambor  

* ipa-cp.h (ipa_supports_p): Rename to ipa_vr_supported_type_p.
* ipa-cp.cc (ipa_vr_operation_and_type_effects): Adjust called
function name.
(propagate_vr_across_jump_function): Likewise.
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Likewise.
(ipcp_get_parm_bits): Likewise.
---
 gcc/ipa-cp.cc   | 5 +++--
 gcc/ipa-cp.h| 2 +-
 gcc/ipa-prop.cc | 6 +++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 56468dc40ee..a1033b81aef 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1649,7 +1649,8 @@ ipa_vr_operation_and_type_effects (vrange &dst_vr,
   enum tree_code operation,
   tree dst_type, tree src_type)
 {
-  if (!ipa_supports_p (dst_type) || !ipa_supports_p (src_type))
+  if (!ipa_vr_supported_type_p (dst_type)
+  || !ipa_vr_supported_type_p (src_type))
 return false;
 
   range_op_handler handler (operation);
@@ -2553,7 +2554,7 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
  ipa_range_set_and_normalize (op_vr, op);
 
  if (!handler
- || !ipa_supports_p (operand_type)
+ || !ipa_vr_supported_type_p (operand_type)
  /* Sometimes we try to fold comparison operators using a
 pointer type to hold the result instead of a boolean
 type.  Avoid trapping in the sanity check in
diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h
index 4616c61625a..ba2ebfede63 100644
--- a/gcc/ipa-cp.h
+++ b/gcc/ipa-cp.h
@@ -294,7 +294,7 @@ bool values_equal_for_ipcp_p (tree x, tree y);
 /* Return TRUE if IPA supports ranges of TYPE.  */
 
 static inline bool
-ipa_supports_p (tree type)
+ipa_vr_supported_type_p (tree type)
 {
   return irange::supports_p (type) || prange::supports_p (type);
 }
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 99ebd6229ec..78d1fb7086d 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2392,8 +2392,8 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
   else
{
  if (param_type
- && ipa_supports_p (TREE_TYPE (arg))
- && ipa_supports_p (param_type)
+ && ipa_vr_supported_type_p (TREE_TYPE (arg))
+ && ipa_vr_supported_type_p (param_type)
  && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt)
  && !vr.undefined_p ())
{
@@ -5761,7 +5761,7 @@ ipcp_get_parm_bits (tree parm, tree *value, widest_int 
*mask)
   ipcp_transformation *ts = ipcp_get_transformation_summary (cnode);
   if (!ts
   || vec_safe_length (ts->m_vr) == 0
-  || !ipa_supports_p (TREE_TYPE (parm)))
+  || !ipa_vr_supported_type_p (TREE_TYPE (parm)))
 return false;
 
   int i = ts->get_param_index (current_function_decl, parm);
-- 
2.46.0



[PATCH 2/2] ipa-cp: One more use of ipa_vr_supported_type_p

2024-09-11 Thread Martin Jambor
Hi,

Since we have the predicate, this patch converts one more check for
essentially the same thing into its use.

It has passed a bootstrap and testsuite on x86_64.  I believe it is
obvious enough that I can commit it myself and so will do so later
today.

Thanks,

Martin


2024-09-11  Martin Jambor  

* gcc/ipa-cp.cc (propagate_vr_across_jump_function): Use
ipa_vr_supported_type_p instead of explicit check for integral and
pointer types.
---
 gcc/ipa-cp.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index a1033b81aef..fa7bd6a15da 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -2519,8 +2519,7 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
 return false;
 
   if (!param_type
-  || (!INTEGRAL_TYPE_P (param_type)
- && !POINTER_TYPE_P (param_type)))
+  || !ipa_vr_supported_type_p (param_type))
 return dest_lat->set_to_bottom ();
 
   if (jfunc->type == IPA_JF_PASS_THROUGH)
-- 
2.46.0



Re: FW: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread 钟居哲
I see the codegen is incorrect before this patch:

foo:
vsetvli a5,a0,e16,m1,ta,ma
vmv.x.s a4,v8
vsetvli a5,a0,e8,mf2,ta,ma ---> wrong VTYPE
vadd.vx v9,v8,a4
vsetvli zero,a5,e16,m1,ta,ma
vadd.vv v8,v9,v8
ret

Could you show me what the codegen looks like after this patch ?
I would be expecting the codegen become:

foo:
vsetvli a5,a0,e16,m1,ta,ma
vmv.x.s a4,v8
vadd.vx v9,v8,a4
vsetvli zero,a5,e16,m1,ta,ma
vadd.vv v8,v9,v8
ret
Or:
foo:
vsetvli zero,a0,e16,m1,ta,ma
vmv.x.s a4,v8
vadd.vx v9,v8,a4
vadd.vv v8,v9,v8
ret
are both correct.

Also, I think it's better add assembly check in the testcase in stead of just 
adding "Eliminate insn" "vsetvl"

Thanks.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2024-09-11 19:28
To: juzhe.zh...@rivai.ai
Subject: FW: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused
FYI
 
-Original Message-
From: Bohan Lei  
Sent: Wednesday, September 11, 2024 5:13 PM
To: gcc-patches 
Subject: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused
 
The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc   |  3 +++
.../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c | 18 ++
2 files changed, 21 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+   if (!curr_info.vl_used_by_non_rvv_insn_p ()
+   && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+ m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..faa0c8073d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
-- 
2.17.1
 


Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > wrote:
> > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > > > wrote:
> > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > *tf_to_fprx2_1
> > > > > survive register allocation.  This in turn leads to wrong
> > > > > register
> > > > > renaming.  Keeping the current approach would mean we need
> > > > > two
> > > > > insns
> > > > > for
> > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something
> > > > > along
> > > > > the
> > > > > lines
> > > > > 
> > > > > (define_insn "*tf_to_fprx2_0"
> > > > >   [(set (subreg:DF (match_operand:FPRX2 0
> > > > > "nonimmediate_operand"
> > > > > "=f") 0)
> > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > "v")]
> > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > >   "TARGET_VXE"
> > > > >   "#")
> > > > > 
> > > > > (define_insn "*tf_to_fprx2_0"
> > > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > "v")]
> > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > >   "TARGET_VXE"
> > > > >   "vpdi\t%v0,%v1,%v0,1
> > > > >   [(set_attr "op_type" "VRR")])
> > > > > 
> > > > > and similar for *tf_to_fprx2_1.  Note, pre register
> > > > > allocation
> > > > > operand 0
> > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > eliminated.
> > > > > 
> > > > > Since we always copy a whole vector register into a floating-
> > > > > point
> > > > > register pair, another way to fix this is to merge
> > > > > *tf_to_fprx2_0
> > > > > and
> > > > > *tf_to_fprx2_1 into a single insn which means we don't have
> > > > > to
> > > > > use
> > > > > subregs at all.  The downside of this is that the assembler
> > > > > template
> > > > > contains two instructions, now.  The upside is that we don't
> > > > > have
> > > > > to
> > > > > come up with some artificial insn before RA which might be
> > > > > more
> > > > > readable/maintainable.  That is implemented by this patch.
> > > > > 
> > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > specifier
> > > > > %V
> > > > > was
> > > > > introduced which is used in tf_to_fprx2 only, now.  I didn't
> > > > > come
> > > > > up
> > > > > with its counterpart like %F for floating-point registers. 
> > > > > Instead I
> > > > > printed the register pair in the output function directly. 
> > > > > This
> > > > > spares
> > > > > us a new and "rare" format specifier for a single insn.  I
> > > > > don't
> > > > > have
> > > > > a
> > > > > strong opinion which option to choose, however, we should
> > > > > either
> > > > > add
> > > > > %F
> > > > > in order to mimic the same behaviour as %V or getting rid of
> > > > > %V
> > > > > and
> > > > > inline the logic in the output function.  I lean towards the
> > > > > latter.
> > > > > Any preferences?
> > > > > ---
> > > > >  gcc/config/s390/s390.md    |  2 +
> > > > >  gcc/config/s390/vector.md  | 66 +++-
> > > > > 
> > > > > 
> > > > > --
> > > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> > > > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-
> > > > > 1.c
> > > > 
> > > > [...]
> > > > 
> > > > > +  char buf[64];
> > > > > +  switch (which_alternative)
> > > > > +    {
> > > > > +    case 0:
> > > > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > > + return "vpdi\t%V0,%v1,%V0,5";
> > > > > +  else
> > > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > > > +    case 1:
> > > > > +  {
> > > > > + const char *reg_pair = reg_names[REGNO (operands[0])
> > > > > +
> > > > > 1];
> > > > > + snprintf (buf, sizeof (buf),
> > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > > > reg_pair);
> > > > 
> > > > I wonder if there is a corner case where 8+ does not fit into
> > > > short
> > > > displacement?
> > > 
> > > That is covered by constraint AR, i.e., for short displacement,
> > > and
> > > AT
> > > for long displacement.
> > 
> > Don't they cover only %1, and not 8+%1? Can't there be a situation
> > where %1 barely fits and 8+%1 doesn't fit? A quick glance shows
> > that
> > the code doesn't leave any allowance for this:
> > 
> > "AR"
> >   s390_mem_constraint("AR")
> >     s390_check_qrst_address('R')
> >   s390_short_displacement()
> >     INTVAL (disp) >= 0 && INTVAL (disp) < 4096
> 
> Isn't this covered by
> 
> int
> s390_mem_constraint (const char *str, rtx op)
> {
>   char c = str[0];
> 
>   switch (c)
>     {
>     case 'A':
>   /* Check for offsettable variants of memory constraints.  */
>   if (!MEM_P (op

[PATCH] testsuite: Relax line number match in gfortran.dg/pr95690.f90

2024-09-11 Thread Andreas Schwab
The actual line number is target dependent, and immaterial for the test.

* gfortran.dg/pr95690.f90: Allow matching error message anywhere.
---
 gcc/testsuite/gfortran.dg/pr95690.f90 | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/pr95690.f90 
b/gcc/testsuite/gfortran.dg/pr95690.f90
index 1432937438a..4bd19b3dcdd 100644
--- a/gcc/testsuite/gfortran.dg/pr95690.f90
+++ b/gcc/testsuite/gfortran.dg/pr95690.f90
@@ -2,8 +2,10 @@
 module m
 contains
subroutine s
-  print *, (erfc) ! { dg-error "not a floating constant" "" { target 
i?86-*-* x86_64-*-* sparc*-*-* cris-*-* hppa*-*-* } }
-   end ! { dg-error "not a floating constant" "" { target { ! "i?86-*-* 
x86_64-*-* sparc*-*-* cris-*-* hppa*-*-*" } } }
+  print *, (erfc)
+   end
function erfc()
end
 end
+! The actual line number is target dependent, allow any
+! { dg-error "not a floating constant" "" { target *-*-* } 0 }
-- 
2.46.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-11 Thread Nathaniel Shead
On Tue, Sep 10, 2024 at 08:29:58PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The following patch on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html
> patch adds CWG 2867 support for namespace locals.
> 
> Those vars are just pushed into {static,tls}_aggregates chain, then
> pruned from those lists, separated by priority and finally emitted into
> the corresponding dynamic initialization functions.
> The patch adds two flags used on the TREE_LIST nodes in those lists,
> one marks the structured binding base variable and/or associated ref
> extended temps, another marks the vars initialized using get methods.
> The flags are preserved across the pruning, for splitting into by priority
> all associated decls of a structured binding using tuple* are forced
> into the same priority as the first one, and finally when actually emitting
> code, CLEANUP_POINT_EXPRs are disabled in the base initializer(s) and
> code from the bases and non-bases together is wrapped into a single
> CLEANUP_POINT_EXPR.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Note, I haven't touched the module handling; from what I can see,
> prune_vars_needing_no_initialization is destructive to the
> {static,tls}_aggregates lists (keeps the list NULL at the end or if there
> are errors or it contains some DECL_EXTERNAL decls, keeps in there just
> those, not the actual vars that need dynamic initialization) and
> the module writing is done only afterwards, so I think it could work
> reasonably only if header_module_p ().  Can namespace scope structured
> bindings appear in header_module_p () or !header_module_p () modules?
> How would a testcase using them look like?  Especially when structured
> bindings can't be extern nor templates nor inline there can be just one
> definition, so the module would need to be included in a single file, no?

In the header_module_p case, it is valid to have internal linkage
definitions (e.g. in an anonymous namespace), but in that case the
{static,tls}_aggregates lists should still be in place to be streamed
and everything should work as "normal".

(Note that it is 'valid' but not actually supported yet, I have a patch
series in progress to fix up all the various linkage issues.)

In the !header_module_p case, the modules streaming code doesn't use
those lists at all; namespace-scope definitions are attached to the
module TU directly and any initialization/destruction code is emitted
there.  Definitions would only be streamed if the variables are usable
in constant expressions.

So I don't think there's anything to do for modules here.

Yours,
Nathaniel

> In any case, the patch shouldn't make the modules case any worse, it
> just adds TREE_LIST flags which will not be streamed for modules and so
> if one can use structured bindings in modules, possibly CWG 2867 would be
> not fixed for those but nothing worse than that.
> 
> 2024-09-10  Jakub Jelinek  
> 
>   PR c++/115769
> gcc/cp/
>   * cp-tree.h (STATIC_INIT_DECOMP_BASE_P): Define.
>   (STATIC_INIT_DECOMP_NONBASE_P): Define.
>   * decl.cc (cp_finish_decl): Mark nodes in {static,tls}_aggregates
>   with 
>   * decl2.cc (decomp_handle_one_var, decomp_finalize_var_list): New
>   functions.
>   (emit_partial_init_fini_fn): Use them.
>   (prune_vars_needing_no_initialization): Clear
>   STATIC_INIT_DECOMP_*BASE_P flags if needed.
>   (partition_vars_for_init_fini): Use same priority for
>   consecutive STATIC_INIT_DECOMP_*BASE_P vars and propagate
>   those flags to new TREE_LISTs when possible.  Formatting fix.
>   (handle_tls_init): Use decomp_handle_one_var and
>   decomp_finalize_var_list functions.
> gcc/testsuite/
>   * g++.dg/DRs/dr2867-5.C: New test.
>   * g++.dg/DRs/dr2867-6.C: New test.
>   * g++.dg/DRs/dr2867-7.C: New test.
>   * g++.dg/DRs/dr2867-8.C: New test.
> 
> --- gcc/cp/cp-tree.h.jj   2024-09-07 09:31:20.601484156 +0200
> +++ gcc/cp/cp-tree.h  2024-09-09 15:53:44.924112247 +0200
> @@ -470,6 +470,7 @@ extern GTY(()) tree cp_global_trees[CPTI
>BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P (in BASELINK)
>BIND_EXPR_VEC_DTOR (in BIND_EXPR)
>ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (in ATOMIC_CONSTR)
> +  STATIC_INIT_DECOMP_BASE_P (in the TREE_LIST for 
> {static,tls}_aggregates)
> 2: IDENTIFIER_KIND_BIT_2 (in IDENTIFIER_NODE)
>ICS_THIS_FLAG (in _CONV)
>DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (in VAR_DECL)
> @@ -489,6 +490,8 @@ extern GTY(()) tree cp_global_trees[CPTI
>IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR)
>PACK_EXPANSION_AUTO_P (in *_PACK_EXPANSION)
>contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
> +  STATIC_INIT_DECOMP_NONBASE_P (in the TREE_LIST
> + for {static,tls}_aggregates)
> 3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR)
>ICS_BAD_FLAG (in _CON

Re: [PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-11 Thread Jakub Jelinek
On Wed, Sep 11, 2024 at 10:16:18PM +1000, Nathaniel Shead wrote:
> In the header_module_p case, it is valid to have internal linkage
> definitions (e.g. in an anonymous namespace), but in that case the
> {static,tls}_aggregates lists should still be in place to be streamed
> and everything should work as "normal".

As the patch doesn't touch the streaming of {static,tls}_aggregates
in that case, I guess that means CWG 2867 will not be fixed for those
cases (i.e. temporaries from the structured binding base initialization
will be destructed at the end of that initialization, rather than at the
end of subsequent get initializers); perhaps we should stream the
STATIC_INIT_DECOMP_*BASE_P flags say by streaming there integer_zero_node
or integer_one_node right before the decls and on streaming it back set
the flags again.  For the !header_module_p case, we'll need a testcase too
to make sure it works properly.

Jakub



[PATCH] c++/modules: Really always track partial specialisations [PR116496]

2024-09-11 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

My last fix for this issue (PR c++/114947, r15-810) didn't go far
enough; I had assumed that the issue where we lost track of partial
specialisations we would need to walk again later was limited to
partitions (where we always re-walk all specialisations), but the linked
PR is the same cause but for header units, and it is possible to
construct test cases exposing the same bug just for normal modules.

As such this patch just unconditionally ensures that whenever we modify
DECL_TEMPLATE_SPECIALIZATIONS we also track any partial specialisations
that might have added.

Also clean up a couple of comments and assertions to make expected state
more obvious when processing these specs.

PR c++/116496

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Don't call
set_defining_module_for_partial_spec here.
(depset::hash::add_partial_entities): Clarity assertions.
* pt.cc (add_mergeable_specialization): Always call
set_defining_module_for_partial_spec when adding a partial spec.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-5_a.C: New test.
* g++.dg/modules/partial-5_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   | 25 +++---
 gcc/cp/pt.cc   |  1 +
 gcc/testsuite/g++.dg/modules/partial-5_a.C |  9 
 gcc/testsuite/g++.dg/modules/partial-5_b.C |  9 
 4 files changed, 31 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/partial-5_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/partial-5_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 647208944da..eedcd0ec076 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8434,11 +8434,6 @@ trees_in::decl_value ()
  add_mergeable_specialization (!is_type, &spec, decl, spec_flags);
}
 
-  /* When making a CMI from a partition we're going to need to walk partial
-specializations again, so make sure they're tracked.  */
-  if (state->is_partition () && (spec_flags & 2))
-   set_defining_module_for_partial_spec (inner);
-
   if (NAMESPACE_SCOPE_P (decl)
  && (mk == MK_named || mk == MK_unique
  || mk == MK_enum || mk == MK_friend_spec)
@@ -13356,16 +13351,20 @@ depset::hash::add_partial_entities (vec 
*partial_classes)
 specialization.  */
  gcc_checking_assert (dep->get_entity_kind ()
   == depset::EK_PARTIAL);
+
+ /* Only emit GM entities if reached.  */
+ if (!DECL_LANG_SPECIFIC (inner)
+ || !DECL_MODULE_PURVIEW_P (inner))
+   dep->set_flag_bit ();
}
   else
-   /* It was an explicit specialization, not a partial one.  */
-   gcc_checking_assert (dep->get_entity_kind ()
-== depset::EK_SPECIALIZATION);
-
-  /* Only emit GM entities if reached.  */
-  if (!DECL_LANG_SPECIFIC (inner)
- || !DECL_MODULE_PURVIEW_P (inner))
-   dep->set_flag_bit ();
+   {
+ /* It was an explicit specialization, not a partial one.
+We should have already added this.  */
+ gcc_checking_assert (dep->get_entity_kind ()
+  == depset::EK_SPECIALIZATION);
+ gcc_checking_assert (dep->is_special ());
+   }
 }
 }
 
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 9195a5274e1..b8dd7e3a0ee 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -31685,6 +31685,7 @@ add_mergeable_specialization (bool decl_p, spec_entry 
*elt, tree decl,
 DECL_TEMPLATE_SPECIALIZATIONS (elt->tmpl));
   TREE_TYPE (cons) = decl_p ? TREE_TYPE (elt->spec) : elt->spec;
   DECL_TEMPLATE_SPECIALIZATIONS (elt->tmpl) = cons;
+  set_defining_module_for_partial_spec (STRIP_TEMPLATE (decl));
 }
 }
 
diff --git a/gcc/testsuite/g++.dg/modules/partial-5_a.C 
b/gcc/testsuite/g++.dg/modules/partial-5_a.C
new file mode 100644
index 000..768e6995f0f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/partial-5_a.C
@@ -0,0 +1,9 @@
+// PR c++/116496
+// { dg-additional-options "-fmodules-ts -std=c++20 -Wno-global-module" }
+// { dg-module-cmi A }
+
+module;
+template  struct S {};
+export module A;
+template  struct S {};
+template  requires false struct S {};
diff --git a/gcc/testsuite/g++.dg/modules/partial-5_b.C 
b/gcc/testsuite/g++.dg/modules/partial-5_b.C
new file mode 100644
index 000..95401fe8b56
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/partial-5_b.C
@@ -0,0 +1,9 @@
+// PR c++/116496
+// { dg-additional-options "-fmodules-ts -std=c++20 -Wno-global-module" }
+// { dg-module-cmi B }
+
+module;
+template  struct S {};
+export module B;
+import A;
+template  requires true struct S {};
-- 
2.46.0



[PATCH] tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis

2024-09-11 Thread Richard Biener
When SLP analysis scraps an instance because it fails to analyze we
can end up calling vectorizable_* in analysis mode on a node that
was analyzed during the analysis of that instance again.
vectorizable_simd_clone_call wasn't expecting that and instead
guarded analysis/transform code on populated data structures.
The following changes it so it survives re-analysis.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116674
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Support
re-analysis.

* g++.dg/vect/pr116674.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr116674.cc | 85 +++
 gcc/tree-vect-stmts.cc|  8 ++-
 2 files changed, 90 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr116674.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr116674.cc 
b/gcc/testsuite/g++.dg/vect/pr116674.cc
new file mode 100644
index 000..1c13f12290b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116674.cc
@@ -0,0 +1,85 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-Ofast" }
+// { dg-additional-options "-march=x86-64-v3" { target { x86_64-*-* i?86-*-* } 
} }
+
+namespace std {
+typedef int a;
+template  struct b;
+template  class aa {};
+template  c d(c e, c) { return e; }
+template  struct b> {
+   using f = c;
+   using g = c *;
+   template  using j = aa;
+};
+} // namespace std
+namespace l {
+template  struct m : std::b {
+   typedef std::b n;
+   typedef typename n::f &q;
+   template  struct ac { typedef typename n::j ad; };
+};
+} // namespace l
+namespace std {
+template  struct o {
+   typedef typename l::m::ac::ad ae;
+   typedef typename l::m::g g;
+   struct p {
+   g af;
+   };
+   struct ag : p {
+   ag(ae) {}
+   };
+   typedef ab u;
+   o(a, u e) : ah(e) {}
+   ag ah;
+};
+template > class r : o {
+   typedef o s;
+   typedef typename s::ae ae;
+   typedef l::m w;
+
+public:
+   c f;
+   typedef typename w::q q;
+   typedef a t;
+   typedef ab u;
+   r(t x, u e = u()) : s(ai(x, e), e) {}
+   q operator[](t x) { return *(this->ah.af + x); }
+   t ai(t x, u) { return x; }
+};
+extern "C" __attribute__((__simd__)) double exp(double);
+} // namespace std
+using namespace std;
+int ak;
+double v, y;
+void am(double, int an, double, double, double, double, double, double, double,
+   double, double, double, int, double, double, double, double,
+   r ap, double, double, double, double, double, double, double,
+   double, r ar, r as, double, double, r at,
+   r au, r av, double, double) {
+double ba;
+for (int k;;)
+  for (int i; i < an; ++i) {
+ y = i;
+ v = d(y, 25.0);
+ ba = exp(v);
+ ar[i * (ak + 1)] = ba;
+ as[i * (ak + 1)] = ar[i * (ak + 1)];
+ if (k && ap[k]) {
+ at[i * (ak + 1)] = av[i * (ak + 1)] = as[i * (ak + 1)];
+ au[i * (ak + 1)] = ar[i * (ak + 1)];
+ } else {
+ au[i * (ak + 1)] = ba;
+ at[i * (ak + 1)] = av[i * (ak + 1)] = k;
+ }
+  }
+}
+void b(int bc) {
+double bd, be, bf, bg, bh, ao, ap, bn, bo, bp, bq, br, bs, bt, bu, bv, bw, 
bx,
+by, aq, ar, as, bz, ca, at, au, av, cb, aw;
+int bi;
+am(bh, bc, bi, bi, bi, bi, bv, bw, bx, by, bu, bt, bi, ao, bn, bo, bp, ap, 
bq,
+   br, bs, bd, be, bf, bg, aq, ar, as, bz, ca, at, au, av, cb, aw);
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 22d50263cdd..1d919ad2516 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3987,6 +3987,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   vec& simd_clone_info = (slp_node ? SLP_TREE_SIMD_CLONE_INFO (slp_node)
: STMT_VINFO_SIMD_CLONE_INFO (stmt_info));
+  if (!vec_stmt)
+simd_clone_info.truncate (0);
   arginfo.reserve (nargs, true);
   auto_vec slp_op;
   slp_op.safe_grow_cleared (nargs);
@@ -4035,10 +4037,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   /* For linear arguments, the analyze phase should have saved
 the base and step in {STMT_VINFO,SLP_TREE}_SIMD_CLONE_INFO.  */
-  if (i * 3 + 4 <= simd_clone_info.length ()
+  if (vec_stmt
+ && i * 3 + 4 <= simd_clone_info.length ()
  && simd_clone_info[i * 3 + 2])
{
- gcc_assert (vec_stmt);
  thisarginfo.linear_step = tree_to_shwi (simd_clone_info[i * 3 + 2]);
  thisarginfo.op = simd_clone_info[i * 3 + 1];
  thisarginfo.simd_lane_linear
@@ -4093,7 +4095,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   unsigned group_size = slp_node ? SLP_TREE_LANES (slp_node) : 1;
   unsigned int badness = 0;
   struct cgra

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-11 Thread Andrew Stubbs

On 10/09/2024 10:43, Andrew Stubbs wrote:

On 06/09/2024 09:47, Robin Dapp wrote:

So we only found two instances of this problem and both were related to
_Bools.  In case you have more cases, it would be greatly appreciated
to verify the series with them.  If you don't mind, would it be 
possible

to comment out the zeroing, re-run the testsuite and check for FAILs?


I looked it up, and it was an execution failure in testcase
gfortran.dg/assumed_rank_1.f90 that prompted me to add the 
initialization.


Ah, I saw that one as well here.  Thanks, will have a look locally.



I ran the tests with the initialization removed, but I'm not too sure 
what to make of the results. There are 3 regressions and 8 progressions, 
but there's no consistency across the different devices (I tested 
gfx1100, gfx908, and gfx90a).


I'm going to rerun the tests and see if it does the same again, or if 
there's some randomness.


There's definitely some random chance at play here: two identical test 
runs resulted in different results.


The problem test cases:
   gfortran.dg/all_bounds_1.f90
   gfortran.dg/allocated_4.f90
   gfortran.dg/team_change_1.f90
   gfortran.dg/class_optional_1.f90
   gcc.dg/torture/pr92152.c

Hope that helps

Andrew


Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-11 Thread Richard Biener
On Wed, 11 Sep 2024, Andrew Stubbs wrote:

> On 10/09/2024 10:43, Andrew Stubbs wrote:
> > On 06/09/2024 09:47, Robin Dapp wrote:
>  So we only found two instances of this problem and both were related to
>  _Bools.  In case you have more cases, it would be greatly appreciated
>  to verify the series with them.  If you don't mind, would it be possible
>  to comment out the zeroing, re-run the testsuite and check for FAILs?
> >>>
> >>> I looked it up, and it was an execution failure in testcase
> >>> gfortran.dg/assumed_rank_1.f90 that prompted me to add the initialization.
> >>
> >> Ah, I saw that one as well here.  Thanks, will have a look locally.
> >>
> > 
> > I ran the tests with the initialization removed, but I'm not too sure what
> > to make of the results. There are 3 regressions and 8 progressions, but
> > there's no consistency across the different devices (I tested gfx1100,
> > gfx908, and gfx90a).
> > 
> > I'm going to rerun the tests and see if it does the same again, or if
> > there's some randomness.
> 
> There's definitely some random chance at play here: two identical test runs
> resulted in different results.

I guess that's what you expect when you have UD elements that in
the end should have a defined value.

Richard.

> The problem test cases:
>gfortran.dg/all_bounds_1.f90
>gfortran.dg/allocated_4.f90
>gfortran.dg/team_change_1.f90
>gfortran.dg/class_optional_1.f90
>gcc.dg/torture/pr92152.c
> 
> Hope that helps
> 
> Andrew
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH RFA] libstdc++: fix C header include guards

2024-09-11 Thread Jonathan Wakely
On Tue, 10 Sept 2024 at 11:12, Jonathan Wakely wrote:
>
> On Tue, 10 Sept 2024 at 05:10, Jason Merrill wrote:
> >
> > Tested x86_64-pc-linux-gnu, OK for trunk?
>
> I'm going to have to do some digging ... I _think_ there's some
> obscure reason for this. Maybe a weird bootstrap situation. Or maybe I
> made that up as rationale for a scripting error.

I couldn't find (or remember) any reason for this, so please push and
hopefully if it breaks something we'll learn about it before GCC 15.1



Re: [PATCH] Fix wrong code out of NRV + RSO + inlining

2024-09-11 Thread Richard Biener
On Wed, Sep 11, 2024 at 12:38 PM Eric Botcazou  wrote:
>
> > Hmm, it looks to me that the IPA analysis marking the variable readonly
> > in the first place is wrong - or that NRV may not be applied to such a
> > variable later.  Is NRV ever applied to say
> >
> > static const S s = ...;
> >
> > s = foo ();
> >
> > thus a readonly declared LHS?
>
> But NRV is only an example and not necessary, as you may read a RESULT_DECL in
> the callee.  So the combination is actually just RSO + inlining if the callee
> happens to read RESULT_DECL.

Sure, but then the same argument is that with RSO marking the variable
as write-only
is wrong (or changing the call to use RSO for a write-only variable) -
unless the callee
never reads from RESULT_DECL.

Richard.

>
> --
> Eric Botcazou
>
>


Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-11 Thread Richard Biener
On Wed, Sep 11, 2024 at 8:31 AM  wrote:
>
> From: Pan Li 
>
> When matching the cond with 2 args phi node, we need to figure out
> which arg of phi node comes from the true edge of cond block, as
> well as the false edge.  This patch would like to add interface
> to perform the action and return the true and false arg in TREE type.
>
> There will be some additional handling if one of the arg is INTEGER_CST.
> Because the INTEGER_CST args may have no source block, thus its' edge
> source points to the condition block.  See below example in line 31,
> the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> For example, the _1(3) takes block 3 as source, which is the dest
> of false edge of the condition block.
>
>4   │ __attribute__((noinline))
>5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
>6   │ {
>7   │   unsigned char _1;
>8   │   unsigned char _2;
>9   │   uint8_t _3;
>   10   │   __complex__ unsigned char _5;
>   11   │
>   12   │ ;;   basic block 2, loop depth 0
>   13   │ ;;pred:   ENTRY
>   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
>   15   │   _2 = IMAGPART_EXPR <_5>;
>   16   │   if (_2 != 0)
>   17   │ goto ; [35.00%]
>   18   │   else
>   19   │ goto ; [65.00%]
>   20   │ ;;succ:   3
>   21   │ ;;4
>   22   │
>   23   │ ;;   basic block 3, loop depth 0
>   24   │ ;;pred:   2
>   25   │   _1 = REALPART_EXPR <_5>;
>   26   │ ;;succ:   4
>   27   │
>   28   │ ;;   basic block 4, loop depth 0
>   29   │ ;;pred:   2
>   30   │ ;;3
>   31   │   # _3 = PHI <255(2), _1(3)>
>   32   │   return _3;
>   33   │ ;;succ:   EXIT
>   34   │
>   35   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * gimple-match-head.cc (match_cond_with_binary_phi): Add new func
> impl to match binary phi for true and false arg.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/gimple-match-head.cc | 60 
>  1 file changed, 60 insertions(+)
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index c51728ae742..64f4f28cc72 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -490,3 +490,63 @@ match_control_flow_graph_case_1 (basic_block b3, 
> basic_block *b_out)
>*b_out = b0;
>return true;
>  }
> +
> +/*
> + * Return the relevant gcond * of the given phi, as well as the true
> + * and false TREE args of the phi.  Or return NULL.
> + *
> + * If matched the gcond *, the output argument TREE true_arg and false_arg
> + * will be updated to the relevant args of phi.
> + *
> + * If failed to match, NULL gcond * will be returned, as well as the output
> + * arguments will be set to NULL_TREE.
> + */
> +
> +static inline gcond *
> +match_cond_with_binary_phi (gphi *phi, tree *true_arg, tree *false_arg)
> +{
> +  basic_block cond_block;
> +  *true_arg = *false_arg = NULL_TREE;
> +
> +  if (gimple_phi_num_args (phi) != 2)
> +return NULL;
> +
> +  if (!match_control_flow_graph_case_0 (gimple_bb (phi), &cond_block)
> +  && !match_control_flow_graph_case_1 (gimple_bb (phi), &cond_block))
> +return NULL;
> +
> +  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_block));
> +
> +  if (!cond || EDGE_COUNT (cond_block->succs) != 2)
> +return NULL;
> +
> +  tree t0 = gimple_phi_arg_def (phi, 0);
> +  tree t1 = gimple_phi_arg_def (phi, 1);
> +  edge e0 = gimple_phi_arg_edge (phi, 0);
> +  edge e1 = gimple_phi_arg_edge (phi, 1);
> +
> +  if (TREE_CODE (t0) == INTEGER_CST && TREE_CODE (t1) == INTEGER_CST)
> +return NULL;
> +
> +  bool arg_0_cst_p = TREE_CODE (t0) == INTEGER_CST;
> +  edge arg_edge = arg_0_cst_p ? e1 : e0;
> +  tree arg = arg_0_cst_p ? t1 : t0;
> +  tree other_arg = arg_0_cst_p ? t0 : t1;

why would arg_edge depend on whether t0 is INTEGER_CST or not?

> +  edge cond_e0 = EDGE_SUCC (cond_block, 0);
> +  edge cond_e1 = EDGE_SUCC (cond_block, 1);
> +  edge matched_edge = arg_edge->src == cond_e0->dest ? cond_e0 : cond_e1;
> +
> +  if (matched_edge->flags & EDGE_TRUE_VALUE)

I don't think this works reliably like for case 0 if cond_e0 leads to
the PHI block?

Can you instead inline match_control_flow_graph_case_0 and _1 and do the
argument assignment within the three cases of CFGs we accept?  That
would be much easier to follow.

> +{
> +  *true_arg = arg;
> +  *false_arg = other_arg;
> +}
> +  else
> +{
> +  *false_arg = arg;
> +  *true_arg = other_arg;
> +}
> +
> +  return cond;
> +}
> --
> 2.43.0
>


Is it possible to get an old posed Patch from gcc-patches archive??

2024-09-11 Thread Qing Zhao
Hi,

I was trying to study an old posted patch (but was not approved and committed) 
in the following link:

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561096.html

However, I cannot get the patch from the attachment part as following:

-- next part --
A non-text attachment was scrubbed...
Name: 0001-Refactor-frecord-gcc-switches.patch
Type: text/x-patch
Size: 24450 bytes
Desc: not available
URL: 


I tried to click the URL link at the last line above, also failed…
Is there other way to get the old patch?

thanks.

Qing



Re: [PATCH v4] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-11 Thread Qing Zhao


> On Sep 10, 2024, at 17:47, Jakub Jelinek  wrote:
> 
> On Tue, Sep 10, 2024 at 09:28:04PM +, Qing Zhao wrote:
>> @@ -11741,6 +11770,54 @@ c_parser_postfix_expression (c_parser *parser)
>>set_c_expr_source_range (&expr, loc, close_paren_loc);
>>break;
>>  }
>> + case RID_BUILTIN_COUNTED_BY_REF:
>> +  {
>> +vec *cexpr_list;
>> +c_expr_t *e_p;
>> +location_t close_paren_loc;
>> +
>> +in_builtin_counted_by_ref = true;
>> +
>> +c_parser_consume_token (parser);
>> +if (!c_parser_get_builtin_args (parser,
>> +"__builtin_counted_by_ref",
>> +&cexpr_list, false,
>> +&close_paren_loc))
>> +  {
>> + expr.set_error ();
>> + goto error_exit;
> 
> Up to Joseph or Marek as C maintainers/reviewers, but I think it is a bad
> idea to use such a generic name for a label inside of handling of one
> specific keyword.
> 
> Either use RAII and just break; instead of goto error_exit;, like
>struct in_builtin_counted_by_ref_sentinel {
>  ~in_builtin_counted_by_ref_sentinel ()
>  { in_builtin_counted_by_ref = false; }
>} ibcbr_sentinel;
> or add those in_builtin_counted_by_ref = false; lines before each break;

Okay, I can add in_builtin_counted_by_ref = false lines before each break.
 That might be the most simple and straightforward way to fix this concern. -:)

> 
> Or set it just when parsing the args?
> 
> Anyway, I'm not even convinced a global variable like that is a good idea.
> The argument can contain arbitrary expressions in there (e.g. comma
> expression, statement expression, ...), I strongly doubt you want to
> have that special handling in all the places in the grammar rather than just
> for the last COMPONENT_REF in there.  And, there is no reason why
> you couldn't have e.g. nested call inside of the argument:
> __builtin_counted_by_ref (ptr[*__builtin_counted_by_ref 
> (something->fam1)]->fam2)
> and that on the other side clears in_builtin_counted_by_ref after parsing
> the inner one.
Good point here. So, instead of of a boolean type, same as the other globals, 
such as
In_typeof, in_sizeof, use an int type  to record the level of the nesting might 
be more proper.

What’s your suggestion here?

thanks.

Qing
> 
> Jakub
> 



Re: Is it possible to get an old posed Patch from gcc-patches archive??

2024-09-11 Thread Arsen Arsenović
Qing Zhao  writes:

> Hi,
>
> I was trying to study an old posted patch (but was not approved and 
> committed) in the following link:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561096.html
>
> However, I cannot get the patch from the attachment part as following:
>
> -- next part --
> A non-text attachment was scrubbed...
> Name: 0001-Refactor-frecord-gcc-switches.patch
> Type: text/x-patch
> Size: 24450 bytes
> Desc: not available
> URL: 
> 
>
> I tried to click the URL link at the last line above, also failed…
> Is there other way to get the old patch?

I can download both of the old URLs.  But also:

https://inbox.sourceware.org/9159005a-ff96-3db2-65eb-7d5be3faf...@suse.cz/

(you can get the message ID from a meta tag header, it is in an
In-Reply-To in the HTML file, and then give it to public-inbox)

Hope that helps, have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] c++: ICE with TTP [PR96097]

2024-09-11 Thread Patrick Palka
On Wed, 4 Sep 2024, Marek Polacek wrote:

> On Wed, Sep 04, 2024 at 10:58:25AM -0400, Jason Merrill wrote:
> > On 9/3/24 6:12 PM, Marek Polacek wrote:
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?
> > 
> > The change to return bool seems like unrelated cleanup; please push that
> > separately on trunk only.
> 
> Done.
>  
> > > +   /* We can also have:
> > > +
> > > +   template  typename X>
> > > +   void func() {}
> > > +   template 
> > > +   struct Y {};
> > > +   void g() { func(); }
> > > +
> > > +  where we are not in a template, but the type of PARM is T::type
> > > +  and dependent_type_p doesn't want to see a TEMPLATE_TYPE_PARM
> > > +  outside a template.  */
> > > +   ++processing_template_decl;
> > > tree t = tsubst (TREE_TYPE (parm), outer_args, complain, 
> > > in_decl);
> > > +   --processing_template_decl;
> > > if (!uses_template_parms (t)
> > > && !same_type_p (t, TREE_TYPE (arg)))
> > 
> > This looks like the pattern Patrick just removed from type_unification_real
> > for PR101463.  Do we want to make the same change here?

The pattern does seem similar, but I can't see the benefit of making the
same change here.  In type_unification_real the change made dependent vs
non-dependent deduction more uniform by removing the dependence tests.

But nothing can be made more uniform here AFAICT.  At best we can use
find_template_parameters to see if 'parm' depends on outer template
parameters and avoid the substitution if so.  But both uses_template_parms
tests need to remain, I think.  Even if 'parm' doesn't depend on outer
template parameters, it seems we still want to succeed if it's overall
dependent as in the testcase from the PR.  Likewise if 'arg' is overall
dependent and 'parm' isn't.

> 
> Interesting.  Sorry, this may be silly, but I'm not sure I can do that
> here: find_template_parameters wants "the template parameters in scope"
> but I don't think I have that here.

You could probably pass the DECL_TEMPLATE_PARMS of parm_tmpl or arg_tmpl
of its caller.



Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Stefan Schulze Frielinghaus
On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus wrote:
> > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > > wrote:
> > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > > > > wrote:
> > > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > > *tf_to_fprx2_1
> > > > > > survive register allocation.  This in turn leads to wrong
> > > > > > register
> > > > > > renaming.  Keeping the current approach would mean we need
> > > > > > two
> > > > > > insns
> > > > > > for
> > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something
> > > > > > along
> > > > > > the
> > > > > > lines
> > > > > > 
> > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > >   [(set (subreg:DF (match_operand:FPRX2 0
> > > > > > "nonimmediate_operand"
> > > > > > "=f") 0)
> > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > "v")]
> > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > >   "TARGET_VXE"
> > > > > >   "#")
> > > > > > 
> > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > "v")]
> > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > >   "TARGET_VXE"
> > > > > >   "vpdi\t%v0,%v1,%v0,1
> > > > > >   [(set_attr "op_type" "VRR")])
> > > > > > 
> > > > > > and similar for *tf_to_fprx2_1.  Note, pre register
> > > > > > allocation
> > > > > > operand 0
> > > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > > eliminated.
> > > > > > 
> > > > > > Since we always copy a whole vector register into a floating-
> > > > > > point
> > > > > > register pair, another way to fix this is to merge
> > > > > > *tf_to_fprx2_0
> > > > > > and
> > > > > > *tf_to_fprx2_1 into a single insn which means we don't have
> > > > > > to
> > > > > > use
> > > > > > subregs at all.  The downside of this is that the assembler
> > > > > > template
> > > > > > contains two instructions, now.  The upside is that we don't
> > > > > > have
> > > > > > to
> > > > > > come up with some artificial insn before RA which might be
> > > > > > more
> > > > > > readable/maintainable.  That is implemented by this patch.
> > > > > > 
> > > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > > specifier
> > > > > > %V
> > > > > > was
> > > > > > introduced which is used in tf_to_fprx2 only, now.  I didn't
> > > > > > come
> > > > > > up
> > > > > > with its counterpart like %F for floating-point registers. 
> > > > > > Instead I
> > > > > > printed the register pair in the output function directly. 
> > > > > > This
> > > > > > spares
> > > > > > us a new and "rare" format specifier for a single insn.  I
> > > > > > don't
> > > > > > have
> > > > > > a
> > > > > > strong opinion which option to choose, however, we should
> > > > > > either
> > > > > > add
> > > > > > %F
> > > > > > in order to mimic the same behaviour as %V or getting rid of
> > > > > > %V
> > > > > > and
> > > > > > inline the logic in the output function.  I lean towards the
> > > > > > latter.
> > > > > > Any preferences?
> > > > > > ---
> > > > > >  gcc/config/s390/s390.md    |  2 +
> > > > > >  gcc/config/s390/vector.md  | 66 +++-
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> > > > > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-
> > > > > > 1.c
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > +  char buf[64];
> > > > > > +  switch (which_alternative)
> > > > > > +    {
> > > > > > +    case 0:
> > > > > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > > > +   return "vpdi\t%V0,%v1,%V0,5";
> > > > > > +  else
> > > > > > +   return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > > > > +    case 1:
> > > > > > +  {
> > > > > > +   const char *reg_pair = reg_names[REGNO (operands[0])
> > > > > > +
> > > > > > 1];
> > > > > > +   snprintf (buf, sizeof (buf),
> > > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > > > > reg_pair);
> > > > > 
> > > > > I wonder if there is a corner case where 8+ does not fit into
> > > > > short
> > > > > displacement?
> > > > 
> > > > That is covered by constraint AR, i.e., for short displacement,
> > > > and
> > > > AT
> > > > for long displacement.
> > > 
> > > Don't they cover only %1, and not 8+%1? Can't there be a situation
> > > where %1 barely fits and 8+%1 doesn't fit? A quick glance shows
> > > that
> > > the code doesn't leave any allowance for this:
> > > 
> > > "AR"
> > >   s390_mem_constraint("AR")
> > >     s390_check_qrst_address('R')
> > >

Re: [PATCH] c++: ICE with TTP [PR96097]

2024-09-11 Thread Patrick Palka
On Wed, 11 Sep 2024, Patrick Palka wrote:

> On Wed, 4 Sep 2024, Marek Polacek wrote:
> 
> > On Wed, Sep 04, 2024 at 10:58:25AM -0400, Jason Merrill wrote:
> > > On 9/3/24 6:12 PM, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?
> > > 
> > > The change to return bool seems like unrelated cleanup; please push that
> > > separately on trunk only.
> > 
> > Done.
> >  
> > > > + /* We can also have:
> > > > +
> > > > + template  
> > > > typename X>
> > > > + void func() {}
> > > > + template 
> > > > + struct Y {};
> > > > + void g() { func(); }
> > > > +
> > > > +where we are not in a template, but the type of PARM is 
> > > > T::type
> > > > +and dependent_type_p doesn't want to see a 
> > > > TEMPLATE_TYPE_PARM
> > > > +outside a template.  */

... so the patch LGTM, except I'd prefer to not have this comment
containing an embedded specific testcase.  IMHO it's "understood" that
processing_template_decl needs to be set when substituting using an
incomplete set of arguments since in that case the result must be
templated.

> > > > + ++processing_template_decl;
> > > >   tree t = tsubst (TREE_TYPE (parm), outer_args, complain, 
> > > > in_decl);
> > > > + --processing_template_decl;
> > > >   if (!uses_template_parms (t)
> > > >   && !same_type_p (t, TREE_TYPE (arg)))
> > > 
> > > This looks like the pattern Patrick just removed from 
> > > type_unification_real
> > > for PR101463.  Do we want to make the same change here?
> 
> The pattern does seem similar, but I can't see the benefit of making the
> same change here.  In type_unification_real the change made dependent vs
> non-dependent deduction more uniform by removing the dependence tests.
> 
> But nothing can be made more uniform here AFAICT.  At best we can use
> find_template_parameters to see if 'parm' depends on outer template
> parameters and avoid the substitution if so.  But both uses_template_parms
> tests need to remain, I think.  Even if 'parm' doesn't depend on outer
> template parameters, it seems we still want to succeed if it's overall
> dependent as in the testcase from the PR.  Likewise if 'arg' is overall
> dependent and 'parm' isn't.
> 
> > 
> > Interesting.  Sorry, this may be silly, but I'm not sure I can do that
> > here: find_template_parameters wants "the template parameters in scope"
> > but I don't think I have that here.
> 
> You could probably pass the DECL_TEMPLATE_PARMS of parm_tmpl or arg_tmpl
> of its caller.
> 



Re: [PATCH] c++: ICE with TTP [PR96097]

2024-09-11 Thread Patrick Palka
On Wed, 11 Sep 2024, Patrick Palka wrote:

> On Wed, 11 Sep 2024, Patrick Palka wrote:
> 
> > On Wed, 4 Sep 2024, Marek Polacek wrote:
> > 
> > > On Wed, Sep 04, 2024 at 10:58:25AM -0400, Jason Merrill wrote:
> > > > On 9/3/24 6:12 PM, Marek Polacek wrote:
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?
> > > > 
> > > > The change to return bool seems like unrelated cleanup; please push that
> > > > separately on trunk only.
> > > 
> > > Done.
> > >  
> > > > > +   /* We can also have:
> > > > > +
> > > > > +   template  
> > > > > typename X>
> > > > > +   void func() {}
> > > > > +   template 
> > > > > +   struct Y {};
> > > > > +   void g() { func(); }
> > > > > +
> > > > > +  where we are not in a template, but the type of PARM is 
> > > > > T::type
> > > > > +  and dependent_type_p doesn't want to see a 
> > > > > TEMPLATE_TYPE_PARM
> > > > > +  outside a template.  */
> 
> ... so the patch LGTM, except I'd prefer to not have this comment
> containing an embedded specific testcase.  IMHO it's "understood" that
> processing_template_decl needs to be set when substituting using an
> incomplete set of arguments since in that case the result must be
> templated.

... and the comment might make this instance of the pattern seem more
like an exceptional case rather than a general rule, which paradoxically
could make the code seem more complex than it is at first glance.

> 
> > > > > +   ++processing_template_decl;
> > > > > tree t = tsubst (TREE_TYPE (parm), outer_args, complain, 
> > > > > in_decl);
> > > > > +   --processing_template_decl;
> > > > > if (!uses_template_parms (t)
> > > > > && !same_type_p (t, TREE_TYPE (arg)))
> > > > 
> > > > This looks like the pattern Patrick just removed from 
> > > > type_unification_real
> > > > for PR101463.  Do we want to make the same change here?
> > 
> > The pattern does seem similar, but I can't see the benefit of making the
> > same change here.  In type_unification_real the change made dependent vs
> > non-dependent deduction more uniform by removing the dependence tests.
> > 
> > But nothing can be made more uniform here AFAICT.  At best we can use
> > find_template_parameters to see if 'parm' depends on outer template
> > parameters and avoid the substitution if so.  But both uses_template_parms
> > tests need to remain, I think.  Even if 'parm' doesn't depend on outer
> > template parameters, it seems we still want to succeed if it's overall
> > dependent as in the testcase from the PR.  Likewise if 'arg' is overall
> > dependent and 'parm' isn't.
> > 
> > > 
> > > Interesting.  Sorry, this may be silly, but I'm not sure I can do that
> > > here: find_template_parameters wants "the template parameters in scope"
> > > but I don't think I have that here.
> > 
> > You could probably pass the DECL_TEMPLATE_PARMS of parm_tmpl or arg_tmpl
> > of its caller.
> > 
> 



Re: [PATCH] Makefile: Fix typos

2024-09-11 Thread Eric Gallager
On Wed, Sep 11, 2024 at 7:32 AM Andrew Kreimer  wrote:
>
> Fix typos in comments.
>
> Signed-off-by: Andrew Kreimer 
> ---
>  Makefile.def | 2 +-
>  Makefile.in  | 4 ++--
>  Makefile.tpl | 4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/Makefile.def b/Makefile.def
> index 19954e7d731..b502eb63d36 100644
> --- a/Makefile.def
> +++ b/Makefile.def
> @@ -77,7 +77,7 @@ host_modules= { module= gprofng; };
>  host_modules= { module= gettext; bootstrap=true; no_install=true;
>  module_srcdir= "gettext/gettext-runtime";
> // We always build gettext with pic, because some packages 
> (e.g. gdbserver)
> -   // need it in some configuratons, which is determined via 
> nontrivial tests.
> +   // need it in some configurations, which is determined via 
> nontrivial tests.
> // Always enabling pic seems to make sense for something tied 
> to
> // user-facing output.
> extra_configure_flags='--disable-shared --disable-threads 
> --disable-java --disable-csharp --with-pic --disable-libasprintf';
> diff --git a/Makefile.in b/Makefile.in
> index 966d6045496..0c3511d2cf1 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -666,7 +666,7 @@ AR_FOR_TARGET=@AR_FOR_TARGET@
>  AS_FOR_TARGET=@AS_FOR_TARGET@
>  CC_FOR_TARGET=$(STAGE_CC_WRAPPER) @CC_FOR_TARGET@
>
> -# If GCC_FOR_TARGET is not overriden on the command line, then this
> +# If GCC_FOR_TARGET is not overridden on the command line, then this
>  # variable is passed down to the gcc Makefile, where it is used to
>  # build libgcc2.a.  We define it here so that it can itself be
>  # overridden on the command line.
> @@ -68937,7 +68937,7 @@ install-gdb: $(INSTALL_GDB_TK)
>  @serialization_dependencies@
>
>  # 
> -# Regenerating top level configury
> +# Regenerating top level configure
>  # 

No, "configury" is correct here. "configure" refers to just the script
actually named "configure", while "configury" with a "y" includes all
of the helper scripts and stuff that autotools installs to ensure that
the configure script can run properly.

>
>  # Rebuilding Makefile.in, using autogen.
> diff --git a/Makefile.tpl b/Makefile.tpl
> index da38dca697a..b32dd1e4583 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -589,7 +589,7 @@ AR_FOR_TARGET=@AR_FOR_TARGET@
>  AS_FOR_TARGET=@AS_FOR_TARGET@
>  CC_FOR_TARGET=$(STAGE_CC_WRAPPER) @CC_FOR_TARGET@
>
> -# If GCC_FOR_TARGET is not overriden on the command line, then this
> +# If GCC_FOR_TARGET is not overridden on the command line, then this
>  # variable is passed down to the gcc Makefile, where it is used to
>  # build libgcc2.a.  We define it here so that it can itself be
>  # overridden on the command line.
> @@ -2129,7 +2129,7 @@ install-gdb: $(INSTALL_GDB_TK)
>  @serialization_dependencies@
>
>  # 
> -# Regenerating top level configury
> +# Regenerating top level configure
>  # 
>
>  # Rebuilding Makefile.in, using autogen.
> --
> 2.46.0
>


FW: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Bohan Lei
Hi Juzhe,

> Could you show me what the codegen looks like after this patch ?> I would be 
> expecting the codegen become:
> 
> foo:
>         vsetvli a5,a0,e16,m1,ta,ma
>         vmv.x.s a4,v8
>         vadd.vx v9,v8,a4
>         vsetvli zero,a5,e16,m1,ta,ma
>         vadd.vv v8,v9,v8
>         ret
> Or:
> foo:
>         vsetvli zero,a0,e16,m1,ta,ma
>         vmv.x.s a4,v8
>         vadd.vx v9,v8,a4
>         vadd.vv v8,v9,v8
>         ret
> are both correct.

The former one with two vsetvli instructions is generated after the patch.

> Also, I think it's better add assembly check in the testcase in stead of just 
> adding "Eliminate insn" "vsetvl"

Okay.

Thanks,
Bohan

Re: [PATCH] Makefile: Fix typos

2024-09-11 Thread Andrew Kreimer
On Wed, Sep 11, 2024 at 11:06:40AM -0400, Eric Gallager wrote:
> On Wed, Sep 11, 2024 at 7:32 AM Andrew Kreimer  wrote:
> >
> > Fix typos in comments.
> >
> > Signed-off-by: Andrew Kreimer 
> > ---
> >  Makefile.def | 2 +-
> >  Makefile.in  | 4 ++--
> >  Makefile.tpl | 4 ++--
> >  3 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/Makefile.def b/Makefile.def
> > index 19954e7d731..b502eb63d36 100644
> > --- a/Makefile.def
> > +++ b/Makefile.def
> > @@ -77,7 +77,7 @@ host_modules= { module= gprofng; };
> >  host_modules= { module= gettext; bootstrap=true; no_install=true;
> >  module_srcdir= "gettext/gettext-runtime";
> > // We always build gettext with pic, because some packages 
> > (e.g. gdbserver)
> > -   // need it in some configuratons, which is determined via 
> > nontrivial tests.
> > +   // need it in some configurations, which is determined via 
> > nontrivial tests.
> > // Always enabling pic seems to make sense for something 
> > tied to
> > // user-facing output.
> > extra_configure_flags='--disable-shared --disable-threads 
> > --disable-java --disable-csharp --with-pic --disable-libasprintf';
> > diff --git a/Makefile.in b/Makefile.in
> > index 966d6045496..0c3511d2cf1 100644
> > --- a/Makefile.in
> > +++ b/Makefile.in
> > @@ -666,7 +666,7 @@ AR_FOR_TARGET=@AR_FOR_TARGET@
> >  AS_FOR_TARGET=@AS_FOR_TARGET@
> >  CC_FOR_TARGET=$(STAGE_CC_WRAPPER) @CC_FOR_TARGET@
> >
> > -# If GCC_FOR_TARGET is not overriden on the command line, then this
> > +# If GCC_FOR_TARGET is not overridden on the command line, then this
> >  # variable is passed down to the gcc Makefile, where it is used to
> >  # build libgcc2.a.  We define it here so that it can itself be
> >  # overridden on the command line.
> > @@ -68937,7 +68937,7 @@ install-gdb: $(INSTALL_GDB_TK)
> >  @serialization_dependencies@
> >
> >  # 
> > -# Regenerating top level configury
> > +# Regenerating top level configure
> >  # 
> 
> No, "configury" is correct here. "configure" refers to just the script
> actually named "configure", while "configury" with a "y" includes all
> of the helper scripts and stuff that autotools installs to ensure that
> the configure script can run properly.
> 

My bad!

> >
> >  # Rebuilding Makefile.in, using autogen.
> > diff --git a/Makefile.tpl b/Makefile.tpl
> > index da38dca697a..b32dd1e4583 100644
> > --- a/Makefile.tpl
> > +++ b/Makefile.tpl
> > @@ -589,7 +589,7 @@ AR_FOR_TARGET=@AR_FOR_TARGET@
> >  AS_FOR_TARGET=@AS_FOR_TARGET@
> >  CC_FOR_TARGET=$(STAGE_CC_WRAPPER) @CC_FOR_TARGET@
> >
> > -# If GCC_FOR_TARGET is not overriden on the command line, then this
> > +# If GCC_FOR_TARGET is not overridden on the command line, then this
> >  # variable is passed down to the gcc Makefile, where it is used to
> >  # build libgcc2.a.  We define it here so that it can itself be
> >  # overridden on the command line.
> > @@ -2129,7 +2129,7 @@ install-gdb: $(INSTALL_GDB_TK)
> >  @serialization_dependencies@
> >
> >  # 
> > -# Regenerating top level configury
> > +# Regenerating top level configure
> >  # 
> >
> >  # Rebuilding Makefile.in, using autogen.
> > --
> > 2.46.0
> >


Re: [PATCH v1] RISC-V: Fix asm check for Vector SAT_* due to middle-end change

2024-09-11 Thread Jeff Law




On 9/10/24 5:03 PM, pan2...@intel.com wrote:

From: Pan Li 

The middle-end change makes the effect on the layout of the assembly
for vector SAT_*.  This patch would like to fix it and make it robust.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Adjust
asm check and make it robust.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.

Re: [PATCH] c++: ICE with TTP [PR96097]

2024-09-11 Thread Jason Merrill

On 9/11/24 10:53 AM, Patrick Palka wrote:

On Wed, 11 Sep 2024, Patrick Palka wrote:


On Wed, 11 Sep 2024, Patrick Palka wrote:


On Wed, 4 Sep 2024, Marek Polacek wrote:


On Wed, Sep 04, 2024 at 10:58:25AM -0400, Jason Merrill wrote:

On 9/3/24 6:12 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?


The change to return bool seems like unrelated cleanup; please push that
separately on trunk only.


Done.
  

+ /* We can also have:
+
+ template  typename X>
+ void func() {}
+ template 
+ struct Y {};
+ void g() { func(); }
+
+where we are not in a template, but the type of PARM is T::type
+and dependent_type_p doesn't want to see a TEMPLATE_TYPE_PARM
+outside a template.  */


... so the patch LGTM, except I'd prefer to not have this comment
containing an embedded specific testcase.  IMHO it's "understood" that
processing_template_decl needs to be set when substituting using an
incomplete set of arguments since in that case the result must be
templated.


... and the comment might make this instance of the pattern seem more
like an exceptional case rather than a general rule, which paradoxically
could make the code seem more complex than it is at first glance.


Makes sense to me.

Jason



Re: [PATCH] c++: decltype(auto) deduction of statement-expression [PR116418]

2024-09-11 Thread Jason Merrill

On 9/10/24 8:35 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/backports?


OK, though you might combine the new STMT_EXPR case with the existing 
LAMBDA_EXPR case; the principle is the same for both.



-- >8 --

r8-7538 for PR84968 made strip_typedefs_expr diagnose seeing
STATEMENT_LIST, which effectively makes us reject statement-expressions
noexcept-specifiers (we already diagnose them in template arguments
at parse time).

Later r11-7452 made decltype(auto) deduction do strip_typedefs_expr on
the expression before deducing (as an implementation detail) and so ever
since we inadvertently reject decltype(auto) deduction of a
statement-expression.

This patch just removes the diagnostic in strip_typedefs_expr; it doesn't
seem like the right place for it.  And it lets us accept more code using
statement-expressions in various contexts.

PR c++/116418
PR c++/84968

gcc/cp/ChangeLog:

* tree.cc (strip_typedefs_expr) : Replace
with ...
: ... this non-diagnosing early exit.

gcc/testsuite/ChangeLog:

* g++.dg/eh/pr84968.C: No longer expect ah ahead of time diagnostic
for the statement-expresssion.  Instantiate the template and expect
an incomplete type error instead.
* g++.dg/ext/stmtexpr26.C: New test.
---
  gcc/cp/tree.cc|  5 ++---
  gcc/testsuite/g++.dg/eh/pr84968.C |  4 +++-
  gcc/testsuite/g++.dg/ext/stmtexpr26.C | 10 ++
  3 files changed, 15 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/stmtexpr26.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 31ecbb1ac79..a150a91f2fa 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -2011,9 +2011,8 @@ strip_typedefs_expr (tree t, bool *remove_attributes, 
unsigned int flags)
  case LAMBDA_EXPR:
return t;
  
-case STATEMENT_LIST:

-  error ("statement-expression in a constant expression");
-  return error_mark_node;
+case STMT_EXPR:
+  return t;
  
  default:

break;
diff --git a/gcc/testsuite/g++.dg/eh/pr84968.C 
b/gcc/testsuite/g++.dg/eh/pr84968.C
index 23c49f477a8..a6e21914eed 100644
--- a/gcc/testsuite/g++.dg/eh/pr84968.C
+++ b/gcc/testsuite/g++.dg/eh/pr84968.C
@@ -9,7 +9,9 @@ struct S {
void a()
  try {
  } catch (int ()
-noexcept (({ union b a; true; }))) // { dg-error "constant" }
+noexcept (({ union b a; true; }))) // { dg-error "'b a' has incomplete 
type" }
{
}
  };
+
+template void S::a(); // { dg-message "required from here" }
diff --git a/gcc/testsuite/g++.dg/ext/stmtexpr26.C 
b/gcc/testsuite/g++.dg/ext/stmtexpr26.C
new file mode 100644
index 000..498dd12ef10
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/stmtexpr26.C
@@ -0,0 +1,10 @@
+// PR c++/116418
+// { dg-do compile { target c++14 } }
+
+void foo ();
+template 
+void bar ()
+{
+  decltype(auto) v = ({ foo (); 3; });
+}
+




[PATCH v1][GCC] aarch64: Add GCS build attributes support.

2024-09-11 Thread Srinath Parvathaneni
This patch adds support for aarch64 gcs build attributes. This support
includes generating two new assembler directives .aeabi_subsection and
.aeabi_attribute. These directives are generated as per the syntax
mentioned in spec "Build Attributes for the Arm® 64-bit
Architecture (AArch64)" available at [1].

To check whether the assembler being used to build the toolchain
supports these directives, a new gcc configure check is added in
gcc/configure.ac.

If the assembler support these directives, .aeabi_subsection and
.aeabi_attribute directives are emitted in the generated assembly,
when -mbranch-protection=gcs is passed.

If the assembler does not support these directives,
.note.gnu.property section will emit the relevant gcs information
in the generated assembly, when -mbranch-protection=gcs is passed.

This patch needs to be applied on top of GCC gcs patch series [2].

Bootstrapped on aarch64-none-linux-gnu and regression tested on
aarch64-none-elf, no issues.

Ok for master?

Regards,
Srinath.

[1]: https://github.com/ARM-software/abi-aa/pull/230
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/gcs

gcc/ChangeLog:

2024-09-11  Srinath Parvathaneni  

* config.in: Regenerated
* config/aarch64/aarch64.cc (aarch64_emit_aeabi_attribute): New
function declaration.
(aarch64_emit_aeabi_subsection): Likewise.
(aarch64_start_file): Emit gcs build attributes.
(aarch64_file_end_indicate_exec_stack): Update gcs bit in
note.gnu.property section.
* configure: Regenerated.
* configure.ac: Add gcc configure check.

gcc/testsuite/ChangeLog:

2024-09-11  Srinath Parvathaneni  

* gcc.target/aarch64/build-attribute-gcs.c: New test.
---
 gcc/config.in |   6 +++
 gcc/config/aarch64/a.out  | Bin 0 -> 656 bytes
 gcc/config/aarch64/aarch64.cc |  43 ++
 gcc/configure |  35 ++
 gcc/configure.ac  |   7 +++
 .../gcc.target/aarch64/build-attribute-gcs.c  |  24 ++
 6 files changed, 115 insertions(+)
 create mode 100644 gcc/config/aarch64/a.out
 create mode 100644 gcc/testsuite/gcc.target/aarch64/build-attribute-gcs.c

diff --git a/gcc/config.in b/gcc/config.in
index 7fcabbe5061..eb6024dfc90 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -379,6 +379,12 @@
 #endif
 
 
+/* Define if your assembler supports gcs build attributes. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_BUILD_ATTRIBUTES_GCS
+#endif
+
+
 /* Define to the level of your assembler's compressed debug section support.
*/
 #ifndef USED_FOR_TARGET
diff --git a/gcc/config/aarch64/a.out b/gcc/config/aarch64/a.out
new file mode 100644
index ..dd7982f2db625be166d33548dc5d00c7e7601629
GIT binary patch
literal 656
zcmb<-^>JfjWMqH=MuzPS2p&w7f#Cvz$>0EHJ20>_upx>confdefs.h
 
+fi
+
+# Check if we have binutils support for gcs build attributes.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for gcs build attributes support" >&5
+$as_echo_n "checking assembler for gcs build attributes support... " >&6; }
+if ${gcc_cv_as_aarch64_gcs_build_attributes+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_aarch64_gcs_build_attributes=no
+  if test x$gcc_cv_as != x; then
+$as_echo '
+	.aeabi_subsection .aeabi-feature-and-bits, 1, 0
+	.aeabi_attribute 3, 1
+' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+	gcc_cv_as_aarch64_gcs_build_attributes=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_aarch64_gcs_build_attributes" >&5
+$as_echo "$gcc_cv_as_aarch64_gcs_build_attributes" >&6; }
+if test $gcc_cv_as_aarch64_gcs_build_attributes = yes; then
+
+$as_echo "#define HAVE_AS_BUILD_ATTRIBUTES_GCS 1" >>confdefs.h
+
 fi
 
 # Enable Branch Target Identification Mechanism and Return Address
diff --git a/gcc/configure.ac b/gcc/configure.ac
index d0b9865fc91..51b07417153 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4388,6 +4388,13 @@ case "$target" in
 	ldr x0, [[x2, #:gotpage_lo15:globalsym]]
 ],,[AC_DEFINE(HAVE_AS_SMALL_PIC_RELOCS, 1,
 	[Define if your assembler supports relocs needed by -fpic.])])
+# Check if we have binutils support for gcs build attributes.
+gcc_GAS_CHECK_FEATURE([gcs build attributes support], gcc_cv_as_aarch64_gcs_build_attributes,,
+[
+	.aeabi_subsection .aeabi-feature-and-bits, 1, 0
+	.aeabi_attribute 3, 1
+],,[AC_DEFINE(HAVE_AS_BUILD_ATTRIBUTES_GCS, 1,
+	[Define if your assembler supp

[PATCH] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-11 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

  struct C {
 C(const C&&) = default;
  };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.  I'm also downgrading an error to a pedwarn
in C++17 since the code compiles in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Call delete_defaulted_fn to set
DECL_DELETED_FN.
* cp-tree.h (delete_defaulted_fn): Declare.
* method.cc (delete_defaulted_fn): New.
(defaulted_late_check): Call delete_defaulted_fn instead of giving
an error, unless the code is ill-formed.  Change error to pedwarn.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdefaulted-function-deleted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted15.C: Add dg-warning.
* g++.dg/cpp0x/defaulted51.C: Likewise.
* g++.dg/cpp0x/defaulted52.C: Likewise.
* g++.dg/cpp0x/defaulted53.C: Likewise.
* g++.dg/cpp0x/defaulted54.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted58.C: Likewise.
* g++.dg/cpp0x/defaulted59.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: New test.
* g++.dg/cpp0x/defaulted64.C: New test.
* g++.dg/cpp0x/defaulted65.C: New test.
* g++.dg/cpp23/defaulted1.C: New test.
---
 gcc/c-family/c.opt   |  4 ++
 gcc/cp/class.cc  |  2 +-
 gcc/cp/cp-tree.h |  1 +
 gcc/cp/method.cc | 90 +---
 gcc/doc/invoke.texi  |  9 +++
 gcc/testsuite/g++.dg/cpp0x/defaulted15.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted51.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted52.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted53.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted54.C |  1 +
 gcc/testsuite/g++.dg/cpp0x/defaulted56.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted57.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted58.C |  1 +
 gcc/testsuite/g++.dg/cpp0x/defaulted59.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted63.C | 39 ++
 gcc/testsuite/g++.dg/cpp0x/defaulted64.C | 27 +++
 gcc/testsuite/g++.dg/cpp0x/defaulted65.C | 25 +++
 gcc/testsuite/g++.dg/cpp23/defaulted1.C  | 23 ++
 18 files changed, 226 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted63.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted64.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted65.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/defaulted1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 491aa02e1a3..f5136fd2341 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -619,6 +619,10 @@ Wdeclaration-missing-parameter-type
 C ObjC Var(warn_declaration_missing_parameter) Warning Init(1)
 Warn for missing parameter types in function declarations.
 
+Wdefaulted-function-deleted
+C++ ObjC++ Var(warn_defaulted_fn_deleted) Init(1) Warning
+Warn when an explicitly defaulted function is deleted.
+
 Wdelete-incomplete
 C++ ObjC++ Var(warn_delete_incomplete) Init(1) Warning
 Warn when deleting a pointer to incomplete type.
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 950d83b0ea4..a4fdf7f9d11 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -6506,7 +6506,7 @@ check_bases_and_members (tree t)
  /* If the function is defaulted outside the class, we just
 give the synthesis error.  Core Issue #1331 says this is
 no longer ill-formed, it is defined as deleted instead.  */
- DECL_DELETED_FN (fn) = true;
+ delete_defaulted_fn (fn);
  }
defaulted_late_check (fn);
   }
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7baa2ccbe1e..65295b3326d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6929,6 +6929,7 @@ extern bool type_build_ctor_call  (tree);
 extern bool type_build_dtor_call   (tree);
 extern void explain_non_literal_class  (tree);
 extern void inherit_targ_abi_tags  (tree);
+extern void delete_defaulted_fn(tree);
 extern void defaulted_late_check   (tree);
 extern bool defaultable_fn_check   (tree);
 extern void check_ab

Re: [PATCH] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-11 Thread Jason Merrill

On 9/11/24 12:54 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

   struct C {
  C(const C&&) = default;
   };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.  I'm also downgrading an error to a pedwarn
in C++17 since the code compiles in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Call delete_defaulted_fn to set
DECL_DELETED_FN.
* cp-tree.h (delete_defaulted_fn): Declare.
* method.cc (delete_defaulted_fn): New.
(defaulted_late_check): Call delete_defaulted_fn instead of giving
an error, unless the code is ill-formed.  Change error to pedwarn.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdefaulted-function-deleted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted15.C: Add dg-warning.
* g++.dg/cpp0x/defaulted51.C: Likewise.
* g++.dg/cpp0x/defaulted52.C: Likewise.
* g++.dg/cpp0x/defaulted53.C: Likewise.
* g++.dg/cpp0x/defaulted54.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted58.C: Likewise.
* g++.dg/cpp0x/defaulted59.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: New test.
* g++.dg/cpp0x/defaulted64.C: New test.
* g++.dg/cpp0x/defaulted65.C: New test.
* g++.dg/cpp23/defaulted1.C: New test.
---
  gcc/c-family/c.opt   |  4 ++
  gcc/cp/class.cc  |  2 +-
  gcc/cp/cp-tree.h |  1 +
  gcc/cp/method.cc | 90 +---
  gcc/doc/invoke.texi  |  9 +++
  gcc/testsuite/g++.dg/cpp0x/defaulted15.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted51.C |  3 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted52.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted53.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted54.C |  1 +
  gcc/testsuite/g++.dg/cpp0x/defaulted56.C |  6 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted57.C |  6 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted58.C |  1 +
  gcc/testsuite/g++.dg/cpp0x/defaulted59.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted63.C | 39 ++
  gcc/testsuite/g++.dg/cpp0x/defaulted64.C | 27 +++
  gcc/testsuite/g++.dg/cpp0x/defaulted65.C | 25 +++
  gcc/testsuite/g++.dg/cpp23/defaulted1.C  | 23 ++
  18 files changed, 226 insertions(+), 19 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted63.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted64.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted65.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/defaulted1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 491aa02e1a3..f5136fd2341 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -619,6 +619,10 @@ Wdeclaration-missing-parameter-type
  C ObjC Var(warn_declaration_missing_parameter) Warning Init(1)
  Warn for missing parameter types in function declarations.
  
+Wdefaulted-function-deleted

+C++ ObjC++ Var(warn_defaulted_fn_deleted) Init(1) Warning
+Warn when an explicitly defaulted function is deleted.
+
  Wdelete-incomplete
  C++ ObjC++ Var(warn_delete_incomplete) Init(1) Warning
  Warn when deleting a pointer to incomplete type.
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 950d83b0ea4..a4fdf7f9d11 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -6506,7 +6506,7 @@ check_bases_and_members (tree t)
  /* If the function is defaulted outside the class, we just
 give the synthesis error.  Core Issue #1331 says this is
 no longer ill-formed, it is defined as deleted instead.  */
- DECL_DELETED_FN (fn) = true;
+ delete_defaulted_fn (fn);
  }
defaulted_late_check (fn);
}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7baa2ccbe1e..65295b3326d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6929,6 +6929,7 @@ extern bool type_build_ctor_call  (tree);
  extern bool type_build_dtor_call  (tree);
  extern void explain_non_literal_class (tree);
  extern void inherit_targ_abi_tags (tree);
+extern void delete_defaulted_fn(tree);
  extern void defaulted_late_check  (tree);

Re: [PATCH] c++/modules: Really always track partial specialisations [PR116496]

2024-09-11 Thread Jason Merrill

On 9/11/24 8:51 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

My last fix for this issue (PR c++/114947, r15-810) didn't go far
enough; I had assumed that the issue where we lost track of partial
specialisations we would need to walk again later was limited to
partitions (where we always re-walk all specialisations), but the linked
PR is the same cause but for header units, and it is possible to
construct test cases exposing the same bug just for normal modules.

As such this patch just unconditionally ensures that whenever we modify
DECL_TEMPLATE_SPECIALIZATIONS we also track any partial specialisations
that might have added.

Also clean up a couple of comments and assertions to make expected state
more obvious when processing these specs.

PR c++/116496

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Don't call
set_defining_module_for_partial_spec here.
(depset::hash::add_partial_entities): Clarity assertions.
* pt.cc (add_mergeable_specialization): Always call
set_defining_module_for_partial_spec when adding a partial spec.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-5_a.C: New test.
* g++.dg/modules/partial-5_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 25 +++---
  gcc/cp/pt.cc   |  1 +
  gcc/testsuite/g++.dg/modules/partial-5_a.C |  9 
  gcc/testsuite/g++.dg/modules/partial-5_b.C |  9 
  4 files changed, 31 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-5_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-5_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 647208944da..eedcd0ec076 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8434,11 +8434,6 @@ trees_in::decl_value ()
  add_mergeable_specialization (!is_type, &spec, decl, spec_flags);
}
  
-  /* When making a CMI from a partition we're going to need to walk partial

-specializations again, so make sure they're tracked.  */
-  if (state->is_partition () && (spec_flags & 2))
-   set_defining_module_for_partial_spec (inner);
-
if (NAMESPACE_SCOPE_P (decl)
  && (mk == MK_named || mk == MK_unique
  || mk == MK_enum || mk == MK_friend_spec)
@@ -13356,16 +13351,20 @@ depset::hash::add_partial_entities (vec 
*partial_classes)
 specialization.  */
  gcc_checking_assert (dep->get_entity_kind ()
   == depset::EK_PARTIAL);
+
+ /* Only emit GM entities if reached.  */
+ if (!DECL_LANG_SPECIFIC (inner)
+ || !DECL_MODULE_PURVIEW_P (inner))
+   dep->set_flag_bit ();
}
else
-   /* It was an explicit specialization, not a partial one.  */
-   gcc_checking_assert (dep->get_entity_kind ()
-== depset::EK_SPECIALIZATION);
-
-  /* Only emit GM entities if reached.  */
-  if (!DECL_LANG_SPECIFIC (inner)
- || !DECL_MODULE_PURVIEW_P (inner))
-   dep->set_flag_bit ();
+   {
+ /* It was an explicit specialization, not a partial one.
+We should have already added this.  */
+ gcc_checking_assert (dep->get_entity_kind ()
+  == depset::EK_SPECIALIZATION);
+ gcc_checking_assert (dep->is_special ());
+   }
  }
  }
  
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc

index 9195a5274e1..b8dd7e3a0ee 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -31685,6 +31685,7 @@ add_mergeable_specialization (bool decl_p, spec_entry 
*elt, tree decl,
 DECL_TEMPLATE_SPECIALIZATIONS (elt->tmpl));
TREE_TYPE (cons) = decl_p ? TREE_TYPE (elt->spec) : elt->spec;
DECL_TEMPLATE_SPECIALIZATIONS (elt->tmpl) = cons;
+  set_defining_module_for_partial_spec (STRIP_TEMPLATE (decl));
  }
  }
  
diff --git a/gcc/testsuite/g++.dg/modules/partial-5_a.C b/gcc/testsuite/g++.dg/modules/partial-5_a.C

new file mode 100644
index 000..768e6995f0f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/partial-5_a.C
@@ -0,0 +1,9 @@
+// PR c++/116496
+// { dg-additional-options "-fmodules-ts -std=c++20 -Wno-global-module" }
+// { dg-module-cmi A }
+
+module;
+template  struct S {};
+export module A;
+template  struct S {};
+template  requires false struct S {};
diff --git a/gcc/testsuite/g++.dg/modules/partial-5_b.C 
b/gcc/testsuite/g++.dg/modules/partial-5_b.C
new file mode 100644
index 000..95401fe8b56
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/partial-5_b.C
@@ -0,0 +1,9 @@
+// PR c++/116496
+// { dg-additional-options "-fmodules-ts -std=c++20 -Wno-global-module" }
+// { dg-module-cmi B }
+
+module;
+template  struct S {};
+export module B;
+import A;
+template  requires true struct S {};




Re: [PATCH] c++: Don't ICE to build private access error message [PR116323]

2024-09-11 Thread Jason Merrill

On 9/11/24 7:26 AM, Simon Martin wrote:

We currently ICE upon the following code while building the "[...] is
private within this context" error message

=== cut here ===
class A { enum Enum{}; };
template class Alloc>
class B : private Alloc, private A {};
template class Alloc>
int B::foo (Enum m) { return 42; }
=== cut here ===

The problem is that since r11-6880, after detecting that Enum cannot be
accessed in B, enforce_access will access the TYPE_BINFO of all the
bases of B, which ICEs for any that is a BOUND_TEMPLATE_TEMPLATE_PARM.
This patch simply skips such bases.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/116323

gcc/cp/ChangeLog:

* search.cc (get_parent_with_private_access): Only call access_in_type
for RECORD_OR_UNION_TYPE_P base BINFOs.

gcc/testsuite/ChangeLog:

* g++.dg/template/access43.C: New test.

---
  gcc/cp/search.cc |  4 +++-
  gcc/testsuite/g++.dg/template/access43.C | 11 +++
  2 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/access43.C

diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 60c30ecb881..a810cf70d6a 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -163,9 +163,11 @@ get_parent_with_private_access (tree decl, tree binfo)
/* Iterate through immediate parent classes.  */
for (int i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
  {
+  tree base_binfo_type = BINFO_TYPE (base_binfo);
/* This parent had private access.  Therefore that's why BINFO can't
  access DECL.  */
-  if (access_in_type (BINFO_TYPE (base_binfo), decl) == ak_private)
+  if (RECORD_OR_UNION_TYPE_P (base_binfo_type)


You might add to the comment to explain that in a template the base list 
can also contain WILDCARD_TYPE_P types.  OK either way.


Jason



Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Wed, 2024-09-11 at 16:44 +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote:
> > On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus
> > wrote:
> > > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > > > wrote:
> > > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich
> > > > > wrote:
> > > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze
> > > > > > Frielinghaus
> > > > > > wrote:
> > > > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > > > *tf_to_fprx2_1
> > > > > > > survive register allocation.  This in turn leads to wrong
> > > > > > > register
> > > > > > > renaming.  Keeping the current approach would mean we
> > > > > > > need
> > > > > > > two
> > > > > > > insns
> > > > > > > for
> > > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. 
> > > > > > > Something
> > > > > > > along
> > > > > > > the
> > > > > > > lines
> > > > > > > 
> > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > >   [(set (subreg:DF (match_operand:FPRX2 0
> > > > > > > "nonimmediate_operand"
> > > > > > > "=f") 0)
> > > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > "v")]
> > > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > > >   "TARGET_VXE"
> > > > > > >   "#")
> > > > > > > 
> > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > "v")]
> > > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > > >   "TARGET_VXE"
> > > > > > >   "vpdi\t%v0,%v1,%v0,1
> > > > > > >   [(set_attr "op_type" "VRR")])
> > > > > > > 
> > > > > > > and similar for *tf_to_fprx2_1.  Note, pre register
> > > > > > > allocation
> > > > > > > operand 0
> > > > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > > > eliminated.
> > > > > > > 
> > > > > > > Since we always copy a whole vector register into a
> > > > > > > floating-
> > > > > > > point
> > > > > > > register pair, another way to fix this is to merge
> > > > > > > *tf_to_fprx2_0
> > > > > > > and
> > > > > > > *tf_to_fprx2_1 into a single insn which means we don't
> > > > > > > have
> > > > > > > to
> > > > > > > use
> > > > > > > subregs at all.  The downside of this is that the
> > > > > > > assembler
> > > > > > > template
> > > > > > > contains two instructions, now.  The upside is that we
> > > > > > > don't
> > > > > > > have
> > > > > > > to
> > > > > > > come up with some artificial insn before RA which might
> > > > > > > be
> > > > > > > more
> > > > > > > readable/maintainable.  That is implemented by this
> > > > > > > patch.
> > > > > > > 
> > > > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > > > specifier
> > > > > > > %V
> > > > > > > was
> > > > > > > introduced which is used in tf_to_fprx2 only, now.  I
> > > > > > > didn't
> > > > > > > come
> > > > > > > up
> > > > > > > with its counterpart like %F for floating-point
> > > > > > > registers. 
> > > > > > > Instead I
> > > > > > > printed the register pair in the output function
> > > > > > > directly. 
> > > > > > > This
> > > > > > > spares
> > > > > > > us a new and "rare" format specifier for a single insn. 
> > > > > > > I
> > > > > > > don't
> > > > > > > have
> > > > > > > a
> > > > > > > strong opinion which option to choose, however, we should
> > > > > > > either
> > > > > > > add
> > > > > > > %F
> > > > > > > in order to mimic the same behaviour as %V or getting rid
> > > > > > > of
> > > > > > > %V
> > > > > > > and
> > > > > > > inline the logic in the output function.  I lean towards
> > > > > > > the
> > > > > > > latter.
> > > > > > > Any preferences?
> > > > > > > ---
> > > > > > >  gcc/config/s390/s390.md    |  2 +
> > > > > > >  gcc/config/s390/vector.md  | 66
> > > > > > > +++-
> > > > > > > 
> > > > > > > 
> > > > > > > --
> > > > > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26
> > > > > > > +
> > > > > > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > > > > > >  create mode 100644
> > > > > > > gcc/testsuite/gcc.target/s390/pr115860-
> > > > > > > 1.c
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > +  char buf[64];
> > > > > > > +  switch (which_alternative)
> > > > > > > +    {
> > > > > > > +    case 0:
> > > > > > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > > > > + return "vpdi\t%V0,%v1,%V0,5";
> > > > > > > +  else
> > > > > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > > > > > +    case 1:
> > > > > > > +  {
> > > > > > > + const char *reg_pair = reg_names[REGNO
> > > > > > > (operands[0])
> > > > > > > +
> > > > > > > 1];
> > > > > > > + snprintf (buf, sizeof (buf),
> > > > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > > > > > reg

[PATCH v2] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-11 Thread Marek Polacek
On Wed, Sep 11, 2024 at 01:19:53PM -0400, Jason Merrill wrote:
> On 9/11/24 12:54 PM, Marek Polacek wrote:
> > + auto_diagnostic_group d;
> > + /* We used to emit a hard error, so this uses 0 rather than
> > +OPT_Wpedantic.  */
> > + if (pedwarn (DECL_SOURCE_LOCATION (fn), 0,
> > +  "defaulted declaration %q+D does not match the "
> > +  "expected signature", fn))
> > +   inform (DECL_SOURCE_LOCATION (fn),
> > +   "expected signature: %qD", implicit_fn);
> 
> This should also depend on -Wdefaulted-function-deleted, and set
> DECL_DELETED_FN.  And the C++20 case should show the expected signature.
> Really, the two cases should share the same code, only the diagnostic kind
> should change.

How about this?

dg.exp passed; running the full testing.

-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

  struct C {
 C(const C&&) = default;
  };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.  I'm also downgrading an error to a pedwarn
in C++17 since the code compiles in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Call delete_defaulted_fn to set
DECL_DELETED_FN.
* cp-tree.h (delete_defaulted_fn): Declare.
* method.cc (delete_defaulted_fn): New.
(defaulted_late_check): Call delete_defaulted_fn instead of giving
an error.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdefaulted-function-deleted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted15.C: Add dg-warning/dg-error.
* g++.dg/cpp0x/defaulted51.C: Likewise.
* g++.dg/cpp0x/defaulted52.C: Likewise.
* g++.dg/cpp0x/defaulted53.C: Likewise.
* g++.dg/cpp0x/defaulted54.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted58.C: Likewise.
* g++.dg/cpp0x/defaulted59.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: New test.
* g++.dg/cpp0x/defaulted64.C: New test.
* g++.dg/cpp0x/defaulted65.C: New test.
* g++.dg/cpp23/defaulted1.C: New test.
---
 gcc/c-family/c.opt   |  4 ++
 gcc/cp/class.cc  | 18 --
 gcc/cp/cp-tree.h |  1 +
 gcc/cp/method.cc | 77 +---
 gcc/doc/invoke.texi  |  9 +++
 gcc/testsuite/g++.dg/cpp0x/defaulted15.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted51.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted52.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted53.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted54.C |  2 +
 gcc/testsuite/g++.dg/cpp0x/defaulted56.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted57.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted58.C |  2 +
 gcc/testsuite/g++.dg/cpp0x/defaulted59.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted63.C | 39 
 gcc/testsuite/g++.dg/cpp0x/defaulted64.C | 27 +
 gcc/testsuite/g++.dg/cpp0x/defaulted65.C | 25 
 gcc/testsuite/g++.dg/cpp23/defaulted1.C  | 23 +++
 18 files changed, 232 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted63.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted64.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted65.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/defaulted1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 491aa02e1a3..f5136fd2341 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -619,6 +619,10 @@ Wdeclaration-missing-parameter-type
 C ObjC Var(warn_declaration_missing_parameter) Warning Init(1)
 Warn for missing parameter types in function declarations.
 
+Wdefaulted-function-deleted
+C++ ObjC++ Var(warn_defaulted_fn_deleted) Init(1) Warning
+Warn when an explicitly defaulted function is deleted.
+
 Wdelete-incomplete
 C++ ObjC++ Var(warn_delete_incomplete) Init(1) Warning
 Warn when deleting a pointer to incomplete type.
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 950d83b0ea4..2d85681dc72 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -6490,8 +6490,9 @@ check_bases_and_members (tree t)
&& !DECL_ARTIFICIAL (fn)
&& DECL_DEFAULTED_IN_CLASS_P (fn))
   {
+   special_function_kind kind = special_function_p (fn);
/* ...except handle comparisons later, in finish_str

[PATCH v2] c++: ICE with TTP [PR96097]

2024-09-11 Thread Marek Polacek
On Wed, Sep 11, 2024 at 11:26:56AM -0400, Jason Merrill wrote:
> On 9/11/24 10:53 AM, Patrick Palka wrote:
> > On Wed, 11 Sep 2024, Patrick Palka wrote:
> > 
> > > On Wed, 11 Sep 2024, Patrick Palka wrote:
> > > 
> > > > On Wed, 4 Sep 2024, Marek Polacek wrote:
> > > > 
> > > > > On Wed, Sep 04, 2024 at 10:58:25AM -0400, Jason Merrill wrote:
> > > > > > On 9/3/24 6:12 PM, Marek Polacek wrote:
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?
> > > > > > 
> > > > > > The change to return bool seems like unrelated cleanup; please push 
> > > > > > that
> > > > > > separately on trunk only.
> > > > > 
> > > > > Done.
> > > > > > > +   /* We can also have:
> > > > > > > +
> > > > > > > +   template  
> > > > > > > typename X>
> > > > > > > +   void func() {}
> > > > > > > +   template 
> > > > > > > +   struct Y {};
> > > > > > > +   void g() { func(); }
> > > > > > > +
> > > > > > > +  where we are not in a template, but the type of PARM is 
> > > > > > > T::type
> > > > > > > +  and dependent_type_p doesn't want to see a 
> > > > > > > TEMPLATE_TYPE_PARM
> > > > > > > +  outside a template.  */
> > > 
> > > ... so the patch LGTM, except I'd prefer to not have this comment
> > > containing an embedded specific testcase.  IMHO it's "understood" that
> > > processing_template_decl needs to be set when substituting using an
> > > incomplete set of arguments since in that case the result must be
> > > templated.
> > 
> > ... and the comment might make this instance of the pattern seem more
> > like an exceptional case rather than a general rule, which paradoxically
> > could make the code seem more complex than it is at first glance.
> 
> Makes sense to me.

Thank you both.  Here's a version without the cleanups and the comment.

Ran dg.exp on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --
We crash when dependent_type_p gets a TEMPLATE_TYPE_PARM outside
a template.  That happens here because in

  template  typename X>
  void func() {}
  template 
  struct Y {};
  void g() { func(); }

when performing overload resolution for func() we have to check
if U matches T and I matches TT.  So we wind up in
coerce_template_template_parm/PARM_DECL.  TREE_TYPE (arg) is int
so we try to substitute TT's type, which is T::type.  But we have
nothing to substitute T with.  And we call make_typename_type where
ctx is still T, which checks dependent_scope_p and we trip the assert.

It should work to always perform the substitution in a template context.
If the result still contains template parameters, we cannot say if they
match.

PR c++/96097

gcc/cp/ChangeLog:

* pt.cc (coerce_template_template_parm): Increment
processing_template_decl before calling tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp44.C: New test.
---
 gcc/cp/pt.cc  |  2 ++
 gcc/testsuite/g++.dg/template/ttp44.C | 13 +
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp44.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 310e5dfff03..e4de5451f19 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7951,7 +7951,9 @@ coerce_template_template_parm (tree parm, tree arg, 
tsubst_flags_t complain,
 i.e. the parameter list of TT depends on earlier parameters.  */
   if (!uses_template_parms (TREE_TYPE (arg)))
{
+ ++processing_template_decl;
  tree t = tsubst (TREE_TYPE (parm), outer_args, complain, in_decl);
+ --processing_template_decl;
  if (!uses_template_parms (t)
  && !same_type_p (t, TREE_TYPE (arg)))
return false;
diff --git a/gcc/testsuite/g++.dg/template/ttp44.C 
b/gcc/testsuite/g++.dg/template/ttp44.C
new file mode 100644
index 000..2a412975243
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp44.C
@@ -0,0 +1,13 @@
+// PR c++/96097
+// { dg-do compile }
+
+template  class X>
+void func() {}
+
+template 
+struct Y {};
+
+void test()
+{
+  func();
+}

base-commit: 670cfd5fe6433ee8f2e86eedb197d2523dbb033b
-- 
2.46.0



[PATCH v5] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-11 Thread Qing Zhao
compared to the 4th version, the changes are (address Jacub's concerns):

1. change the global "in_builtin_counted_by_ref" from a boolean to an int;
2. delete the label for the error handling code, and decress the global
   "in_builtin_counted_by_ref" before each break; 

the 4th version compared to the 3rd version, the only change is the 
size calculation in the testing case.

The 3rd version compared to the 2nd version, the major change is:
the update in testing cases per Martin's suggestions.

when the 2nd version is compared to the first version, the major changes are:

1. change the name of the builtin from __builtin_get_counted_by to
__builtin_counted_by_ref in order to reflect the fact that the returned
value of it is a reference to the object.

2. make typeof(__builtin_counted_by_ref) working.

3. update the testing case to use the new builtin inside _Generic.

bootstrapped and regress tested on both X86 and aarch64. no issue.

Okay for the trunk?

thanks.

Qing.

==

With the addition of the 'counted_by' attribute and its wide roll-out
within the Linux kernel, a use case has been found that would be very
nice to have for object allocators: being able to set the counted_by
counter variable without knowing its name.

For example, given:

  struct foo {
...
int counter;
...
struct bar array[] __attribute__((counted_by (counter)));
  } *p;

The existing Linux object allocators are roughly:

  #define MAX(A, B) (A > B) ? (A) : (B)
  #define alloc(P, FAM, COUNT) ({ \
__auto_type __p = &(P); \
size_t __size = MAX (sizeof(*P),
 __builtin_offsetof (__typeof(*P), FAM)
 + sizeof (*(P->FAM)) * COUNT); \
*__p = kmalloc(__size); \
  })

Right now, any addition of a counted_by annotation must also
include an open-coded assignment of the counter variable after
the allocation:

  p = alloc(p, array, how_many);
  p->counter = how_many;

In order to avoid the tedious and error-prone work of manually adding
the open-coded counted-by intializations everywhere in the Linux
kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
to be added to help the adoption of the counted-by attribute.

 -- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
 The built-in function '__builtin_counted_by_ref' checks whether the
 array object pointed by the pointer PTR has another object
 associated with it that represents the number of elements in the
 array object through the 'counted_by' attribute (i.e.  the
 counted-by object).  If so, returns a pointer to the corresponding
 counted-by object.  If such counted-by object does not exist,
 returns a NULL pointer.

 This built-in function is only available in C for now.

 The argument PTR must be a pointer to an array.  The TYPE of the
 returned value must be a pointer type pointing to the corresponding
 type of the counted-by object or VOID pointer type in case of a
 NULL pointer being returned.

With this new builtin, the central allocator could be updated to:

  #define MAX(A, B) (A > B) ? (A) : (B)
  #define alloc(P, FAM, COUNT) ({ \
__auto_type __p = &(P); \
__auto_type __c = (COUNT); \
size_t __size = MAX (sizeof (*(*__p)),\
 __builtin_offsetof (__typeof(*(*__p)),FAM) \
 + sizeof (*((*__p)->FAM)) * __c); \
if ((*__p = kmalloc(__size))) { \
  __auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
  *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
} \
  })

And then structs can gain the counted_by attribute without needing
additional open-coded counter assignments for each struct, and
unannotated structs could still use the same allocator.

PR c/116016

gcc/c-family/ChangeLog:

* c-common.cc: Add new __builtin_counted_by_ref.
* c-common.h (enum rid): Add RID_BUILTIN_COUNTED_BY_REF.

gcc/c/ChangeLog:

* c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
* c-parser.cc (has_counted_by_object): New routine.
(get_counted_by_ref): New routine.
(c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
* c-tree.h: New global in_builtin_counted_by_ref.
* c-typeck.cc (build_component_ref): Enable generating
.ACCESS_WITH_SIZE inside typeof when inside builtin_counted_by_ref.

gcc/ChangeLog:

* doc/extend.texi: Add documentation for __builtin_counted_by_ref.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-counted-by-ref-1.c: New test.
* gcc.dg/builtin-counted-by-ref.c: New test.
---
 gcc/c-family/c-common.cc  |  1 +
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-decl.cc   |  1 +
 gcc/c/c-parser.cc | 78 +++
 gcc/c/c-tree.h|  1 +
 gcc/c/c-typeck.cc   

[PATCH] cselib: Discard useless locs of preserved VALUEs [PR116627]

2024-09-11 Thread Jakub Jelinek
Hi!

remove_useless_values iteratively discards useless locs (locs of
cselib_val which refer to non-preserved VALUEs with no locations),
which in turn can make further values useless until no further VALUEs
are made useless and then discards the useless VALUEs.

Preserved VALUEs (something done during var-tracking only I think)
live in a different hash table, cselib_preserved_hash_table rather
than cselib_hash_table.  cselib_find_slot first looks up slot in
cselib_preserved_hash_table and only if not found looks it up in
cselib_hash_table (and INSERTs only into the latter), whereas preservation
of a VALUE results in move of a cselib_val from the latter to the former
hash table.

The testcase in the PR (apparently too fragile, it only reproduces on 14
branch with various flags on a single arch, not on trunk) ICEs, because
we have a preserved VALUE (QImode with (const_int 0) as one of the locs).
In a different BB SImode r2 is looked up, a non-preserved VALUE is created
for it, and the r13-2916 added code attempts to lookup also SUBREGs of that
in narrower modes, among those QImode, so adds to that SImode r2
non-preserve VALUE a new loc of (subreg:QI (value:SI) 0).  That SImode
value is considered useless, so remove_useless_value discards it, but
nothing discarded it from the preserved VALUE's loc_list, so when looking
something up in the hash table we ICE trying to derevence CSELIB_VAL
of the discarded VALUE.

I think we need to discuard useless locs even from the preserved VALUEs.
That IMHO shouldn't create any further useless VALUEs, the preserved
VALUEs are never useless, so we don't need to iterate with it, can do it
just once, but IMHO it needs to be done because actually
discard_useless_values.

The following patch does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-11  Jakub Jelinek  

PR target/116627
* cselib.cc (remove_useless_values): Discard useless locs
even from preserved cselib_vals in cselib_preserved_hash_table
hash table.

--- gcc/cselib.cc.jj2024-04-26 11:46:54.960269768 +0200
+++ gcc/cselib.cc   2024-09-11 10:54:05.018242593 +0200
@@ -751,6 +751,11 @@ remove_useless_values (void)
   }
   *p = &dummy_val;
 
+  if (cselib_preserve_constants)
+cselib_preserved_hash_table->traverse  (NULL);
+  gcc_assert (!values_became_useless);
+
   n_useless_values += n_useless_debug_values;
   n_debug_values -= n_useless_debug_values;
   n_useless_debug_values = 0;

Jakub



[PATCH] c++: Disable deprecated/unavailable diagnostics when creating thunks for methods with such attributes [PR116636]

2024-09-11 Thread Jakub Jelinek
Hi!

On the following testcase, we emit false positive warnings/errors about using
the deprecated or unavailable methods when creating thunks for them, even
when nothing (in the testcase so far) actually used those.

The following patch temporarily disables that diagnostics when creating
the thunks.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-11  Jakub Jelinek  

PR c++/116636
* method.cc: Include decl.h.
(use_thunk): Temporarily change deprecated_state to
UNAVAILABLE_DEPRECATED_SUPPRESS.

* g++.dg/warn/deprecated-19.C: New test.

--- gcc/cp/method.cc.jj 2024-09-06 13:43:37.823301244 +0200
+++ gcc/cp/method.cc2024-09-11 12:19:57.420486173 +0200
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.
 #include "coretypes.h"
 #include "target.h"
 #include "cp-tree.h"
+#include "decl.h"
 #include "stringpool.h"
 #include "cgraph.h"
 #include "varasm.h"
@@ -283,6 +284,11 @@ use_thunk (tree thunk_fndecl, bool emit_
   /* Thunks are always addressable; they only appear in vtables.  */
   TREE_ADDRESSABLE (thunk_fndecl) = 1;
 
+  /* Don't diagnose deprecated or unavailable functions just because they
+ have thunks emitted for them.  */
+  auto du = make_temp_override (deprecated_state,
+UNAVAILABLE_DEPRECATED_SUPPRESS);
+
   /* Figure out what function is being thunked to.  It's referenced in
  this translation unit.  */
   TREE_ADDRESSABLE (function) = 1;
--- gcc/testsuite/g++.dg/warn/deprecated-19.C.jj2024-09-11 
12:50:25.34263 +0200
+++ gcc/testsuite/g++.dg/warn/deprecated-19.C   2024-09-11 13:05:29.210222060 
+0200
@@ -0,0 +1,22 @@
+// PR c++/116636
+// { dg-do compile }
+// { dg-options "-pedantic -Wdeprecated" }
+
+struct A {
+  virtual int foo () = 0;
+};
+struct B : virtual A {
+  [[deprecated]] int foo () { return 0; }  // { dg-message "declared here" 
}
+}; // { dg-warning "C\\\+\\\+11 
attributes only available with" "" { target c++98_only } .-1 }
+struct C : virtual A {
+  [[gnu::unavailable]] int foo () { return 0; }// { dg-message 
"declared here" }
+}; // { dg-warning "C\\\+\\\+11 
attributes only available with" "" { target c++98_only } .-1 }
+
+void
+bar ()
+{
+  B b;
+  b.foo ();// { dg-warning "'virtual int 
B::foo\\\(\\\)' is deprecated" }
+  C c;
+  c.foo ();// { dg-error "'virtual int 
C::foo\\\(\\\)' is unavailable" }
+}

Jakub



[PATCH] libcpp: Implement clang -Wheader-guard warning [PR96842]

2024-09-11 Thread Jakub Jelinek
Hi!

The following patch implements the clang -Wheader-guard warning, which warns
if a valid multiple inclusion header guard's #ifndef/#if !defined directive
is immediately (no other non-line directives nor other (non-comment)
tokens in between) followed by #define directive for some different macro,
which in get_suggestion rules is close enough to the actual header guard
macro (i.e. likely misspelling), the #define is object-like with empty
definition (I've followed what clang implements) and the macro isn't defined
later on (at least not on the final #endif at the end of a header).

In this case it emits a warning, so that
#ifndef STDIO_H
#define STDOI_H
...
#endif
or similar misspellings can be caught.

clang enables this warning by default, but I've put it into -Wall instead
as it still seems to be a style warning, nothing more severe; if a header
doesn't survive multiple inclusion because of the misspelling, users will
get different diagnostics.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-11  Jakub Jelinek  

PR preprocessor/96842
libcpp/
* include/cpplib.h (struct cpp_options): Add warn_header_guard member.
(enum cpp_warning_reason): Add CPP_W_HEADER_GUARD enumerator.
* internal.h (struct cpp_reader): Add mi_def_cmacro, mi_loc and
mi_def_loc members.
(_cpp_defined_macro_p): Constify type pointed by argument type.
Formatting fix.
* init.cc (cpp_create_reader): Clear
CPP_OPTION (pfile, warn_header_guard).
* directives.cc (struct if_stack): Add def_loc and mi_def_cmacro
members.
(DIRECTIVE_TABLE): Add IF_COND flag to define.
(do_define): Set ifs->mi_def_cmacro on a define immediately following
#ifndef directive for the guard.  Clear pfile->mi_valid.  Formatting
fix.
(do_endif): Copy over pfile->mi_def_cmacro and pfile->mi_def_loc
if ifs->mi_def_cmacro is set and pfile->mi_cmacro isn't a defined
macro.
(push_conditional): Clear mi_def_cmacro and mi_def_loc members.
* files.cc (_cpp_pop_file_buffer): Emit -Wheader-guard diagnostics.
gcc/
* doc/invoke.texi (Wheader-guard): Document.
gcc/c-family/
* c.opt (Wheader-guard): New option.
* c.opt.urls: Regenerated.
* c-ppoutput.cc (init_pp_output): Initialize also cb->get_suggestion.
gcc/testsuite/
* c-c++-common/cpp/Wheader-guard-1.c: New test.
* c-c++-common/cpp/Wheader-guard-1-1.h: New test.
* c-c++-common/cpp/Wheader-guard-1-2.h: New test.
* c-c++-common/cpp/Wheader-guard-1-3.h: New test.
* c-c++-common/cpp/Wheader-guard-1-4.h: New test.
* c-c++-common/cpp/Wheader-guard-1-5.h: New test.
* c-c++-common/cpp/Wheader-guard-1-6.h: New test.
* c-c++-common/cpp/Wheader-guard-1-7.h: New test.
* c-c++-common/cpp/Wheader-guard-1-8.h: New test.
* c-c++-common/cpp/Wheader-guard-1-9.h: New test.
* c-c++-common/cpp/Wheader-guard-1-10.h: New test.
* c-c++-common/cpp/Wheader-guard-1-11.h: New test.
* c-c++-common/cpp/Wheader-guard-1-12.h: New test.
* c-c++-common/cpp/Wheader-guard-2.c: New test.
* c-c++-common/cpp/Wheader-guard-2.h: New test.
* c-c++-common/cpp/Wheader-guard-3.c: New test.
* c-c++-common/cpp/Wheader-guard-3.h: New test.

--- libcpp/include/cpplib.h.jj  2024-09-03 16:47:47.323031836 +0200
+++ libcpp/include/cpplib.h 2024-09-11 16:39:36.373680969 +0200
@@ -435,6 +435,10 @@ struct cpp_options
   /* Different -Wimplicit-fallthrough= levels.  */
   unsigned char cpp_warn_implicit_fallthrough;
 
+  /* Nonzero means warn about a define of a different macro right after
+ #ifndef/#if !defined header guard directive.  */
+  unsigned char warn_header_guard;
+
   /* Nonzero means we should look for header.gcc files that remap file
  names.  */
   unsigned char remap;
@@ -702,7 +706,8 @@ enum cpp_warning_reason {
   CPP_W_EXPANSION_TO_DEFINED,
   CPP_W_BIDIRECTIONAL,
   CPP_W_INVALID_UTF8,
-  CPP_W_UNICODE
+  CPP_W_UNICODE,
+  CPP_W_HEADER_GUARD
 };
 
 /* Callback for header lookup for HEADER, which is the name of a
--- libcpp/internal.h.jj2024-09-03 16:47:47.324031823 +0200
+++ libcpp/internal.h   2024-09-11 17:09:26.481097532 +0200
@@ -493,9 +493,11 @@ struct cpp_reader
  been used.  */
   bool seen_once_only;
 
-  /* Multiple include optimization.  */
+  /* Multiple include optimization and -Wheader-guard warning.  */
   const cpp_hashnode *mi_cmacro;
   const cpp_hashnode *mi_ind_cmacro;
+  const cpp_hashnode *mi_def_cmacro;
+  location_t mi_loc, mi_def_loc;
   bool mi_valid;
 
   /* Lexing.  */
@@ -676,7 +678,8 @@ _cpp_in_main_source_file (cpp_reader *pf
 }
 
 /* True if NODE is a macro for the purposes of ifdef, defined etc.  */
-inline bool _cpp_defined_macro_p (cpp_hashnode *node)
+inline bool
+_cpp_defined_macro_p (const cpp_hashnode *node)
 {
   /* Do not treat conditional macros 

Merch/Apparel hats supply

2024-09-11 Thread Dorothy.zhao
Hi
 
Have a nice day!
 
This is Dorothy from WellSucceed Embroidery Limited in China. 
 
May I ask if we have the great honor to be your supplier and your responsible helper?
 
Could you please put us in contact with your purchasing team in case the mail goes into the spam folder?
 
We are specialized in Beanies, Snapbacks, Dad Hats, Baseball Caps, Trucker Hats, etc.
 
Besides caps, we can also supply Cycling Apparel, Swimwear, Socks, Hoodies, Sweaters, Scarves, Tees, Jackets, Pants, Underwear, Bags, etc.
 
Our benefits:

Good Quality & Competitive Price
Low MOQ & Fast Turn-around Time
PayPal & Wire Transfer(T/T) Accepted
One-stop sourcing for all your products

 
Are there any products you are interested in?
 
Would you please give us more information about the size, quantity, and designs?
 
It would help us to give you the price.
 
Any ideas, please let us know. 
 
Wishing you a booming business!
 
Best regards!
 
Dorothy
WELLSUCCEED EMBROIDERY LIMITED
Phone/Whatsapp: +86 182 2317 2310
(One-Stop Sourcing of  custom-made clothing, headwear, footwear, yogawear, gymwear, golfwear, band merch, and fans merch )
ADD: 4/F, 38 SOUTH XIHU 3RD ROAD, SHILONG, DONGGUAN,GUANGDONG. 523325. CHINA.



Re: [PATCH v5] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-11 Thread Bill Wendling
On Wed, Sep 11, 2024 at 2:13 PM Qing Zhao  wrote:
>
> compared to the 4th version, the changes are (address Jacub's concerns):
>
> 1. change the global "in_builtin_counted_by_ref" from a boolean to an int;
> 2. delete the label for the error handling code, and decress the global
>"in_builtin_counted_by_ref" before each break;
>
> the 4th version compared to the 3rd version, the only change is the
> size calculation in the testing case.
>
> The 3rd version compared to the 2nd version, the major change is:
> the update in testing cases per Martin's suggestions.
>
> when the 2nd version is compared to the first version, the major changes are:
>
> 1. change the name of the builtin from __builtin_get_counted_by to
> __builtin_counted_by_ref in order to reflect the fact that the returned
> value of it is a reference to the object.
>
> 2. make typeof(__builtin_counted_by_ref) working.
>
> 3. update the testing case to use the new builtin inside _Generic.
>
> bootstrapped and regress tested on both X86 and aarch64. no issue.
>
> Okay for the trunk?
>
> thanks.
>
> Qing.
>
> ==
>
> With the addition of the 'counted_by' attribute and its wide roll-out
> within the Linux kernel, a use case has been found that would be very
> nice to have for object allocators: being able to set the counted_by
> counter variable without knowing its name.
>
> For example, given:
>
>   struct foo {
> ...
> int counter;
> ...
> struct bar array[] __attribute__((counted_by (counter)));
>   } *p;
>
> The existing Linux object allocators are roughly:
>
>   #define MAX(A, B) (A > B) ? (A) : (B)
>   #define alloc(P, FAM, COUNT) ({ \
> __auto_type __p = &(P); \
> size_t __size = MAX (sizeof(*P),
>  __builtin_offsetof (__typeof(*P), FAM)
>  + sizeof (*(P->FAM)) * COUNT); \
> *__p = kmalloc(__size); \
>   })
>
> Right now, any addition of a counted_by annotation must also
> include an open-coded assignment of the counter variable after
> the allocation:
>
>   p = alloc(p, array, how_many);
>   p->counter = how_many;
>
> In order to avoid the tedious and error-prone work of manually adding
> the open-coded counted-by intializations everywhere in the Linux
> kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
> to be added to help the adoption of the counted-by attribute.
>
>  -- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
>  The built-in function '__builtin_counted_by_ref' checks whether the
>  array object pointed by the pointer PTR has another object
>  associated with it that represents the number of elements in the
>  array object through the 'counted_by' attribute (i.e.  the
>  counted-by object).  If so, returns a pointer to the corresponding
>  counted-by object.  If such counted-by object does not exist,
>  returns a NULL pointer.
>
>  This built-in function is only available in C for now.
>
>  The argument PTR must be a pointer to an array.  The TYPE of the
>  returned value must be a pointer type pointing to the corresponding
>  type of the counted-by object or VOID pointer type in case of a
>  NULL pointer being returned.
>
> With this new builtin, the central allocator could be updated to:
>
>   #define MAX(A, B) (A > B) ? (A) : (B)
>   #define alloc(P, FAM, COUNT) ({ \
> __auto_type __p = &(P); \
> __auto_type __c = (COUNT); \
> size_t __size = MAX (sizeof (*(*__p)),\
>  __builtin_offsetof (__typeof(*(*__p)),FAM) \
>  + sizeof (*((*__p)->FAM)) * __c); \
> if ((*__p = kmalloc(__size))) { \
>   __auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
>   *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
> } \
>   })
>
> And then structs can gain the counted_by attribute without needing
> additional open-coded counter assignments for each struct, and
> unannotated structs could still use the same allocator.
>
> PR c/116016
>
> gcc/c-family/ChangeLog:
>
> * c-common.cc: Add new __builtin_counted_by_ref.
> * c-common.h (enum rid): Add RID_BUILTIN_COUNTED_BY_REF.
>
> gcc/c/ChangeLog:
>
> * c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
> * c-parser.cc (has_counted_by_object): New routine.
> (get_counted_by_ref): New routine.
> (c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
> * c-tree.h: New global in_builtin_counted_by_ref.
> * c-typeck.cc (build_component_ref): Enable generating
> .ACCESS_WITH_SIZE inside typeof when inside builtin_counted_by_ref.
>
> gcc/ChangeLog:
>
> * doc/extend.texi: Add documentation for __builtin_counted_by_ref.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/builtin-counted-by-ref-1.c: New test.
> * gcc.dg/builtin-counted-by-ref.c: New test.
> ---
>  gcc/c-family/c-common.cc  |

Re: [PATCH] libcpp, c-family, v4: Add (dumb) C23 N3017 #embed support [PR105863]

2024-09-11 Thread Joseph Myers
On Fri, 30 Aug 2024, Jakub Jelinek wrote:

> Here is an updated version of the patch which uses a new flag in lang_flags
> to control this.
> I haven't touched the macro expansion or lack thereof from the earlier
> version though.

This patch is OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] libcpp, v3: Add support for gnu::offset #embed/__has_embed parameter

2024-09-11 Thread Joseph Myers
On Fri, 30 Aug 2024, Jakub Jelinek wrote:

> On Fri, Aug 16, 2024 at 04:58:58PM +, Joseph Myers wrote:
> > On Thu, 15 Aug 2024, Jakub Jelinek wrote:
> > 
> > > +   else
> > > + {
> > > +   if (res > INTTYPE_MAXIMUM (off_t))
> > > + cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
> > > +  "too large 'gnu::offset' argument");
> > 
> > Having a testcase for this diagnostic would be a good idea.  Also one for 
> > a negative argument for gnu::offset (the errors for negative arguments are 
> > already tested for limit, but I think testing that for gnu::offset is a 
> > good idea as well).
> 
> Here is an updated patch which does that:

This patch is OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] libcpp, v4: Add support for gnu::base64 #embed parameter

2024-09-11 Thread Joseph Myers
On Fri, 30 Aug 2024, Jakub Jelinek wrote:

> +should be no newlines in the string literal and because this parameter
> +is meant namely for use by the preprocessor itself, there is no support
> +for any escape sequences in the string literal argument.  If 
> @code{gnu::base64}

Given the "no escape sequences" rule, I think there should be a test for 
that - testing rejection of a string that would be valid if escape 
sequences were processed (for example, valid base64 but with the 
individual characters encoded using \x), but is not valid because they are 
not processed.  As far as I can see, the existing tests with escape 
sequences are invalid for other reasons (they use \n as the escape 
sequence).

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-11 Thread Pengxuan Zheng
SVE's INDEX instruction can be used to populate vectors by values starting from
"base" and incremented by "step" for each subsequent value. We can take
advantage of it to generate vector constants if TARGET_SVE is available and the
base and step values are within [-16, 15].

For example, with the following function:

typedef int v4si __attribute__ ((vector_size (16)));
v4si
f_v4si (void)
{
  return (v4si){ 0, 1, 2, 3 };
}

GCC currently generates:

f_v4si:
adrpx0, .LC4
ldr q0, [x0, #:lo12:.LC4]
ret

.LC4:
.word   0
.word   1
.word   2
.word   3

With this patch, we generate an INDEX instruction instead if TARGET_SVE is
available.

f_v4si:
index   z0.s, #0, #1
ret

PR target/113328

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_simd_valid_immediate): Improve
handling of some ADVSIMD vectors by using SVE's INDEX if TARGET_SVE is
available.
(aarch64_output_simd_mov_immediate): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
SVE's INDEX instruction.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_3.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64.cc | 12 ++-
 .../aarch64/sve/acle/general/dupq_1.c |  3 +-
 .../aarch64/sve/acle/general/dupq_2.c |  3 +-
 .../aarch64/sve/acle/general/dupq_3.c |  3 +-
 .../aarch64/sve/acle/general/dupq_4.c |  3 +-
 .../gcc.target/aarch64/sve/vec_init_3.c   | 99 +++
 6 files changed, 114 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_3.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 27e24ba70ab..6b3ca57d0eb 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22991,7 +22991,7 @@ aarch64_simd_valid_immediate (rtx op, 
simd_immediate_info *info,
   if (CONST_VECTOR_P (op)
   && CONST_VECTOR_DUPLICATE_P (op))
 n_elts = CONST_VECTOR_NPATTERNS (op);
-  else if ((vec_flags & VEC_SVE_DATA)
+  else if (which == AARCH64_CHECK_MOV && TARGET_SVE
   && const_vec_series_p (op, &base, &step))
 {
   gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
@@ -25249,6 +25249,16 @@ aarch64_output_simd_mov_immediate (rtx const_vector, 
unsigned width,
 
   if (which == AARCH64_CHECK_MOV)
 {
+  if (info.insn == simd_immediate_info::INDEX)
+   {
+ gcc_assert (TARGET_SVE);
+ snprintf (templ, sizeof (templ), "index\t%%Z0.%c, #"
+   HOST_WIDE_INT_PRINT_DEC ", #" HOST_WIDE_INT_PRINT_DEC,
+   element_char, INTVAL (info.u.index.base),
+   INTVAL (info.u.index.step));
+ return templ;
+   }
+
   mnemonic = info.insn == simd_immediate_info::MVN ? "mvni" : "movi";
   shift_op = (info.u.mov.modifier == simd_immediate_info::MSL
  ? "msl" : "lsl");
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
index 216699b0536..0940bedd0dd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
@@ -10,7 +10,6 @@ dupq (int x)
   return svdupq_s32 (x, 1, 2, 3);
 }
 
-/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
+/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */
 /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
 /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
-/* { dg-final { scan-assembler {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
index d494943a275..218a6601337 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
@@ -10,7 +10,6 @@ dupq (int x)
   return svdupq_s32 (x, 1, 2, 3);
 }
 
-/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
+/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #3, #-1} } } */
 /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
 /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
-/* { dg-final { scan-assembler {\t\.word\t3\n\t\.word\t2\n\t\.word\t1\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c
index 4bc8259df07..245d43b75b5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_3.c
@@ -10,7 +10,6 @@ du

[PATCH 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

2024-09-11 Thread Pengxuan Zheng
We can still use SVE's INDEX instruction to construct vectors even if not all
elements are constants. For example, { 0, x, 2, 3 } can be constructed by first
using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, and then set the elements which
are non-constants separately.

PR target/113328

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
Improve part-variable vector generation with SVE's INDEX if TARGET_SVE
is available.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
check-function-bodies.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_4.c: New test.
* gcc.target/aarch64/sve/vec_init_5.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64.cc | 81 ++-
 .../aarch64/sve/acle/general/dupq_1.c | 12 ++-
 .../aarch64/sve/acle/general/dupq_2.c | 12 ++-
 .../aarch64/sve/acle/general/dupq_3.c | 12 ++-
 .../aarch64/sve/acle/general/dupq_4.c | 12 ++-
 .../gcc.target/aarch64/sve/vec_init_4.c   | 47 +++
 .../gcc.target/aarch64/sve/vec_init_5.c   | 12 +++
 7 files changed, 171 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6b3ca57d0eb..7305a5c6375 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx target, rtx 
vals)
   if (n_var != n_elts)
 {
   rtx copy = copy_rtx (vals);
+  bool is_index_seq = false;
+
+  /* If at least half of the elements of the vector are constants and all
+these constant elements form a linear sequence of the form { B, B + S,
+B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's
+INDEX instruction if SVE is available and then set the elements which
+are not constant separately.  More precisely, each constant element I
+has to be B + I * S where B and S must be valid immediate operand for
+an SVE INDEX instruction.
+
+For example, { X, 1, 2, 3} is a vector satisfying these conditions and
+we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first
+and then set the first element of the vector to X.  */
+
+  if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+ && n_var <= n_elts / 2)
+   {
+ int const_idx = -1;
+ HOST_WIDE_INT const_val = 0;
+ int base = 16;
+ int step = 16;
+
+ for (int i = 0; i < n_elts; ++i)
+   {
+ rtx x = XVECEXP (vals, 0, i);
+
+ if (!CONST_INT_P (x))
+   continue;
+
+ if (const_idx == -1)
+   {
+ const_idx = i;
+ const_val = INTVAL (x);
+   }
+ else
+   {
+ if ((INTVAL (x) - const_val) % (i - const_idx) == 0)
+   {
+ HOST_WIDE_INT s
+ = (INTVAL (x) - const_val) / (i - const_idx);
+ if (s >= -16 && s <= 15)
+   {
+ int b = const_val - s * const_idx;
+ if (b >= -16 && b <= 15)
+   {
+ base = b;
+ step = s;
+   }
+   }
+   }
+ break;
+   }
+   }
+
+ if (base != 16
+ && (!CONST_INT_P (v0)
+ || (CONST_INT_P (v0) && INTVAL (v0) == base)))
+   {
+ if (!CONST_INT_P (v0))
+   XVECEXP (copy, 0, 0) = GEN_INT (base);
+
+ is_index_seq = true;
+ for (int i = 1; i < n_elts; ++i)
+   {
+ rtx x = XVECEXP (copy, 0, i);
+
+ if (CONST_INT_P (x))
+   {
+ if (INTVAL (x) != base + i * step)
+   {
+ is_index_seq = false;
+ break;
+   }
+   }
+ else
+   XVECEXP (copy, 0, i) = GEN_INT (base + i * step);
+   }
+   }
+   }
 
   /* Load constant part of vector.  We really don't care what goes into the
 parts we will overwrite, but we're more likely to be able to load the
 constant efficiently if it has fewer, larger, repeating parts
 (see aarch64_simd_valid_immediate).  */
-  for (int i = 0;

RE: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-11 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > SVE's INDEX instruction can be used to populate vectors by values
> > starting from "base" and incremented by "step" for each subsequent
> > value. We can take advantage of it to generate vector constants if
> > TARGET_SVE is available and the base and step values are within [-16, 15].
> >
> > For example, with the following function:
> >
> > typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si
> > (void) {
> >   return (v4si){ 0, 1, 2, 3 };
> > }
> >
> > GCC currently generates:
> >
> > f_v4si:
> > adrpx0, .LC4
> > ldr q0, [x0, #:lo12:.LC4]
> > ret
> >
> > .LC4:
> > .word   0
> > .word   1
> > .word   2
> > .word   3
> >
> > With this patch, we generate an INDEX instruction instead if
> > TARGET_SVE is available.
> >
> > f_v4si:
> > index   z0.s, #0, #1
> > ret
> >
> > [...]
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 9e12bd9711c..01bfb8c52e4 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op,
> simd_immediate_info *info,
> >if (CONST_VECTOR_P (op)
> >&& CONST_VECTOR_DUPLICATE_P (op))
> >  n_elts = CONST_VECTOR_NPATTERNS (op);
> > -  else if ((vec_flags & VEC_SVE_DATA)
> > -  && const_vec_series_p (op, &base, &step))
> > +  else if (TARGET_SVE && const_vec_series_p (op, &base, &step))
> 
> I think we need to check which == AARCH64_CHECK_MOV too.  (Previously
> that wasn't necessary, because native SVE only uses this routine for moves.)
> 
> FTR: I was initially a bit nervous about testing TARGET_SVE without looking at
> vec_flags at all.  But looking at the previous handling of predicates and
> structures, I agree it looks like the correct thing to do.
> 
> >  {
> >gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> >if (!aarch64_sve_index_immediate_p (base) [...] diff --git
> > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > index 216699b0536..3d6a0160f95 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > @@ -10,7 +10,6 @@ dupq (int x)
> >return svdupq_s32 (x, 1, 2, 3);
> >  }
> >
> > -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> > +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */
> >  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
> >  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n}
> > } } */
> > -/* { dg-final { scan-assembler
> > {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */
> 
> This seems to be a regression of sorts.  Previously we had:
> 
> adrpx1, .LC0
> ldr q0, [x1, #:lo12:.LC0]
> ins v0.s[0], w0
> dup z0.q, z0.q[0]
> 
> whereas now we have:
> 
> moviv0.2s, 0x2
> index   z31.s, #1, #2
> ins v0.s[0], w0
> zip1v0.4s, v0.4s, v31.4s
> dup z0.q, z0.q[0]
> 
> I think we should try to aim for:
> 
> index   z0.s, #0, #1
> ins v0.s[0], w0
> dup z0.q, z0.q[0]
> 
> instead.

Thanks for the feedback, Richard!

I've added support to handle vectors with non-constant elements. I've split 
that change into a separate patch. Please let me know if you have any comments.

[PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX 
instruction [PR113328]
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662842.html

[PATCH 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX 
instruction [PR113328]
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662843.html

Thanks,
Pengxuan
> 
> > [...]
> > +/*
> > +** g_v4si:
> > +** index   z0\.s, #3, #\-4
> 
> The backslash looks redundant here.
> 
> Thanks,
> Richard
> 
> > +** ret
> > +*/
> > +v4si
> > +g_v4si (void)
> > +{
> > +  return (v4si){ 3, -1, -5, -9 };
> > +}


Re: [PATCH v5] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-11 Thread Qing Zhao


> On Sep 11, 2024, at 18:16, Bill Wendling  wrote:
> 
> On Wed, Sep 11, 2024 at 2:13 PM Qing Zhao  wrote:
>> 
>> compared to the 4th version, the changes are (address Jacub's concerns):
>> 
>> 1. change the global "in_builtin_counted_by_ref" from a boolean to an int;
>> 2. delete the label for the error handling code, and decress the global
>>   "in_builtin_counted_by_ref" before each break;
>> 
>> the 4th version compared to the 3rd version, the only change is the
>> size calculation in the testing case.
>> 
>> The 3rd version compared to the 2nd version, the major change is:
>> the update in testing cases per Martin's suggestions.
>> 
>> when the 2nd version is compared to the first version, the major changes are:
>> 
>> 1. change the name of the builtin from __builtin_get_counted_by to
>> __builtin_counted_by_ref in order to reflect the fact that the returned
>> value of it is a reference to the object.
>> 
>> 2. make typeof(__builtin_counted_by_ref) working.
>> 
>> 3. update the testing case to use the new builtin inside _Generic.
>> 
>> bootstrapped and regress tested on both X86 and aarch64. no issue.
>> 
>> Okay for the trunk?
>> 
>> thanks.
>> 
>> Qing.
>> 
>> ==
>> 
>> With the addition of the 'counted_by' attribute and its wide roll-out
>> within the Linux kernel, a use case has been found that would be very
>> nice to have for object allocators: being able to set the counted_by
>> counter variable without knowing its name.
>> 
>> For example, given:
>> 
>>  struct foo {
>>...
>>int counter;
>>...
>>struct bar array[] __attribute__((counted_by (counter)));
>>  } *p;
>> 
>> The existing Linux object allocators are roughly:
>> 
>>  #define MAX(A, B) (A > B) ? (A) : (B)
>>  #define alloc(P, FAM, COUNT) ({ \
>>__auto_type __p = &(P); \
>>size_t __size = MAX (sizeof(*P),
>> __builtin_offsetof (__typeof(*P), FAM)
>> + sizeof (*(P->FAM)) * COUNT); \
>>*__p = kmalloc(__size); \
>>  })
>> 
>> Right now, any addition of a counted_by annotation must also
>> include an open-coded assignment of the counter variable after
>> the allocation:
>> 
>>  p = alloc(p, array, how_many);
>>  p->counter = how_many;
>> 
>> In order to avoid the tedious and error-prone work of manually adding
>> the open-coded counted-by intializations everywhere in the Linux
>> kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
>> to be added to help the adoption of the counted-by attribute.
>> 
>> -- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
>> The built-in function '__builtin_counted_by_ref' checks whether the
>> array object pointed by the pointer PTR has another object
>> associated with it that represents the number of elements in the
>> array object through the 'counted_by' attribute (i.e.  the
>> counted-by object).  If so, returns a pointer to the corresponding
>> counted-by object.  If such counted-by object does not exist,
>> returns a NULL pointer.
>> 
>> This built-in function is only available in C for now.
>> 
>> The argument PTR must be a pointer to an array.  The TYPE of the
>> returned value must be a pointer type pointing to the corresponding
>> type of the counted-by object or VOID pointer type in case of a
>> NULL pointer being returned.
>> 
>> With this new builtin, the central allocator could be updated to:
>> 
>>  #define MAX(A, B) (A > B) ? (A) : (B)
>>  #define alloc(P, FAM, COUNT) ({ \
>>__auto_type __p = &(P); \
>>__auto_type __c = (COUNT); \
>>size_t __size = MAX (sizeof (*(*__p)),\
>> __builtin_offsetof (__typeof(*(*__p)),FAM) \
>> + sizeof (*((*__p)->FAM)) * __c); \
>>if ((*__p = kmalloc(__size))) { \
>>  __auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
>>  *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
>>} \
>>  })
>> 
>> And then structs can gain the counted_by attribute without needing
>> additional open-coded counter assignments for each struct, and
>> unannotated structs could still use the same allocator.
>> 
>>PR c/116016
>> 
>> gcc/c-family/ChangeLog:
>> 
>>* c-common.cc: Add new __builtin_counted_by_ref.
>>* c-common.h (enum rid): Add RID_BUILTIN_COUNTED_BY_REF.
>> 
>> gcc/c/ChangeLog:
>> 
>>* c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
>>* c-parser.cc (has_counted_by_object): New routine.
>>(get_counted_by_ref): New routine.
>>(c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
>>* c-tree.h: New global in_builtin_counted_by_ref.
>>* c-typeck.cc (build_component_ref): Enable generating
>>.ACCESS_WITH_SIZE inside typeof when inside builtin_counted_by_ref.
>> 
>> gcc/ChangeLog:
>> 
>>* doc/extend.texi: Add documentation for __builtin_counted_by_ref.
>> 
>> gcc/testsuite/ChangeLog:
>> 

RE: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

2024-09-11 Thread Li, Pan2
Committed, thanks Juzhe and garthlei.

Pan

From: 钟居哲 
Sent: Wednesday, September 11, 2024 7:36 PM
To: gcc-patches 
Cc: Li, Pan2 ; Robin Dapp ; jeffreyalaw 
; kito.cheng 
Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

Hi, garthlei.
Thanks for fixing it.

I see, you are trying to fix this bug:

lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,2,e8,mf8,ta,ma   ---> It should be a4, 2 instead 
of zero, 2
vle64.v v1,0(a5)
--- missing vsetvli a4, a4 here
sllia4,a4,1
vsetvli zero,a4,e32,m1,ta,ma
li  a2,-1
addia5,a5,16
vslide1down.vx  v1,v1,a2
vslide1down.vx  v1,v1,zero
vsetivlizero,2,e64,m1,ta,ma
vse64.v v1,0(a5)
ret

When I revisit the codes here:

m_vl = ::get_vl
...
update_avl -> "m_vl" variable is modified
...
using wrong m_vl in the following.

A dedicated temporary variable dest_vl looks reasonable here.

LGTM.

The RISC-V folks will commit this patch for you.
Thanks.

juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2024-09-11 19:29
To: juzhe.zh...@rivai.ai
Subject: FW: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl 
pass
FYI.

-Original Message-
From: garthlei mailto:garth...@linux.alibaba.com>>
Sent: Wednesday, September 11, 2024 5:10 PM
To: gcc-patches mailto:gcc-patches@gcc.gnu.org>>
Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

This patch fixes a bug in the current vsetvl pass.  The current pass uses
`m_vl` to determine whether the dest operand has been used by non-RVV
instructions.  However, `m_vl` may have been modified as a result of an
`update_avl` call, and thus would be no longer the dest operand of the
original instruction.  This can lead to incorrect vsetvl eliminations, as is
shown in the testcase.  In this patch, we create a `dest_vl` variable for
this scenerio.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc| 16 +++-
.../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c  | 17 +
2 files changed, 28 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 017efa8bc17..ce831685439 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1002,6 +1002,9 @@ public:
   void parse_insn (insn_info *insn)
   {
+/* The VL dest of the insn */
+rtx dest_vl = NULL_RTX;
+
 m_insn = insn;
 m_bb = insn->bb ();
 /* Return if it is debug insn for the consistency with optimize == 0.  */
@@ -1035,7 +1038,10 @@ public:
 if (m_avl)
   {
if (vsetvl_insn_p (insn->rtl ()) || has_vlmax_avl ())
-   m_vl = ::get_vl (insn->rtl ());
+   {
+ m_vl = ::get_vl (insn->rtl ());
+ dest_vl = m_vl;
+   }
if (has_nonvlmax_reg_avl ())
  m_avl_def = find_access (insn->uses (), REGNO (m_avl))->def ();
@@ -1132,22 +1138,22 @@ public:
   }
 /* Determine if dest operand(vl) has been used by non-RVV instructions.  */
-if (has_vl ())
+if (dest_vl)
   {
const hash_set vl_uses
-   = get_all_real_uses (get_insn (), REGNO (get_vl ()));
+   = get_all_real_uses (get_insn (), REGNO (dest_vl));
for (use_info *use : vl_uses)
  {
gcc_assert (use->insn ()->is_real ());
rtx_insn *rinsn = use->insn ()->rtl ();
if (!has_vl_op (rinsn)
- || count_regno_occurrences (rinsn, REGNO (get_vl ())) != 1)
+ || count_regno_occurrences (rinsn, REGNO (dest_vl)) != 1)
  {
m_vl_used_by_non_rvv_insn = true;
break;
  }
rtx avl = ::get_avl (rinsn);
- if (!avl || !REG_P (avl) || REGNO (get_vl ()) != REGNO (avl))
+ if (!avl || !REG_P (avl) || REGNO (dest_vl) != REGNO (avl))
  {
m_vl_used_by_non_rvv_insn = true;
break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
new file mode 100644
index 000..c155f5613d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O2 -fdump-rtl-vsetvl-details" } 
*/
+
+#include 
+
+uint64_t a[2], b[2];
+
+void
+foo ()
+{
+  size_t vl = __riscv_vsetvl_e64m1 (2);
+  vuint64m1_t vx = __riscv_vle64_v_u64m1 (a, vl);
+  vx = __riscv_vslide1down_vx_u64m1 (vx, 0xull, vl);
+  __riscv_vse64_v_u64m1 (b, vx, vl);
+}
+
+/* { dg-final { scan-rtl-dump-not "Eliminate insn" "vsetvl" } }  */
--
2.17.1



RE: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-11 Thread Li, Pan2
Thanks Richard for comments.

> why would arg_edge depend on whether t0 is INTEGER_CST or not?
Because the edge->src of INTEGER_CST points to the cond block which cannot 
match the 
edge->dest of the cond_block. For example as below, the first arg of PHI is 
255(2), which 
cannot match neither goto  nor goto .

Thus, I need to take the second arg, aka _1(3) to match the edge->dest of 
cond_block.
Aka the phi arg edge->src == cond_block edge->dest. In below example,
the goto matches _1(3) with false condition, and then I can locate the 
edge from b2 -> b3.

Or is there any better approach for this scenario?

   4   │ __attribute__((noinline))
   5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
   6   │ {
   7   │   unsigned char _1;
   8   │   unsigned char _2;
   9   │   uint8_t _3;
  10   │   __complex__ unsigned char _5;
  11   │
  12   │ ;;   basic block 2, loop depth 0
  13   │ ;;pred:   ENTRY
  14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
  15   │   _2 = IMAGPART_EXPR <_5>;
  16   │   if (_2 != 0)
  17   │ goto ; [35.00%]
  18   │   else
  19   │ goto ; [65.00%]
  20   │ ;;succ:   3
  21   │ ;;4
  22   │
  23   │ ;;   basic block 3, loop depth 0
  24   │ ;;pred:   2
  25   │   _1 = REALPART_EXPR <_5>;
  26   │ ;;succ:   4
  27   │
  28   │ ;;   basic block 4, loop depth 0
  29   │ ;;pred:   2
  30   │ ;;3
  31   │   # _3 = PHI <255(2), _1(3)>
  32   │   return _3;
  33   │ ;;succ:   EXIT
  34   │
  35   │ }

> Can you instead inline match_control_flow_graph_case_0 and _1 and do the
> argument assignment within the three cases of CFGs we accept?  That
> would be much easier to follow.

To double confirm, are you suggest inline the cfg match for both the case_0 and 
case_1?
That may make func body grows, and we may have more cases like case_2, 
case_3... etc.
If so, I will inline this to match_cond_with_binary_phi in v4.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, September 11, 2024 9:39 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for 
true/false arg

On Wed, Sep 11, 2024 at 8:31 AM  wrote:
>
> From: Pan Li 
>
> When matching the cond with 2 args phi node, we need to figure out
> which arg of phi node comes from the true edge of cond block, as
> well as the false edge.  This patch would like to add interface
> to perform the action and return the true and false arg in TREE type.
>
> There will be some additional handling if one of the arg is INTEGER_CST.
> Because the INTEGER_CST args may have no source block, thus its' edge
> source points to the condition block.  See below example in line 31,
> the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> For example, the _1(3) takes block 3 as source, which is the dest
> of false edge of the condition block.
>
>4   │ __attribute__((noinline))
>5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
>6   │ {
>7   │   unsigned char _1;
>8   │   unsigned char _2;
>9   │   uint8_t _3;
>   10   │   __complex__ unsigned char _5;
>   11   │
>   12   │ ;;   basic block 2, loop depth 0
>   13   │ ;;pred:   ENTRY
>   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
>   15   │   _2 = IMAGPART_EXPR <_5>;
>   16   │   if (_2 != 0)
>   17   │ goto ; [35.00%]
>   18   │   else
>   19   │ goto ; [65.00%]
>   20   │ ;;succ:   3
>   21   │ ;;4
>   22   │
>   23   │ ;;   basic block 3, loop depth 0
>   24   │ ;;pred:   2
>   25   │   _1 = REALPART_EXPR <_5>;
>   26   │ ;;succ:   4
>   27   │
>   28   │ ;;   basic block 4, loop depth 0
>   29   │ ;;pred:   2
>   30   │ ;;3
>   31   │   # _3 = PHI <255(2), _1(3)>
>   32   │   return _3;
>   33   │ ;;succ:   EXIT
>   34   │
>   35   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * gimple-match-head.cc (match_cond_with_binary_phi): Add new func
> impl to match binary phi for true and false arg.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/gimple-match-head.cc | 60 
>  1 file changed, 60 insertions(+)
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index c51728ae742..64f4f28cc72 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -490,3 +490,63 @@ match_control_flow_graph_case_1 (basic_block b3, 
> basic_block *b_out)
>*b_out = b0;
>return true;
>  }
> +
> +/*
> + * Return the relevant gcond * of the given phi, as well as the true
> + * and false TREE args of the phi.  Or return NULL

Ping: [PATCH v3 1/2] c++: improve location of parsed RETURN_EXPRs

2024-09-11 Thread Arsen Arsenović
Jason Merrill  writes:

>> This is the new output - the diagnostics no longer expand that macro,
>> since the location is not wholly contained within it.  The relevant part
>> of the diagnostics after the change is:
>> --8<---cut here---start->8---
>>|
>>  'const char* inner(int)': event 5 (depth 3)
>>|
>>| return NULL;
>>|
>> --8<---cut here---end--->8---
>> ... as opposed to (excuse the quote difference - the former was pulled
>> from g++.log and the latter from a manual invocation):
>> --8<---cut here---start->8---
>>|
>>  ‘const char* inner(int)’: event 5 (depth 3)
>>|
>>
>> |/home/arsen/gcc-mine/gcc/testsuite/c-c++-common/analyzer/../../gcc.dg/analyzer/analyzer-decls.h:7:14:
>>| #define NULL nullptr
>>|  ^~~
>>|  |
>>|  (5) ...to here
>> /home/arsen/gcc-mine/gcc/testsuite/c-c++-common/analyzer/inlining-4-multiline.c:14:12:
>>  note: in expansion of macro ‘NULL’
>>| return NULL;
>>|^~~~
>>|
>> --8<---cut here---end--->8---
>> ... presumably, the diagnostics chose to elide those bits of output due
>> to the new location covering the entire line (and hence not being too
>> informative) - but I haven't debugged that (as I assumed the diagnostic
>> code is DTRT now).
>
> It seems weird to lose the "...to here" marking on the return.  Any thoughts,
> David?

Gentle ping on this patch/question.

TIA, have a lovely night.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


[PATCH] JSON dumping for GENERIC trees

2024-09-11 Thread tcpreimesberger
From: Thor C Preimesberger 

This patch allows the compiler to dump GENERIC trees as JSON objects.

The dump flag -fdump-tree-original-json dumps each fndecl node in the 
C frontend's gimplifier as a JSON object and traverses related nodes 
in an analagous manner as to raw-dumping.

Some JSON parsers expect for there to be a single JSON value per file -
the following shell command makes the output conformant:

  tr -d '\n ' < out.json | sed -e 's/\]\[/,/g' | sed -e 's/}{/},{/g'
 
There is also a debug function that simply prints a node as formatted JSON to
stdout.

The information in the dumped JSON is meant to be an amalgation of 
tree-pretty-print.cc's dump_generic_node and print-tree.cc's debug_tree.

Bootstrapped and tested on x86_64-pc-linux-gnu without issue.

ChangeLog:
* gcc/Makefile.in: Link tree-emit-json.o to c-gimplify.o
* gcc/c-family/c-gimplify.cc (c_genericize): Hook for
-fdump-tree-original-json
* gcc/dumpfile.cc: Include tree-emit-json.h to expose
node_emit_json and debug_tree_json. Also new headers needed for
json.h being implicitly exposed
* gcc/dumpfile.h (dump_flag): New dump flag TDF_JSON
* gcc/tree-emit-json.cc: Logic for converting a tree to JSON
and dumping.
* gcc/tree-emit-json.h: Ditto

Signed-off-by: Thor C Preimesberger 

---
 gcc/Makefile.in|2 +
 gcc/c-family/c-gimplify.cc |   30 +-
 gcc/cp/dump.cc |1 +
 gcc/dumpfile.cc|3 +
 gcc/dumpfile.h |6 +
 gcc/tree-emit-json.cc  | 3155 
 gcc/tree-emit-json.h   |   82 +
 7 files changed, 3268 insertions(+), 11 deletions(-)
 create mode 100644 gcc/tree-emit-json.cc
 create mode 100644 gcc/tree-emit-json.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 68fda1a7591..b65cc7f0ad5 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1042,6 +1042,7 @@ OPTS_H = $(INPUT_H) $(VEC_H) opts.h $(OBSTACK_H)
 SYMTAB_H = $(srcdir)/../libcpp/include/symtab.h $(OBSTACK_H)
 CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h
 TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
+TREE_EMIT_JSON_H = tree-emit-json.h $(SPLAY_TREE_H) $(DUMPFILE_H) json.h
 TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
 TREE_SSA_H = tree-ssa.h tree-ssa-operands.h \
$(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
@@ -1709,6 +1710,7 @@ OBJS = \
tree-diagnostic.o \
tree-diagnostic-client-data-hooks.o \
tree-dump.o \
+   tree-emit-json.o \
tree-eh.o \
tree-emutls.o \
tree-if-conv.o \
diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 3e29766e092..8b0c80f4f75 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -43,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "tree-pass.h"
 #include "internal-fn.h"
+#include "tree-emit-json.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
@@ -629,20 +631,26 @@ c_genericize (tree fndecl)
   local_dump_flags = dfi->pflags;
   if (dump_orig)
 {
-  fprintf (dump_orig, "\n;; Function %s",
-  lang_hooks.decl_printable_name (fndecl, 2));
-  fprintf (dump_orig, " (%s)\n",
-  (!DECL_ASSEMBLER_NAME_SET_P (fndecl) ? "null"
-   : IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl;
-  fprintf (dump_orig, ";; enabled by -%s\n", dump_flag_name 
(TDI_original));
-  fprintf (dump_orig, "\n");
-
-  if (local_dump_flags & TDF_RAW)
-   dump_node (DECL_SAVED_TREE (fndecl),
+  if (local_dump_flags & TDF_JSON)
+   dump_node_json (DECL_SAVED_TREE (fndecl),
   TDF_SLIM | local_dump_flags, dump_orig);
   else
-   print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl));
+  {
+   fprintf (dump_orig, "\n;; Function %s",
+lang_hooks.decl_printable_name (fndecl, 2));
+   fprintf (dump_orig, " (%s)\n",
+   (!DECL_ASSEMBLER_NAME_SET_P (fndecl) ? "null"
+: IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl;
+   fprintf (dump_orig, ";; enabled by -%s\n", dump_flag_name 
(TDI_original));
+   fprintf (dump_orig, "\n");
+   if (local_dump_flags & TDF_RAW)
+ dump_node (DECL_SAVED_TREE (fndecl),
+TDF_SLIM | local_dump_flags, dump_orig);
+   else
+ print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl));
+  
   fprintf (dump_orig, "\n");
+  }
 }
 
   /* Dump all nested functions now.  */
diff --git a/gcc/cp/dump.cc b/gcc/cp/dump.cc
index aafb62ffaa0..b1083de5f46 100644
--- a/gcc/cp/dump.cc
+++ b/gcc/cp/dump.cc
@@ -22,

[PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.cc (ix86_get_mask_mode):
Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
* config/i386/mmx.md (vec_cmpqi):
Implement vec_cmpv2bfqi and vec_cmpv4bfqi.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-vec_cmpbf.c: New test.
---
 gcc/config/i386/i386.cc   |  3 ++-
 gcc/config/i386/mmx.md| 17 
 .../gcc.target/i386/part-vect-vec_cmpbf.c | 26 +++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 45320124b91..7dbae1d72e3 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24682,7 +24682,8 @@ ix86_get_mask_mode (machine_mode data_mode)
   /* AVX512FP16 only supports vector comparison
 to kmask for _Float16.  */
   || (TARGET_AVX512VL && TARGET_AVX512FP16
- && GET_MODE_INNER (data_mode) == E_HFmode))
+ && GET_MODE_INNER (data_mode) == E_HFmode)
+  || (TARGET_AVX10_2_256 && GET_MODE_INNER (data_mode) == E_BFmode))
 {
   if (elem_size == 4
  || elem_size == 8
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 4bc191b874b..95d9356694a 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2290,6 +2290,23 @@
   DONE;
 })
 
+;;This instruction does not generate floating point exceptions
+(define_expand "vec_cmpqi"
+  [(set (match_operand:QI 0 "register_operand")
+   (match_operator:QI 1 ""
+ [(match_operand:VBF_32_64 2 "register_operand")
+  (match_operand:VBF_32_64 3 "nonimmediate_operand")]))]
+  "TARGET_AVX10_2_256"
+{
+  rtx op2 = lowpart_subreg (V8BFmode,
+   force_reg (mode, operands[2]), mode);
+  rtx op3 = lowpart_subreg (V8BFmode,
+   force_reg (mode, operands[3]), mode);
+
+  emit_insn (gen_vec_cmpv8bfqi (operands[0], operands[1], op2, op3));
+  DONE;
+})
+
 ;
 ;;
 ;; Parallel half-precision floating point rounding operations.
diff --git a/gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c 
b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c
new file mode 100644
index 000..0bb720b6432
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx10.2" } */
+/* { dg-final { scan-assembler-times "vcmppbf16" 10 } } */
+
+typedef __bf16 __attribute__((__vector_size__ (4))) v2bf;
+typedef __bf16 __attribute__((__vector_size__ (8))) v4bf;
+
+
+#define VCMPMN(type, op, name) \
+type  \
+__attribute__ ((noinline, noclone)) \
+vec_cmp_##type##type##name (type a, type b) \
+{ \
+  return a op b;  \
+}
+
+VCMPMN (v4bf, <, lt)
+VCMPMN (v2bf, <, lt)
+VCMPMN (v4bf, <=, le)
+VCMPMN (v2bf, <=, le)
+VCMPMN (v4bf, >, gt)
+VCMPMN (v2bf, >, gt)
+VCMPMN (v4bf, >=, ge)
+VCMPMN (v2bf, >=, ge)
+VCMPMN (v4bf, ==, eq)
+VCMPMN (v2bf, ==, eq)
-- 
2.31.1



Re: [PATCH] JSON dumping for GENERIC trees

2024-09-11 Thread Andrew Pinski
On Wed, Sep 11, 2024 at 6:51 PM  wrote:
>
> From: Thor C Preimesberger 
>
> This patch allows the compiler to dump GENERIC trees as JSON objects.
>
> The dump flag -fdump-tree-original-json dumps each fndecl node in the
> C frontend's gimplifier as a JSON object and traverses related nodes
> in an analagous manner as to raw-dumping.
>
> Some JSON parsers expect for there to be a single JSON value per file -
> the following shell command makes the output conformant:
>
>   tr -d '\n ' < out.json | sed -e 's/\]\[/,/g' | sed -e 's/}{/},{/g'
>
> There is also a debug function that simply prints a node as formatted JSON to
> stdout.
>
> The information in the dumped JSON is meant to be an amalgation of
> tree-pretty-print.cc's dump_generic_node and print-tree.cc's debug_tree.

I don't think this is a good idea and there is no obvious use case.
GIMPLE yes but not GENERIC.
Can you explain what the use case is for dumping generic as json. Also
you only hooked up the C and C++ family set of front-ends. Why not
hook up Fortran, Ada, Rust and go too? Why have it done in the
gimplifier?

Thanks,
Andrew

>
> Bootstrapped and tested on x86_64-pc-linux-gnu without issue.
>
> ChangeLog:
> * gcc/Makefile.in: Link tree-emit-json.o to c-gimplify.o
> * gcc/c-family/c-gimplify.cc (c_genericize): Hook for
> -fdump-tree-original-json
> * gcc/dumpfile.cc: Include tree-emit-json.h to expose
> node_emit_json and debug_tree_json. Also new headers needed for
> json.h being implicitly exposed
> * gcc/dumpfile.h (dump_flag): New dump flag TDF_JSON
> * gcc/tree-emit-json.cc: Logic for converting a tree to JSON
 > and dumping.
> * gcc/tree-emit-json.h: Ditto

A few comments about the changelog entry here.
it should be something like:
gcc/ChangeLog:
 * Makefile.in: ...

gcc/c-family/ChangeLog:
  * c-gimplify.cc ...

Also there is no testcase or indication on how you tested it.


>
> Signed-off-by: Thor C Preimesberger 
>
> ---
>  gcc/Makefile.in|2 +
>  gcc/c-family/c-gimplify.cc |   30 +-
>  gcc/cp/dump.cc |1 +
>  gcc/dumpfile.cc|3 +
>  gcc/dumpfile.h |6 +
>  gcc/tree-emit-json.cc  | 3155 
>  gcc/tree-emit-json.h   |   82 +
>  7 files changed, 3268 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/tree-emit-json.cc
>  create mode 100644 gcc/tree-emit-json.h
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 68fda1a7591..b65cc7f0ad5 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1042,6 +1042,7 @@ OPTS_H = $(INPUT_H) $(VEC_H) opts.h $(OBSTACK_H)
>  SYMTAB_H = $(srcdir)/../libcpp/include/symtab.h $(OBSTACK_H)
>  CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h
>  TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
> +TREE_EMIT_JSON_H = tree-emit-json.h $(SPLAY_TREE_H) $(DUMPFILE_H) json.h
>  TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
>  TREE_SSA_H = tree-ssa.h tree-ssa-operands.h \
> $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
> @@ -1709,6 +1710,7 @@ OBJS = \
> tree-diagnostic.o \
> tree-diagnostic-client-data-hooks.o \
> tree-dump.o \
> +   tree-emit-json.o \
> tree-eh.o \
> tree-emutls.o \
> tree-if-conv.o \
> diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> index 3e29766e092..8b0c80f4f75 100644
> --- a/gcc/c-family/c-gimplify.cc
> +++ b/gcc/c-family/c-gimplify.cc
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>  .  */
>
>  #include "config.h"
> +#define INCLUDE_MEMORY
>  #include "system.h"
>  #include "coretypes.h"
>  #include "tm.h"
> @@ -43,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "context.h"
>  #include "tree-pass.h"
>  #include "internal-fn.h"
> +#include "tree-emit-json.h"
>
>  /*  The gimplification pass converts the language-dependent trees
>  (ld-trees) emitted by the parser into language-independent trees
> @@ -629,20 +631,26 @@ c_genericize (tree fndecl)
>local_dump_flags = dfi->pflags;
>if (dump_orig)
>  {
> -  fprintf (dump_orig, "\n;; Function %s",
> -  lang_hooks.decl_printable_name (fndecl, 2));
> -  fprintf (dump_orig, " (%s)\n",
> -  (!DECL_ASSEMBLER_NAME_SET_P (fndecl) ? "null"
> -   : IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl;
> -  fprintf (dump_orig, ";; enabled by -%s\n", dump_flag_name 
> (TDI_original));
> -  fprintf (dump_orig, "\n");
> -
> -  if (local_dump_flags & TDF_RAW)
> -   dump_node (DECL_SAVED_TREE (fndecl),
> +  if (local_dump_flags & TDF_JSON)
> +   dump_node_json (DECL_SAVED_TREE (fndecl),
>TDF_SLIM | local_dump_flags, dump_orig);
>else
> -   print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl));
> +  {
> +   fprintf (dump_orig, "\n;; Function %s",
> 

Re: [PATCH v2] c++: ICE with TTP [PR96097]

2024-09-11 Thread Jason Merrill

On 9/11/24 4:36 PM, Marek Polacek wrote:

On Wed, Sep 11, 2024 at 11:26:56AM -0400, Jason Merrill wrote:

On 9/11/24 10:53 AM, Patrick Palka wrote:

On Wed, 11 Sep 2024, Patrick Palka wrote:


On Wed, 11 Sep 2024, Patrick Palka wrote:


On Wed, 4 Sep 2024, Marek Polacek wrote:


On Wed, Sep 04, 2024 at 10:58:25AM -0400, Jason Merrill wrote:

On 9/3/24 6:12 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?


The change to return bool seems like unrelated cleanup; please push that
separately on trunk only.


Done.

+ /* We can also have:
+
+ template  typename X>
+ void func() {}
+ template 
+ struct Y {};
+ void g() { func(); }
+
+where we are not in a template, but the type of PARM is T::type
+and dependent_type_p doesn't want to see a TEMPLATE_TYPE_PARM
+outside a template.  */


... so the patch LGTM, except I'd prefer to not have this comment
containing an embedded specific testcase.  IMHO it's "understood" that
processing_template_decl needs to be set when substituting using an
incomplete set of arguments since in that case the result must be
templated.


... and the comment might make this instance of the pattern seem more
like an exceptional case rather than a general rule, which paradoxically
could make the code seem more complex than it is at first glance.


Makes sense to me.


Thank you both.  Here's a version without the cleanups and the comment.

Ran dg.exp on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --
We crash when dependent_type_p gets a TEMPLATE_TYPE_PARM outside
a template.  That happens here because in

   template  typename X>
   void func() {}
   template 
   struct Y {};
   void g() { func(); }

when performing overload resolution for func() we have to check
if U matches T and I matches TT.  So we wind up in
coerce_template_template_parm/PARM_DECL.  TREE_TYPE (arg) is int
so we try to substitute TT's type, which is T::type.  But we have
nothing to substitute T with.  And we call make_typename_type where
ctx is still T, which checks dependent_scope_p and we trip the assert.

It should work to always perform the substitution in a template context.
If the result still contains template parameters, we cannot say if they
match.

PR c++/96097

gcc/cp/ChangeLog:

* pt.cc (coerce_template_template_parm): Increment
processing_template_decl before calling tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp44.C: New test.
---
  gcc/cp/pt.cc  |  2 ++
  gcc/testsuite/g++.dg/template/ttp44.C | 13 +
  2 files changed, 15 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/template/ttp44.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 310e5dfff03..e4de5451f19 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7951,7 +7951,9 @@ coerce_template_template_parm (tree parm, tree arg, 
tsubst_flags_t complain,
 i.e. the parameter list of TT depends on earlier parameters.  */
if (!uses_template_parms (TREE_TYPE (arg)))
{
+ ++processing_template_decl;
  tree t = tsubst (TREE_TYPE (parm), outer_args, complain, in_decl);
+ --processing_template_decl;
  if (!uses_template_parms (t)
  && !same_type_p (t, TREE_TYPE (arg)))
return false;
diff --git a/gcc/testsuite/g++.dg/template/ttp44.C 
b/gcc/testsuite/g++.dg/template/ttp44.C
new file mode 100644
index 000..2a412975243
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp44.C
@@ -0,0 +1,13 @@
+// PR c++/96097
+// { dg-do compile }
+
+template  class X>
+void func() {}
+
+template 
+struct Y {};
+
+void test()
+{
+  func();
+}

base-commit: 670cfd5fe6433ee8f2e86eedb197d2523dbb033b




Re: [PATCH v2] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-11 Thread Jason Merrill

On 9/11/24 4:08 PM, Marek Polacek wrote:

On Wed, Sep 11, 2024 at 01:19:53PM -0400, Jason Merrill wrote:

On 9/11/24 12:54 PM, Marek Polacek wrote:

+ auto_diagnostic_group d;
+ /* We used to emit a hard error, so this uses 0 rather than
+OPT_Wpedantic.  */
+ if (pedwarn (DECL_SOURCE_LOCATION (fn), 0,
+  "defaulted declaration %q+D does not match the "
+  "expected signature", fn))
+   inform (DECL_SOURCE_LOCATION (fn),
+   "expected signature: %qD", implicit_fn);


This should also depend on -Wdefaulted-function-deleted, and set
DECL_DELETED_FN.  And the C++20 case should show the expected signature.
Really, the two cases should share the same code, only the diagnostic kind
should change.


How about this?

dg.exp passed; running the full testing.

-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

   struct C {
  C(const C&&) = default;
   };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.  I'm also downgrading an error to a pedwarn
in C++17 since the code compiles in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Call delete_defaulted_fn to set
DECL_DELETED_FN.
* cp-tree.h (delete_defaulted_fn): Declare.
* method.cc (delete_defaulted_fn): New.
(defaulted_late_check): Call delete_defaulted_fn instead of giving
an error.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdefaulted-function-deleted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted15.C: Add dg-warning/dg-error.
* g++.dg/cpp0x/defaulted51.C: Likewise.
* g++.dg/cpp0x/defaulted52.C: Likewise.
* g++.dg/cpp0x/defaulted53.C: Likewise.
* g++.dg/cpp0x/defaulted54.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted58.C: Likewise.
* g++.dg/cpp0x/defaulted59.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: New test.
* g++.dg/cpp0x/defaulted64.C: New test.
* g++.dg/cpp0x/defaulted65.C: New test.
* g++.dg/cpp23/defaulted1.C: New test.
---
  gcc/c-family/c.opt   |  4 ++
  gcc/cp/class.cc  | 18 --
  gcc/cp/cp-tree.h |  1 +
  gcc/cp/method.cc | 77 +---
  gcc/doc/invoke.texi  |  9 +++
  gcc/testsuite/g++.dg/cpp0x/defaulted15.C |  3 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted51.C |  3 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted52.C |  3 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted53.C |  3 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted54.C |  2 +
  gcc/testsuite/g++.dg/cpp0x/defaulted56.C |  6 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted57.C |  6 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted58.C |  2 +
  gcc/testsuite/g++.dg/cpp0x/defaulted59.C |  3 +-
  gcc/testsuite/g++.dg/cpp0x/defaulted63.C | 39 
  gcc/testsuite/g++.dg/cpp0x/defaulted64.C | 27 +
  gcc/testsuite/g++.dg/cpp0x/defaulted65.C | 25 
  gcc/testsuite/g++.dg/cpp23/defaulted1.C  | 23 +++
  18 files changed, 232 insertions(+), 22 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted63.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted64.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted65.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/defaulted1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 491aa02e1a3..f5136fd2341 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -619,6 +619,10 @@ Wdeclaration-missing-parameter-type
  C ObjC Var(warn_declaration_missing_parameter) Warning Init(1)
  Warn for missing parameter types in function declarations.
  
+Wdefaulted-function-deleted

+C++ ObjC++ Var(warn_defaulted_fn_deleted) Init(1) Warning
+Warn when an explicitly defaulted function is deleted.
+
  Wdelete-incomplete
  C++ ObjC++ Var(warn_delete_incomplete) Init(1) Warning
  Warn when deleting a pointer to incomplete type.
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 950d83b0ea4..2d85681dc72 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -6490,8 +6490,9 @@ check_bases_and_members (tree t)
&& !DECL_ARTIFICIAL (fn)
&& DECL_DEFAULTED_IN_CLASS_P (fn))
{
+   special_function_kind kind = special_function

[PATCH v2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Bohan Lei
Hi all,

A simple assembly check has been added in this version. Previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html

Thanks,
Bohan

--

The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc  |  3 +++
 .../riscv/rvv/vsetvl/vsetvl_bug-4.c   | 19 +++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+ if (!curr_info.vl_used_by_non_rvv_insn_p ()
+ && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+   m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..04a8ff2945a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
+/* { dg-final { scan-assembler-times {vsetvli} 2 } } */
-- 
2.17.1



RE: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Bohan Lei
An updated version has been submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662854.html

--
From:Bohan Lei 
Send Time:2024 Sep. 11 (Wed.) 17:12
To:"gcc-patches"
Subject:[PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused


The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
 Delete vsetvl insn when `prev_info` is compatible

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc               |  3 +++
 .../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c | 18 ++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
         curr_info.dump (dump_file, "        ");
       }
     m_dem.merge (prev_info, curr_info);
+    if (!curr_info.vl_used_by_non_rvv_insn_p ()
+        && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+      m_delete_list.safe_push (curr_info);
     if (curr_info.get_read_vl_insn ())
       prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
     if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..faa0c8073d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
-- 
2.17.1



Re: RE: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread 钟居哲
Could you CC to me ? I can't reply that patch directly.



juzhe.zh...@rivai.ai
 
From: Bohan Lei
Date: 2024-09-12 10:38
To: Bohan Lei
CC: gcc-patches; juzhe.zhong
Subject: RE: [PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused
An updated version has been submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662854.html
 
--
From:Bohan Lei 
Send Time:2024 Sep. 11 (Wed.) 17:12
To:"gcc-patches"
Subject:[PATCH 2/2] RISC-V: Eliminate latter vsetvl when fused
 
 
The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc   |  3 +++
.../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c | 18 ++
2 files changed, 21 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
curr_info.dump (dump_file, "");
  }
m_dem.merge (prev_info, curr_info);
+if (!curr_info.vl_used_by_non_rvv_insn_p ()
+&& vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+  m_delete_list.safe_push (curr_info);
if (curr_info.get_read_vl_insn ())
  prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..faa0c8073d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
-- 
2.17.1
 


[PATCH v1] RISC-V: Implement SAT_ADD for signed integer vector

2024-09-11 Thread pan2 . li
From: Pan Li 

This patch would like to implement the ssadd for vector integer.  Aka
form 1 of ssadd vector.

Form 1:
  #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
  void __attribute__((noinline))   \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T sum = (UT)x + (UT)y; \
out[i] = (x ^ y) < 0   \
  ? sum\
  : (sum ^ x) >= 0 \
? sum  \
: x < 0 ? MIN : MAX;   \
  }\
  }

DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

Before this patch:
vec_sat_s_add_int64_t_fmt_1:
  ...
  vsetvli  t1,zero,e64,m1,ta,mu
  vadd.vv  v3,v1,v2
  vxor.vv  v0,v1,v3
  vmslt.vi v0,v0,0
  vxor.vv  v2,v1,v2
  vmsge.vi v2,v2,0
  vmand.mm v0,v0,v2
  vsra.vx  v1,v1,t3
  vxor.vv  v3,v1,v4,v0.t
  ...

After this patch:
vec_sat_s_add_int64_t_fmt_1:
  ...
  vsetvli  a6,zero,e64,m1,ta,ma
  vsadd.vv v1,v1,v2
  ...

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (ssadd3): Add new pattern for
signed integer vector SAT_ADD.
* config/riscv/riscv-protos.h (expand_vec_ssadd): Add new func
decl for vector ssadd expanding.
* config/riscv/riscv-v.cc (expand_vec_ssadd): Add new func impl
to expand vector ssadd pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for vector ssadd.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper
macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-4.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  11 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |   9 +
 .../riscv/rvv/autovec/binop/vec_sat_data.h| 264 ++
 .../riscv/rvv/autovec/binop/vec_sat_s_add-1.c |  18 ++
 .../riscv/rvv/autovec/binop/vec_sat_s_add-2.c |  18 ++
 .../riscv/rvv/autovec/binop/vec_sat_s_add-3.c |  18 ++
 .../riscv/rvv/autovec/binop/vec_sat_s_add-4.c |  18 ++
 .../rvv/autovec/binop/vec_sat_s_add-run-1.c   |  17 ++
 .../rvv/autovec/binop/vec_sat_s_add-run-2.c   |  17 ++
 .../rvv/autovec/binop/vec_sat_s_add-run-3.c   |  17 ++
 .../rvv/autovec/binop/vec_sat_s_add-run-4.c   |  17 ++
 .../riscv/rvv/autovec/vec_sat_arith.h |  25 ++
 13 files changed, 450 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a4e108268b4..a53c44659f0 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2684,6 +2684,17 @@ (define_expand "usadd3"
   }
 )
 
+(define_expand "ssadd3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_ssadd (operands[0], operands[1], op

[PATCH v2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Bohan Lei
Resent to cc Juzhe.

--

Hi all,

A simple assembly check has been added in this version. Previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html

Thanks,
Bohan

--

The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc  |  3 +++
 .../riscv/rvv/vsetvl/vsetvl_bug-4.c   | 19 +++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+ if (!curr_info.vl_used_by_non_rvv_insn_p ()
+ && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+   m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..04a8ff2945a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
+/* { dg-final { scan-assembler-times {vsetvli} 2 } } */
-- 
2.17.1



Re: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Bohan Lei
Date: 2024-09-12 12:38
To: gcc-patches
CC: juzhe.zhong
Subject: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused
Resent to cc Juzhe.
 
--
 
Hi all,
 
A simple assembly check has been added in this version. Previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html
 
Thanks,
Bohan
 
--
 
The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  |  3 +++
.../riscv/rvv/vsetvl/vsetvl_bug-4.c   | 19 +++
2 files changed, 22 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+   if (!curr_info.vl_used_by_non_rvv_insn_p ()
+   && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+ m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..04a8ff2945a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
+/* { dg-final { scan-assembler-times {vsetvli} 2 } } */
-- 
2.17.1
 


[PATCH] i386: Implement Thread Local Storage on Windows

2024-09-11 Thread Julian Waters
Hello everyone,

This patch is an initial implementation of native Thread Local Storage on 
Windows, which currently emulates TLS via emutls. This was heavily referenced 
from Daniel Green's original work with Windows TLS from a decade ago, so credit 
should be attributed to him as well (https://github.com/venix1 with the 
original implementation being 
https://github.com/venix1/MinGW-GDC/blob/master/patches/mingw-tls-gcc-4.8.patch).
 TLS support still requires a bug in ld to be fixed, and the work for that is 
currently underway (With thanks to Jan Beulich). Note that native TLS is still 
disabled by default for Windows, and has to be explicitly be enabled via the 
--enable-tls switch during configure time. There are some issues with this 
implementation, namely that the TLS section is only emitted with the w section 
flag, and does not have the d flag emitted alongside it (I am unsure whether as 
requires the d flag or not), the TLS init method being emitted has not yet been 
rewritten to work on Windows (I do not know how to do this), and that the last 
step of the TLS access contains an inefficiency due to the patch zero extending 
the TLS symbol, which causes an extra instruction to be emitted. This is 
unfortunate, but I could not find a way to implement this without the zero 
extending, as all other alternatives would crash when trying to compile libgcc 
or libgomp. If anyone has suggestions to fix this inefficient extra 
instruction, as well as the other issues with the implementation, I would be 
more than happy to apply the changes to the patch. As always, I do not have any 
write access to gcc, and once the green light is given for this patch I need 
help in committing it to gcc. The patch is attached at the very end of this mail

best regards,
Julian

P.S. The demonstration of the extra unrequired instruction is shown here, by 
comparing it to clang (Both at -O3):

thread_local int local = 2;

int main() {
local = 7;
}

clang:
mov eax, dword ptr [rip + _tls_index]
mov rcx, qword ptr gs:[88]
mov rsi, qword ptr [rcx + 8*rax]
mov dword ptr [rsi + local@SECREL32], 7 <-- Notice how clang moves 7 
into the calculated TLS address in one step

gcc:

mov eax, DWORD PTR [rip+_tls_index]
mov rdx, QWORD PTR gs:[88]
mov rax, QWORD PTR [rdx+rax*8]
lea edx, local@secrel32 <-- gcc first loads the TLS offset
mov DWORD PTR [rdx+rax], 7 <-- Then adds it to the thread pointer, 
before moving, which is not necessary

gcc/config/i386/ChangeLog:

* i386.cc
(mingw_w64_pe_select_section): New method.
(ix86_legitimate_constant_p): Handle new relocation.
(legitimate_pic_operand_p): Handle new relocation.
(legitimate_pic_address_disp_p): Handle new relocation.
(ix86_legitimate_address_p): Handle new relocation.
(legitimize_tls_address): Handle new Thread Local Storage model.
(output_pic_addr_const): Handle new relocation.
(i386_output_dwarf_dtprel): Handle new relocation.
(i386_asm_output_addr_const_extra): Handle new relocation.

* i386.h: New TARGET_WIN32_TLS flag.

* i386.md: Define UNSPEC_SECREL32, UNSPEC_TLS_WIN32 and handle new RTL 
template.

* mingw-w64.h: Define TARGET_ASM_SELECT_SECTION and TARGET_WIN32_TLS.

* predicates.md: Handle new relocation.

gcc/config/mingw/ChangeLog:

* winnt.cc (mingw_pe_unique_section): Emit new TLS section.

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 45320124b91..c1e6760a073 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -789,6 +789,20 @@ x86_64_elf_select_section (tree decl, int reloc,
   return default_elf_select_section (decl, reloc, align);
 }
 
+ATTRIBUTE_UNUSED static section *
+mingw_w64_pe_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT 
align)
+{
+  if (TREE_CODE (decl) == VAR_DECL && DECL_THREAD_LOCAL_P (decl))
+{
+  if (!DECL_P (decl))
+   decl = NULL_TREE;
+
+  return get_named_section (decl, ".tls$", reloc);
+}
+  else
+return default_select_section (decl, reloc, align);
+}
+
 /* Select a set of attributes for section NAME based on the properties
of DECL and whether or not RELOC indicates that DECL's initializer
might contain runtime relocations.  */
@@ -11170,6 +11184,9 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
x = XVECEXP (x, 0, 0);
return (GET_CODE (x) == SYMBOL_REF
&& SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_DYNAMIC);
+ case UNSPEC_SECREL32:
+   x = XVECEXP (x, 0, 0);
+   return GET_CODE (x) == SYMBOL_REF;
  default:
return false;
  }
@@ -11306,6 +11323,9 @@ legitimate_pic_operand_p (rtx x)
x = XVECEXP (inner, 0, 0);
return (GET_CODE (x) == SYMBOL_REF
&& SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_EXEC);
+ 

Re: [PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Hongtao Liu
On Thu, Sep 12, 2024 at 9:55 AM Levy Hsu  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_get_mask_mode):
> Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
> * config/i386/mmx.md (vec_cmpqi):
> Implement vec_cmpv2bfqi and vec_cmpv4bfqi.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/part-vect-vec_cmpbf.c: New test.
> ---
>  gcc/config/i386/i386.cc   |  3 ++-
>  gcc/config/i386/mmx.md| 17 
>  .../gcc.target/i386/part-vect-vec_cmpbf.c | 26 +++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 45320124b91..7dbae1d72e3 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24682,7 +24682,8 @@ ix86_get_mask_mode (machine_mode data_mode)
>/* AVX512FP16 only supports vector comparison
>  to kmask for _Float16.  */
>|| (TARGET_AVX512VL && TARGET_AVX512FP16
> - && GET_MODE_INNER (data_mode) == E_HFmode))
> + && GET_MODE_INNER (data_mode) == E_HFmode)
> +  || (TARGET_AVX10_2_256 && GET_MODE_INNER (data_mode) == E_BFmode))
>  {
>if (elem_size == 4
>   || elem_size == 8
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 4bc191b874b..95d9356694a 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -2290,6 +2290,23 @@
>DONE;
>  })
>
> +;;This instruction does not generate floating point exceptions
> +(define_expand "vec_cmpqi"
> +  [(set (match_operand:QI 0 "register_operand")
> +   (match_operator:QI 1 ""
> + [(match_operand:VBF_32_64 2 "register_operand")
> +  (match_operand:VBF_32_64 3 "nonimmediate_operand")]))]
> +  "TARGET_AVX10_2_256"
> +{
> +  rtx op2 = lowpart_subreg (V8BFmode,
> +   force_reg (mode, operands[2]), mode);
> +  rtx op3 = lowpart_subreg (V8BFmode,
> +   force_reg (mode, operands[3]), mode);
> +
> +  emit_insn (gen_vec_cmpv8bfqi (operands[0], operands[1], op2, op3));
> +  DONE;
> +})
> +
>  ;
>  ;;
>  ;; Parallel half-precision floating point rounding operations.
> diff --git a/gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c 
> b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c
> new file mode 100644
> index 000..0bb720b6432
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmpbf.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx10.2" } */
> +/* { dg-final { scan-assembler-times "vcmppbf16" 10 } } */
> +
> +typedef __bf16 __attribute__((__vector_size__ (4))) v2bf;
> +typedef __bf16 __attribute__((__vector_size__ (8))) v4bf;
> +
> +
> +#define VCMPMN(type, op, name) \
> +type  \
> +__attribute__ ((noinline, noclone)) \
> +vec_cmp_##type##type##name (type a, type b) \
> +{ \
> +  return a op b;  \
> +}
> +
> +VCMPMN (v4bf, <, lt)
> +VCMPMN (v2bf, <, lt)
> +VCMPMN (v4bf, <=, le)
> +VCMPMN (v2bf, <=, le)
> +VCMPMN (v4bf, >, gt)
> +VCMPMN (v2bf, >, gt)
> +VCMPMN (v4bf, >=, ge)
> +VCMPMN (v2bf, >=, ge)
> +VCMPMN (v4bf, ==, eq)
> +VCMPMN (v2bf, ==, eq)
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Stefan Schulze Frielinghaus
On Wed, Sep 11, 2024 at 08:57:23PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2024-09-11 at 16:44 +0200, Stefan Schulze Frielinghaus wrote:
> > On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote:
> > > On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus
> > > wrote:
> > > > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > > > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > > > > wrote:
> > > > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich
> > > > > > wrote:
> > > > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze
> > > > > > > Frielinghaus
> > > > > > > wrote:
> > > > > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > > > > *tf_to_fprx2_1
> > > > > > > > survive register allocation.  This in turn leads to wrong
> > > > > > > > register
> > > > > > > > renaming.  Keeping the current approach would mean we
> > > > > > > > need
> > > > > > > > two
> > > > > > > > insns
> > > > > > > > for
> > > > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. 
> > > > > > > > Something
> > > > > > > > along
> > > > > > > > the
> > > > > > > > lines
> > > > > > > > 
> > > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > > >   [(set (subreg:DF (match_operand:FPRX2 0
> > > > > > > > "nonimmediate_operand"
> > > > > > > > "=f") 0)
> > > > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > > "v")]
> > > > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > > > >   "TARGET_VXE"
> > > > > > > >   "#")
> > > > > > > > 
> > > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > > "v")]
> > > > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > > > >   "TARGET_VXE"
> > > > > > > >   "vpdi\t%v0,%v1,%v0,1
> > > > > > > >   [(set_attr "op_type" "VRR")])
> > > > > > > > 
> > > > > > > > and similar for *tf_to_fprx2_1.  Note, pre register
> > > > > > > > allocation
> > > > > > > > operand 0
> > > > > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > > > > eliminated.
> > > > > > > > 
> > > > > > > > Since we always copy a whole vector register into a
> > > > > > > > floating-
> > > > > > > > point
> > > > > > > > register pair, another way to fix this is to merge
> > > > > > > > *tf_to_fprx2_0
> > > > > > > > and
> > > > > > > > *tf_to_fprx2_1 into a single insn which means we don't
> > > > > > > > have
> > > > > > > > to
> > > > > > > > use
> > > > > > > > subregs at all.  The downside of this is that the
> > > > > > > > assembler
> > > > > > > > template
> > > > > > > > contains two instructions, now.  The upside is that we
> > > > > > > > don't
> > > > > > > > have
> > > > > > > > to
> > > > > > > > come up with some artificial insn before RA which might
> > > > > > > > be
> > > > > > > > more
> > > > > > > > readable/maintainable.  That is implemented by this
> > > > > > > > patch.
> > > > > > > > 
> > > > > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > > > > specifier
> > > > > > > > %V
> > > > > > > > was
> > > > > > > > introduced which is used in tf_to_fprx2 only, now.  I
> > > > > > > > didn't
> > > > > > > > come
> > > > > > > > up
> > > > > > > > with its counterpart like %F for floating-point
> > > > > > > > registers. 
> > > > > > > > Instead I
> > > > > > > > printed the register pair in the output function
> > > > > > > > directly. 
> > > > > > > > This
> > > > > > > > spares
> > > > > > > > us a new and "rare" format specifier for a single insn. 
> > > > > > > > I
> > > > > > > > don't
> > > > > > > > have
> > > > > > > > a
> > > > > > > > strong opinion which option to choose, however, we should
> > > > > > > > either
> > > > > > > > add
> > > > > > > > %F
> > > > > > > > in order to mimic the same behaviour as %V or getting rid
> > > > > > > > of
> > > > > > > > %V
> > > > > > > > and
> > > > > > > > inline the logic in the output function.  I lean towards
> > > > > > > > the
> > > > > > > > latter.
> > > > > > > > Any preferences?
> > > > > > > > ---
> > > > > > > >  gcc/config/s390/s390.md    |  2 +
> > > > > > > >  gcc/config/s390/vector.md  | 66
> > > > > > > > +++-
> > > > > > > > 
> > > > > > > > 
> > > > > > > > --
> > > > > > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26
> > > > > > > > +
> > > > > > > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > > > > > > >  create mode 100644
> > > > > > > > gcc/testsuite/gcc.target/s390/pr115860-
> > > > > > > > 1.c
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > +  char buf[64];
> > > > > > > > +  switch (which_alternative)
> > > > > > > > +    {
> > > > > > > > +    case 0:
> > > > > > > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > > > > > +   return "vpdi\t%V0,%v1,%V0,5";
> > > > > > > > +  else
>

RE: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Li, Pan2
Committed.

Pan

From: 钟居哲 
Sent: Thursday, September 12, 2024 12:40 PM
To: Bohan Lei ; gcc-patches 

Cc: Li, Pan2 
Subject: Re: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused

LGTM


juzhe.zh...@rivai.ai

From: Bohan Lei
Date: 2024-09-12 12:38
To: gcc-patches
CC: juzhe.zhong
Subject: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused
Resent to cc Juzhe.

--

Hi all,

A simple assembly check has been added in this version. Previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html

Thanks,
Bohan

--

The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  |  3 +++
.../riscv/rvv/vsetvl/vsetvl_bug-4.c   | 19 +++
2 files changed, 22 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+   if (!curr_info.vl_used_by_non_rvv_insn_p ()
+   && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+ m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..04a8ff2945a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
+/* { dg-final { scan-assembler-times {vsetvli} 2 } } */
--
2.17.1



Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Andreas Krebbel

Ok, Thanks!

Andreas

On 8/16/24 09:41, Stefan Schulze Frielinghaus wrote:

Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
survive register allocation.  This in turn leads to wrong register
renaming.  Keeping the current approach would mean we need two insns for
*tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
lines

(define_insn "*tf_to_fprx2_0"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
 (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
UNSPEC_TF_TO_FPRX2_0))]
   "TARGET_VXE"
   "#")

(define_insn "*tf_to_fprx2_0"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
 (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
UNSPEC_TF_TO_FPRX2_0))]
   "TARGET_VXE"
   "vpdi\t%v0,%v1,%v0,1
   [(set_attr "op_type" "VRR")])

and similar for *tf_to_fprx2_1.  Note, pre register allocation operand 0
has mode FPRX2 and afterwards DF once subregs have been eliminated.

Since we always copy a whole vector register into a floating-point
register pair, another way to fix this is to merge *tf_to_fprx2_0 and
*tf_to_fprx2_1 into a single insn which means we don't have to use
subregs at all.  The downside of this is that the assembler template
contains two instructions, now.  The upside is that we don't have to
come up with some artificial insn before RA which might be more
readable/maintainable.  That is implemented by this patch.

In commit r11-4872-ge627cda5686592, the output operand specifier %V was
introduced which is used in tf_to_fprx2 only, now.  I didn't come up
with its counterpart like %F for floating-point registers.  Instead I
printed the register pair in the output function directly.  This spares
us a new and "rare" format specifier for a single insn.  I don't have a
strong opinion which option to choose, however, we should either add %F
in order to mimic the same behaviour as %V or getting rid of %V and
inline the logic in the output function.  I lean towards the latter.
Any preferences?
---
  gcc/config/s390/s390.md|  2 +
  gcc/config/s390/vector.md  | 66 +++---
  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
  3 files changed, 60 insertions(+), 34 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3d5759d6252..31240899934 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -241,6 +241,8 @@
 UNSPEC_VEC_VFMIN
 UNSPEC_VEC_VFMAX
  
+   UNSPEC_TF_TO_FPRX2

+
 UNSPEC_NNPA_VCLFNHS_V8HI
 UNSPEC_NNPA_VCLFNLS_V8HI
 UNSPEC_NNPA_VCRNFS_V8HI
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a75b7cb5825..561182e0c2c 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -907,36 +907,36 @@
"vmrlg\t%0,%1,%2";
[(set_attr "op_type" "VRR")])
  
-

-(define_insn "*tf_to_fprx2_0"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
-   (subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
-  "TARGET_VXE"
-  ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
-  "vpdi\t%v0,%v1,%v0,1"
-  [(set_attr "op_type" "VRR")])
-
-(define_insn "*tf_to_fprx2_1"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
-   (subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
+(define_insn "tf_to_fprx2"
+  [(set (match_operand:FPRX2 0 "register_operand" "=f,f ,f")
+   (unspec:FPRX2 [(match_operand:TF 1 "general_operand"   "v,AR,AT")]
+ UNSPEC_TF_TO_FPRX2))]
"TARGET_VXE"
-  ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
-  "vpdi\t%V0,%v1,%V0,5"
-  [(set_attr "op_type" "VRR")])
-
-(define_insn_and_split "tf_to_fprx2"
-  [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f")
-   (subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
-  "TARGET_VXE"
-  "#"
-  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
-  [(set (match_dup 2) (match_dup 3))
-   (set (match_dup 4) (match_dup 5))]
  {
-  operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0);
-  operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0);
-  operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8);
-  operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
+  char buf[64];
+  switch (which_alternative)
+{
+case 0:
+  if (REGNO (operands[0]) == REGNO (operands[1]))
+   return "vpdi\t%V0,%v1,%V0,5";
+  else
+   return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
+case 1:
+  {
+   const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
+   snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1", reg_pair);
+   output_asm_insn (buf, operands);
+   return "";
+  }
+case 2:
+  {
+   const char *reg_pair = reg_names[REG

Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Andreas Krebbel



On 9/12/24 08:14, Stefan Schulze Frielinghaus wrote:

..

Right, so offsettable_memref_p only ensures that any resulting
address is a
valid general address.  So we have to manually check for short
displacement.
Maybe something along the lines:

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 7aea776da2f..e61cda8352a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3714,6 +3714,18 @@ s390_mem_constraint (const char *str, rtx op)
    if ((reload_completed || reload_in_progress)
   ? !offsettable_memref_p (op) :
!offsettable_nonstrict_memref_p (op))
     return 0;
+  /* offsettable_memref_p ensures only that any positive offset
added to
+    the address forms a valid general address.  For Q and R
constraints we
+    also have to verify that the resulting displacement after
adding any
+    positive offset less than the size of the object being
referenced is
+    still valid.  */
+  if (str[1] == 'Q' || str[1] == 'R')
+   {
+ int o = GET_MODE_SIZE (GET_MODE (op)) - 1;
+ rtx tmp = adjust_address (op, QImode, o);
+ if (!s390_check_qrst_address (str[1], XEXP (tmp, 0), true))
+   return 0;
+   }
    return s390_check_qrst_address (str[1], XEXP (op, 0), true);
  case 'B':
    /* Check for non-literal-pool variants of memory constraints.
*/

My reading of the constraints A[RQST] is that those are only used for
operands
with non-block mode.  Thus, I didn't check for block mode.  Maybe an
assert
would be worthwhile.

This looks reasonable to me. I guess this deserves to be a separate
patch?

Yea I think so, too, since this fixes the constraints AR and AQ which is
independent of this patch.  I will prepare one shortly.


Agreed. Feel free to commit the change above right away. Thanks!

Andreas



Re: [PATCH] cselib: Discard useless locs of preserved VALUEs [PR116627]

2024-09-11 Thread Jakub Jelinek
On Wed, Sep 11, 2024 at 11:26:27PM +0200, Jakub Jelinek wrote:
> I think we need to discuard useless locs even from the preserved VALUEs.
> That IMHO shouldn't create any further useless VALUEs, the preserved
> VALUEs are never useless, so we don't need to iterate with it, can do it
> just once, but IMHO it needs to be done because actually
> discard_useless_values.
> 
> The following patch does that.

Note, I've verified the patch on x86_64-linux cc1plus didn't change
anything at all on the resulting cc1plus binary (compared it to one
bootstrapped without this patch with the patch later applied and
make cc1plus done in the stage3, the only change in the binary was
16 bytes of executable_checksum).

Jakub



Re: [PATCH] s390: Fix strict_low_part generation

2024-09-11 Thread Andreas Krebbel

On 8/16/24 09:14, Stefan Schulze Frielinghaus wrote:

In s390_expand_insv(), if generating code for ICM et al. src is a MEM
and gen_lowpart might force src into a register such that we end up with
patterns which do not match anymore.  Use adjust_address() instead in
order to preserve a MEM.

Furthermore, it is not straight forward to enforce a subreg.  For
example, in case of a paradoxical subreg, gen_lowpart() may return a
register.  In order to compensate this, s390_gen_lowpart_subreg() emits
a reference to a pseudo which does not coincide with its definition
which is wrong.  Additionally, if dest is a paradoxical subreg, then do
not try to emit a strict_low_part since it could mean that dest was not
initialized even though this might be fixed up later by init-regs.

Splitter for insn *get_tp_64, *zero_extendhisi2_31,
*zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
just returns the hard register for mode m which is fine to use as an
argument for strict_low_part, i.e., we do not need to enforce subregs
here since after reload subregs are supposed to be eliminated anyway.

This fixes gcc.dg/torture/pr111821.c.

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
* config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
(s390_expand_insv): Use adjust_address() and emit a
strict_low_part only in case of a natural subreg.
* config/s390/s390.md: Use gen_lowpart() instead of
s390_gen_lowpart_subreg().


Ok. Thanks!


Andreas




Re: [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-11 Thread Richard Biener
On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng  wrote:
>
> SVE's INDEX instruction can be used to populate vectors by values starting 
> from
> "base" and incremented by "step" for each subsequent value. We can take
> advantage of it to generate vector constants if TARGET_SVE is available and 
> the
> base and step values are within [-16, 15].

Are there multiplication by or addition of scalar immediate instructions to
enhance this with two-instruction sequences?

> For example, with the following function:
>
> typedef int v4si __attribute__ ((vector_size (16)));
> v4si
> f_v4si (void)
> {
>   return (v4si){ 0, 1, 2, 3 };
> }
>
> GCC currently generates:
>
> f_v4si:
> adrpx0, .LC4
> ldr q0, [x0, #:lo12:.LC4]
> ret
>
> .LC4:
> .word   0
> .word   1
> .word   2
> .word   3
>
> With this patch, we generate an INDEX instruction instead if TARGET_SVE is
> available.
>
> f_v4si:
> index   z0.s, #0, #1
> ret
>
> PR target/113328
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (aarch64_simd_valid_immediate): Improve
> handling of some ADVSIMD vectors by using SVE's INDEX if TARGET_SVE is
> available.
> (aarch64_output_simd_mov_immediate): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
> SVE's INDEX instruction.
> * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
> * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
> * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
> * gcc.target/aarch64/sve/vec_init_3.c: New test.
>
> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64.cc | 12 ++-
>  .../aarch64/sve/acle/general/dupq_1.c |  3 +-
>  .../aarch64/sve/acle/general/dupq_2.c |  3 +-
>  .../aarch64/sve/acle/general/dupq_3.c |  3 +-
>  .../aarch64/sve/acle/general/dupq_4.c |  3 +-
>  .../gcc.target/aarch64/sve/vec_init_3.c   | 99 +++
>  6 files changed, 114 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_3.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 27e24ba70ab..6b3ca57d0eb 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22991,7 +22991,7 @@ aarch64_simd_valid_immediate (rtx op, 
> simd_immediate_info *info,
>if (CONST_VECTOR_P (op)
>&& CONST_VECTOR_DUPLICATE_P (op))
>  n_elts = CONST_VECTOR_NPATTERNS (op);
> -  else if ((vec_flags & VEC_SVE_DATA)
> +  else if (which == AARCH64_CHECK_MOV && TARGET_SVE
>&& const_vec_series_p (op, &base, &step))
>  {
>gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> @@ -25249,6 +25249,16 @@ aarch64_output_simd_mov_immediate (rtx const_vector, 
> unsigned width,
>
>if (which == AARCH64_CHECK_MOV)
>  {
> +  if (info.insn == simd_immediate_info::INDEX)
> +   {
> + gcc_assert (TARGET_SVE);
> + snprintf (templ, sizeof (templ), "index\t%%Z0.%c, #"
> +   HOST_WIDE_INT_PRINT_DEC ", #" HOST_WIDE_INT_PRINT_DEC,
> +   element_char, INTVAL (info.u.index.base),
> +   INTVAL (info.u.index.step));
> + return templ;
> +   }
> +
>mnemonic = info.insn == simd_immediate_info::MVN ? "mvni" : "movi";
>shift_op = (info.u.mov.modifier == simd_immediate_info::MSL
>   ? "msl" : "lsl");
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> index 216699b0536..0940bedd0dd 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> @@ -10,7 +10,6 @@ dupq (int x)
>return svdupq_s32 (x, 1, 2, 3);
>  }
>
> -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */
>  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
>  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
> -/* { dg-final { scan-assembler {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } 
> */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> index d494943a275..218a6601337 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> @@ -10,7 +10,6 @@ dupq (int x)
>return svdupq_s32 (x, 1, 2, 3);
>  }
>
> -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #3, #-1} } } */
>  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
>  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n}

Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-11 Thread Richard Biener
On Thu, Sep 12, 2024 at 3:41 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > why would arg_edge depend on whether t0 is INTEGER_CST or not?
> Because the edge->src of INTEGER_CST points to the cond block which cannot 
> match the
> edge->dest of the cond_block. For example as below, the first arg of PHI is 
> 255(2), which
> cannot match neither goto  nor goto .
>
> Thus, I need to take the second arg, aka _1(3) to match the edge->dest of 
> cond_block.
> Aka the phi arg edge->src == cond_block edge->dest. In below example,
> the goto matches _1(3) with false condition, and then I can locate the 
> edge from b2 -> b3.
>
> Or is there any better approach for this scenario?
>
>4   │ __attribute__((noinline))
>5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
>6   │ {
>7   │   unsigned char _1;
>8   │   unsigned char _2;
>9   │   uint8_t _3;
>   10   │   __complex__ unsigned char _5;
>   11   │
>   12   │ ;;   basic block 2, loop depth 0
>   13   │ ;;pred:   ENTRY
>   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
>   15   │   _2 = IMAGPART_EXPR <_5>;
>   16   │   if (_2 != 0)
>   17   │ goto ; [35.00%]
>   18   │   else
>   19   │ goto ; [65.00%]
>   20   │ ;;succ:   3
>   21   │ ;;4
>   22   │
>   23   │ ;;   basic block 3, loop depth 0
>   24   │ ;;pred:   2
>   25   │   _1 = REALPART_EXPR <_5>;
>   26   │ ;;succ:   4
>   27   │
>   28   │ ;;   basic block 4, loop depth 0
>   29   │ ;;pred:   2
>   30   │ ;;3
>   31   │   # _3 = PHI <255(2), _1(3)>
>   32   │   return _3;
>   33   │ ;;succ:   EXIT
>   34   │
>   35   │ }
>
> > Can you instead inline match_control_flow_graph_case_0 and _1 and do the
> > argument assignment within the three cases of CFGs we accept?  That
> > would be much easier to follow.
>
> To double confirm, are you suggest inline the cfg match for both the case_0 
> and case_1?
> That may make func body grows, and we may have more cases like case_2, 
> case_3... etc.
> If so, I will inline this to match_cond_with_binary_phi in v4.

Yes, inline both CFG matches and unify them - there should be exactly
three cases at
the moment.  And "duplicate" computing the true/false arg into the
respective cases
since it's trivial which edge(s) to look at.

This should make the code more maintainable and easier to understand.

I'm not sure what additional cases you are thinking of, more complex CFGs should
always mean more than a single controlling condition - I'm not sure we
want to go
the way to present those as cond1 | cond2.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, September 11, 2024 9:39 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi 
> for true/false arg
>
> On Wed, Sep 11, 2024 at 8:31 AM  wrote:
> >
> > From: Pan Li 
> >
> > When matching the cond with 2 args phi node, we need to figure out
> > which arg of phi node comes from the true edge of cond block, as
> > well as the false edge.  This patch would like to add interface
> > to perform the action and return the true and false arg in TREE type.
> >
> > There will be some additional handling if one of the arg is INTEGER_CST.
> > Because the INTEGER_CST args may have no source block, thus its' edge
> > source points to the condition block.  See below example in line 31,
> > the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> > For example, the _1(3) takes block 3 as source, which is the dest
> > of false edge of the condition block.
> >
> >4   │ __attribute__((noinline))
> >5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
> >6   │ {
> >7   │   unsigned char _1;
> >8   │   unsigned char _2;
> >9   │   uint8_t _3;
> >   10   │   __complex__ unsigned char _5;
> >   11   │
> >   12   │ ;;   basic block 2, loop depth 0
> >   13   │ ;;pred:   ENTRY
> >   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
> >   15   │   _2 = IMAGPART_EXPR <_5>;
> >   16   │   if (_2 != 0)
> >   17   │ goto ; [35.00%]
> >   18   │   else
> >   19   │ goto ; [65.00%]
> >   20   │ ;;succ:   3
> >   21   │ ;;4
> >   22   │
> >   23   │ ;;   basic block 3, loop depth 0
> >   24   │ ;;pred:   2
> >   25   │   _1 = REALPART_EXPR <_5>;
> >   26   │ ;;succ:   4
> >   27   │
> >   28   │ ;;   basic block 4, loop depth 0
> >   29   │ ;;pred:   2
> >   30   │ ;;3
> >   31   │   # _3 = PHI <255(2), _1(3)>
> >   32   │   return _3;
> >   33   │ ;;succ:   EXIT
> >   34   │
> >   35   │ }
> >
> > The below test suites are passed for this patch.
> > * The rv64gcv fully regression test.
> > * The x86 bootstr

RE: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-11 Thread Li, Pan2
Thanks Richard for comments.

> Yes, inline both CFG matches and unify them - there should be exactly
> three cases at
> the moment.  And "duplicate" computing the true/false arg into the
> respective cases
> since it's trivial which edge(s) to look at.

Got it, will resend the v4 series for this change.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, September 12, 2024 2:51 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for 
true/false arg

On Thu, Sep 12, 2024 at 3:41 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > why would arg_edge depend on whether t0 is INTEGER_CST or not?
> Because the edge->src of INTEGER_CST points to the cond block which cannot 
> match the
> edge->dest of the cond_block. For example as below, the first arg of PHI is 
> 255(2), which
> cannot match neither goto  nor goto .
>
> Thus, I need to take the second arg, aka _1(3) to match the edge->dest of 
> cond_block.
> Aka the phi arg edge->src == cond_block edge->dest. In below example,
> the goto matches _1(3) with false condition, and then I can locate the 
> edge from b2 -> b3.
>
> Or is there any better approach for this scenario?
>
>4   │ __attribute__((noinline))
>5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
>6   │ {
>7   │   unsigned char _1;
>8   │   unsigned char _2;
>9   │   uint8_t _3;
>   10   │   __complex__ unsigned char _5;
>   11   │
>   12   │ ;;   basic block 2, loop depth 0
>   13   │ ;;pred:   ENTRY
>   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
>   15   │   _2 = IMAGPART_EXPR <_5>;
>   16   │   if (_2 != 0)
>   17   │ goto ; [35.00%]
>   18   │   else
>   19   │ goto ; [65.00%]
>   20   │ ;;succ:   3
>   21   │ ;;4
>   22   │
>   23   │ ;;   basic block 3, loop depth 0
>   24   │ ;;pred:   2
>   25   │   _1 = REALPART_EXPR <_5>;
>   26   │ ;;succ:   4
>   27   │
>   28   │ ;;   basic block 4, loop depth 0
>   29   │ ;;pred:   2
>   30   │ ;;3
>   31   │   # _3 = PHI <255(2), _1(3)>
>   32   │   return _3;
>   33   │ ;;succ:   EXIT
>   34   │
>   35   │ }
>
> > Can you instead inline match_control_flow_graph_case_0 and _1 and do the
> > argument assignment within the three cases of CFGs we accept?  That
> > would be much easier to follow.
>
> To double confirm, are you suggest inline the cfg match for both the case_0 
> and case_1?
> That may make func body grows, and we may have more cases like case_2, 
> case_3... etc.
> If so, I will inline this to match_cond_with_binary_phi in v4.

Yes, inline both CFG matches and unify them - there should be exactly
three cases at
the moment.  And "duplicate" computing the true/false arg into the
respective cases
since it's trivial which edge(s) to look at.

This should make the code more maintainable and easier to understand.

I'm not sure what additional cases you are thinking of, more complex CFGs should
always mean more than a single controlling condition - I'm not sure we
want to go
the way to present those as cond1 | cond2.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, September 11, 2024 9:39 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi 
> for true/false arg
>
> On Wed, Sep 11, 2024 at 8:31 AM  wrote:
> >
> > From: Pan Li 
> >
> > When matching the cond with 2 args phi node, we need to figure out
> > which arg of phi node comes from the true edge of cond block, as
> > well as the false edge.  This patch would like to add interface
> > to perform the action and return the true and false arg in TREE type.
> >
> > There will be some additional handling if one of the arg is INTEGER_CST.
> > Because the INTEGER_CST args may have no source block, thus its' edge
> > source points to the condition block.  See below example in line 31,
> > the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> > For example, the _1(3) takes block 3 as source, which is the dest
> > of false edge of the condition block.
> >
> >4   │ __attribute__((noinline))
> >5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
> >6   │ {
> >7   │   unsigned char _1;
> >8   │   unsigned char _2;
> >9   │   uint8_t _3;
> >   10   │   __complex__ unsigned char _5;
> >   11   │
> >   12   │ ;;   basic block 2, loop depth 0
> >   13   │ ;;pred:   ENTRY
> >   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
> >   15   │   _2 = IMAGPART_EXPR <_5>;
> >   16   │   if (_2 != 0)
> >   17   │ goto ; [35.00%]
>