Re: [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]'

2021-07-20 Thread Thomas Schwinge
Hi!

On 2021-07-19T10:46:35+0200, I wrote:
> | On 7/16/21 11:42 AM, Thomas Schwinge wrote:
> |> On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
>  wrote:
> |>> The attached tweak avoids the new -Warray-bounds instances when
> |>> building libatomic for arm. Christophe confirms it resolves
> |>> the problem (thank you!)
> |>
> |> As Abid has just reported in
> |> , similar
> |> problem with GCN target libgomp build:
> |>
> |>  In function ‘gcn_thrs’,
> |>  inlined from ‘gomp_thread’ at 
> [...]/source-gcc/libgomp/libgomp.h:803:10,
> |>  inlined from ‘GOMP_barrier’ at 
> [...]/source-gcc/libgomp/barrier.c:34:29:
> |>  [...]/source-gcc/libgomp/libgomp.h:792:10: error: array subscript 0 
> is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ 
> [-Werror=array-bounds]
> |>792 |   return *thrs;
> |>|  ^
> |>
> |>  gcc/config/gcn/gcn.h:  c_register_addr_space ("__lds", 
> ADDR_SPACE_LDS);   \
> |>
> |>  libgomp/libgomp.h-static inline struct gomp_thread *gcn_thrs (void)
> |>  libgomp/libgomp.h-{
> |>  libgomp/libgomp.h-  /* The value is at the bottom of LDS.  */
> |>  libgomp/libgomp.h:  struct gomp_thread * __lds *thrs = (struct 
> gomp_thread * __lds *)4;
> |>  libgomp/libgomp.h-  return *thrs;
> |>  libgomp/libgomp.h-}
> |>
> |> ..., plus a few more.  Work-around:
> |>
> |> struct gomp_thread * __lds *thrs = (struct gomp_thread * __lds *)4;
> |>  +# pragma GCC diagnostic push
> |>  +# pragma GCC diagnostic ignored "-Warray-bounds"
> |> return *thrs;
> |>  +# pragma GCC diagnostic pop
> |>
> |> ..., but it's a bit tedious to add that in all that the other places,
> |> too.
>
> Wasn't so bad after all; a lot of duplicates due to 'libgomp.h'.  I've
> thus pushed "[gcn] Work-around libgomp 'error: array subscript 0 is
> outside array bounds of ‘__lds struct gomp_thread * __lds[0]’
> [-Werror=array-bounds]' [PR101484]" to master branch in commit
> 9f2bc5077debef2b046b6c10d38591ac324ad8b5, see attached.

As I should find, these '#pragma GCC diagnostic [...]' directives cause
some code generation changes (that seems unexpected, problematic!).
(Martin, any idea?  Might be a pre-existing problem, of course.)  This
results in a lot (ten thousands) of 'GCN team arena exhausted' run-time
diagnostics, also leading to a few FAILs:

PASS: libgomp.c/../libgomp.c-c++-common/for-11.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-11.c execution test

PASS: libgomp.c/../libgomp.c-c++-common/for-12.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-12.c execution test

PASS: libgomp.c/../libgomp.c-c++-common/for-3.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-3.c execution test

PASS: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-5.c execution test

PASS: libgomp.c/../libgomp.c-c++-common/for-6.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-6.c execution test

PASS: libgomp.c/../libgomp.c-c++-common/for-9.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-9.c execution test

Same for 'libgomp.c++'.

It remains to be analyzed how '#pragma GCC diagnostic [...]' directives
can cause code generation changes; for now I'm working around the
"unexpected" '-Werror=array-bounds' diagnostics differently:

> |> (So I'll consider some GCN-specific '-Wno-array-bounds' if we don't
> |> get to resolve this otherwise, soon.)

'-Wno-error=array-bounds', precisely.  I've now pushed "[gcn]
Work-around libgomp 'error: array subscript 0 is outside array bounds of
‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' some more
[PR101484]" to master branch in commit
8168338684fc2bed576bb09202c63b3e9e678d92, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8168338684fc2bed576bb09202c63b3e9e678d92 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 19 Jul 2021 23:11:38 +0200
Subject: [PATCH] =?UTF-8?q?[gcn]=20Work-around=20libgomp=20'error:=20array?=
 =?UTF-8?q?=20subscript=200=20is=20outside=20array=20bounds=20of=20?=
 =?UTF-8?q?=E2=80=98=5F=5Flds=20struct=20gomp=5Fthread=20*=20=5F=5Flds[0]?=
 =?UTF-8?q?=E2=80=99=20[-Werror=3Darray-bounds]'=20some=20more=20[PR101484?=
 =?UTF-8?q?]?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

With yesterday's commit 9f2bc5077debef2b046b6c10d38591ac324ad8b5 "[gcn]
Work-around libgomp 'error: array subscript 0 is 

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-20 Thread Richard Biener
On Tue, 20 Jul 2021, Hongtao Liu wrote:

> On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  wrote:
> >
> > On Thu, 15 Jul 2021, Richard Biener wrote:
> >
> > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > >
> > > > OK, guess I was more looking at
> > > >
> > > > #define N 32
> > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > >  unsigned int *c, unsigned int * __restrict d,
> > > >  int n)
> > > > {
> > > >   unsigned sum = 1;
> > > >   for (int i = 0; i < n; ++i)
> > > > {
> > > >   b[i] += a[i];
> > > >   d[i] += c[i];
> > > > }
> > > >   return sum;
> > > > }
> > > >
> > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > mask for V16SI.  With SVE I see
> > > >
> > > > punpklo p1.h, p0.b
> > > > punpkhi p2.h, p0.b
> > > >
> > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > mask and unpacking that to two V8DI ones.  But I see
> > > >
> > > > vpbroadcastd%eax, %ymm0
> > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > vpbroadcastd%eax, %xmm0
> > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > kortestb%k1, %k1
> > > > jne .L3
> > > >
> > > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > > the magic for SVE and why that doesn't trigger for x86 here.
> > >
> > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > QImode so that doesn't play well here.  Not sure if there
> > > are proper mask instructions to use (I guess there's a shift
> > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> unpack_lo/hi, only shift.
> > > conversion.  Not sure how to better ask the target here - again
> > > VnBImode might have been easier here.
> >
> > So I've managed to "emulate" the unpack_lo/hi for the case of
> > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > to turn vector(8)  into two
> > vector(4) ) via BIT_FIELD_REF.  That then
> > produces the desired single mask producer and
> >
> >   loop_mask_38 = VIEW_CONVERT_EXPR > >(loop_mask_54);
> >   loop_mask_37 = BIT_FIELD_REF ;
> >
> > note for the lowpart we can just view-convert away the excess bits,
> > fully re-using the mask.  We generate surprisingly "good" code:
> >
> > kmovb   %k1, %edi
> > shrb$4, %dil
> > kmovb   %edi, %k2
> >
> > besides the lack of using kshiftrb.  I guess we're just lacking
> > a mask register alternative for
> Yes, we can do it similar as kor/kand/kxor.
> >
> > (insn 22 20 25 4 (parallel [
> > (set (reg:QI 94 [ loop_mask_37 ])
> > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > (const_int 4 [0x4])))
> > (clobber (reg:CC 17 flags))
> > ]) 724 {*lshrqi3_1}
> >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > (nil)))
> >
> > and so we reload.  For the above cited loop the AVX512 vectorization
> > with --param vect-partial-vector-usage=1 does look quite sensible
> > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > epilogue we get a single fully masked AVX512 "iteration" for both.
> > I suppose it's still mostly a code-size optimization (384 bytes
> > with the masked epiloge vs. 474 bytes with trunk) since it will
> > be likely slower for very low iteration counts but it's good
> > for icache usage then and good for less branch predictor usage.
> >
> > That said, I have to set up SPEC on a AVX512 machine to do
> Does patch  land in trunk already, i can have a test on CLX.

I'm still experimenting a bit right now but hope to get something
trunk ready at the end of this or beginning next week.  Since it's
disabled by default we can work on improving it during stage1 then.

I'm mostly struggling with the GIMPLE IL to be used for the
mask unpacking since we currently reject both the BIT_FIELD_REF
and the VIEW_CONVERT we generate (why do AVX512 masks not all have
SImode but sometimes QImode and sometimes HImode ...).  Unfortunately
we've dropped whole-vector shifts in favor of VEC_PERM but that
doesn't work well either for integer mode vectors.  So I'm still
playing with my options here and looking for something that doesn't
require too much surgery on the RTL side to recover good mask
register code ...

Another part missing is expanders for the various cond_* patterns

OPTAB_D (cond_add_optab, "cond_add$a")
OPTAB_D (cond_sub_optab, "cond_sub$a")
OPTAB_D (cond_smul_optab, "cond_mul$a")
OPTAB_D (cond_sdiv_optab, "cond_div$a")
OPTAB_D (cond_smod_optab, "cond_mod$a")
OPTAB_D (cond_udiv_optab, "cond_udiv$a")
OPTAB_D (cond_umod_optab, "cond_umod$a")
OPTAB_D (cond_and_opta

Re: [committed] RISC-V: Detect python and pick best one for calling multilib-generator

2021-07-20 Thread Andreas Schwab
On Jul 20 2021, Kito Cheng wrote:

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 93e2b3219b9..3df9b52cf25 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4730,9 +4730,10 @@ case "${target}" in
>   echo "--with-multilib-list= can't used with 
> --with-multilib-generator= at same time" 1>&2
>   exit 1
>   fi
> + PYTHON=`which python || which python3 || which python2`

which is a non-standard utility.  Additionally, you will get extra
output on stderr when one of the commands is not found.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [committed] add test for PR 86650

2021-07-20 Thread Bin.Cheng via Gcc-patches
On Wed, Jul 7, 2021 at 5:39 AM Martin Sebor via Gcc-patches
 wrote:
>
> The recent patch series to improve warning suppression for inlined
> functions [PR98512] also implicitly includes the inlining context
> in all warning messages for inlined code.  In r12-2091 I have
> committed the attached test to verify that -Warray-bounds too
> includes this context (its absence its the subject of PR 86650).
Hi,
It seems this patch exposes/causes uninitialized warning in arm_neon.h
like the following one:

__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vst2q_s32 (int32_t * __a, int32x4x2_t __val)
{
  __builtin_aarch64_simd_oi __o;
  __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __val.val[0], 0);
  __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __val.val[1], 1);
  __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o);
}

Thanks,
bin


RE: [committed] add test for PR 86650

2021-07-20 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Bin.Cheng 
> Sent: 20 July 2021 09:26
> To: Martin Sebor ; Kyrylo Tkachov
> 
> Cc: gcc-patches 
> Subject: Re: [committed] add test for PR 86650
> 
> On Wed, Jul 7, 2021 at 5:39 AM Martin Sebor via Gcc-patches
>  wrote:
> >
> > The recent patch series to improve warning suppression for inlined
> > functions [PR98512] also implicitly includes the inlining context
> > in all warning messages for inlined code.  In r12-2091 I have
> > committed the attached test to verify that -Warray-bounds too
> > includes this context (its absence its the subject of PR 86650).
> Hi,
> It seems this patch exposes/causes uninitialized warning in arm_neon.h
> like the following one:
> 
> __extension__ extern __inline void
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> vst2q_s32 (int32_t * __a, int32x4x2_t __val)
> {
>   __builtin_aarch64_simd_oi __o;
>   __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __val.val[0], 0);
>   __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __val.val[1], 1);
>   __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o);
> }

I believe Jonathan is working in this area to rework these intrinsics.
Can you file a bug report to track this please.
Thanks,
Kyrill

> 
> Thanks,
> bin


'#pragma GCC diagnostic' (mis-)use in 'statement' of 'if' (was: [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-boun

2021-07-20 Thread Thomas Schwinge
Hi!

On 2021-07-20T09:23:24+0200, I wrote:
> On 2021-07-19T10:46:35+0200, I wrote:
>> | On 7/16/21 11:42 AM, Thomas Schwinge wrote:
>> |> On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
>>  wrote:
>> |>> The attached tweak avoids the new -Warray-bounds instances when
>> |>> building libatomic for arm. Christophe confirms it resolves
>> |>> the problem (thank you!)
>> |>
>> |> As Abid has just reported in
>> |> , similar
>> |> problem with GCN target libgomp build:
>> |>
>> |>  In function ‘gcn_thrs’,
>> |>  inlined from ‘gomp_thread’ at 
>> [...]/source-gcc/libgomp/libgomp.h:803:10,
>> |>  inlined from ‘GOMP_barrier’ at 
>> [...]/source-gcc/libgomp/barrier.c:34:29:
>> |>  [...]/source-gcc/libgomp/libgomp.h:792:10: error: array subscript 0 
>> is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ 
>> [-Werror=array-bounds]
>> |>792 |   return *thrs;
>> |>|  ^
>> |>
>> |>  gcc/config/gcn/gcn.h:  c_register_addr_space ("__lds", 
>> ADDR_SPACE_LDS);   \
>> |>
>> |>  libgomp/libgomp.h-static inline struct gomp_thread *gcn_thrs (void)
>> |>  libgomp/libgomp.h-{
>> |>  libgomp/libgomp.h-  /* The value is at the bottom of LDS.  */
>> |>  libgomp/libgomp.h:  struct gomp_thread * __lds *thrs = (struct 
>> gomp_thread * __lds *)4;
>> |>  libgomp/libgomp.h-  return *thrs;
>> |>  libgomp/libgomp.h-}
>> |>
>> |> ..., plus a few more.  Work-around:
>> |>
>> |> struct gomp_thread * __lds *thrs = (struct gomp_thread * __lds 
>> *)4;
>> |>  +# pragma GCC diagnostic push
>> |>  +# pragma GCC diagnostic ignored "-Warray-bounds"
>> |> return *thrs;
>> |>  +# pragma GCC diagnostic pop
>> |>
>> |> ..., but it's a bit tedious to add that in all that the other places,
>> |> too.
>>
>> Wasn't so bad after all; a lot of duplicates due to 'libgomp.h'.  I've
>> thus pushed "[gcn] Work-around libgomp 'error: array subscript 0 is
>> outside array bounds of ‘__lds struct gomp_thread * __lds[0]’
>> [-Werror=array-bounds]' [PR101484]" to master branch in commit
>> 9f2bc5077debef2b046b6c10d38591ac324ad8b5, see attached.
>
> As I should find, these '#pragma GCC diagnostic [...]' directives cause
> some code generation changes (that seems unexpected, problematic!).
> (Martin, any idea?  Might be a pre-existing problem, of course.)

OK, phew.  Martin: your diagnostic changes are *not* to be blamed for
code generation changes -- it's my '#pragma GCC diagnostic pop'
placement that triggers:

> This
> results in a lot (ten thousands) of 'GCN team arena exhausted' run-time
> diagnostics, also leading to a few FAILs:
>
> PASS: libgomp.c/../libgomp.c-c++-common/for-11.c (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-11.c execution 
> test
>
> PASS: libgomp.c/../libgomp.c-c++-common/for-12.c (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-12.c execution 
> test
>
> PASS: libgomp.c/../libgomp.c-c++-common/for-3.c (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-3.c execution 
> test
>
> PASS: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-5.c execution 
> test
>
> PASS: libgomp.c/../libgomp.c-c++-common/for-6.c (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-6.c execution 
> test
>
> PASS: libgomp.c/../libgomp.c-c++-common/for-9.c (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-9.c execution 
> test
>
> Same for 'libgomp.c++'.
>
> It remains to be analyzed how '#pragma GCC diagnostic [...]' directives
> can cause code generation changes; for now I'm working around the
> "unexpected" '-Werror=array-bounds' diagnostics differently:

In addition to a few in straight-line code, I also had these two:

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -128,7 +128,10 @@ team_malloc (size_t size)
> : "=v"(result) : "v"(TEAM_ARENA_FREE), "v"(size), "e"(1L) : "memory");
>
>/* Handle OOM.  */
> +# pragma GCC diagnostic push
> +# pragma GCC diagnostic ignored "-Warray-bounds" /*TODO PR101484 */
>if (result + size > *(void * __lds *)TEAM_ARENA_END)
> +# pragma GCC diagnostic pop
>  {
>/* While this is experimental, let's make sure we know when OOM
>happens.  */
> @@ -162,8 +159,11 @@ team_free (void *ptr)
>   However, if we fell back to using heap then we should free it.
>   It would be better if this function could be a no-op, but at least
>   LDS loads are cheap.  */
> +# pragma GCC diagnostic push
> +# pragma GCC diagnostic ignored "-Warray-bounds" /*TODO PR101484 */
>if (ptr < *(void * __lds *)TEAM_ARENA_START
>|| ptr >= *(void * __lds *)TEAM_ARENA_END)
> +# pragma GCC diagnostic pop
> 

RE: [RFC] ipa: Adjust references to identify read-only globals

2021-07-20 Thread JiangNing OS via Gcc-patches
> -Original Message-
> From: Gcc-patches  bounces+jiangning=os.amperecomputing@gcc.gnu.org> On Behalf Of
> Martin Jambor
> Sent: Wednesday, June 30, 2021 4:19 AM
> To: GCC Patches 
> Cc: Jan Hubicka 
> Subject: [RFC] ipa: Adjust references to identify read-only globals
> 
> Hi,
> 
> this patch has been motivated by SPEC 2017's 544.nab_r in which there is a
> static variable which is never written to and so zero throughout the run-time
> of the benchmark.  However, it is passed by reference to a function in which
> it is read and (after some multiplications) passed into __builtin_exp which in
> turn unnecessarily consumes almost 10% of the total benchmark run-time.

I do see ~8.5% runtime reduction on aarch64.

> The situation is illustrated by the added testcase remref-3.c.
> 
> The patch adds a flag to ipa-prop descriptor of each parameter to mark such
> parameters.  IPA-CP and inling then take the effort to remove IPA_REF_ADDR
> references in the caller and only add IPA_REF_LOAD reference to the
> clone/overall inlined function.  This is sufficient for subsequent symbol 
> table
> analysis code to identify the read-only variable as such and optimize the 
> code.
> 
> I plan to compile a number of packages with the patch to test it some more
> and get a bit better idea of its impact.  But it has passed bootstrap,
> LTObootstrap and testing on x86_64-linux and i686-linux and so unless I find
> any problem, I would like to commit it at some point next month without any
> major changes, so I'd be grateful for any feedback even now.

I see 3 cases in SPEC2017 failed to compile on aarch64, i.e. 521.wrf_r, 
527.cam4_r, 554.roms_r. For example,

pre_step3d.fppized.f90:1260:35: internal compiler error: Segmentation fault
 1260 |   CALL wclock_on (ng, iNLM, 22)
  |   ^
0x1645c6b internal_error(char const*, ...)
???:0
0xe1f4f4 place_block_symbol(rtx_def*)
???:0
0x84ab33 use_anchored_address(rtx_def*)
???:0
0x868203 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
???:0
0x868793 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
???:0
0x75b593 expand_call(tree_node*, rtx_def*, int)
???:0
0x86a09f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
???:0
Please submit a full bug report

Thanks,
-Jiangning


Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Kewen.Lin via Gcc-patches
on 2021/7/19 下午2:26, Andrew Pinski wrote:
> On Sun, Jul 18, 2021 at 11:21 PM Kewen.Lin via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> This patch follows Martin's suggestion here[1], to support
>> range-based for loops for traversing loops, analogously to
>> the patch for vec[2].
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Any comments are appreciated.
> 
> +1 from me (note I did not review the patch but I like the idea).
> 

Thanks Andrew!  It's actually Martin's idea.  :)

BR,
Kewen

> Thanks,
> Andrew
> 
>>
>> BR,
>> Kewen
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573424.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html
>> -
>> gcc/ChangeLog:
>>
>> * cfgloop.h (class loop_iterator): Rename to ...
>> (class loops_list): ... this.
>> (loop_iterator::next): Rename to ...
>> (loops_list::iterator::fill_curr_loop): ... this and adjust.
>> (loop_iterator::loop_iterator): Rename to ...
>> (loops_list::loops_list): ... this and adjust.
>> (FOR_EACH_LOOP): Rename to ...
>> (ALL_LOOPS): ... this.
>> (FOR_EACH_LOOP_FN): Rename to ...
>> (ALL_LOOPS_FN): this.
>> (loops_list::iterator): New class.
>> (loops_list::begin): New function.
>> (loops_list::end): Likewise.
>> * cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with ALL_LOOPS*.
>> (sort_sibling_loops): Likewise.
>> (disambiguate_loops_with_multiple_latches): Likewise.
>> (verify_loop_structure): Likewise.
>> * cfgloopmanip.c (create_preheaders): Likewise.
>> (force_single_succ_latches): Likewise.
>> * config/aarch64/falkor-tag-collision-avoidance.c
>> (execute_tag_collision_avoidance): Likewise.
>> * config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
>> * config/s390/s390.c (s390_adjust_loops): Likewise.
>> * doc/loop.texi: Likewise.
>> * gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
>> * gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
>> * gimple-loop-versioning.cc (loop_versioning::analyze_blocks): 
>> Likewise.
>> (loop_versioning::make_versioning_decisions): Likewise.
>> * gimple-ssa-split-paths.c (split_paths): Likewise.
>> * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): 
>> Likewise.
>> * graphite.c (canonicalize_loop_form): Likewise.
>> (graphite_transform_loops): Likewise.
>> * ipa-fnsummary.c (analyze_function_body): Likewise.
>> * ipa-pure-const.c (analyze_function): Likewise.
>> * loop-doloop.c (doloop_optimize_loops): Likewise.
>> * loop-init.c (loop_optimizer_finalize): Likewise.
>> (fix_loop_structure): Likewise.
>> * loop-invariant.c (calculate_loop_reg_pressure): Likewise.
>> (move_loop_invariants): Likewise.
>> * loop-unroll.c (decide_unrolling): Likewise.
>> (unroll_loops): Likewise.
>> * modulo-sched.c (sms_schedule): Likewise.
>> * predict.c (predict_loops): Likewise.
>> (pass_profile::execute): Likewise.
>> * profile.c (branch_prob): Likewise.
>> * sel-sched-ir.c (sel_finish_pipelining): Likewise.
>> (sel_find_rgns): Likewise.
>> * tree-cfg.c (replace_loop_annotate): Likewise.
>> (replace_uses_by): Likewise.
>> (move_sese_region_to_fn): Likewise.
>> * tree-if-conv.c (pass_if_conversion::execute): Likewise.
>> * tree-loop-distribution.c (loop_distribution::execute): Likewise.
>> * tree-parloops.c (parallelize_loops): Likewise.
>> * tree-predcom.c (tree_predictive_commoning): Likewise.
>> * tree-scalar-evolution.c (scev_initialize): Likewise.
>> (scev_reset): Likewise.
>> * tree-ssa-dce.c (find_obviously_necessary_stmts): Likewise.
>> * tree-ssa-live.c (remove_unused_locals): Likewise.
>> * tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
>> * tree-ssa-loop-im.c (analyze_memory_references): Likewise.
>> (tree_ssa_lim_initialize): Likewise.
>> * tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): 
>> Likewise.
>> * tree-ssa-loop-ivopts.c (tree_ssa_iv_optimize): Likewise.
>> * tree-ssa-loop-manip.c (get_loops_exits): Likewise.
>> * tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Likewise.
>> (free_numbers_of_iterations_estimates): Likewise.
>> * tree-ssa-loop-prefetch.c (tree_ssa_prefetch_arrays): Likewise.
>> * tree-ssa-loop-split.c (tree_ssa_split_loops): Likewise.
>> * tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Likewise.
>> * tree-ssa-loop.c (gate_oacc_kernels): Likewise.
>> (pass_scev_cprop::execute): Likewise.
>>

Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Kewen.Lin via Gcc-patches
on 2021/7/19 下午10:08, Jonathan Wakely wrote:
> On Mon, 19 Jul 2021 at 07:20, Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This patch follows Martin's suggestion here[1], to support
>> range-based for loops for traversing loops, analogously to
>> the patch for vec[2].
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Any comments are appreciated.
> 
> In the loops_list::iterator type, this looks a little strange:
> 
> +bool
> +operator!= (const iterator &rhs) const
> +{
> +  return this->curr_idx < rhs.curr_idx;
> +}
> +
> 
> This works fine when the iterator type is used implicitly in a
> range-based for loop, but it wouldn't work for explicit uses of the
> iterator type where somebody does the != comparison with the
> past-the-end iterator on on the LHS:
> 
> auto&& list ALL_LOOPS(foo);
> auto end = list.end();
> auto begin = list.begin();
> while (--end != begin)
> 

Thanks for the comments, Jonathan.  Yeah, to use "!=" is better
for clear meaning and later extension.  It was under the assumption
that the index can only increase (only supports operator++()), so
I simply used "<".  Will fix it in V2.

BR,
Kewen


Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Kewen.Lin via Gcc-patches
on 2021/7/19 下午10:34, Richard Biener wrote:
> On Mon, Jul 19, 2021 at 8:20 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This patch follows Martin's suggestion here[1], to support
>> range-based for loops for traversing loops, analogously to
>> the patch for vec[2].
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Any comments are appreciated.
> 
> Since you are touching all FOR_EACH_LOOP please
> make implicit 'cfun' uses explicit.  I'm not sure ALL_LOOPS
> should scream, I think all_loops (function *, flags) would be
> nicer.
> 
> Note I'm anticipating iteration over a subset of the loop tree
> which would ask for specifying the 'root' of the loop tree to
> iterate over so it could be
> 
>   loops_list (class loop *root, unsigned flags)
> 
> and the "all" cases use loops_list (loops_for_fn (cfun), flags) then.
> Providing an overload with struct function is of course OK.
> 

Thanks for the comments, Richi.  Will update them in V2. 
I noticed the current loop_iterator requires a struct loops*
for LI_ONLY_INNERMOST, if you don't mind, I will use

  loops_list (class loops *loops, unsigned flags)

instead to make LI_ONLY_INNERMOST happy.  Your mentioned root can
be just the tree_root of the input loops.

BR,
Kewen




Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Kewen.Lin via Gcc-patches
on 2021/7/19 下午11:59, Martin Sebor wrote:
> On 7/19/21 12:20 AM, Kewen.Lin wrote:
>> Hi,
>>
>> This patch follows Martin's suggestion here[1], to support
>> range-based for loops for traversing loops, analogously to
>> the patch for vec[2].
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Any comments are appreciated.
> 
> Thanks for this nice cleanup!  Just a few suggestions:
> 
> I would recommend against introducing new macros unless they
> offer a significant advantage over alternatives (for the two
> macros the patch adds I don't think they do).
> 
> If improving const-correctness is one of our a goals
> the loops_list iterator type would need to a corresponding
> const_iterator type, and const overloads of the begin()
> and end() member functions.
> 
> Rather than introducing more instances of the loop_p typedef
> I'd suggest to use loop *.  It has at least two advantages:
> it's clearer (it's obvious it refers to a pointer), and lends
> itself more readily to making code const-correct by declaring
> the control variable const: for (const class loop *loop: ...)
> while avoiding the mistake of using const loop_p loop to
> declare a pointer to a const loop.
> 

Thanks for the suggestions, Martin!  Will update them in V2.

With some experiments, I noticed that even provided const_iterator
like:

   iterator
   begin ()
   {
 return iterator (*this, 0);
   }

+  const_iterator
+  begin () const
+  {
+return const_iterator (*this, 0);
+  }

for (const class loop *loop: ...) will still use iterator instead
of const_iterator pair.  We have to make the code look like:

  const auto& const_loops = loops_list (...);
  for (const class loop *loop: const_loops)

or
  template constexpr const T &as_const(T &t) noexcept { return t; }
  for (const class loop *loop: as_const(loops_list...)) 

Does it look good to add below as_const along with loops_list in cfgloop.h?

+/* Provide the functionality of std::as_const to support range-based for
+   to use const iterator.  (We can't use std::as_const itself because it's
+   a C++17 feature.)  */
+template 
+constexpr const T &
+as_const (T &t) noexcept
+{
+  return t;
+}
+

BR,
Kewen


[PATCH] debug/101473 - apply debug prefix maps before checksumming DIEs

2021-07-20 Thread Richard Biener
The following makes sure to apply the debug prefix maps to filenames
before checksumming DIEs to create the global symbol for the CU DIE
used by LTO to link the late debug to the early debug.  This avoids
binary differences (in said symbol) when compiling with toolchains
installed under a different path and that compensated with appropriate
-fdebug-prefix-map options.

The easiest and most scalable way is to record both the unmapped
and the remapped filename in the dwarf_file_data so the remapping
process takes place at a single point and only once (otherwise it
creates GC garbage at each point doing that).

Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK?

Thanks,
Richard.

2021-07-20  Richard Biener  

PR debug/101473
* dwarf2out.h (dwarf_file_data): Add key member.
* dwarf2out.c (dwarf_file_hasher::equal): Compare key.
(dwarf_file_hasher::hash): Hash key.
(lookup_filename): Remap the filename and store it in the
filename member of dwarf_file_data when creating a new
dwarf_file_data.
(file_name_acquire): Do not remap the filename again.
(maybe_emit_file): Likewise.
---
 gcc/dwarf2out.c | 12 ++--
 gcc/dwarf2out.h |  1 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 82783c4968b..884f1e191c6 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -12424,7 +12424,7 @@ file_name_acquire (dwarf_file_data **slot, 
file_name_acquire_data *fnad)
 
   fi = fnad->files + fnad->used_files++;
 
-  f = remap_debug_filename (d->filename);
+  f = d->filename;
 
   /* Skip all leading "./".  */
   while (f[0] == '.' && IS_DIR_SEPARATOR (f[1]))
@@ -27460,13 +27460,13 @@ dwarf2out_ignore_block (const_tree block)
 bool
 dwarf_file_hasher::equal (dwarf_file_data *p1, const char *p2)
 {
-  return filename_cmp (p1->filename, p2) == 0;
+  return filename_cmp (p1->key, p2) == 0;
 }
 
 hashval_t
 dwarf_file_hasher::hash (dwarf_file_data *p)
 {
-  return htab_hash_string (p->filename);
+  return htab_hash_string (p->key);
 }
 
 /* Lookup FILE_NAME (in the list of filenames that we know about here in
@@ -27496,7 +27496,8 @@ lookup_filename (const char *file_name)
 return *slot;
 
   created = ggc_alloc ();
-  created->filename = file_name;
+  created->key = file_name;
+  created->filename = remap_debug_filename (file_name);
   created->emitted_number = 0;
   *slot = created;
   return created;
@@ -27522,8 +27523,7 @@ maybe_emit_file (struct dwarf_file_data * fd)
   if (output_asm_line_debug_info ())
{
  fprintf (asm_out_file, "\t.file %u ", fd->emitted_number);
- output_quoted_string (asm_out_file,
-   remap_debug_filename (fd->filename));
+ output_quoted_string (asm_out_file, fd->filename);
  fputc ('\n', asm_out_file);
}
 }
diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
index 057afdb53a0..b2152a53bf9 100644
--- a/gcc/dwarf2out.h
+++ b/gcc/dwarf2out.h
@@ -424,6 +424,7 @@ extern enum dwarf_tag dw_get_die_tag (dw_die_ref);
 
 /* Data about a single source file.  */
 struct GTY((for_user)) dwarf_file_data {
+  const char * key;
   const char * filename;
   int emitted_number;
 };
-- 
2.26.2


Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Jonathan Wakely via Gcc-patches
On Tue, 20 Jul 2021 at 09:58, Kewen.Lin  wrote:
>
> on 2021/7/19 下午11:59, Martin Sebor wrote:
> > On 7/19/21 12:20 AM, Kewen.Lin wrote:
> >> Hi,
> >>
> >> This patch follows Martin's suggestion here[1], to support
> >> range-based for loops for traversing loops, analogously to
> >> the patch for vec[2].
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu, also
> >> bootstrapped on ppc64le P9 with bootstrap-O3 config.
> >>
> >> Any comments are appreciated.
> >
> > Thanks for this nice cleanup!  Just a few suggestions:
> >
> > I would recommend against introducing new macros unless they
> > offer a significant advantage over alternatives (for the two
> > macros the patch adds I don't think they do).
> >
> > If improving const-correctness is one of our a goals
> > the loops_list iterator type would need to a corresponding
> > const_iterator type, and const overloads of the begin()
> > and end() member functions.
> >
> > Rather than introducing more instances of the loop_p typedef
> > I'd suggest to use loop *.  It has at least two advantages:
> > it's clearer (it's obvious it refers to a pointer), and lends
> > itself more readily to making code const-correct by declaring
> > the control variable const: for (const class loop *loop: ...)
> > while avoiding the mistake of using const loop_p loop to
> > declare a pointer to a const loop.
> >
>
> Thanks for the suggestions, Martin!  Will update them in V2.
>
> With some experiments, I noticed that even provided const_iterator
> like:
>
>iterator
>begin ()
>{
>  return iterator (*this, 0);
>}
>
> +  const_iterator
> +  begin () const
> +  {
> +return const_iterator (*this, 0);
> +  }
>
> for (const class loop *loop: ...) will still use iterator instead
> of const_iterator pair.  We have to make the code look like:
>
>   const auto& const_loops = loops_list (...);
>   for (const class loop *loop: const_loops)
>
> or
>   template constexpr const T &as_const(T &t) noexcept { return t; 
> }
>   for (const class loop *loop: as_const(loops_list...))
>
> Does it look good to add below as_const along with loops_list in cfgloop.h?
>
> +/* Provide the functionality of std::as_const to support range-based for
> +   to use const iterator.  (We can't use std::as_const itself because it's
> +   a C++17 feature.)  */
> +template 
> +constexpr const T &
> +as_const (T &t) noexcept

The noexcept is not needed because GCC is built -fno-exceptions. For
consistency with all the other code that doesn't use noexcept, it
should probably not be there.

> +{
> +  return t;
> +}
> +

That's one option. Another option (which could coexist with as_const)
is to add cbegin() and cend() members, which are not overloaded for
const and non-const, and so always return a const_iterator:

const_iterator cbegin () const { return const_iterator (*this, 0); }
iterator begin () const { return cbegin(); }

And similarly for `end () const` and `cend () const`.


[PATCH] libcpp: __VA_OPT__ p1042r1 placemarker changes [PR101488]

2021-07-20 Thread Jakub Jelinek via Gcc-patches
Hi!

So, besides missing #__VA_OPT__ patch for which I've posted patch last week,
P1042R1 introduced some placemarker changes for __VA_OPT__, most notably
the addition of before "removal of placemarker tokens," rescanning ...
and the
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1)  // replaced by a b
example mentioned there where we replace it currently with ab

The following patch are the minimum changes (except for the
__builtin_expect) that achieve the same preprocessing between current
clang++ and patched gcc on all the testcases I've tried (i.e. gcc __VA_OPT__
testsuite in c-c++-common/cpp/va-opt* including the new test and the clang
clang/test/Preprocessor/macro_va_opt* testcases).

At one point I was trying to implement the __VA_OPT__(args) case as if
for non-empty __VA_ARGS__ it expanded as if __VA_OPT__( and ) were missing,
but from the tests it seems that is not how it should work, in particular
if after (or before) we have some macro argument and it is not followed
(or preceded) by ##, then it should be macro expanded even when __VA_OPT__
is after ## or ) is followed by ##.  And it seems that not removing any
padding tokens isn't possible either, because the expansion of the arguments
typically has a padding token at the start and end and those at least
according to the testsuite need to go.  It is unclear if it would be enough
to remove just one or if all padding tokens should be removed.
Anyway, e.g. the previous removal of all padding tokens at the end of
__VA_OPT__ is undesirable, as it e.g. eats also the padding tokens needed
for the H4 example from the paper.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-20  Jakub Jelinek  

PR preprocessor/101488
* macro.c (replace_args): Fix up handling of CPP_PADDING tokens at the
start or end of __VA_OPT__ arguments when preceeded or followed by ##.

* c-c++-common/cpp/va-opt-3.c: Adjust expected output.
* c-c++-common/cpp/va-opt-7.c: New test.

--- libcpp/macro.c.jj   2021-07-16 11:10:08.512925510 +0200
+++ libcpp/macro.c  2021-07-19 15:58:59.819101659 +0200
@@ -2025,6 +2026,7 @@ replace_args (cpp_reader *pfile, cpp_has
   i = 0;
   vaopt_state vaopt_tracker (pfile, macro->variadic, &args[macro->paramc - 1]);
   const cpp_token **vaopt_start = NULL;
+  unsigned vaopt_padding_tokens = 0;
   for (src = macro->exp.tokens; src < limit; src++)
 {
   unsigned int arg_tokens_count;
@@ -2034,7 +2036,7 @@ replace_args (cpp_reader *pfile, cpp_has
 
   /* __VA_OPT__ handling.  */
   vaopt_state::update_type vostate = vaopt_tracker.update (src);
-  if (vostate != vaopt_state::INCLUDE)
+  if (__builtin_expect (vostate != vaopt_state::INCLUDE, false))
{
  if (vostate == vaopt_state::BEGIN)
{
@@ -2059,7 +2061,9 @@ replace_args (cpp_reader *pfile, cpp_has
 
  /* Remove any tail padding from inside the __VA_OPT__.  */
  paste_flag = tokens_buff_last_token_ptr (buff);
- while (paste_flag && paste_flag != start
+ while (vaopt_padding_tokens--
+&& paste_flag
+&& paste_flag != start
 && (*paste_flag)->type == CPP_PADDING)
{
  tokens_buff_remove_last_token (buff);
@@ -2103,6 +2107,7 @@ replace_args (cpp_reader *pfile, cpp_has
  continue;
}
 
+  vaopt_padding_tokens = 0;
   if (src->type != CPP_MACRO_ARG)
{
  /* Allocate a virtual location for token SRC, and add that
@@ -2180,11 +2185,8 @@ replace_args (cpp_reader *pfile, cpp_has
  else
paste_flag = tmp_token_ptr;
}
- /* Remove the paste flag if the RHS is a placemarker, unless the
-previous emitted token is at the beginning of __VA_OPT__;
-placemarkers within __VA_OPT__ are ignored in that case.  */
- else if (arg_tokens_count == 0
-  && tmp_token_ptr != vaopt_start)
+ /* Remove the paste flag if the RHS is a placemarker.  */
+ else if (arg_tokens_count == 0)
paste_flag = tmp_token_ptr;
}
}
@@ -2259,8 +2262,12 @@ replace_args (cpp_reader *pfile, cpp_has
token_index += j;
 
  index = expanded_token_index (pfile, macro, src, token_index);
- tokens_buff_add_token (buff, virt_locs,
-macro_arg_token_iter_get_token (&from),
+ const cpp_token *tok = macro_arg_token_iter_get_token (&from);
+ if (tok->type == CPP_PADDING)
+   vaopt_padding_tokens++;
+ else
+   vaopt_padding_tokens = 0;
+ tokens_buff_add_token (buff, virt_locs, tok,
 macro_arg_token_iter_get_location (&from),
 src->src_loc, map, index);
  macro_a

Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Jonathan Wakely via Gcc-patches
On Tue, 20 Jul 2021 at 10:49, Jonathan Wakely  wrote:
>
> On Tue, 20 Jul 2021 at 09:58, Kewen.Lin  wrote:
> >
> > on 2021/7/19 下午11:59, Martin Sebor wrote:
> > > On 7/19/21 12:20 AM, Kewen.Lin wrote:
> > >> Hi,
> > >>
> > >> This patch follows Martin's suggestion here[1], to support
> > >> range-based for loops for traversing loops, analogously to
> > >> the patch for vec[2].
> > >>
> > >> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> > >> x86_64-redhat-linux and aarch64-linux-gnu, also
> > >> bootstrapped on ppc64le P9 with bootstrap-O3 config.
> > >>
> > >> Any comments are appreciated.
> > >
> > > Thanks for this nice cleanup!  Just a few suggestions:
> > >
> > > I would recommend against introducing new macros unless they
> > > offer a significant advantage over alternatives (for the two
> > > macros the patch adds I don't think they do).
> > >
> > > If improving const-correctness is one of our a goals
> > > the loops_list iterator type would need to a corresponding
> > > const_iterator type, and const overloads of the begin()
> > > and end() member functions.
> > >
> > > Rather than introducing more instances of the loop_p typedef
> > > I'd suggest to use loop *.  It has at least two advantages:
> > > it's clearer (it's obvious it refers to a pointer), and lends
> > > itself more readily to making code const-correct by declaring
> > > the control variable const: for (const class loop *loop: ...)
> > > while avoiding the mistake of using const loop_p loop to
> > > declare a pointer to a const loop.
> > >
> >
> > Thanks for the suggestions, Martin!  Will update them in V2.
> >
> > With some experiments, I noticed that even provided const_iterator
> > like:
> >
> >iterator
> >begin ()
> >{
> >  return iterator (*this, 0);
> >}
> >
> > +  const_iterator
> > +  begin () const
> > +  {
> > +return const_iterator (*this, 0);
> > +  }
> >
> > for (const class loop *loop: ...) will still use iterator instead
> > of const_iterator pair.  We have to make the code look like:
> >
> >   const auto& const_loops = loops_list (...);
> >   for (const class loop *loop: const_loops)
> >
> > or
> >   template constexpr const T &as_const(T &t) noexcept { return 
> > t; }
> >   for (const class loop *loop: as_const(loops_list...))
> >
> > Does it look good to add below as_const along with loops_list in cfgloop.h?
> >
> > +/* Provide the functionality of std::as_const to support range-based for
> > +   to use const iterator.  (We can't use std::as_const itself because it's
> > +   a C++17 feature.)  */
> > +template 
> > +constexpr const T &
> > +as_const (T &t) noexcept
>
> The noexcept is not needed because GCC is built -fno-exceptions. For
> consistency with all the other code that doesn't use noexcept, it
> should probably not be there.
>
> > +{
> > +  return t;
> > +}
> > +
>
> That's one option. Another option (which could coexist with as_const)
> is to add cbegin() and cend() members, which are not overloaded for
> const and non-const, and so always return a const_iterator:
>
> const_iterator cbegin () const { return const_iterator (*this, 0); }
> iterator begin () const { return cbegin(); }
>
> And similarly for `end () const` and `cend () const`.

The range-based for loop would not use cbegin and cend, so you'd still
want to use as_const for that purpose.


Re: [PATCH] debug/101473 - apply debug prefix maps before checksumming DIEs

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 20, 2021 at 11:07:26AM +0200, Richard Biener wrote:
> The following makes sure to apply the debug prefix maps to filenames
> before checksumming DIEs to create the global symbol for the CU DIE
> used by LTO to link the late debug to the early debug.  This avoids
> binary differences (in said symbol) when compiling with toolchains
> installed under a different path and that compensated with appropriate
> -fdebug-prefix-map options.
> 
> The easiest and most scalable way is to record both the unmapped
> and the remapped filename in the dwarf_file_data so the remapping
> process takes place at a single point and only once (otherwise it
> creates GC garbage at each point doing that).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK?
> 
> Thanks,
> Richard.
> 
> 2021-07-20  Richard Biener  
> 
>   PR debug/101473
>   * dwarf2out.h (dwarf_file_data): Add key member.
>   * dwarf2out.c (dwarf_file_hasher::equal): Compare key.
>   (dwarf_file_hasher::hash): Hash key.
>   (lookup_filename): Remap the filename and store it in the
>   filename member of dwarf_file_data when creating a new
>   dwarf_file_data.
>   (file_name_acquire): Do not remap the filename again.
>   (maybe_emit_file): Likewise.

Ok.

Jakub



[committed] dir-locals: Use https for bug references

2021-07-20 Thread Richard Earnshaw via Gcc-patches

We've been using https for web references for some time now.

ChangeLog:

* .dir-locals.el (bug-reference-url-format): Use https.
---
 .dir-locals.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.dir-locals.el b/.dir-locals.el
index b07a0dc50d8..fa031cbded9 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -17,7 +17,7 @@
 ((tcl-mode . ((tcl-indent-level . 4)
 	  (tcl-continued-indent-level . 4)
 	  (indent-tabs-mode . t)))
- (nil . ((bug-reference-url-format . "http://gcc.gnu.org/PR%s";)))
+ (nil . ((bug-reference-url-format . "https://gcc.gnu.org/PR%s";)))
  (c-mode . ((c-file-style . "GNU")
 	(indent-tabs-mode . t)
 	(fill-column . 79


[PATCH] aarch64: Don't include vec_select in SIMD multiply cost

2021-07-20 Thread Jonathan Wright via Gcc-patches
Hi,

The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon multiply cost function
to match the vec_select used by the lane-referencing forms of the
instructions already mentioned. This traversal prevents the cost of
the vec_select from being added into the cost of the multiply -
meaning that these instructions can now be emitted in the combine
pass as they are no longer deemed prohibitively expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-19  Jonathan Wright  

* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Traverse
RTL tree to prevents vec_select from being added into Neon
multiply cost.


rb14675.patch
Description: rb14675.patch


Re: PING 2 [PATCH] handle sanitizer built-ins in -Wuninitialized (PR 101300)

2021-07-20 Thread Jeff Law via Gcc-patches




On 7/19/2021 6:01 PM, Martin Sebor via Gcc-patches wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/12/21 12:06 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/2/21 1:21 PM, Martin Sebor wrote:

To avoid a class of false negatives for sanitized code
-Wuninitialized recognizes the ASAN_MARK internal function
doesn't modify its argument.  But the warning code doesn't do
the same for any sanitizer built-ins even though they don't
modify user-supplied arguments either.  This leaves another
class of false negatives unresolved.

The attached fix enhances the warning logic to recognize all
sanitizer built-ins as well and treat them as non-modifying.

Tested on x86_64-linux.

OK after fixing the "pointets" -> "pointers" typo.

Jeff



Re: [RFC] ipa: Adjust references to identify read-only globals

2021-07-20 Thread Richard Biener via Gcc-patches
On Tue, Jul 20, 2021 at 10:54 AM JiangNing OS via Gcc-patches
 wrote:
>
> > -Original Message-
> > From: Gcc-patches  > bounces+jiangning=os.amperecomputing@gcc.gnu.org> On Behalf Of
> > Martin Jambor
> > Sent: Wednesday, June 30, 2021 4:19 AM
> > To: GCC Patches 
> > Cc: Jan Hubicka 
> > Subject: [RFC] ipa: Adjust references to identify read-only globals
> >
> > Hi,
> >
> > this patch has been motivated by SPEC 2017's 544.nab_r in which there is a
> > static variable which is never written to and so zero throughout the 
> > run-time
> > of the benchmark.  However, it is passed by reference to a function in which
> > it is read and (after some multiplications) passed into __builtin_exp which 
> > in
> > turn unnecessarily consumes almost 10% of the total benchmark run-time.
>
> I do see ~8.5% runtime reduction on aarch64.
>
> > The situation is illustrated by the added testcase remref-3.c.
> >
> > The patch adds a flag to ipa-prop descriptor of each parameter to mark such
> > parameters.  IPA-CP and inling then take the effort to remove IPA_REF_ADDR
> > references in the caller and only add IPA_REF_LOAD reference to the
> > clone/overall inlined function.  This is sufficient for subsequent symbol 
> > table
> > analysis code to identify the read-only variable as such and optimize the 
> > code.
> >
> > I plan to compile a number of packages with the patch to test it some more
> > and get a bit better idea of its impact.  But it has passed bootstrap,
> > LTObootstrap and testing on x86_64-linux and i686-linux and so unless I find
> > any problem, I would like to commit it at some point next month without any
> > major changes, so I'd be grateful for any feedback even now.
>
> I see 3 cases in SPEC2017 failed to compile on aarch64, i.e. 521.wrf_r, 
> 527.cam4_r, 554.roms_r. For example,
>
> pre_step3d.fppized.f90:1260:35: internal compiler error: Segmentation fault
>  1260 |   CALL wclock_on (ng, iNLM, 22)
>   |   ^
> 0x1645c6b internal_error(char const*, ...)
> ???:0
> 0xe1f4f4 place_block_symbol(rtx_def*)
> ???:0
> 0x84ab33 use_anchored_address(rtx_def*)
> ???:0
> 0x868203 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ???:0
> 0x868793 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ???:0
> 0x75b593 expand_call(tree_node*, rtx_def*, int)
> ???:0
> 0x86a09f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ???:0
> Please submit a full bug report

Please file a bugreport and provide a (possibly reduced) testcase.

Thanks,
Richard.

> Thanks,
> -Jiangning


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-20 Thread Hongtao Liu via Gcc-patches
On Tue, Jul 20, 2021 at 3:38 PM Richard Biener  wrote:
>
> On Tue, 20 Jul 2021, Hongtao Liu wrote:
>
> > On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  wrote:
> > >
> > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > >
> > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > >
> > > > > OK, guess I was more looking at
> > > > >
> > > > > #define N 32
> > > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > > >  unsigned int *c, unsigned int * __restrict d,
> > > > >  int n)
> > > > > {
> > > > >   unsigned sum = 1;
> > > > >   for (int i = 0; i < n; ++i)
> > > > > {
> > > > >   b[i] += a[i];
> > > > >   d[i] += c[i];
> > > > > }
> > > > >   return sum;
> > > > > }
> > > > >
> > > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > > mask for V16SI.  With SVE I see
> > > > >
> > > > > punpklo p1.h, p0.b
> > > > > punpkhi p2.h, p0.b
> > > > >
> > > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > > mask and unpacking that to two V8DI ones.  But I see
> > > > >
> > > > > vpbroadcastd%eax, %ymm0
> > > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > > vpbroadcastd%eax, %xmm0
> > > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > > kortestb%k1, %k1
> > > > > jne .L3
> > > > >
> > > > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > > > the magic for SVE and why that doesn't trigger for x86 here.
> > > >
> > > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > > QImode so that doesn't play well here.  Not sure if there
> > > > are proper mask instructions to use (I guess there's a shift
> > > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> > Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> > unpack_lo/hi, only shift.
> > > > conversion.  Not sure how to better ask the target here - again
> > > > VnBImode might have been easier here.
> > >
> > > So I've managed to "emulate" the unpack_lo/hi for the case of
> > > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > > to turn vector(8)  into two
> > > vector(4) ) via BIT_FIELD_REF.  That then
> > > produces the desired single mask producer and
> > >
> > >   loop_mask_38 = VIEW_CONVERT_EXPR > > >(loop_mask_54);
> > >   loop_mask_37 = BIT_FIELD_REF ;
> > >
> > > note for the lowpart we can just view-convert away the excess bits,
> > > fully re-using the mask.  We generate surprisingly "good" code:
> > >
> > > kmovb   %k1, %edi
> > > shrb$4, %dil
> > > kmovb   %edi, %k2
> > >
> > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > a mask register alternative for
> > Yes, we can do it similar as kor/kand/kxor.
> > >
> > > (insn 22 20 25 4 (parallel [
> > > (set (reg:QI 94 [ loop_mask_37 ])
> > > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > > (const_int 4 [0x4])))
> > > (clobber (reg:CC 17 flags))
> > > ]) 724 {*lshrqi3_1}
> > >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > (nil)))
> > >
> > > and so we reload.  For the above cited loop the AVX512 vectorization
> > > with --param vect-partial-vector-usage=1 does look quite sensible
> > > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > > epilogue we get a single fully masked AVX512 "iteration" for both.
> > > I suppose it's still mostly a code-size optimization (384 bytes
> > > with the masked epiloge vs. 474 bytes with trunk) since it will
> > > be likely slower for very low iteration counts but it's good
> > > for icache usage then and good for less branch predictor usage.
> > >
> > > That said, I have to set up SPEC on a AVX512 machine to do
> > Does patch  land in trunk already, i can have a test on CLX.
>
> I'm still experimenting a bit right now but hope to get something
> trunk ready at the end of this or beginning next week.  Since it's
> disabled by default we can work on improving it during stage1 then.
>
> I'm mostly struggling with the GIMPLE IL to be used for the
> mask unpacking since we currently reject both the BIT_FIELD_REF
> and the VIEW_CONVERT we generate (why do AVX512 masks not all have
> SImode but sometimes QImode and sometimes HImode ...).  Unfortunately
We have  instruction like ktestb which only cases about the low 8
bits, if we use SImode for all masks, code implementation can become
complex.

> we've dropped whole-vector shifts in favor of VEC_PERM but that
> doesn't work well either for integer mode vectors.  So I'm still
> playing with my options here and looking for something that doesn't
> require too much surgery on t

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-20 Thread Richard Biener
On Tue, 20 Jul 2021, Hongtao Liu wrote:

> On Tue, Jul 20, 2021 at 3:38 PM Richard Biener  wrote:
> >
> > On Tue, 20 Jul 2021, Hongtao Liu wrote:
> >
> > > On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  wrote:
> > > >
> > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > >
> > > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > > >
> > > > > > OK, guess I was more looking at
> > > > > >
> > > > > > #define N 32
> > > > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > > > >  unsigned int *c, unsigned int * __restrict d,
> > > > > >  int n)
> > > > > > {
> > > > > >   unsigned sum = 1;
> > > > > >   for (int i = 0; i < n; ++i)
> > > > > > {
> > > > > >   b[i] += a[i];
> > > > > >   d[i] += c[i];
> > > > > > }
> > > > > >   return sum;
> > > > > > }
> > > > > >
> > > > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > > > mask for V16SI.  With SVE I see
> > > > > >
> > > > > > punpklo p1.h, p0.b
> > > > > > punpkhi p2.h, p0.b
> > > > > >
> > > > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > > > mask and unpacking that to two V8DI ones.  But I see
> > > > > >
> > > > > > vpbroadcastd%eax, %ymm0
> > > > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > > > vpbroadcastd%eax, %xmm0
> > > > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > > > kortestb%k1, %k1
> > > > > > jne .L3
> > > > > >
> > > > > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > > > > the magic for SVE and why that doesn't trigger for x86 here.
> > > > >
> > > > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > > > QImode so that doesn't play well here.  Not sure if there
> > > > > are proper mask instructions to use (I guess there's a shift
> > > > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> > > Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> > > unpack_lo/hi, only shift.
> > > > > conversion.  Not sure how to better ask the target here - again
> > > > > VnBImode might have been easier here.
> > > >
> > > > So I've managed to "emulate" the unpack_lo/hi for the case of
> > > > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > > > to turn vector(8)  into two
> > > > vector(4) ) via BIT_FIELD_REF.  That then
> > > > produces the desired single mask producer and
> > > >
> > > >   loop_mask_38 = VIEW_CONVERT_EXPR > > > >(loop_mask_54);
> > > >   loop_mask_37 = BIT_FIELD_REF ;
> > > >
> > > > note for the lowpart we can just view-convert away the excess bits,
> > > > fully re-using the mask.  We generate surprisingly "good" code:
> > > >
> > > > kmovb   %k1, %edi
> > > > shrb$4, %dil
> > > > kmovb   %edi, %k2
> > > >
> > > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > > a mask register alternative for
> > > Yes, we can do it similar as kor/kand/kxor.
> > > >
> > > > (insn 22 20 25 4 (parallel [
> > > > (set (reg:QI 94 [ loop_mask_37 ])
> > > > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > > > (const_int 4 [0x4])))
> > > > (clobber (reg:CC 17 flags))
> > > > ]) 724 {*lshrqi3_1}
> > > >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > > (nil)))
> > > >
> > > > and so we reload.  For the above cited loop the AVX512 vectorization
> > > > with --param vect-partial-vector-usage=1 does look quite sensible
> > > > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > > > epilogue we get a single fully masked AVX512 "iteration" for both.
> > > > I suppose it's still mostly a code-size optimization (384 bytes
> > > > with the masked epiloge vs. 474 bytes with trunk) since it will
> > > > be likely slower for very low iteration counts but it's good
> > > > for icache usage then and good for less branch predictor usage.
> > > >
> > > > That said, I have to set up SPEC on a AVX512 machine to do
> > > Does patch  land in trunk already, i can have a test on CLX.
> >
> > I'm still experimenting a bit right now but hope to get something
> > trunk ready at the end of this or beginning next week.  Since it's
> > disabled by default we can work on improving it during stage1 then.
> >
> > I'm mostly struggling with the GIMPLE IL to be used for the
> > mask unpacking since we currently reject both the BIT_FIELD_REF
> > and the VIEW_CONVERT we generate (why do AVX512 masks not all have
> > SImode but sometimes QImode and sometimes HImode ...).  Unfortunately
> We have  instruction like ktestb which only cases about the low 8
> bits, if we use SImode for all masks, code impleme

Re: [PATCH] predcom: Refactor more using auto_vec

2021-07-20 Thread Richard Biener via Gcc-patches
On Mon, Jul 19, 2021 at 8:29 AM Kewen.Lin  wrote:
>
> Hi Martin & Richard,
>
> >> A further improvement worth considering (if you're so inclined :)
> >> is replacing the pcom_worker vec members with auto_vec (obviating
> >> having to explicitly release them) and for the same reason also
> >> replacing the comp_ptrs bare pointer members with auto_vecs.
> >> There may be other opportunities to do the same in individual
> >> functions (I'd look to get rid of as many calls to functions
> >> like XNEW()/XNEWVEC() and free() use auto_vec instead).
> >>
> >> An unrelated but worthwhile change is to replace the FOR_EACH_
> >> loops with C++ 11 range loops, analogously to:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html
> >>
> >> Finally, the only loosely followed naming convention for member
> >> variables is to start them with the m_ prefix.
> >>
> >> These just suggestions that could be done in a followup, not
> >> something I would consider prerequisite for accepting the patch
> >> as is if I were in a position to make such a decision.
> >>
>
> Sorry for the late update, this patch follows your previous
> advices to refactor it more by:
>   - Adding m_ prefix for class pcom_worker member variables.
>   - Using auto_vec instead of vec among class pcom_worker,
> chain, component and comp_ptrs.
>
> btw, the changes in tree-data-ref.[ch] is required, without
> it the destruction of auto_vec instance could try to double
> free the memory pointed by m_vec.
>
> The suggestion on range loops is addressed by one separated
> patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575536.html
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * tree-data-ref.c (free_dependence_relations): Adjust to pass vec by
> reference.
> (free_data_refs): Likewise.
> * tree-data-ref.h (free_dependence_relations): Likewise.
> (free_data_refs): Likewise.
> * tree-predcom.c (struct chain): Use auto_vec instead of vec for
> members.
> (struct component): Likewise.
> (pcom_worker::pcom_worker): Adjust for auto_vec and renaming changes.
> (pcom_worker::~pcom_worker): Likewise.
> (pcom_worker::release_chain): Adjust as auto_vec changes.
> (pcom_worker::loop): Rename to ...
> (pcom_worker::m_loop): ... this.
> (pcom_worker::datarefs): Rename to ...
> (pcom_worker::m_datarefs): ... this.  Use auto_vec instead of vec.
> (pcom_worker::dependences): Rename to ...
> (pcom_worker::m_dependences): ... this.  Use auto_vec instead of vec.
> (pcom_worker::chains): Rename to ...
> (pcom_worker::m_chains): ... this.  Use auto_vec instead of vec.
> (pcom_worker::looparound_phis): Rename to ...
> (pcom_worker::m_looparound_phis): ... this.  Use auto_vec instead of
> vec.
> (pcom_worker::cache): Rename to ...
> (pcom_worker::m_cache): ... this.  Use auto_vec instead of vec.
> (pcom_worker::release_chain): Adjust for auto_vec changes.
> (pcom_worker::release_chains): Adjust for auto_vec and renaming
> changes.
> (release_component): Remove.
> (release_components): Adjust for release_component removal.
> (component_of): Adjust to use vec.
> (merge_comps): Likewise.
> (pcom_worker::aff_combination_dr_offset): Adjust for renaming changes.
> (pcom_worker::determine_offset): Likewise.
> (class comp_ptrs): Remove.
> (pcom_worker::split_data_refs_to_components): Adjust for renaming
> changes, for comp_ptrs removal with auto_vec.
> (pcom_worker::suitable_component_p): Adjust for renaming changes.
> (pcom_worker::filter_suitable_components): Adjust for 
> release_component
> removal.
> (pcom_worker::valid_initializer_p): Adjust for renaming changes.
> (pcom_worker::find_looparound_phi): Likewise.
> (pcom_worker::add_looparound_copies): Likewise.
> (pcom_worker::determine_roots_comp): Likewise.
> (pcom_worker::single_nonlooparound_use): Likewise.
> (pcom_worker::execute_pred_commoning_chain): Likewise.
> (pcom_worker::execute_pred_commoning): Likewise.
> (pcom_worker::try_combine_chains): Likewise.
> (pcom_worker::prepare_initializers_chain): Likewise.
> (pcom_worker::prepare_initializers): Likewise.
> (pcom_worker::prepare_finalizers_chain): Likewise.
> (pcom_worker::prepare_finalizers): Likewise.
> (pcom_worker::tree_predictive_commoning_loop): Likewise.


[committed] libstdc++: Add more tests for filesystem::create_directory [PR101510]

2021-07-20 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (create_dir): Adjust whitespace.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Test creating directory with name of existing symlink to
directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Likewise.

Tested x86_64-linux. Committed to trunk.

commit 0c4ae4ff46b1d7633f1e06f57d348b5817b8f640
Author: Jonathan Wakely 
Date:   Tue Jul 20 12:35:37 2021

libstdc++: Add more tests for filesystem::create_directory [PR101510]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (create_dir): Adjust whitespace.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Test creating directory with name of existing symlink to
directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Likewise.

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 66207ae5e44..cec76446f06 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -577,8 +577,7 @@ namespace
   {
 bool created = false;
 #ifdef _GLIBCXX_HAVE_SYS_STAT_H
-posix::mode_t mode
-  = static_cast>(perm);
+posix::mode_t mode = static_cast>(perm);
 if (posix::mkdir(p.c_str(), mode))
   {
const int err = errno;
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc
index a0e50471275..256621481d7 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc
@@ -54,6 +54,33 @@ test01()
   b = create_directory(p);
   VERIFY( !b );
 
+  auto f = p/"file";
+  std::ofstream{f} << "create file";
+  b = create_directory(f, ec);
+  VERIFY( ec == std::errc::file_exists );
+  VERIFY( !b );
+  try
+  {
+create_directory(f);
+VERIFY( false );
+  }
+  catch (const fs::filesystem_error& e)
+  {
+VERIFY( e.code() == std::errc::file_exists );
+VERIFY( e.path1() == f );
+  }
+
+  // PR libstdc++/101510 create_directory on an existing symlink to a directory
+  fs::create_directory(p/"dir");
+  auto link = p/"link";
+  fs::create_directory_symlink("dir", link);
+  ec = bad_ec;
+  b = fs::create_directory(link, ec);
+  VERIFY( !b );
+  VERIFY( !ec );
+  b = fs::create_directory(link);
+  VERIFY( !b );
+
   remove_all(p, ec);
 }
 
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
index ee2a74b8803..39f95b61a45 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
@@ -46,12 +46,40 @@ test01()
   VERIFY( exists(p) );
 
   // Test existing path (libstdc++/71036).
+  ec = make_error_code(std::errc::invalid_argument);
   b = create_directory(p, ec);
   VERIFY( !ec );
   VERIFY( !b );
   b = create_directory(p);
   VERIFY( !b );
 
+  auto f = p/"file";
+  std::ofstream{f} << "create file";
+  b = create_directory(f, ec);
+  VERIFY( ec == std::errc::file_exists );
+  VERIFY( !b );
+  try
+  {
+create_directory(f);
+VERIFY( false );
+  }
+  catch (const fs::filesystem_error& e)
+  {
+VERIFY( e.code() == std::errc::file_exists );
+VERIFY( e.path1() == f );
+  }
+
+  // PR libstdc++/101510 create_directory on an existing symlink to a directory
+  fs::create_directory(p/"dir");
+  auto link = p/"link";
+  fs::create_directory_symlink("dir", link);
+  ec = make_error_code(std::errc::invalid_argument);
+  b = fs::create_directory(link, ec);
+  VERIFY( !b );
+  VERIFY( !ec );
+  b = fs::create_directory(link);
+  VERIFY( !b );
+
   remove_all(p, ec);
 }
 


[PATCH] Support logic shift left/right for avx512 mask type.

2021-07-20 Thread liuhongt via Gcc-patches
Hi:
  As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html

cut start-
> note for the lowpart we can just view-convert away the excess bits,
> fully re-using the mask.  We generate surprisingly "good" code:
>
> kmovb   %k1, %edi
> shrb$4, %dil
> kmovb   %edi, %k2
>
> besides the lack of using kshiftrb.  I guess we're just lacking
> a mask register alternative for
Yes, we can do it similar as kor/kand/kxor.
---cut end

  Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*3_1): Ditto.
(*3_1): Ditto.
* config/i386/sse.md (k): New define_split after
it to convert generic shift pattern to mask shift ones.

gcc/testsuite/ChangeLog:

* gcc.target/i386/mask-shift.c: New test.
---
 gcc/config/i386/constraints.md | 10 +++
 gcc/config/i386/i386.md| 94 +++---
 gcc/config/i386/sse.md | 14 
 gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++
 4 files changed, 173 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 485e3f5b2cf..4aa28a5621c 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -222,6 +222,16 @@ (define_constraint "BC"
(match_operand 0 "vector_all_ones_operand"
 
 ;; Integer constant constraints.
+(define_constraint "Wb"
+  "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (ival, 0, 7)")))
+
+(define_constraint "Ww"
+  "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (ival, 0, 15)")))
+
 (define_constraint "I"
   "Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
   (and (match_code "const_int")
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..c5f9bd4d4d8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
 
 ;; Immediate operand constraint for shifts.
 (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
+(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
 
 ;; Print register name in the specified mode.
 (define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
@@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashl3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
-   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,M,r")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
+   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c,M,r,")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (ASHIFT, mode, operands)"
 {
@@ -11098,6 +11099,7 @@ (define_insn "*ashl3_1"
 {
 case TYPE_LEA:
 case TYPE_ISHIFTX:
+case TYPE_MSKLOG:
   return "#";
 
 case TYPE_ALU:
@@ -3,7 +5,11 @@ (define_insn "*ashl3_1"
return "sal{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,bmi2")
+  [(set_attr "isa" "*,*,bmi2,avx512bw")
(set (attr "type")
  (cond [(eq_attr "alternative" "1")
  (const_string "lea")
@@ -11123,6 +11129,8 @@ (define_insn "*ashl3_1"
  (match_operand 0 "register_operand"))
 (match_operand 2 "const1_operand"))
  (const_string "alu")
+   (eq_attr "alternative" "3")
+ (const_string "msklog")
   ]
   (const_string "ishift")))
(set (attr "length_immediate")
@@ -11218,15 +11226,16 @@ (define_split
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
 (define_insn "*ashlhi3_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
-   (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
-  (match_operand:QI 2 "nonmemory_operand" "cI,M")))
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
+   (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
+  (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (ASHIFT, HImode, operands)"
 {
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
+case TYPE_MSKLOG:
   return "#";
 
 case TYPE_ALU:
@@ -11241,9 +11246,12 @@ (define_insn "*ashlhi3_1"
return "sal{w}\t{%2, %0|%0, %2}";
 }
 }
-  [(set (attr "

RE: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-20 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, July 15, 2021 8:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics
> optabs
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > There's a slight mismatch between the vectorizer optabs and the
> > intrinsics patterns for NEON.  The vectorizer expects operands[3] and
> > operands[0] to be the same but the aarch64 intrinsics expanders expect
> > operands[0] and operands[1] to be the same.
> >
> > This means we need different patterns here.  This adds a separate
> > usdot vectorizer pattern which just shuffles around the RTL params.
> >
> > There's also an inconsistency between the usdot and (u|s)dot
> > intrinsics RTL patterns which is not corrected here.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Couldn't we just change:
> 
> > diff --git a/gcc/config/aarch64/arm_neon.h
> > b/gcc/config/aarch64/arm_neon.h index
> >
> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac
> e4f
> > c7f43e2040a8 100644
> > --- a/gcc/config/aarch64/arm_neon.h
> > +++ b/gcc/config/aarch64/arm_neon.h
> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)  {
> > -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
> > +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
> 
> …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?

Not easily, as I was mentioning before, Neon intrinsics have the assumption that
operands[0] and operands[1] are the same. And this goes much further than just
the header call.

The actual type is determined by the optabs and the C stubs that are generated.

aarch64_init_simd_builtins which creates the C function stubs starts processing
arguments from the end and on non-void functions assumes that the value at
operands[0] be the return type. So simply moving __r will get it to think that
the result type should be uint8x8_t.

I can bypass this but then have to write a custom expander in expand code to
handle this, but at point, is it really worth it..

Tamar

> I think that's an OK thing to do when the function is named after
> an optab rather than an arm_neon.h intrinsic.
> 
> Thanks,
> Richard
> 
> >  }
> >
> >  __extension__ extern __inline int32x4_t
> >  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >  vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
> >  {
> > -  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
> > +  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
> >  }
> >
> >  __extension__ extern __inline int32x2_t


Patch ping (was Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384])

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 13, 2021 at 09:30:43PM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following gcc.dg/pr101384.c testcase is miscompiled on
> powerpc64le-linux.
> easy_altivec_constant has code to try construct vector constants with
> different element sizes, perhaps different from CONST_VECTOR's mode.  But as
> written, that works fine for vspltis[bhw] cases, but not for the vspltisw
> x,-1; vsl[bhw] x,x,x case, because that creates always a V16QImode, V8HImode
> or V4SImode constant containing broadcasted constant with just the MSB set. 
> The vspltis_constant function etc. expects the vspltis[bhw] instructions
> where the small [-16..15] or even [-32..30] constant is sign-extended to the
> remaining step bytes, but that is not the case for the 0x80...00 constants,
> with step > 1 we can't handle e.g.
> { 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 
> 0x80, 0xff, 0xff, 0xff }
> vectors but do want to handle e.g.
> { 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80 }
> and similarly with copies > 1 we do want to handle e.g.
> { 0x80808080, 0x80808080, 0x80808080, 0x80808080 }.
> 
> Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (the latter
> regtested with -m32/-m64), ok for trunk?
> 
> Perhaps for backports it would be best to limit the EASY_VECTOR_MSB case
> matching to step == 1 && copies == 1, because that is the only case the
> splitter handled correctly, but as can be seen in the gcc.target tests, the
> patch tries to handle it for all the cases.  Do you want that other patch
> or prefer this patch for the backports too?
> 
> 2021-07-13  Jakub Jelinek  
> 
>   PR target/101384
>   * config/rs6000/rs6000-protos.h (easy_altivec_constant): Change return
>   type from bool to int.
>   * config/rs6000/rs6000.c (vspltis_constant): Fix up handling the
>   EASY_VECTOR_MSB case if either step or copies is not 1.
>   (vspltis_shifted): Fix comment typo.
>   (easy_altivec_constant): Change return type from bool to int, instead
>   of returning true return byte size of the element mode that should be
>   used to synthetize the constant.
>   * config/rs6000/predicates.md (easy_vector_constant_msb): Require
>   that vspltis_shifted is 0, handle the case where easy_altivec_constant
>   assumes using different vector mode from CONST_VECTOR's mode.
>   * config/rs6000/altivec.md (easy_vector_constant_msb splitter): Use
>   easy_altivec_constant to determine mode in which -1 >> -1 should be
>   performed, use rs6000_expand_vector_init instead of gen_vec_initv4sisi.
> 
>   * gcc.dg/pr101384.c: New test.
>   * gcc.target/powerpc/pr101384-1.c: New test.
>   * gcc.target/powerpc/pr101384-2.c: New test.

I'd like to ping this patch.

For gcc 11, I've bootstrapped/regtested on powerpc64le-linux and
powerpc64-linux (the latter regtested -m32/-m64) also a simpler version
below, which restricts it to the case that the code handles properly.

2021-07-20  Jakub Jelinek  

PR target/101384
* config/rs6000/rs6000.c (vspltis_constant): Accept EASY_VECTOR_MSB
only if step and copies are equal to 1.

* gcc.dg/pr101384.c: New test.

--- gcc/config/rs6000/rs6000.c.jj   2021-07-18 12:50:43.816219546 +0200
+++ gcc/config/rs6000/rs6000.c  2021-07-20 10:46:23.880632997 +0200
@@ -6144,7 +6144,7 @@ vspltis_constant (rtx op, unsigned step,
 
   /* Also check if are loading up the most significant bit which can be done by
  loading up -1 and shifting the value left by -1.  */
-  else if (EASY_VECTOR_MSB (splat_val, inner))
+  else if (EASY_VECTOR_MSB (splat_val, inner) && step == 1 && copies == 1)
 ;
 
   else
--- gcc/testsuite/gcc.dg/pr101384.c.jj  2021-07-20 10:45:22.828486154 +0200
+++ gcc/testsuite/gcc.dg/pr101384.c 2021-07-20 10:45:22.828486154 +0200
@@ -0,0 +1,39 @@
+/* PR target/101384 */
+/* { dg-do run } */
+/* { dg-options "-O2 -Wno-psabi -w" } */
+
+typedef unsigned char __attribute__((__vector_size__ (16))) U;
+typedef unsigned short __attribute__((__vector_size__ (8 * sizeof (short V;
+
+U u;
+V v;
+
+__attribute__((noipa)) U
+foo (void)
+{
+  U y = (U) { 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff,
+  0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff } + u;
+  return y;
+}
+
+__attribute__((noipa)) V
+bar (void)
+{
+  V y = (V) { 0x8000, 0x, 0x8000, 0x,
+  0x8000, 0x, 0x8000, 0x } + v;
+  return y;
+}
+
+int
+main ()
+{
+  U x = foo ();
+  for (unsigned i = 0; i < 16; i++)
+if (x[i] != ((i & 3) ? 0xff : 0x80))
+  __builtin_abort ();
+  V y = bar ();
+  for (unsigned i = 0; i < 8; i++)
+if (y[i] != ((i & 1) ? 0x : 0x8000))
+  __builtin_abort ();
+  return 0;
+}


Jakub



Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-20 Thread H.J. Lu via Gcc-patches
On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> >> > + {
> >> > +   /* First generate subreg of word mode if the previous mode is
> >> > +  wider than word mode and word mode is wider than MODE.  */
> >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> >> > +   prev_mode, 0);
> >> > +   prev_mode = word_mode;
> >> > + }
> >> > +  if (prev_rtx != nullptr)
> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> >>
> >> This should be lowpart_subreg, since 0 isn't the right offset for
> >> big-endian targets.  Using lowpart_subreg should also avoid the need
> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
> >> subregs of multiword values.
> >
> > I tried it.  It didn't work since it caused the LRA failure.   I replaced
> > simplify_gen_subreg with lowpart_subreg instead.
>
> What specifically went wrong?

With vector broadcast, for
---
extern void *ops;

void
foo (int c)
{
  __builtin_memset (ops, c, 18);
}
---
we generate HI from V16QI.   With a single lowpart_subreg, I get

(insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
(const_int 16 [0x10])) [0 MEM  [(void
*)ops.0_1]+16 S2 A8])
(subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
 (nil))

instead of

(insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
(const_int 16 [0x10])) [0 MEM  [(void
*)ops.0_1]+16 S2 A8])
(subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
 (nil))

IRA and LRA fail to reload:

(insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
(const_int 16 [0x10])) [0 MEM  [(void
*)ops.0_1]+16 S2 A8])
(subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
 (nil))

since ix86_can_change_mode_class has

  if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
{
  /* Vector registers do not support QI or HImode loads.  If we don't
 disallow a change to these modes, reload will assume it's ok to
 drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
 the vec_dupv4hi pattern.  */
  if (GET_MODE_SIZE (from) < 4)
return false;
}

If we don't use a hard scratch register, (subreg:HI (reg:V16QI)) compiles.  But
codegen is worse:

vmovd %edi, %xmm0
vpbroadcastb %xmm0, %xmm0
vmovdqa %xmm0, -24(%rsp)
movq ops(%rip), %rax
movzwl -24(%rsp), %edx
vmovdqu %xmm0, (%rax)
movw %dx, 16(%rax)

vs

vmovd %edi, %xmm15
movq ops(%rip), %rax
vpbroadcastb %xmm15, %xmm15
vmovq %xmm15, %rdx
movw %dx, 16(%rax)
vmovdqu %xmm15, (%rax)

> >> > +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> >> > +   bytes from constant string DATA + OFFSET and return it as target
> >> > +   constant.  If PREV isn't nullptr, it has the RTL info from the
> >> > +   previous iteration.  */
> >> > +
> >> > +rtx
> >> > +builtin_memset_read_str (void *data, void *prev,
> >> > +  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> >> > +  machine_mode mode)
> >> > +{
> >> > +  rtx target;
> >> >const char *c = (const char *) data;
> >> > -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
> >> > +  char *p;
> >> > +  unsigned int size = GET_MODE_SIZE (mode).to_constant ();
> >> > +
> >> > +  /* Don't use the previous value if size is 1.  */
> >>
> >> Why not though?  The existing code does, and that seems like the right
> >> thing to do when operating on integers.
> >>
> >> I can see the check would be a good thing to do if MODE isn't a vector
> >> mode and the previous mode was.  Is that the case you're worried about?
> >> If so, that check could go in gen_memset_value_from_prev instead.
> >
> > We are storing one byte.  Doing it directly is faster.
>
> But the first thing being protected here is…
>
> >> > +  if (size != 1)
> >> > +{
> >> > +  target = gen_memset_value_from_prev (prev, mode);
> >> > +  if (target != nullptr)
> >> > + return target;
>
> …this attempt to use the previous value.  If the target uses, say,
> SImode for the first piece and QImode for a final byte, using the QImode
> lowpart of the SImode register would avoid having to move the byte value
> into a separate QImode register.  Why's that a bad thing to do?  AFAICT
> it's what the current code would do, so if we want to change it even for
> integer modes, I think it should be a separate patch with a separate
> justification.

I removed the size == 1 check.   I didn't notice any issues.

> Like I say, I can understand that using the QImode lowpart of a vector
> wouldn't be a good idea.  But if that's specifically what you're trying
> to prevent, I think we should test for it.
>
> Thanks,
> Richard

I will submit the v3 patch.

Thanks.

-- 
H.J.


Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-20 Thread H.J. Lu via Gcc-patches
On Tue, Jul 20, 2021 at 5:48 AM H.J. Lu  wrote:
>
> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
>  wrote:
> >
> > "H.J. Lu via Gcc-patches"  writes:
> > >> > + {
> > >> > +   /* First generate subreg of word mode if the previous mode is
> > >> > +  wider than word mode and word mode is wider than MODE.  */
> > >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> > >> > +   prev_mode, 0);
> > >> > +   prev_mode = word_mode;
> > >> > + }
> > >> > +  if (prev_rtx != nullptr)
> > >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> > >>
> > >> This should be lowpart_subreg, since 0 isn't the right offset for
> > >> big-endian targets.  Using lowpart_subreg should also avoid the need
> > >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
> > >> subregs of multiword values.
> > >
> > > I tried it.  It didn't work since it caused the LRA failure.   I replaced
> > > simplify_gen_subreg with lowpart_subreg instead.
> >
> > What specifically went wrong?
>
> With vector broadcast, for
> ---
> extern void *ops;
>
> void
> foo (int c)
> {
>   __builtin_memset (ops, c, 18);
> }
> ---
> we generate HI from V16QI.   With a single lowpart_subreg, I get
>
> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> (const_int 16 [0x10])) [0 MEM  [(void
> *)ops.0_1]+16 S2 A8])
> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>  (nil))
>
> instead of
>
> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> (const_int 16 [0x10])) [0 MEM  [(void
> *)ops.0_1]+16 S2 A8])
> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>  (nil))
>
> IRA and LRA fail to reload:
>
> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> (const_int 16 [0x10])) [0 MEM  [(void
> *)ops.0_1]+16 S2 A8])
> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>  (nil))
>
> since ix86_can_change_mode_class has
>
>   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
> {
>   /* Vector registers do not support QI or HImode loads.  If we don't
>  disallow a change to these modes, reload will assume it's ok to
>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>  the vec_dupv4hi pattern.  */
>   if (GET_MODE_SIZE (from) < 4)
> return false;
> }

Correction.  It is ix86_hard_regno_mode_ok which doesn't allow HImode
in XMM registers.

> If we don't use a hard scratch register, (subreg:HI (reg:V16QI)) compiles.  
> But
> codegen is worse:
>
> vmovd %edi, %xmm0
> vpbroadcastb %xmm0, %xmm0
> vmovdqa %xmm0, -24(%rsp)
> movq ops(%rip), %rax
> movzwl -24(%rsp), %edx
> vmovdqu %xmm0, (%rax)
> movw %dx, 16(%rax)
>
> vs
>
> vmovd %edi, %xmm15
> movq ops(%rip), %rax
> vpbroadcastb %xmm15, %xmm15
> vmovq %xmm15, %rdx
> movw %dx, 16(%rax)
> vmovdqu %xmm15, (%rax)
>
> > >> > +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> > >> > +   bytes from constant string DATA + OFFSET and return it as target
> > >> > +   constant.  If PREV isn't nullptr, it has the RTL info from the
> > >> > +   previous iteration.  */
> > >> > +
> > >> > +rtx
> > >> > +builtin_memset_read_str (void *data, void *prev,
> > >> > +  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> > >> > +  machine_mode mode)
> > >> > +{
> > >> > +  rtx target;
> > >> >const char *c = (const char *) data;
> > >> > -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
> > >> > +  char *p;
> > >> > +  unsigned int size = GET_MODE_SIZE (mode).to_constant ();
> > >> > +
> > >> > +  /* Don't use the previous value if size is 1.  */
> > >>
> > >> Why not though?  The existing code does, and that seems like the right
> > >> thing to do when operating on integers.
> > >>
> > >> I can see the check would be a good thing to do if MODE isn't a vector
> > >> mode and the previous mode was.  Is that the case you're worried about?
> > >> If so, that check could go in gen_memset_value_from_prev instead.
> > >
> > > We are storing one byte.  Doing it directly is faster.
> >
> > But the first thing being protected here is…
> >
> > >> > +  if (size != 1)
> > >> > +{
> > >> > +  target = gen_memset_value_from_prev (prev, mode);
> > >> > +  if (target != nullptr)
> > >> > + return target;
> >
> > …this attempt to use the previous value.  If the target uses, say,
> > SImode for the first piece and QImode for a final byte, using the QImode
> > lowpart of the SImode register would avoid having to move the byte value
> > into a separate QImode register.  Why's that a bad thing to do?  AFAICT
> > it's what the current code would do, so if we want to change it even for
> > integer modes, I think it should be a separate patch with a separate
> > justification.
>
> I removed the size == 1 check.   I didn't notice any issues.
>
> > Like I s

[PATCH] simplify-rtx: Push sign/zero-extension inside vec_duplicate

2021-07-20 Thread Jonathan Wright via Gcc-patches
Hi,

As a general principle, vec_duplicate should be as close to the root
of an expression as possible. Where unary operations have
vec_duplicate as an argument, these operations should be pushed
inside the vec_duplicate.

This patch modifies unary operation simplification to push
sign/zero-extension of a scalar inside vec_duplicate.

This patch also updates all RTL patterns in aarch64-simd.md to use
the new canonical form.

Regression tested and bootstrapped on aarch64-none-linux-gnu and
x86_64-none-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-19  Jonathan Wright  

* config/aarch64/aarch64-simd.md: Push sign/zero-extension
inside vec_duplicate for all patterns.
* simplify-rtx.c (simplify_context::simplify_unary_operation_1):
Push sign/zero-extension inside vec_duplicate.

rb14677.patch
Description: rb14677.patch


Re: [Patch] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 07, 2021 at 07:59:58PM +0200, Marcel Vollweiler wrote:
> OpenMP: Add support for device-modifiers for 'omp target device'
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.c (c_parser_omp_clause_device): Add support for 
>   device-modifiers for 'omp target device'.
> 
> gcc/cp/ChangeLog:
> 
>   * parser.c (cp_parser_omp_clause_device): Add support for 
>   device-modifiers for 'omp target device'.
> 
> gcc/fortran/ChangeLog:
> 
>   * openmp.c (gfc_match_omp_clauses): Add support for 
>   device-modifiers for 'omp target device'.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/target-device-1.c: New test.
>   * c-c++-common/gomp/target-device-2.c: New test.
>   * gfortran.dg/gomp/target-device-1.f90: New test.
>   * gfortran.dg/gomp/target-device-2.f90: New test.

>  static tree
>  c_parser_omp_clause_device (c_parser *parser, tree list)
>  {
>location_t clause_loc = c_parser_peek_token (parser)->location;
> +  location_t expr_loc;
> +  c_expr expr;
> +  tree c, t;
> +
>matching_parens parens;
> -  if (parens.require_open (parser))
> +  if (!parens.require_open (parser))
> +return list;
> +
> +  int pos = 1;
> +  int pos_colon = 0;
> +  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME
> +  || c_parser_peek_nth_token_raw (parser, pos)->type == CPP_COLON
> +  || c_parser_peek_nth_token_raw (parser, pos)->type == CPP_COMMA)

Why CPP_COMMA?  The OpenMP 5.0/5.1/5.2 grammar only supports a single device
modifier.
So please simplify it to just an
  if (c_parser_next_token_is (parser, CPP_NAME)
  && c_parser_peek_2nd_token (parser, 2)->type == CPP_COLON)
   {
and check there just for the two modifiers.
  const char *p
= IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
  if (strcmp ("ancestor", p) == 0)
...
  else if (strcmp ("device-num", p) == 0)
;
  else
error_at (..., "expected % or %");
}
Similarly for C++.

Also, even if we sorry on device(ancestor: ...), it would be nice if you
in tree.h define OMP_CLAUSE_DEVICE_ANCESTOR macro (with
  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE)->base.public_flag)
definition), set it, sorry later on it (e.g. omp-expand.c) only if it
survived till then (wasn't removed because of other errors) and diagnose
the various restrictions/requirements on device(ancestor:).
In particular:
1) that OMP_CLAUSE_DEVICE clauses with OMP_CLAUSE_DEVICE_ANCESTOR
   only appear on OMP_TARGET and not on other constructs
   (this can be easily tested e.g. during gimplification, when
   gimplify_scan_omp_clauses sees OMP_CLAUSE_DEVICE with
   OMP_CLAUSE_DEVICE_ANCESTOR and code != OMP_TARGET, diagnose)
2) that if after the usual fully folding the argument is INTEGER_CST,
   it is equal to 1 (the spec says must evaluate to 1, but doesn't say
   it has to be a constant, so it can evaluate to 1 at runtime but if it is
   a constant other than 1, we know it will not evaluate to 1); this can be
   done in *finish_omp_clauses
3) that omp_requires_mask has OMP_REQUIRES_REVERSE_OFFLOAD set; this should
   be checked during the parsing
4) only the device, firstprivate, private, defaultmap, and map clauses may
   appear on the construct; can be also done during gimplification, there is
   at most one device clause, so walking all clauses when we see
   OMP_CLAUSE_DEVICE_ANCESTOR is still linear complexity
5) no OpenMP constructs or calls to OpenMP API runtime routines are allowed 
inside
   the corresponding target region (this is something that should be checked
   in omp-low.c region nesting code, we already have similar restrictions
   for e.g. the loop construct)
Everything should be covered by testcases.

Jakub



[og11][committed] amdgcn: Add -mxnack and -msram-ecc [PR 100208]

2021-07-20 Thread Andrew Stubbs

This is now backported to devel/omp/gcc-11.

Andrew

On 19/07/2021 17:49, Andrew Stubbs wrote:
This patch adds two new GCN-specific options: -mxnack and 
-msram-ecc={on,off,any}.


The primary purpose is to ensure that we have an explicit default 
setting for these features and that this is passed to the assembler. 
This will ensure that if LLVM defaults change, again, GCC won't get 
caught out and stop working with attribute mismatches.


The new options will provide a means to adjust these features in future, 
but this patch does not actually add any new support for either XNACK or 
SRAM-ECC.


The XNACK feature has two settings, "on" (-mxnack) and "off" 
(-mno-xnack). The default is "off", and trying to turn it on will give a 
"sorry, unimplemented" message. To implement this will require changes 
to the load/store instruction early-clobber rules (actually, clobbering 
across multiple contiguous load/store instructions is a problem too), 
and a new xnack-enabled multilib for each supported ISA.


The SRAM-ECC feature has three settings, "on", "off" and "any" (in which 
the generated code must work with the device configures to either mode). 
The current implementation is actually "any" already, but as that 
attribute setting is not available in the HSACOv3 binary standard we 
target right now we just set it to "on" or "off" according to which 
makes sense for the configured ISA. We'll have to revisit this when we 
implement HSACOv4 compatibility.


Andrew




Re: [RFC] ipa: Adjust references to identify read-only globals

2021-07-20 Thread Martin Jambor
Hi,

On Tue, Jul 20 2021, Richard Biener wrote:
> On Tue, Jul 20, 2021 at 10:54 AM JiangNing OS via Gcc-patches
>  wrote:
>>
>> > -Original Message-
>> > From: Gcc-patches > > bounces+jiangning=os.amperecomputing@gcc.gnu.org> On Behalf Of
>> > Martin Jambor
>> > Sent: Wednesday, June 30, 2021 4:19 AM
>> > To: GCC Patches 
>> > Cc: Jan Hubicka 
>> > Subject: [RFC] ipa: Adjust references to identify read-only globals
>> >
>> > Hi,
>> >
>> > this patch has been motivated by SPEC 2017's 544.nab_r in which there is a
>> > static variable which is never written to and so zero throughout the 
>> > run-time
>> > of the benchmark.  However, it is passed by reference to a function in 
>> > which
>> > it is read and (after some multiplications) passed into __builtin_exp 
>> > which in
>> > turn unnecessarily consumes almost 10% of the total benchmark run-time.
>>
>> I do see ~8.5% runtime reduction on aarch64.
>>
>> > The situation is illustrated by the added testcase remref-3.c.
>> >
>> > The patch adds a flag to ipa-prop descriptor of each parameter to mark such
>> > parameters.  IPA-CP and inling then take the effort to remove IPA_REF_ADDR
>> > references in the caller and only add IPA_REF_LOAD reference to the
>> > clone/overall inlined function.  This is sufficient for subsequent symbol 
>> > table
>> > analysis code to identify the read-only variable as such and optimize the 
>> > code.
>> >
>> > I plan to compile a number of packages with the patch to test it some more
>> > and get a bit better idea of its impact.  But it has passed bootstrap,
>> > LTObootstrap and testing on x86_64-linux and i686-linux and so unless I 
>> > find
>> > any problem, I would like to commit it at some point next month without any
>> > major changes, so I'd be grateful for any feedback even now.
>>
>> I see 3 cases in SPEC2017 failed to compile on aarch64, i.e. 521.wrf_r, 
>> 527.cam4_r, 554.roms_r. For example,
>>
>> pre_step3d.fppized.f90:1260:35: internal compiler error: Segmentation fault
>>  1260 |   CALL wclock_on (ng, iNLM, 22)
>>   |   ^
>> 0x1645c6b internal_error(char const*, ...)
>> ???:0
>> 0xe1f4f4 place_block_symbol(rtx_def*)
>> ???:0
>> 0x84ab33 use_anchored_address(rtx_def*)
>> ???:0
>> 0x868203 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> ???:0
>> 0x868793 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> ???:0
>> 0x75b593 expand_call(tree_node*, rtx_def*, int)
>> ???:0
>> 0x86a09f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>> ???:0
>> Please submit a full bug report
>
> Please file a bugreport and provide a (possibly reduced) testcase.
>

The patch is not yet committed, so I don't think a bug-report (in
bugzilla) is in order.


At least after I fixed a bug pointed out in Honza's review, I cannot
replicate any ICE building any of 521.wrf_r, 527.cam4_r, 554.roms_r on
x86_64, at least without LTO.  But with LTO, I get an undefined symbol
link error building 527.cam4_r which is of course certainly a bug in the
patch.  I will investigate and hopefully fix it and re-post the patch
but then I would appreciate if you checked it on aarch64 for me.

Thanks,

Martin


Re: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-20 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 20, 2021 at 2:33 PM liuhongt  wrote:
>
> Hi:
>   As mention in 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
>
> cut start-
> > note for the lowpart we can just view-convert away the excess bits,
> > fully re-using the mask.  We generate surprisingly "good" code:
> >
> > kmovb   %k1, %edi
> > shrb$4, %dil
> > kmovb   %edi, %k2
> >
> > besides the lack of using kshiftrb.  I guess we're just lacking
> > a mask register alternative for
> Yes, we can do it similar as kor/kand/kxor.
> ---cut end
>
>   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/constraints.md (Wb): New constraint.
> (Ww): Ditto.
> * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> shift.
> (*ashlqi3_1): Ditto.
> (*3_1): Ditto.
> (*3_1): Ditto.
> * config/i386/sse.md (k): New define_split after
> it to convert generic shift pattern to mask shift ones.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/mask-shift.c: New test.
> ---
>  gcc/config/i386/constraints.md | 10 +++
>  gcc/config/i386/i386.md| 94 +++---
>  gcc/config/i386/sse.md | 14 
>  gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++
>  4 files changed, 173 insertions(+), 28 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 485e3f5b2cf..4aa28a5621c 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -222,6 +222,16 @@ (define_constraint "BC"
> (match_operand 0 "vector_all_ones_operand"
>
>  ;; Integer constant constraints.
> +(define_constraint "Wb"
> +  "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (ival, 0, 7)")))
> +
> +(define_constraint "Ww"
> +  "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (ival, 0, 15)")))
> +
>  (define_constraint "I"
>"Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
>(and (match_code "const_int")
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 8b809c49fe0..c5f9bd4d4d8 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
>
>  ;; Immediate operand constraint for shifts.
>  (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
> +(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
>
>  ;; Print register name in the specified mode.
>  (define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
> @@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl3_1"
> (set_attr "mode" "")])
>
>  (define_insn "*ashl3_1"
> -  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
> -   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
> - (match_operand:QI 2 "nonmemory_operand" "c,M,r")))
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
> +   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
> "0,l,rm,k")
> + (match_operand:QI 2 "nonmemory_operand" 
> "c,M,r,")))
> (clobber (reg:CC FLAGS_REG))]
>"ix86_binary_operator_ok (ASHIFT, mode, operands)"
>  {
> @@ -11098,6 +11099,7 @@ (define_insn "*ashl3_1"
>  {
>  case TYPE_LEA:
>  case TYPE_ISHIFTX:
> +case TYPE_MSKLOG:
>return "#";
>
>  case TYPE_ALU:
> @@ -3,7 +5,11 @@ (define_insn "*ashl3_1"
> return "sal{}\t{%2, %0|%0, %2}";
>  }
>  }
> -  [(set_attr "isa" "*,*,bmi2")
> +  [(set_attr "isa" "*,*,bmi2,avx512bw")
> (set (attr "type")
>   (cond [(eq_attr "alternative" "1")
>   (const_string "lea")
> @@ -11123,6 +11129,8 @@ (define_insn "*ashl3_1"
>   (match_operand 0 "register_operand"))
>  (match_operand 2 "const1_operand"))
>   (const_string "alu")
> +   (eq_attr "alternative" "3")
> + (const_string "msklog")
>]
>(const_string "ishift")))
> (set (attr "length_immediate")
> @@ -11218,15 +11226,16 @@ (define_split
>"operands[2] = gen_lowpart (SImode, operands[2]);")
>
>  (define_insn "*ashlhi3_1"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
> -   (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
> -  (match_operand:QI 2 "nonmemory_operand" "cI,M")))
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
> +   (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
> +  (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
> (clobber (reg:CC FLAGS_REG))]
>"ix86_

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-20 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following extends the existing loop masking support using
> > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > you can now enable masked vectorized epilogues (=1) or fully
> > masked vector loops (=2).
> 
> As mentioned on IRC, WHILE_ULT is supposed to ensure that every
> element after the first zero is also zero.  That happens naturally
> for power-of-2 vectors if the start index is a multiple of the VF.
> (And at the moment, variable-length vectors are the only way of
> supporting non-power-of-2 vectors.)
> 
> This probably works fine for =2 and =1 as things stand, since the
> vector IVs always start at zero.  But if in future we have a single
> IV counting scalar iterations, and use it even for peeled prologue
> iterations, we could end up with a situation where the approximation
> is no longer safe.
> 
> E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
> If we peeled 2 iterations for alignment and then had a VF of 8,
> the final vector would have a start index of (uint32_t)-6 and the
> vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.
> 
> So I think it would be safer to handle this as an alternative to
> using while, rather than as a direct emulation, so that we can take
> the extra restrictions into account.  Alternatively, we could probably
> do { 0, 1, 2, ... } < { end - start, end - start, ... }.

That doesn't end up working since in the last iteration with a
non-zero mask we'll compare with all underflowed values (start
will be > end).  So while we compute a correct mask we cannot use
that for loop control anymore.

Richard.

> Thanks,
> Richard
> 
> 
> 
> >
> > What's missing is using a scalar IV for the loop control
> > (but in principle AVX512 can use the mask here - just the patch
> > doesn't seem to work for AVX512 yet for some reason - likely
> > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > is providing more support for predicated operations in the case
> > of reductions either via VEC_COND_EXPRs or via implementing
> > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > to masked AVX512 operations.
> >
> > For AVX2 and
> >
> > int foo (unsigned *a, unsigned * __restrict b, int n)
> > {
> >   unsigned sum = 1;
> >   for (int i = 0; i < n; ++i)
> > b[i] += a[i];
> >   return sum;
> > }
> >
> > we get
> >
> > .L3:
> > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > addl$8, %edx
> > vpaddd  %ymm3, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > vmovd   %edx, %xmm1
> > vpsubd  %ymm15, %ymm2, %ymm0
> > addq$32, %rax
> > vpbroadcastd%xmm1, %ymm1
> > vpaddd  %ymm4, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vptest  %ymm0, %ymm0
> > jne .L3
> >
> > for the fully masked loop body and for the masked epilogue
> > we see
> >
> > .L4:
> > vmovdqu (%rsi,%rax), %ymm3
> > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > vmovdqu %ymm0, (%rsi,%rax)
> > addq$32, %rax
> > cmpq%rax, %rcx
> > jne .L4
> > movl%edx, %eax
> > andl$-8, %eax
> > testb   $7, %dl
> > je  .L11
> > .L3:
> > subl%eax, %edx
> > vmovdqa .LC0(%rip), %ymm1
> > salq$2, %rax
> > vmovd   %edx, %xmm0
> > movl$-2147483648, %edx
> > addq%rax, %rsi
> > vmovd   %edx, %xmm15
> > vpbroadcastd%xmm0, %ymm0
> > vpbroadcastd%xmm15, %ymm15
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm0, %ymm0
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > vpaddd  %ymm2, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > .L11:
> > vzeroupper
> >
> > compared to
> >
> > .L3:
> > movl%edx, %r8d
> > subl%eax, %r8d
> > leal-1(%r8), %r9d
> > cmpl$2, %r9d
> > jbe .L6
> > leaq(%rcx,%rax,4), %r9
> > vmovdqu (%rdi,%rax,4), %xmm2
> > movl%r8d, %eax
> > andl$-4, %eax
> > vpaddd  (%r9), %xmm2, %xmm0
> > addl%eax, %esi
> > andl$3, %r8d
> > vmovdqu %xmm0, (%r9)
> > je  .L2
> > .L6:
> > movslq  %esi, %r8
> > leaq0(,%r8,4), %rax
> > movl(%rdi,%r8,4), %r8d
> > addl%r8d, (%rcx,%rax)
> > leal1(%rsi), %r8d
> > cmpl%r8d, %edx
> > jle .L2
> > addl$2, %esi
> > movl4(%rdi,%rax), %r8d
> > addl%r8d, 4(%rcx,%rax)
> >

libstdc++: Fix testsuite for skipping gdb tests on remote/non-native target

2021-07-20 Thread Marc Poulhies via Gcc-patches
This fixes an incorrect invocation of gdb on remote targets where DejaGNU would 
try to run host's gdb in remote target simulator.
gdb-test skips the testing when target is remote or non native but the gdb 
version check function does not.

libstdc++-v3/ChangeLog:
* testsuite/lib/gdb-test.exp (gdb_batch_check): Exit if non native or 
remote target.

commit 0c4ae4ff46b1d7633f1e06f57d348b5817b8f640
Author: Jonathan Wakely 
Date:   Tue Jul 20 12:35:37 2021

libstdc++: Add more tests for filesystem::create_directory [PR101510]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (create_dir): Adjust whitespace.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Test creating directory with name of existing symlink to
directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Likewise.

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 66207ae5e44..cec76446f06 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -577,8 +577,7 @@ namespace
   {
 bool created = false;
 #ifdef _GLIBCXX_HAVE_SYS_STAT_H
-posix::mode_t mode
-  = static_cast>(perm);
+posix::mode_t mode = static_cast>(perm);
 if (posix::mkdir(p.c_str(), mode))
   {
const int err = errno;
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc
index a0e50471275..256621481d7 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc
@@ -54,6 +54,33 @@ test01()
   b = create_directory(p);
   VERIFY( !b );
 
+  auto f = p/"file";
+  std::ofstream{f} << "create file";
+  b = create_directory(f, ec);
+  VERIFY( ec == std::errc::file_exists );
+  VERIFY( !b );
+  try
+  {
+create_directory(f);
+VERIFY( false );
+  }
+  catch (const fs::filesystem_error& e)
+  {
+VERIFY( e.code() == std::errc::file_exists );
+VERIFY( e.path1() == f );
+  }
+
+  // PR libstdc++/101510 create_directory on an existing symlink to a directory
+  fs::create_directory(p/"dir");
+  auto link = p/"link";
+  fs::create_directory_symlink("dir", link);
+  ec = bad_ec;
+  b = fs::create_directory(link, ec);
+  VERIFY( !b );
+  VERIFY( !ec );
+  b = fs::create_directory(link);
+  VERIFY( !b );
+
   remove_all(p, ec);
 }
 
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
index ee2a74b8803..39f95b61a45 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
@@ -46,12 +46,40 @@ test01()
   VERIFY( exists(p) );
 
   // Test existing path (libstdc++/71036).
+  ec = make_error_code(std::errc::invalid_argument);
   b = create_directory(p, ec);
   VERIFY( !ec );
   VERIFY( !b );
   b = create_directory(p);
   VERIFY( !b );
 
+  auto f = p/"file";
+  std::ofstream{f} << "create file";
+  b = create_directory(f, ec);
+  VERIFY( ec == std::errc::file_exists );
+  VERIFY( !b );
+  try
+  {
+create_directory(f);
+VERIFY( false );
+  }
+  catch (const fs::filesystem_error& e)
+  {
+VERIFY( e.code() == std::errc::file_exists );
+VERIFY( e.path1() == f );
+  }
+
+  // PR libstdc++/101510 create_directory on an existing symlink to a directory
+  fs::create_directory(p/"dir");
+  auto link = p/"link";
+  fs::create_directory_symlink("dir", link);
+  ec = make_error_code(std::errc::invalid_argument);
+  b = fs::create_directory(link, ec);
+  VERIFY( !b );
+  VERIFY( !ec );
+  b = fs::create_directory(link);
+  VERIFY( !b );
+
   remove_all(p, ec);
 }
 


Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-20 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu via Gcc-patches"  writes:
>> >> > + {
>> >> > +   /* First generate subreg of word mode if the previous mode is
>> >> > +  wider than word mode and word mode is wider than MODE.  */
>> >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
>> >> > +   prev_mode, 0);
>> >> > +   prev_mode = word_mode;
>> >> > + }
>> >> > +  if (prev_rtx != nullptr)
>> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>> >>
>> >> This should be lowpart_subreg, since 0 isn't the right offset for
>> >> big-endian targets.  Using lowpart_subreg should also avoid the need
>> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
>> >> subregs of multiword values.
>> >
>> > I tried it.  It didn't work since it caused the LRA failure.   I replaced
>> > simplify_gen_subreg with lowpart_subreg instead.
>>
>> What specifically went wrong?
>
> With vector broadcast, for
> ---
> extern void *ops;
>
> void
> foo (int c)
> {
>   __builtin_memset (ops, c, 18);
> }
> ---
> we generate HI from V16QI.   With a single lowpart_subreg, I get
>
> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> (const_int 16 [0x10])) [0 MEM  [(void
> *)ops.0_1]+16 S2 A8])
> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>  (nil))
>
> instead of
>
> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> (const_int 16 [0x10])) [0 MEM  [(void
> *)ops.0_1]+16 S2 A8])
> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>  (nil))
>
> IRA and LRA fail to reload:
>
> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> (const_int 16 [0x10])) [0 MEM  [(void
> *)ops.0_1]+16 S2 A8])
> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>  (nil))
>
> since ix86_can_change_mode_class has
>
>   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
> {
>   /* Vector registers do not support QI or HImode loads.  If we don't
>  disallow a change to these modes, reload will assume it's ok to
>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>  the vec_dupv4hi pattern.  */
>   if (GET_MODE_SIZE (from) < 4)
> return false;
> }

Ah!  OK.  In that case, maybe we should have something like:

   if (REG_P (prev_rtx)
   && HARD_REGISTER_P (prev_rtx)
   && REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode))
 prev_rtx = copy_to_reg (prev_rtx);

and then just have the single lowpart_subreg after that.

Thanks,
Richard


Re: Patch ping (was Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384])

2021-07-20 Thread Segher Boessenkool
Hi!

On Tue, Jul 20, 2021 at 02:43:03PM +0200, Jakub Jelinek wrote:
> For gcc 11, I've bootstrapped/regtested on powerpc64le-linux and
> powerpc64-linux (the latter regtested -m32/-m64) also a simpler version
> below, which restricts it to the case that the code handles properly.
> 
> 2021-07-20  Jakub Jelinek  
> 
>   PR target/101384
>   * config/rs6000/rs6000.c (vspltis_constant): Accept EASY_VECTOR_MSB
>   only if step and copies are equal to 1.
> 
>   * gcc.dg/pr101384.c: New test.

Okay for all backports.  Thanks!


Segher


[PATCH v2] Use range-based for loops for traversing loops

2021-07-20 Thread Kewen.Lin via Gcc-patches
Hi,

This v2 has addressed some review comments/suggestions:

  - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
  - Add new CTOR loops_list (struct loops *loops, unsigned flags)
to support loop hierarchy tree rather than just a function,
and adjust to use loops* accordingly.
  - Make implicit 'cfun' become explicit.
  - Get rid of macros ALL_LOOPS*, use loops_list instance.
  - Add const_iterator type begin()/end().
  - Use class loop* instead of loop_p in range-based for.

Bootstrapped and regtested again on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped again on ppc64le P9 with bootstrap-O3 config.

Does it look better?  Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* cfgloop.h (as_const): New function.
(class loop_iterator): Rename to ...
(class loops_list): ... this.
(loop_iterator::next): Rename to ...
(loops_list::Iter::fill_curr_loop): ... this and adjust.
(loop_iterator::loop_iterator): Rename to ...
(loops_list::loops_list): ... this and adjust.
(loops_list::Iter): New class.
(loops_list::iterator): New type.
(loops_list::const_iterator): New type.
(loops_list::begin): New function.
(loops_list::end): Likewise.
(loops_list::begin const): Likewise.
(loops_list::end const): Likewise.
(FOR_EACH_LOOP): Remove.
(FOR_EACH_LOOP_FN): Remove.
* cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with range-based
for loop with loops_list instance.
(sort_sibling_loops): Likewise.
(disambiguate_loops_with_multiple_latches): Likewise.
(verify_loop_structure): Likewise.
* cfgloopmanip.c (create_preheaders): Likewise.
(force_single_succ_latches): Likewise.
* config/aarch64/falkor-tag-collision-avoidance.c
(execute_tag_collision_avoidance): Likewise.
* config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
* config/s390/s390.c (s390_adjust_loops): Likewise.
* doc/loop.texi: Likewise.
* gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* gimple-loop-versioning.cc (loop_versioning::analyze_blocks): Likewise.
(loop_versioning::make_versioning_decisions): Likewise.
* gimple-ssa-split-paths.c (split_paths): Likewise.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.c (canonicalize_loop_form): Likewise.
(graphite_transform_loops): Likewise.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ipa-pure-const.c (analyze_function): Likewise.
* loop-doloop.c (doloop_optimize_loops): Likewise.
* loop-init.c (loop_optimizer_finalize): Likewise.
(fix_loop_structure): Likewise.
* loop-invariant.c (calculate_loop_reg_pressure): Likewise.
(move_loop_invariants): Likewise.
* loop-unroll.c (decide_unrolling): Likewise.
(unroll_loops): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
* predict.c (predict_loops): Likewise.
(pass_profile::execute): Likewise.
* profile.c (branch_prob): Likewise.
* sel-sched-ir.c (sel_finish_pipelining): Likewise.
(sel_find_rgns): Likewise.
* tree-cfg.c (replace_loop_annotate): Likewise.
(replace_uses_by): Likewise.
(move_sese_region_to_fn): Likewise.
* tree-if-conv.c (pass_if_conversion::execute): Likewise.
* tree-loop-distribution.c (loop_distribution::execute): Likewise.
* tree-parloops.c (parallelize_loops): Likewise.
* tree-predcom.c (tree_predictive_commoning): Likewise.
* tree-scalar-evolution.c (scev_initialize): Likewise.
(scev_reset): Likewise.
* tree-ssa-dce.c (find_obviously_necessary_stmts): Likewise.
* tree-ssa-live.c (remove_unused_locals): Likewise.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
* tree-ssa-loop-im.c (analyze_memory_references): Likewise.
(tree_ssa_lim_initialize): Likewise.
* tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): Likewise.
* tree-ssa-loop-ivopts.c (tree_ssa_iv_optimize): Likewise.
* tree-ssa-loop-manip.c (get_loops_exits): Likewise.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Likewise.
(free_numbers_of_iterations_estimates): Likewise.
* tree-ssa-loop-prefetch.c (tree_ssa_prefetch_arrays): Likewise.
* tree-ssa-loop-split.c (tree_ssa_split_loops): Likewise.
* tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Likewise.
* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
(pass_scev_cprop::execute): Likewise.
* tree-ssa-propagate.c (clean_up_loop_closed_phi): Likewise.
* tree-ssa-sccvn.c (do_rpo_vn): Likewise.
* tree-ssa-thre

Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-20 Thread Kewen.Lin via Gcc-patches
on 2021/7/20 下午5:49, Jonathan Wakely wrote:
> On Tue, 20 Jul 2021 at 09:58, Kewen.Lin  wrote:
>>
>> on 2021/7/19 下午11:59, Martin Sebor wrote:
>>> On 7/19/21 12:20 AM, Kewen.Lin wrote:
 Hi,

 This patch follows Martin's suggestion here[1], to support
 range-based for loops for traversing loops, analogously to
 the patch for vec[2].

 Bootstrapped and regtested on powerpc64le-linux-gnu P9,
 x86_64-redhat-linux and aarch64-linux-gnu, also
 bootstrapped on ppc64le P9 with bootstrap-O3 config.

 Any comments are appreciated.
>>>
>>> Thanks for this nice cleanup!  Just a few suggestions:
>>>
>>> I would recommend against introducing new macros unless they
>>> offer a significant advantage over alternatives (for the two
>>> macros the patch adds I don't think they do).
>>>
>>> If improving const-correctness is one of our a goals
>>> the loops_list iterator type would need to a corresponding
>>> const_iterator type, and const overloads of the begin()
>>> and end() member functions.
>>>
>>> Rather than introducing more instances of the loop_p typedef
>>> I'd suggest to use loop *.  It has at least two advantages:
>>> it's clearer (it's obvious it refers to a pointer), and lends
>>> itself more readily to making code const-correct by declaring
>>> the control variable const: for (const class loop *loop: ...)
>>> while avoiding the mistake of using const loop_p loop to
>>> declare a pointer to a const loop.
>>>
>>
>> Thanks for the suggestions, Martin!  Will update them in V2.
>>
>> With some experiments, I noticed that even provided const_iterator
>> like:
>>
>>iterator
>>begin ()
>>{
>>  return iterator (*this, 0);
>>}
>>
>> +  const_iterator
>> +  begin () const
>> +  {
>> +return const_iterator (*this, 0);
>> +  }
>>
>> for (const class loop *loop: ...) will still use iterator instead
>> of const_iterator pair.  We have to make the code look like:
>>
>>   const auto& const_loops = loops_list (...);
>>   for (const class loop *loop: const_loops)
>>
>> or
>>   template constexpr const T &as_const(T &t) noexcept { return 
>> t; }
>>   for (const class loop *loop: as_const(loops_list...))
>>
>> Does it look good to add below as_const along with loops_list in cfgloop.h?
>>
>> +/* Provide the functionality of std::as_const to support range-based for
>> +   to use const iterator.  (We can't use std::as_const itself because it's
>> +   a C++17 feature.)  */
>> +template 
>> +constexpr const T &
>> +as_const (T &t) noexcept
> 
> The noexcept is not needed because GCC is built -fno-exceptions. For
> consistency with all the other code that doesn't use noexcept, it
> should probably not be there.
> 

Thanks for pointing out!   Fixed it in v2.

>> +{
>> +  return t;
>> +}
>> +
> 
> That's one option. Another option (which could coexist with as_const)
> is to add cbegin() and cend() members, which are not overloaded for
> const and non-const, and so always return a const_iterator:
> 
> const_iterator cbegin () const { return const_iterator (*this, 0); }
> iterator begin () const { return cbegin(); }
> 
> And similarly for `end () const` and `cend () const`.
> 

Thanks for the suggestion.  As you pointed out in the later reply, the
range-based for loop doesn't use cbegin and cend, so I didn't add them
in v2.

BR,
Kewen


[committed] aarch64: Tweak old vect-* tests to avoid new FAILs

2021-07-20 Thread Richard Sandiford via Gcc-patches
I'm not sure what these test were originally designed to test.
vaddv and vmaxv seem to be testing for vectorisation, with associated
scan-assembler tests.  But they use arm_neon.h functions to test
the results, which would presumably also trip many of the scans.
That was probably what the split into vect-fmax-fmin.c and
vect-fmaxv-fminv-compile.c was supposed to avoid.

Anyway, the tests started failing after the recent change to allow
staged reductions for epilogue loops.  And epilogues came into play
because the reduction loops iterate LANES-1 rather than LANES times.
(vmaxv was trying to iterate LANES times, but the gimple optimisers
outsmarted it.  The other two explicitly had a count of LANES-1.)

Just suppressing epilogues causes other issues for vaddv and vmaxv.
The easiest fix therefore seemed to be to use an asm to hide the
initial value of the vmaxv loop (so that it really does iterate
LANES times) and then make the others match that style.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/testsuite/
PR testsuite/101506
* gcc.target/aarch64/vect-vmaxv.c: Use an asm to hide the
true initial value of the reduction from the vectorizer.
* gcc.target/aarch64/vect-vaddv.c: Likewise.  Make the vector
loop operate on exactly LANES (rather than LANES-1) iterations.
* gcc.target/aarch64/vect-fmaxv-fminv.x: Likewise.
---
 .../gcc.target/aarch64/vect-fmaxv-fminv.x | 20 +++
 gcc/testsuite/gcc.target/aarch64/vect-vaddv.c |  4 ++--
 gcc/testsuite/gcc.target/aarch64/vect-vmaxv.c |  2 +-
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x 
b/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x
index 0bc6ba494cf..d3ba31c425a 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fmaxv-fminv.x
@@ -5,8 +5,9 @@ typedef double *__restrict__ pRF64;
 float maxv_f32 (pRF32 a)
 {
   int i;
-  float s = a[0];
-  for (i=1;i<8;i++)
+  float s;
+  asm ("" : "=w" (s) : "0" (a[0]));
+  for (i=0;i<8;i++)
 s = (s > a[i] ? s :  a[i]);
 
   return s;
@@ -15,8 +16,9 @@ float maxv_f32 (pRF32 a)
 float minv_f32 (pRF32 a)
 {
   int i;
-  float s = a[0];
-  for (i=1;i<16;i++)
+  float s;
+  asm ("" : "=w" (s) : "0" (a[0]));
+  for (i=0;i<16;i++)
 s = (s < a[i] ? s :  a[i]);
 
   return s;
@@ -25,8 +27,9 @@ float minv_f32 (pRF32 a)
 double maxv_f64 (pRF64 a)
 {
   int i;
-  double s = a[0];
-  for (i=1;i<8;i++)
+  double s;
+  asm ("" : "=w" (s) : "0" (a[0]));
+  for (i=0;i<8;i++)
 s = (s > a[i] ? s :  a[i]);
 
   return s;
@@ -35,8 +38,9 @@ double maxv_f64 (pRF64 a)
 double minv_f64 (pRF64 a)
 {
   int i;
-  double s = a[0];
-  for (i=1;i<16;i++)
+  double s;
+  asm ("" : "=w" (s) : "0" (a[0]));
+  for (i=0;i<16;i++)
 s = (s < a[i] ? s :  a[i]);
 
   return s;
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vaddv.c 
b/gcc/testsuite/gcc.target/aarch64/vect-vaddv.c
index 41e9157dbec..3a12ae9706a 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-vaddv.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-vaddv.c
@@ -57,8 +57,8 @@ test_vaddv##SUFFIX##_##TYPE##x##LANES##_t (void)  
\
   /* Calculate linearly.  */   \
   for (i = 0; i < moves; i++)  \
 {  \
-  out_l[i] = input_##TYPE[i];  \
-  for (j = 1; j < LANES; j++)  \
+  asm ("" : "=r" (out_l[i]) : "0" (0));\
+  for (j = 0; j < LANES; j++)  \
out_l[i] += input_##TYPE[i + j];\
 }  \
\
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vmaxv.c 
b/gcc/testsuite/gcc.target/aarch64/vect-vmaxv.c
index 4280834ec4a..1bdea890d3e 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-vmaxv.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-vmaxv.c
@@ -36,7 +36,7 @@ test_v##MAXMIN##v##SUFFIX##_##TYPE##x##LANES##_t (void)   
\
   /* Calculate linearly.  */   \
   for (i = 0; i < moves; i++)  \
 {  \
-  out_l[i] = input_##TYPE[i];  \
+  asm ("" : "=r" (out_l[i]) : "0" (input_##TYPE[i]));  \
   for (j = 0; j < LANES; j++)  \
out_l[i] = input_##TYPE[i + j] CMP_OP out_l[i]  ?   \
  input_##TYPE[i + j] : out_l[i];   \


Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384]

2021-07-20 Thread Segher Boessenkool
Hi!

On Tue, Jul 13, 2021 at 09:30:43PM +0200, Jakub Jelinek wrote:
>   PR target/101384
>   * config/rs6000/rs6000-protos.h (easy_altivec_constant): Change return
>   type from bool to int.
>   * config/rs6000/rs6000.c (vspltis_constant): Fix up handling the
>   EASY_VECTOR_MSB case if either step or copies is not 1.
>   (vspltis_shifted): Fix comment typo.
>   (easy_altivec_constant): Change return type from bool to int, instead
>   of returning true return byte size of the element mode that should be
>   used to synthetize the constant.
>   * config/rs6000/predicates.md (easy_vector_constant_msb): Require
>   that vspltis_shifted is 0, handle the case where easy_altivec_constant
>   assumes using different vector mode from CONST_VECTOR's mode.
>   * config/rs6000/altivec.md (easy_vector_constant_msb splitter): Use
>   easy_altivec_constant to determine mode in which -1 >> -1 should be
>   performed, use rs6000_expand_vector_init instead of gen_vec_initv4sisi.
> 
>   * gcc.dg/pr101384.c: New test.
>   * gcc.target/powerpc/pr101384-1.c: New test.
>   * gcc.target/powerpc/pr101384-2.c: New test.

> -  if (mode == V4SFmode)
> +  switch (easy_altivec_constant (operands[1], mode))
>  {
> -  mode = V4SImode;
> -  dest = gen_lowpart (V4SImode, dest);
> +case 1: mode = V16QImode; break;
> +case 2: mode = V8HImode; break;
> +case 4: mode = V4SImode; break;
> +default: gcc_unreachable ();
>  }

case 1:
  mode = V16QImode;
  break;
etc. (and/or make a convenience function for it).

> --- gcc/testsuite/gcc.dg/pr101384.c.jj2021-07-13 13:45:42.971992584 
> +0200
> +++ gcc/testsuite/gcc.dg/pr101384.c   2021-07-13 13:45:32.427135184 +0200
> @@ -0,0 +1,39 @@
> +/* PR target/101384 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -Wno-psabi -w" } */

If you have -w anyway, do you / why do you still need -Wno-psabi?

> --- gcc/testsuite/gcc.target/powerpc/pr101384-1.c.jj  2021-07-13 
> 14:02:57.003030314 +0200
> +++ gcc/testsuite/gcc.target/powerpc/pr101384-1.c 2021-07-13 
> 14:02:40.868247714 +0200
> @@ -0,0 +1,79 @@
> +/* PR target/101384 */
> +/* { dg-do compile { target le } } */
> +/* { dg-options "-O2 -maltivec" } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M} 9 } } */

If you put (?p) at the front of your regex it gets "partial newline-
sensitive matching", which means that . will not match newlines.  See
https://www.tcl.tk/man/tcl/TclCmd/re_syntax.html#M95 for details.  Here,
you could also use
/* { dg-final { scan-assembler-times {\mvspltis[whb] [^,]*,-1\M} 9 } } */
(but that (more traditional) approach quickly becomes clumsy).

Okay for trunk with or without those improvements (with the switch
formatting fix though).  Thanks!


Segher


Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384]

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 20, 2021 at 09:48:26AM -0500, Segher Boessenkool wrote:
> > -  if (mode == V4SFmode)
> > +  switch (easy_altivec_constant (operands[1], mode))
> >  {
> > -  mode = V4SImode;
> > -  dest = gen_lowpart (V4SImode, dest);
> > +case 1: mode = V16QImode; break;
> > +case 2: mode = V8HImode; break;
> > +case 4: mode = V4SImode; break;
> > +default: gcc_unreachable ();
> >  }
> 
> case 1:
>   mode = V16QImode;
>   break;
> etc. (and/or make a convenience function for it).

Ok, will fix.

> > --- gcc/testsuite/gcc.dg/pr101384.c.jj  2021-07-13 13:45:42.971992584 
> > +0200
> > +++ gcc/testsuite/gcc.dg/pr101384.c 2021-07-13 13:45:32.427135184 +0200
> > @@ -0,0 +1,39 @@
> > +/* PR target/101384 */
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -Wno-psabi -w" } */
> 
> If you have -w anyway, do you / why do you still need -Wno-psabi?

I think not all of the -Wpsabi diagnostics is emitted with warning{,_at}
etc. that -w disables, others are emitted with inform.  The -Wno-psabi
also makes it clear that it is the psabi stuff that is what the testcase
cares about.  Whether -w is also needed or not is something I don't know,
in the past it certainly was needed on various architectures, but maybe it
got fixed and only -Wno-psabi would do the trick?
If so, perhaps we could replace all -Wno-psabi -w occurrences in testsuite
dg-options with just -Wno-psabi and see how far we get.
find testsuite/ -type f | xargs grep -- '-w -Wno-psabi' 2>/dev/null | grep -v 
ChangeLog | wc -l
49
find testsuite/ -type f | xargs grep -- '-Wno-psabi -w' 2>/dev/null | grep -v 
ChangeLog | wc -l
24

Jakub



Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-20 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> "H.J. Lu via Gcc-patches"  writes:
>> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
>>  wrote:
>>>
>>> "H.J. Lu via Gcc-patches"  writes:
>>> >> > + {
>>> >> > +   /* First generate subreg of word mode if the previous mode is
>>> >> > +  wider than word mode and word mode is wider than MODE.  */
>>> >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
>>> >> > +   prev_mode, 0);
>>> >> > +   prev_mode = word_mode;
>>> >> > + }
>>> >> > +  if (prev_rtx != nullptr)
>>> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>>> >>
>>> >> This should be lowpart_subreg, since 0 isn't the right offset for
>>> >> big-endian targets.  Using lowpart_subreg should also avoid the need
>>> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
>>> >> subregs of multiword values.
>>> >
>>> > I tried it.  It didn't work since it caused the LRA failure.   I replaced
>>> > simplify_gen_subreg with lowpart_subreg instead.
>>>
>>> What specifically went wrong?
>>
>> With vector broadcast, for
>> ---
>> extern void *ops;
>>
>> void
>> foo (int c)
>> {
>>   __builtin_memset (ops, c, 18);
>> }
>> ---
>> we generate HI from V16QI.   With a single lowpart_subreg, I get
>>
>> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
>> (const_int 16 [0x10])) [0 MEM  [(void
>> *)ops.0_1]+16 S2 A8])
>> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>>  (nil))
>>
>> instead of
>>
>> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
>> (const_int 16 [0x10])) [0 MEM  [(void
>> *)ops.0_1]+16 S2 A8])
>> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>>  (nil))
>>
>> IRA and LRA fail to reload:
>>
>> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
>> (const_int 16 [0x10])) [0 MEM  [(void
>> *)ops.0_1]+16 S2 A8])
>> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>>  (nil))
>>
>> since ix86_can_change_mode_class has
>>
>>   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
>> {
>>   /* Vector registers do not support QI or HImode loads.  If we don't
>>  disallow a change to these modes, reload will assume it's ok to
>>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>>  the vec_dupv4hi pattern.  */
>>   if (GET_MODE_SIZE (from) < 4)
>> return false;
>> }
>
> Ah!  OK.  In that case, maybe we should have something like:
>
>if (REG_P (prev_rtx)
>&& HARD_REGISTER_P (prev_rtx)
>&& REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode))

Sorry, make that last line:

  && lowpart_subreg_regno (REGNO (prev_rtx), prev->mode, mode) < 0

where lowpart_subreg_regno is a new wrapper around simplify_subreg_regno
that uses subreg_lowpart_offset (mode, prev->mode) as the offset.

Thanks,
Richard

>  prev_rtx = copy_to_reg (prev_rtx);
>
> and then just have the single lowpart_subreg after that.
>
> Thanks,
> Richard


Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384]

2021-07-20 Thread Segher Boessenkool
On Tue, Jul 20, 2021 at 05:01:00PM +0200, Jakub Jelinek wrote:
> On Tue, Jul 20, 2021 at 09:48:26AM -0500, Segher Boessenkool wrote:
> > > --- gcc/testsuite/gcc.dg/pr101384.c.jj2021-07-13 13:45:42.971992584 
> > > +0200
> > > +++ gcc/testsuite/gcc.dg/pr101384.c   2021-07-13 13:45:32.427135184 
> > > +0200
> > > @@ -0,0 +1,39 @@
> > > +/* PR target/101384 */
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O2 -Wno-psabi -w" } */
> > 
> > If you have -w anyway, do you / why do you still need -Wno-psabi?
> 
> I think not all of the -Wpsabi diagnostics is emitted with warning{,_at}
> etc. that -w disables, others are emitted with inform.

/* An informative note at LOCATION.  Use this for additional details on an error
   message.  */
void
inform (location_t location, const char *gmsgid, ...)

So inform is misused in -Wpsabi?

If using it like this is deemed correct, then inhibit_warnings should
turn it off just like it turns off all *stronger* warnings.  The current
situation doesn't make much sense.

> The -Wno-psabi
> also makes it clear that it is the psabi stuff that is what the testcase
> cares about.  Whether -w is also needed or not is something I don't know,
> in the past it certainly was needed on various architectures, but maybe it
> got fixed and only -Wno-psabi would do the trick?
> If so, perhaps we could replace all -Wno-psabi -w occurrences in testsuite
> dg-options with just -Wno-psabi and see how far we get.
> find testsuite/ -type f | xargs grep -- '-w -Wno-psabi' 2>/dev/null | grep -v 
> ChangeLog | wc -l
> 49
> find testsuite/ -type f | xargs grep -- '-Wno-psabi -w' 2>/dev/null | grep -v 
> ChangeLog | wc -l
> 24

Note we will disable the -Wpsabi vector warnings for rs6000 from GCC 12
on.  It should have been done earlier, but we need a time machine to
install a time machine in the past, etc. :-)


Segher


[PATCH] c++tools, configury: Configure with C++; test checking status [PR98821].

2021-07-20 Thread Iain Sandoe
Hi Folks,

Following Jakub’s suggestions (on irc) here is a patch that works around
misconfiguration of the c++tools directory present for at least Linux and Darwin
(probably on any platform that does not have typedefs for the inet structs in 
its
 system headers).

This also pulls in tests for the checking configure flags (copied from libcpp) 
and the
implementations of gcc_assert (copied from gcc).  Actually, there’s not much 
original
code here - but the combination is new, of course.

Tested lightly on Linux and Darwin for master w/wout —disable-checking and on
gcc-11 with default (release).  At least the configures now seem to DTRT for 
those.

OK for master and GCC-11.2?
 (if a complete regtest for passes for both)

thanks
Iain




The c++tools configure fragments need to be built with a C++ compiler.

In addition, the stand-alone server uses diagnostic mechanisms in common
with GCC, but needs to define implementations of the asserts and
supporting output functions.

Signed-off-by: Iain Sandoe 

PR c++/98821 - modules : c++tools configures with CC but code fragments assume 
CXX.

PR c++/98821

c++tools/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Configure using C++.  Pull logic to
detect enabled checking modes.
* server.cc (AI_NUMERICSERV): Define a fallback value.
(gcc_assert): New.
(gcc_checking_assert): New.
(gcc_unreachable): New.
(fancy_abort): Only build when checking is enabled.

Co-authored-by: Jakub Jelinek 
---
 c++tools/config.h.in  |  10 +
 c++tools/configure| 766 +++---
 c++tools/configure.ac |  58 
 c++tools/server.cc|  35 ++
 4 files changed, 228 insertions(+), 641 deletions(-)

diff --git a/c++tools/configure.ac b/c++tools/configure.ac
index 70fcb641db9..cb67dabf191 100644
--- a/c++tools/configure.ac
+++ b/c++tools/configure.ac
@@ -41,6 +41,8 @@ MISSING=`cd $ac_aux_dir && ${PWDCMD-pwd}`/missing
 AC_CHECK_PROGS([AUTOCONF], [autoconf], [$MISSING autoconf])
 AC_CHECK_PROGS([AUTOHEADER], [autoheader], [$MISSING autoheader])
 
+AC_LANG(C++)
+
 dnl Enabled by default
 AC_MSG_CHECKING([whether to build C++ tools])
   AC_ARG_ENABLE(c++-tools, 
@@ -67,6 +69,62 @@ AC_MSG_RESULT([$maintainer_mode])
 test "$maintainer_mode" = yes && MAINTAINER=yes
 AC_SUBST(MAINTAINER)
 
+# Enable expensive internal checks
+is_release=
+if test -f $srcdir/../gcc/DEV-PHASE \
+   && test x"`cat $srcdir/../gcc/DEV-PHASE`" != xexperimental; then
+  is_release=yes
+fi
+
+AC_ARG_ENABLE(checking,
+[AS_HELP_STRING([[--enable-checking[=LIST]]],
+   [enable expensive run-time checks.  With LIST,
+enable only specific categories of checks.
+Categories are: yes,no,all,none,release.
+Flags are: misc,valgrind or other strings])],
+[ac_checking_flags="${enableval}"],[
+# Determine the default checks.
+if test x$is_release = x ; then
+  ac_checking_flags=yes
+else
+  ac_checking_flags=release
+fi])
+IFS="${IFS=}"; ac_save_IFS="$IFS"; IFS="$IFS,"
+for check in release $ac_checking_flags
+do
+   case $check in
+   # these set all the flags to specific states
+   yes|all) ac_checking=1 ; ac_assert_checking=1 ; ac_valgrind_checking= ;;
+   no|none) ac_checking= ; ac_assert_checking= ; ac_valgrind_checking= ;;
+   release) ac_checking= ; ac_assert_checking=1 ; ac_valgrind_checking= ;;
+   # these enable particular checks
+   assert) ac_assert_checking=1 ;;
+   misc) ac_checking=1 ;;
+   valgrind) ac_valgrind_checking=1 ;;
+   # accept
+   *) ;;
+   esac
+done
+IFS="$ac_save_IFS"
+
+if test x$ac_checking != x ; then
+  AC_DEFINE(CHECKING_P, 1,
+[Define to 1 if you want more run-time sanity checks.])
+else
+  AC_DEFINE(CHECKING_P, 0)
+fi
+
+if test x$ac_assert_checking != x ; then
+  AC_DEFINE(ENABLE_ASSERT_CHECKING, 1,
+[Define if you want assertions enabled.  This is a cheap check.])
+fi
+
+if test x$ac_valgrind_checking != x ; then
+  AC_DEFINE(ENABLE_VALGRIND_CHECKING, 1,
+[Define if you want to workaround valgrind (a memory checker) warnings about
+ possible memory leaks because of libcpp use of interior pointers.])
+fi
+
 # Check whether --enable-default-pie was given.
 AC_ARG_ENABLE(default-pie,
 [AS_HELP_STRING([--enable-default-pie],
diff --git a/c++tools/server.cc b/c++tools/server.cc
index fae3e78dc5d..3056352e24b 100644
--- a/c++tools/server.cc
+++ b/c++tools/server.cc
@@ -61,6 +61,10 @@ along with GCC; see the file COPYING3.  If not see
 # define gai_strerror(X) ""
 #endif
 
+#ifndef AI_NUMERICSERV
+#define AI_NUMERICSERV 0
+#endif
+
 #include 
 
 // Select or epoll
@@ -92,6 +96,35 @@ along with GCC; see the file COPYING3.  If not see
 #define DIR_SEPARATOR '/'
 #endif
 
+/* Imported from libcpp/system.h
+   Use gcc_assert(EXPR) to test invariants.  */
+#if ENABLE_ASSERT_CHECKING || CHECKING_P
+#define gcc_assert(EXPR)  

Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384]

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 20, 2021 at 10:17:08AM -0500, Segher Boessenkool wrote:
> On Tue, Jul 20, 2021 at 05:01:00PM +0200, Jakub Jelinek wrote:
> > On Tue, Jul 20, 2021 at 09:48:26AM -0500, Segher Boessenkool wrote:
> > > > --- gcc/testsuite/gcc.dg/pr101384.c.jj  2021-07-13 13:45:42.971992584 
> > > > +0200
> > > > +++ gcc/testsuite/gcc.dg/pr101384.c 2021-07-13 13:45:32.427135184 
> > > > +0200
> > > > @@ -0,0 +1,39 @@
> > > > +/* PR target/101384 */
> > > > +/* { dg-do run } */
> > > > +/* { dg-options "-O2 -Wno-psabi -w" } */
> > > 
> > > If you have -w anyway, do you / why do you still need -Wno-psabi?
> > 
> > I think not all of the -Wpsabi diagnostics is emitted with warning{,_at}
> > etc. that -w disables, others are emitted with inform.
> 
> /* An informative note at LOCATION.  Use this for additional details on an 
> error
>message.  */
> void
> inform (location_t location, const char *gmsgid, ...)
> 
> So inform is misused in -Wpsabi?

I bet it is done intentionally not to trigger -Werror, e.g. i386.c uses it
for the notes that some old GCC version had different ABI for certain
passing than the current one.

> If using it like this is deemed correct, then inhibit_warnings should
> turn it off just like it turns off all *stronger* warnings.  The current
> situation doesn't make much sense.
> 
> > The -Wno-psabi
> > also makes it clear that it is the psabi stuff that is what the testcase
> > cares about.  Whether -w is also needed or not is something I don't know,
> > in the past it certainly was needed on various architectures, but maybe it
> > got fixed and only -Wno-psabi would do the trick?
> > If so, perhaps we could replace all -Wno-psabi -w occurrences in testsuite
> > dg-options with just -Wno-psabi and see how far we get.
> > find testsuite/ -type f | xargs grep -- '-w -Wno-psabi' 2>/dev/null | grep 
> > -v ChangeLog | wc -l
> > 49
> > find testsuite/ -type f | xargs grep -- '-Wno-psabi -w' 2>/dev/null | grep 
> > -v ChangeLog | wc -l
> > 24
> 
> Note we will disable the -Wpsabi vector warnings for rs6000 from GCC 12
> on.  It should have been done earlier, but we need a time machine to
> install a time machine in the past, etc. :-)

I could understand dropping -Wpsabi warnings of the kind that some very old
GCC version had different ABI if sufficient number of releases passed since
then, but at least x86 also has -Wpsabi warnings that returning a certain
vector or taking certain vector as parameter has different ABI without
some particular ISA option.  And those options are valid all the time and
something people should be aware, e.g. returning 16-byte vector without
-msse, or 32-byte vector without -mavx, or 64-byte vector without -mavx512f
- without those ISA switches they are passed/returned as generic vectors,
while with that option in vector registers.

Jakub



[committed] libstdc++: fix is_default_constructible for hash containers [PR 100863]

2021-07-20 Thread Jonathan Wakely via Gcc-patches

On 02/06/21 13:35 +0100, Jonathan Wakely wrote:

The allocator, hash function and equality function should all be
value-initialized by the default constructor of an unordered container.
Do it in the EBO helper, so we don't have to get it right in multiple
places.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/100863
PR libstdc++/65816
* include/bits/hashtable_policy.h (_Hashtable_ebo_helper):
Value-initialize subobject.
* testsuite/23_containers/unordered_map/allocator/default_init.cc:
Remove XFAIL.
* testsuite/23_containers/unordered_set/allocator/default_init.cc:
Remove XFAIL.



The recent change to _Hashtable_ebo_helper for this PR broke the
is_default_constructible trait for a hash container with a non-default
constructible allocator. That happens because the constructor needs to
be user-provided in order to initialize the member, and so is not
defined as deleted when the type is not default constructible.

By making _Hashtable derive from _Enable_special_members we can ensure
that the default constructor for the std::unordered_xxx containers is
deleted when it would be ill-formed. This makes the trait give the
correct answer.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/100863
* include/bits/hashtable.h (_Hashtable): Conditionally delete
default constructor by deriving from _Enable_special_members.
* testsuite/23_containers/unordered_map/cons/default.cc: New test.
* testsuite/23_containers/unordered_set/cons/default.cc: New test.


Tested powerpc64le-linux. Committed to trunk.


commit 89ec3b67dbe856a447d068b053bc19559f136f43
Author: Jonathan Wakely 
Date:   Tue Jul 20 15:20:41 2021

libstdc++: fix is_default_constructible for hash containers [PR 100863]

The recent change to _Hashtable_ebo_helper for this PR broke the
is_default_constructible trait for a hash container with a non-default
constructible allocator. That happens because the constructor needs to
be user-provided in order to initialize the member, and so is not
defined as deleted when the type is not default constructible.

By making _Hashtable derive from _Enable_special_members we can ensure
that the default constructor for the std::unordered_xxx containers is
deleted when it would be ill-formed. This makes the trait give the
correct answer.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/100863
* include/bits/hashtable.h (_Hashtable): Conditionally delete
default constructor by deriving from _Enable_special_members.
* testsuite/23_containers/unordered_map/cons/default.cc: New test.
* testsuite/23_containers/unordered_set/cons/default.cc: New test.

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index dfc2a2a7800..adb59213f2d 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -33,6 +33,7 @@
 #pragma GCC system_header
 
 #include 
+#include 
 #if __cplusplus > 201402L
 # include 
 #endif
@@ -48,6 +49,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		   // Mandatory to have erase not throwing.
 		   __is_nothrow_invocable>>;
 
+  // Helper to conditionally delete the default constructor.
+  // The _Hash_node_base type is used to distinguish this specialization
+  // from any other potentially-overlapping subobjects of the hashtable.
+  template
+using _Hashtable_enable_default_ctor
+  = _Enable_special_members<__and_,
+   is_default_constructible<_Hash>,
+   is_default_constructible<_Allocator>>{},
+true, true, true, true, true,
+__detail::_Hash_node_base>;
+
   /**
*  Primary class template _Hashtable.
*
@@ -183,7 +195,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   private __detail::_Hashtable_alloc<
 	__alloc_rebind<_Alloc,
 		   __detail::_Hash_node<_Value,
-	_Traits::__hash_cached::value>>>
+	_Traits::__hash_cached::value>>>,
+  private _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>
 {
   static_assert(is_same::type, _Value>::value,
 	  "unordered container must have a non-const, non-volatile value_type");
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/cons/default.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/cons/default.cc
new file mode 100644
index 000..e4f836fde3e
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/cons/default.cc
@@ -0,0 +1,33 @@
+// { dg-do compile { target c++11 } }
+#include 
+
+static_assert( std::is_default_constructible>{}, "" );
+
+template
+  struct NoDefaultConsAlloc
+  {
+using value_type = T;
+
+NoDefaultConsAlloc(int) noexcept { }
+
+template
+  NoDefaultConsAlloc(const NoDefaultConsAlloc&) { }
+
+T *allocate(std::size_t n)
+{ return std::allocator().alloc

[PATCH] unroll: Avoid unnecessary tail loops for constant niters

2021-07-20 Thread Richard Sandiford via Gcc-patches
unroll and jam can decide to unroll the outer loop of a nest like:

  for (int j = 0; j < n; ++j)
for (int i = 0; i < n; ++i)
  x[i] += __builtin_expf (y[j][i]);

It then uses a tail loop to handle any left-over iterations.

However, the code is structured so that this tail loop is always used.
If n is a multiple of the unroll factor UF, the final UF iterations will
use the tail loop rather than the unrolled loop.

“Fixing” that for variable loop counts would mean introducing another
runtime test: a branch around the tail loop if there are no more
iterations.  There's at least an argument that the overhead of doing
that test might not pay for itself.

But we use this structure even if the iteration count is provably
a multiple of UF at compile time.  E.g. with s/n/100/ and an
unroll factor of 2, the first 98 iterations use the unrolled loop
and the final 2 iterations use the original loop.

This patch makes the unroller avoid a tail loop in that case.
The end result seemed easier to follow if variables were declared
at the point of initialisation, so that it's more obvious which
ones are meaningful even when there's no tail loop.

Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too.
OK to install if testing passes?

Richard


gcc/
* tree-ssa-loop-manip.c (determine_exit_conditions): Return a null
exit condition if no tail loop is needed, and if the original exit
condition should therefore be kept as-is.
(tree_transform_and_unroll_loop): Handle that case here too.

gcc/testsuite/
* gcc.dg/unroll-9.c: New test/
---
 gcc/testsuite/gcc.dg/unroll-9.c |  12 ++
 gcc/tree-ssa-loop-manip.c   | 306 +---
 2 files changed, 176 insertions(+), 142 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/unroll-9.c

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 28ae1316fa0..41f9872ca10 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -997,8 +997,10 @@ can_unroll_loop_p (class loop *loop, unsigned factor,
 /* Determines the conditions that control execution of LOOP unrolled FACTOR
times.  DESC is number of iterations of LOOP.  ENTER_COND is set to
condition that must be true if the main loop can be entered.
+   If the loop does not always iterate an exact multiple of FACTOR times,
EXIT_BASE, EXIT_STEP, EXIT_CMP and EXIT_BOUND are set to values describing
-   how the exit from the unrolled loop should be controlled.  */
+   how the exit from the unrolled loop should be controlled.  Otherwise,
+   the trees are set to null and EXIT_CMP is set to ERROR_MARK.  */
 
 static void
 determine_exit_conditions (class loop *loop, class tree_niter_desc *desc,
@@ -1079,6 +1081,16 @@ determine_exit_conditions (class loop *loop, class 
tree_niter_desc *desc,
   assum = fold_build2 (cmp, boolean_type_node, base, bound);
   cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, assum, cond);
 
+  if (integer_nonzerop (cond)
+  && integer_zerop (desc->may_be_zero))
+{
+  /* Convert the latch count to an iteration count.  */
+  tree niter = fold_build2 (PLUS_EXPR, type, desc->niter,
+   build_one_cst (type));
+  if (multiple_of_p (type, niter, bigstep))
+   return;
+}
+
   cond = force_gimple_operand (unshare_expr (cond), &stmts, false, NULL_TREE);
   if (stmts)
 gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
@@ -1234,137 +1246,138 @@ tree_transform_and_unroll_loop (class loop *loop, 
unsigned factor,
transform_callback transform,
void *data)
 {
-  gcond *exit_if;
-  tree ctr_before, ctr_after;
-  tree enter_main_cond, exit_base, exit_step, exit_bound;
-  enum tree_code exit_cmp;
-  gphi *phi_old_loop, *phi_new_loop, *phi_rest;
-  gphi_iterator psi_old_loop, psi_new_loop;
-  tree init, next, new_init;
-  class loop *new_loop;
-  basic_block rest, exit_bb;
-  edge old_entry, new_entry, old_latch, precond_edge, new_exit;
-  edge new_nonexit, e;
-  gimple_stmt_iterator bsi;
-  use_operand_p op;
-  bool ok;
-  unsigned i;
-  profile_probability prob, prob_entry, scale_unrolled;
-  profile_count freq_e, freq_h;
   gcov_type new_est_niter = niter_for_unrolled_loop (loop, factor);
   unsigned irr = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
-  auto_vec to_remove;
 
+  enum tree_code exit_cmp;
+  tree enter_main_cond, exit_base, exit_step, exit_bound;
   determine_exit_conditions (loop, desc, factor,
 &enter_main_cond, &exit_base, &exit_step,
 &exit_cmp, &exit_bound);
+  bool single_loop_p = !exit_base;
 
   /* Let us assume that the unrolled loop is quite likely to be entered.  */
+  profile_probability prob_entry;
   if (integer_nonzerop (enter_main_cond))
 prob_entry = profile_probability::always ();
   else
 prob_entry = profile_probability::guessed_always ()
  

Re: [PATCH] c++tools, configury: Configure with C++; test checking status [PR98821].

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 20, 2021 at 04:21:34PM +0100, Iain Sandoe wrote:

> --- a/c++tools/configure.ac
> +++ b/c++tools/configure.ac
> @@ -41,6 +41,8 @@ MISSING=`cd $ac_aux_dir && ${PWDCMD-pwd}`/missing
>  AC_CHECK_PROGS([AUTOCONF], [autoconf], [$MISSING autoconf])
>  AC_CHECK_PROGS([AUTOHEADER], [autoheader], [$MISSING autoheader])
>  
> +AC_LANG(C++)
> +
>  dnl Enabled by default
>  AC_MSG_CHECKING([whether to build C++ tools])
>AC_ARG_ENABLE(c++-tools, 
> @@ -67,6 +69,62 @@ AC_MSG_RESULT([$maintainer_mode])
>  test "$maintainer_mode" = yes && MAINTAINER=yes
>  AC_SUBST(MAINTAINER)
>  
> +# Enable expensive internal checks
> +is_release=
> +if test -f $srcdir/../gcc/DEV-PHASE \
> +   && test x"`cat $srcdir/../gcc/DEV-PHASE`" != xexperimental; then
> +  is_release=yes
> +fi
> +
> +AC_ARG_ENABLE(checking,
> +[AS_HELP_STRING([[--enable-checking[=LIST]]],
> + [enable expensive run-time checks.  With LIST,
> +  enable only specific categories of checks.
> +  Categories are: yes,no,all,none,release.
> +  Flags are: misc,valgrind or other strings])],
> +[ac_checking_flags="${enableval}"],[
> +# Determine the default checks.
> +if test x$is_release = x ; then
> +  ac_checking_flags=yes
> +else
> +  ac_checking_flags=release
> +fi])
> +IFS="${IFS=  }"; ac_save_IFS="$IFS"; IFS="$IFS,"
> +for check in release $ac_checking_flags
> +do
> + case $check in
> + # these set all the flags to specific states
> + yes|all) ac_checking=1 ; ac_assert_checking=1 ; ac_valgrind_checking= ;;
> + no|none) ac_checking= ; ac_assert_checking= ; ac_valgrind_checking= ;;
> + release) ac_checking= ; ac_assert_checking=1 ; ac_valgrind_checking= ;;
> + # these enable particular checks
> + assert) ac_assert_checking=1 ;;
> + misc) ac_checking=1 ;;
> + valgrind) ac_valgrind_checking=1 ;;
> + # accept
> + *) ;;
> + esac
> +done
> +IFS="$ac_save_IFS"
> +
> +if test x$ac_checking != x ; then
> +  AC_DEFINE(CHECKING_P, 1,
> +[Define to 1 if you want more run-time sanity checks.])
> +else
> +  AC_DEFINE(CHECKING_P, 0)
> +fi
> +
> +if test x$ac_assert_checking != x ; then
> +  AC_DEFINE(ENABLE_ASSERT_CHECKING, 1,
> +[Define if you want assertions enabled.  This is a cheap check.])
> +fi
> +
> +if test x$ac_valgrind_checking != x ; then
> +  AC_DEFINE(ENABLE_VALGRIND_CHECKING, 1,
> +[Define if you want to workaround valgrind (a memory checker) warnings about
> + possible memory leaks because of libcpp use of interior pointers.])
> +fi

I guess we could simplify it, I think at least right now we only care
about ENABLE_ASSERT_CHECKING and nothing else, so the is_release computation
could go, make ac_checking_flags just default to yes, drop the ac_checking,
ac_valgrind_checking variables and simplify the case so that it perhaps
handles all we care together.  That would be
case $check in
yes|all|release|assert) ac_assert_checking=1 ; ;;
no|none) ac_assert_checking= ; ;;
*) ;;
esac
or so. and then only AC_DEFINE ENABLE_ASSERT_CHECKING and from server.cc
drop the CHECKING_P stuff.
> @@ -92,6 +96,35 @@ along with GCC; see the file COPYING3.  If not see
>  #define DIR_SEPARATOR '/'
>  #endif
>  
> +/* Imported from libcpp/system.h
> +   Use gcc_assert(EXPR) to test invariants.  */
> +#if ENABLE_ASSERT_CHECKING || CHECKING_P
> +#define gcc_assert(EXPR)\
> +   ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__), 0 : 0))
> +#elif (GCC_VERSION >= 4005)
> +#define gcc_assert(EXPR)\
> +  ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0 : 0))
> +#else
> +/* Include EXPR, so that unused variable warnings do not occur.  */
> +#define gcc_assert(EXPR) ((void)(0 && (EXPR)))
> +#endif
> +
> +#if CHECKING_P
> +#define gcc_checking_assert(EXPR) gcc_assert (EXPR)
> +#else
> +/* N.B.: in release build EXPR is not evaluated.  */
> +#define gcc_checking_assert(EXPR) ((void)(0 && (EXPR)))
> +#endif

I'd drop the gcc_checking_assert macro, we don't use it...

Otherwise LGTM.

Jakub



[PATCH] unroll: Run VN on unrolled-and-jammed loops

2021-07-20 Thread Richard Sandiford via Gcc-patches
Unroll and jam can sometimes leave redundancies.  E.g. for:

  for (int j = 0; j < 100; ++j)
for (int i = 0; i < 100; ++i)
  x[i] += y[i] * z[j][i];

the new loop will do the equivalent of:

  for (int j = 0; j < 100; j += 2)
for (int i = 0; i < 100; ++i)
  {
x[i] += y[i] * z[j][i];
x[i] += y[i] * z[j + 1][i];
  }

with two reads of y[i] and with a round trip through memory for x[i].

At the moment these redundancies survive till vectorisation, so if
vectorisation succeeds, we're reliant on being able to remove the
redundancies from the vector form.  This can be hard to do if
a vector loop uses predication.  E.g. on SVE we end up with:

.L3:
ld1wz3.s, p0/z, [x3, x0, lsl 2]
ld1wz0.s, p0/z, [x5, x0, lsl 2]
ld1wz1.s, p0/z, [x2, x0, lsl 2]
mad z1.s, p1/m, z0.s, z3.s
ld1wz2.s, p0/z, [x4, x0, lsl 2]
st1wz1.s, p0, [x3, x0, lsl 2]// store to x[i]
ld1wz1.s, p0/z, [x3, x0, lsl 2]  // load back from x[i]
mad z0.s, p1/m, z2.s, z1.s
st1wz0.s, p0, [x3, x0, lsl 2]
add x0, x0, x6
whilelo p0.s, w0, w1
b.any   .L3

This patch runs a value-numbering pass on loops after a successful
unroll-and-jam, which gets rid of the unnecessary load and gives
a more accurate idea of vector costs.  Unfortunately the redundant
store still persists without a pre-vect DSE, but that feels like
a separate issue.

Note that the pass requires the loop to have a single exit,
hence the simple calculation of exit_bbs.

Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too.
OK to install if testing passes?

Richard


gcc/
* gimple-loop-jam.c: Include tree-ssa-sccvn.h.
(tree_loop_unroll_and_jam): Run value-numbering on a loop that
has been successfully unrolled.

gcc/testsuite/
* gcc.dg/unroll-10.c: New test.
---
 gcc/gimple-loop-jam.c| 14 +-
 gcc/testsuite/gcc.dg/unroll-10.c | 13 +
 2 files changed, 22 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/unroll-10.c

diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
index 4842f0dff80..544ad779dd6 100644
--- a/gcc/gimple-loop-jam.c
+++ b/gcc/gimple-loop-jam.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-data-ref.h"
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-vectorizer.h"
+#include "tree-ssa-sccvn.h"
 
 /* Unroll and Jam transformation

@@ -487,7 +488,7 @@ static unsigned int
 tree_loop_unroll_and_jam (void)
 {
   class loop *loop;
-  bool changed = false;
+  unsigned int todo = 0;
 
   gcc_assert (scev_initialized_p ());
 
@@ -591,7 +592,11 @@ tree_loop_unroll_and_jam (void)
&desc);
  free_original_copy_tables ();
  fuse_loops (outer->inner);
- changed = true;
+ todo |= TODO_cleanup_cfg;
+
+ auto_bitmap exit_bbs;
+ bitmap_set_bit (exit_bbs, single_dom_exit (outer)->dest->index);
+ todo |= do_rpo_vn (cfun, loop_preheader_edge (outer), exit_bbs);
}
 
   loop_nest.release ();
@@ -599,13 +604,12 @@ tree_loop_unroll_and_jam (void)
   free_data_refs (datarefs);
 }
 
-  if (changed)
+  if (todo)
 {
   scev_reset ();
   free_dominance_info (CDI_DOMINATORS);
-  return TODO_cleanup_cfg;
 }
-  return 0;
+  return todo;
 }
 
 /* Pass boilerplate */
diff --git a/gcc/testsuite/gcc.dg/unroll-10.c b/gcc/testsuite/gcc.dg/unroll-10.c
new file mode 100644
index 000..0559915f2fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/unroll-10.c
@@ -0,0 +1,13 @@
+/* { dg-options "-O3 -fdump-tree-unrolljam" } */
+
+void
+f (int *restrict x, int *restrict y, int z[restrict 100][100])
+{
+  for (int j = 0; j < 100; ++j)
+for (int i = 0; i < 100; ++i)
+  x[i] += y[i] * z[j][i];
+}
+
+/* The loop should be unrolled 2 times, leaving one load from x,
+   one load from y and 2 loads from z.  */
+/* { dg-final { scan-tree-dump-times { = \(*\*} 4 "unrolljam" } } */


Re: [PATCH] correct range of stpcpy result (PR 101397)

2021-07-20 Thread Jeff Law via Gcc-patches




On 7/14/2021 7:49 PM, Martin Sebor via Gcc-patches wrote:

Access warnings look through calls to the subset of built-ins
that return one of their pointer arguments to find the object
the pointer it points to and its offset.  The computation is
wrong for functions like stpcpy, stpncpy and mempcpy that
return a pointer plus some offset, and leads to a false positive
-Warray-bounds in Glibc with the recent refactoring of the warning
to take advantage of this logic.

The attached patch corrects this mistake by accounting for this
property of these functions while at the same time constraining
the offset to the size of the source argument for better
accuracy.

Tested on x86_64-linux and by also building Glibc there.

Martin

gcc-101397.diff

PR middle-end/101397 - spurious warning writing to the result of stpcpy minus 1


gcc/ChangeLog:

PR middle-end/101397
* builtins.c (gimple_call_return_array): Add argument.  Correct
offsets for memchr, mempcpy, stpcpy, and stpncpy.
(compute_objsize_r): Adjust offset computation for argument returning
built-ins.

gcc/testsuite/ChangeLog:

PR middle-end/101397
* gcc.dg/Warray-bounds-80.c: New test.
* gcc.dg/Warray-bounds-81.c: New test.
* gcc.dg/Warray-bounds-82.c: New test.
* gcc.dg/Warray-bounds-83.c: New test.
* gcc.dg/Warray-bounds-84.c: New test.
* gcc.dg/Wstringop-overflow-46.c: Adjust expected output.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39ab139b7e1..170d776c410 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5200,12 +5200,19 @@ get_offset_range (tree x, gimple *stmt, offset_int 
r[2], range_query *rvals)
  /* Return the argument that the call STMT to a built-in function returns
 or null if it doesn't.  On success, set OFFRNG[] to the range of offsets
 from the argument reflected in the value returned by the built-in if it
-   can be determined, otherwise to 0 and HWI_M1U respectively.  */
+   can be determined, otherwise to 0 and HWI_M1U respectively.  Set
+   *PAST_END for functions like mempcpy that might return a past the end
+   pointer (most functions return a dereferenceable pointer to an existing
+   element of an array).  */
  
  static tree

-gimple_call_return_array (gimple *stmt, offset_int offrng[2],
+gimple_call_return_array (gimple *stmt, offset_int offrng[2], bool *past_end,
  range_query *rvals)
  {
+  /* Clear and set below for the rare function(s) that might return
+ a past-the-end pointer.  */
+  *past_end = false;
+
{
  /* Check for attribute fn spec to see if the function returns one
 of its arguments.  */
@@ -5213,6 +5220,7 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
  unsigned int argno;
  if (fnspec.returns_arg (&argno))
{
+   /* Functions return the first argument (not a range).  */
offrng[0] = offrng[1] = 0;
return gimple_call_arg (stmt, argno);
}
@@ -5242,6 +5250,7 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
if (gimple_call_num_args (stmt) != 2)
return NULL_TREE;
  
+  /* Allocation functions return a pointer to the beginning.  */

offrng[0] = offrng[1] = 0;
return gimple_call_arg (stmt, 1);
  }
@@ -5253,10 +5262,6 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
  case BUILT_IN_MEMMOVE:
  case BUILT_IN_MEMMOVE_CHK:
  case BUILT_IN_MEMSET:
-case BUILT_IN_STPCPY:
-case BUILT_IN_STPCPY_CHK:
-case BUILT_IN_STPNCPY:
-case BUILT_IN_STPNCPY_CHK:
  case BUILT_IN_STRCAT:
  case BUILT_IN_STRCAT_CHK:
  case BUILT_IN_STRCPY:
@@ -5265,18 +5270,34 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
  case BUILT_IN_STRNCAT_CHK:
  case BUILT_IN_STRNCPY:
  case BUILT_IN_STRNCPY_CHK:
+  /* Functions return the first argument (not a range).  */
offrng[0] = offrng[1] = 0;
return gimple_call_arg (stmt, 0);
  
  case BUILT_IN_MEMPCPY:

  case BUILT_IN_MEMPCPY_CHK:
{
+   /* The returned pointer is in a range constrained by the smaller
+  of the upper bound of the size argument and the source object
+  size.  */
ISTM that for the MEMPCPY case the range is constrained by the size 
argument only from an implementation standpoint, but the size of the 
source or dest object can also constrain since if we overflow either 
we've gone into the realm of undefined behavior.  It's a nit for the 
comment, I don't think we need to adjust the implementation further.


OK for the trunk.
jeff



Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-20 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Thursday, July 15, 2021 8:35 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics
>> optabs
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > There's a slight mismatch between the vectorizer optabs and the
>> > intrinsics patterns for NEON.  The vectorizer expects operands[3] and
>> > operands[0] to be the same but the aarch64 intrinsics expanders expect
>> > operands[0] and operands[1] to be the same.
>> >
>> > This means we need different patterns here.  This adds a separate
>> > usdot vectorizer pattern which just shuffles around the RTL params.
>> >
>> > There's also an inconsistency between the usdot and (u|s)dot
>> > intrinsics RTL patterns which is not corrected here.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> 
>> Couldn't we just change:
>> 
>> > diff --git a/gcc/config/aarch64/arm_neon.h
>> > b/gcc/config/aarch64/arm_neon.h index
>> >
>> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac
>> e4f
>> > c7f43e2040a8 100644
>> > --- a/gcc/config/aarch64/arm_neon.h
>> > +++ b/gcc/config/aarch64/arm_neon.h
>> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
>> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> >  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)  {
>> > -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
>> > +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
>> 
>> …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
>
> Not easily, as I was mentioning before, Neon intrinsics have the assumption 
> that
> operands[0] and operands[1] are the same. And this goes much further than just
> the header call.
>
> The actual type is determined by the optabs and the C stubs that are 
> generated.
>
> aarch64_init_simd_builtins which creates the C function stubs starts 
> processing
> arguments from the end and on non-void functions assumes that the value at
> operands[0] be the return type. So simply moving __r will get it to think that
> the result type should be uint8x8_t.

Yeah, the mode of operand 0 (i.e. the output) determines the return type.
But that mode isn't changing, so the return type will be correct for both
input operand orders.  It works for me locally with:

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 88fa5ba5a44..5987d9af7c6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -610,12 +610,12 @@ (define_expand "cmul3"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_insn "dot_prod"
   [(set (match_operand:VS 0 "register_operand" "=w")
-   (plus:VS (match_operand:VS 1 "register_operand" "0")
-   (unspec:VS [(match_operand: 2 "register_operand" "w")
-   (match_operand: 3 "register_operand" "w")]
-   DOTPROD)))]
+   (plus:VS (unspec:VS [(match_operand: 1 "register_operand" "w")
+(match_operand: 2 "register_operand" "w")]
+   DOTPROD)
+(match_operand:VS 3 "register_operand" "0")))]
   "TARGET_DOTPROD"
-  "dot\\t%0., %2., %3."
+  "dot\\t%0., %1., %2."
   [(set_attr "type" "neon_dot")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 597f44ce106..64b6d43a1a0 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -31767,28 +31767,28 @@ __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)
 {
-  return __builtin_aarch64_udot_prodv8qi_ (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv8qi_ (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
 {
-  return __builtin_aarch64_udot_prodv16qi_ (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv16qi_ (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_sdot_prodv8qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_sdot_prodv16qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x2_t

Thanks,
Richard


[PATCH v2] c++tools, configury: Configure with C++; test checking status [PR98821].

2021-07-20 Thread Iain Sandoe
Hi Jakub,

thanks for the quick review,
testing is on-going.

> On 20 Jul 2021, at 16:37, Jakub Jelinek  wrote:
> 
> On Tue, Jul 20, 2021 at 04:21:34PM +0100, Iain Sandoe wrote:
> 
>> --- a/c++tools/configure.ac
>> +++ b/c++tools/configure.ac
>> @@ -41,6 +41,8 @@ MISSING=`cd $ac_aux_dir && ${PWDCMD-pwd}`/missing
>> AC_CHECK_PROGS([AUTOCONF], [autoconf], [$MISSING autoconf])
>> AC_CHECK_PROGS([AUTOHEADER], [autoheader], [$MISSING autoheader])
>> 
>> +AC_LANG(C++)
>> +
>> dnl Enabled by default
>> AC_MSG_CHECKING([whether to build C++ tools])
>>   AC_ARG_ENABLE(c++-tools, 
>> @@ -67,6 +69,62 @@ AC_MSG_RESULT([$maintainer_mode])
>> test "$maintainer_mode" = yes && MAINTAINER=yes
>> AC_SUBST(MAINTAINER)
>> 
>> +# Enable expensive internal checks
>> +is_release=
>> +if test -f $srcdir/../gcc/DEV-PHASE \
>> +   && test x"`cat $srcdir/../gcc/DEV-PHASE`" != xexperimental; then
>> +  is_release=yes
>> +fi
>> +
>> +AC_ARG_ENABLE(checking,
>> +[AS_HELP_STRING([[--enable-checking[=LIST]]],
>> +[enable expensive run-time checks.  With LIST,
>> + enable only specific categories of checks.
>> + Categories are: yes,no,all,none,release.
>> + Flags are: misc,valgrind or other strings])],
>> +[ac_checking_flags="${enableval}"],[
>> +# Determine the default checks.
>> +if test x$is_release = x ; then
>> +  ac_checking_flags=yes
>> +else
>> +  ac_checking_flags=release
>> +fi])
>> +IFS="${IFS= }"; ac_save_IFS="$IFS"; IFS="$IFS,"
>> +for check in release $ac_checking_flags
>> +do
>> +case $check in
>> +# these set all the flags to specific states
>> +yes|all) ac_checking=1 ; ac_assert_checking=1 ; ac_valgrind_checking= ;;
>> +no|none) ac_checking= ; ac_assert_checking= ; ac_valgrind_checking= ;;
>> +release) ac_checking= ; ac_assert_checking=1 ; ac_valgrind_checking= ;;
>> +# these enable particular checks
>> +assert) ac_assert_checking=1 ;;
>> +misc) ac_checking=1 ;;
>> +valgrind) ac_valgrind_checking=1 ;;
>> +# accept
>> +*) ;;
>> +esac
>> +done
>> +IFS="$ac_save_IFS"
>> +
>> +if test x$ac_checking != x ; then
>> +  AC_DEFINE(CHECKING_P, 1,
>> +[Define to 1 if you want more run-time sanity checks.])
>> +else
>> +  AC_DEFINE(CHECKING_P, 0)
>> +fi
>> +
>> +if test x$ac_assert_checking != x ; then
>> +  AC_DEFINE(ENABLE_ASSERT_CHECKING, 1,
>> +[Define if you want assertions enabled.  This is a cheap check.])
>> +fi
>> +
>> +if test x$ac_valgrind_checking != x ; then
>> +  AC_DEFINE(ENABLE_VALGRIND_CHECKING, 1,
>> +[Define if you want to workaround valgrind (a memory checker) warnings about
>> + possible memory leaks because of libcpp use of interior pointers.])
>> +fi
> 
> I guess we could simplify it, I think at least right now we only care
> about ENABLE_ASSERT_CHECKING and nothing else, so the is_release computation
> could go, make ac_checking_flags just default to yes, drop the ac_checking,
> ac_valgrind_checking variables and simplify the case so that it perhaps
> handles all we care together.  That would be
>   case $check in
>   yes|all|release|assert) ac_assert_checking=1 ; ;;
>   no|none) ac_assert_checking= ; ;;
>   *) ;;
>   esac
> or so. and then only AC_DEFINE ENABLE_ASSERT_CHECKING and from server.cc
> drop the CHECKING_P stuff.

works for me.

>> @@ -92,6 +96,35 @@ along with GCC; see the file COPYING3.  If not see
>> #define DIR_SEPARATOR '/'
>> #endif
>> 
>> +/* Imported from libcpp/system.h
>> +   Use gcc_assert(EXPR) to test invariants.  */
>> +#if ENABLE_ASSERT_CHECKING || CHECKING_P
>> +#define gcc_assert(EXPR)\
>> +   ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__), 0 : 0))
>> +#elif (GCC_VERSION >= 4005)
>> +#define gcc_assert(EXPR)\
>> +  ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0 : 0))
>> +#else
>> +/* Include EXPR, so that unused variable warnings do not occur.  */
>> +#define gcc_assert(EXPR) ((void)(0 && (EXPR)))
>> +#endif
>> +
>> +#if CHECKING_P
>> +#define gcc_checking_assert(EXPR) gcc_assert (EXPR)
>> +#else
>> +/* N.B.: in release build EXPR is not evaluated.  */
>> +#define gcc_checking_assert(EXPR) ((void)(0 && (EXPR)))
>> +#endif
> 
> I'd drop the gcc_checking_assert macro, we don't use it…

done.

revised below,
OK now?, ( assuming testing is successful )

thanks
Iain



The c++tools configure fragments need to be built with a C++ compiler.

In addition, the stand-alone server uses diagnostic mechanisms in common
with GCC, but needs to define implementations for gcc_assert and
supporting output functions.

Signed-off-by: Iain Sandoe 

PR c++/98821 - modules : c++tools configures with CC but code fragments assume 
CXX.

PR c++/98821

c++tools/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Configure using C++.  Pull logic to
detect enabled checking modes; defau

Re: [PATCH v2] c++tools, configury: Configure with C++; test checking status [PR98821].

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 20, 2021 at 05:20:27PM +0100, Iain Sandoe wrote:
> Signed-off-by: Iain Sandoe 
> 
> PR c++/98821 - modules : c++tools configures with CC but code fragments 
> assume CXX.
> 
>   PR c++/98821
> 
> c++tools/ChangeLog:
> 
>   * config.h.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Configure using C++.  Pull logic to
>   detect enabled checking modes; default to release
>   checking.
>   * server.cc (AI_NUMERICSERV): Define a fallback value.
>   (gcc_assert): New.
>   (gcc_unreachable): New.
>   (fancy_abort): Only build when checking is enabled.
> 
> Co-authored-by: Jakub Jelinek 

LGTM, thanks.

Jakub



Re: [PATCH] Fix for powerpc64 long double complex divide failure

2021-07-20 Thread Patrick McGehearty via Gcc-patches

Ping...

The fix is minimal (four lines changed).
I recognize that those familiar with IBM 128-bit floating
point precision is a select set of people.
On the plus side, tests fail without the patch and pass with the patch.

- patrick


On 7/8/2021 4:24 PM, Patrick McGehearty via Gcc-patches wrote:

This patch resolves the failure of powerpc64 long double complex divide
in native ibm long double format after the patch "Practical improvement
to libgcc complex divide".
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101104

The new code uses the following macros which are intended to be mapped
to appropriate values according to the underlying hardware representation.

RBIG a value near the maximum representation
RMIN a value near the minimum representation
  (but not in the subnormal range)
RMIN2a value moderately less than 1
RMINSCAL the inverse of RMIN2
RMAX2RBIG * RMIN2  - a value to limit scaling to not overflow

When "long double" values were not using the IEEE 128-bit format but
the traditional IBM 128-bit, the previous code used the LDBL values
which caused overflow for RMINSCAL. The new code uses the DBL values.

RBIG  LDBL_MAX = 0x1.f800p+1022
   DBL_MAX  = 0x1.f000p+1022

RMIN  LDBL_MIN = 0x1.p-969
RMIN  DBL_MIN  = 0x1.p-1022

RMIN2 LDBL_EPSILON = 0x0.1000p-1022 = 0x1.0p-1074
RMIN2 DBL_EPSILON  = 0x1.p-52

RMINSCAL 1/LDBL_EPSILON = inf (1.0p+1074 does not fit in IBM 128-bit).
  1/DBL_EPSILON  = 0x1.p+52

RMAX2 = RBIG * RMIN2 = 0x1.f800p-52
 RBIG * RMIN2 = 0x1.f000p+970

The MAX and MIN values have only modest changes since the exponent
field for IBM 128-bit floating point values is the same size as
the exponent field for IBM 64-bit floating point values. However
the EPSILON field is considerably different. Due to how small
values can be represented in the lower 64 bits of the IBM 128-bit
floating point, EPSILON is extremely small, so far beyond the
desired value that inversion of the value overflows and even
without the overflow, the RMAX2 is so small as to eliminate
most usage of the test.

In addition, the gcc support for the KF fields (IBM native long double
format) does not exist on older gcc compilers such as the default
compilers on the gcc compiler farm. That adds build complexity
for users who's environment is only a few years out of date.
Instead of just replacing the use of KF_EPSILON with DF_ESPILON,
we replace all uses of KF_* with DF_*.

The change has been tested on gcc135.fsffrance.org and gains the
expected improvements in accuracy for long double complex divide.

libgcc/
* config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2):
Fix long double complex divide for native IBM 128-bit
---
  libgcc/config/rs6000/_divkc3.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
index a1d29d2..2b229c8 100644
--- a/libgcc/config/rs6000/_divkc3.c
+++ b/libgcc/config/rs6000/_divkc3.c
@@ -38,10 +38,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  #endif
  
  #ifndef __LONG_DOUBLE_IEEE128__

-#define RBIG   (__LIBGCC_KF_MAX__ / 2)
-#define RMIN   (__LIBGCC_KF_MIN__)
-#define RMIN2  (__LIBGCC_KF_EPSILON__)
-#define RMINSCAL (1 / __LIBGCC_KF_EPSILON__)
+#define RBIG   (__LIBGCC_DF_MAX__ / 2)
+#define RMIN   (__LIBGCC_DF_MIN__)
+#define RMIN2  (__LIBGCC_DF_EPSILON__)
+#define RMINSCAL (1 / __LIBGCC_DF_EPSILON__)
  #define RMAX2  (RBIG * RMIN2)
  #else
  #define RBIG   (__LIBGCC_TF_MAX__ / 2)




Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-20 Thread Jason Merrill via Gcc-patches

On 7/19/21 5:41 AM, Richard Earnshaw wrote:



On 17/07/2021 22:37, Jason Merrill via Gcc-patches wrote:

On Sat, Jul 17, 2021 at 6:55 AM Matthias Kretz  wrote:


On Saturday, 17 July 2021 15:32:42 CEST Jonathan Wakely wrote:

On Sat, 17 Jul 2021, 09:15 Matthias Kretz,  wrote:

If somebody writes a library with `keep_apart` in the public API/ABI

then

you're right.


Yes, it's fine if those constants don't affect anything across module
boundaries.


I believe a significant fraction of hardware interference size usage 
will

be
internal.



I would hope for this to be the vast majority of usage.  I want the 
warning

to discourage people from using the interference size variables in the
public API of a library.


The developer who wants his code to be included in a distro should 
care

about
binary distribution. If his code has an ABI issue, that's a bug he

needs

to
fix. It's not the fault of the packager.


Yes but in practice it's the packagers who have to deal with the bug
reports, analyze the problem, and often fix the bug too. It might 
not be

the packager's fault but it's often their problem


I can imagine. But I don't think requiring users to specify the value
according to what -mtune suggests will improve things. Users will 
write a

configure/cmake/... macro to parse the value -mtune prints and pass that
on
the command line (we'll soon find this solution on SO 😜). I.e. 
things are

likely to be even more broken.



Simpler would be a flag to say "set them based on -mtune", e.g.
-finterference-tuning or --param destructive-intereference-size=tuning.
That would be just as easy to write as -Wno-interference-size.


Please be very careful about an option name like that.  The x86 meaning 
and interpretation of -mtune is subtly different to that of Arm and 
AArch64 and possibly other targets as well.


Also, should the behaviour of a compiler configured with --with-cpu=foo 
be handled differently to a command-line option that sets foo 
explicitly?  In the back-end I'm not sure we can really tell the 
difference.


I don't see any reason to treat them differently.  The meaning of this 
option would be "set the interference sizes to be optimal for the 
current target CPU, without regard for ABI stability".  For x86 this 
wouldn't have any effect; for Arm/AArch64 it would set them to the 
tuning L1 cache line size, if set.


Here's what I have currently:

Jason
>From b10bfd228f23ef2f7499802c8fd1c84798646039 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Thu, 15 Jul 2021 15:30:17 -0400
Subject: [PATCH] c++: implement C++17 hardware interference size
To: gcc-patches@gcc.gnu.org

The last missing piece of the C++17 standard library is the hardware
intereference size constants.  Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.

In principle, both of these values should be the same as the target's L1
cache line size.  When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.

JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74

I implement this by adding new --params for the two sizes.  Targets need to
override these values in targetm.target_option.override() to support the
feature.

64 bytes still seems correct for the x86 family.

I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
target, so I'd think 32/64 would make more sense.

He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
line, I've changed that to 64/256.  Does that seem right?

Currently the patch does not adjust the values based on -march, as in JF's
proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
how to go about that.  --param l1-cache-line-size is set based on -mtune,
but I don't think we want -mtune to change these ABI-affecting values.  Are
there -march values for which a smaller range than 64-256 makes sense?

gcc/ChangeLog:

	* params.opt: Add destructive-interference-size and
	constructive-interference-size.
	* doc/invoke.texi: Document them.
	* config/aarch64/aarch64.c (aarch64_override_options_internal):
	Set them.
	* config/arm/arm.c (arm_option_override): Set them.
	* config/i386/i386-options.c (ix86_option_override_internal):
	Set them.

gcc/c-family/ChangeLog:

	* c.opt: Add -Winterference-size.
	* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
	and __GCC_CONSTRUCTIVE_SIZE.

gcc/cp/ChangeLog:

	* decl.c (cxx_init_decl_processing): Check
	--param *-interference-

Re: [PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384]

2021-07-20 Thread Segher Boessenkool
On Tue, Jul 20, 2021 at 05:24:57PM +0200, Jakub Jelinek wrote:
> On Tue, Jul 20, 2021 at 10:17:08AM -0500, Segher Boessenkool wrote:
> > > I think not all of the -Wpsabi diagnostics is emitted with warning{,_at}
> > > etc. that -w disables, others are emitted with inform.
> > 
> > /* An informative note at LOCATION.  Use this for additional details on an 
> > error
> >message.  */
> > void
> > inform (location_t location, const char *gmsgid, ...)
> > 
> > So inform is misused in -Wpsabi?
> 
> I bet it is done intentionally not to trigger -Werror, e.g. i386.c uses it
> for the notes that some old GCC version had different ABI for certain
> passing than the current one.

The huge workaround that is -Werror needs more workarounds, how great :-(

> > If using it like this is deemed correct, then inhibit_warnings should
> > turn it off just like it turns off all *stronger* warnings.  The current
> > situation doesn't make much sense.
> > 
> > > The -Wno-psabi
> > > also makes it clear that it is the psabi stuff that is what the testcase
> > > cares about.  Whether -w is also needed or not is something I don't know,
> > > in the past it certainly was needed on various architectures, but maybe it
> > > got fixed and only -Wno-psabi would do the trick?
> > > If so, perhaps we could replace all -Wno-psabi -w occurrences in testsuite
> > > dg-options with just -Wno-psabi and see how far we get.
> > > find testsuite/ -type f | xargs grep -- '-w -Wno-psabi' 2>/dev/null | 
> > > grep -v ChangeLog | wc -l
> > > 49
> > > find testsuite/ -type f | xargs grep -- '-Wno-psabi -w' 2>/dev/null | 
> > > grep -v ChangeLog | wc -l
> > > 24
> > 
> > Note we will disable the -Wpsabi vector warnings for rs6000 from GCC 12
> > on.  It should have been done earlier, but we need a time machine to
> > install a time machine in the past, etc. :-)
> 
> I could understand dropping -Wpsabi warnings of the kind that some very old
> GCC version had different ABI if sufficient number of releases passed since
> then,

That is exactly what we will do, yes.

> but at least x86 also has -Wpsabi warnings that returning a certain
> vector or taking certain vector as parameter has different ABI without
> some particular ISA option.  And those options are valid all the time and
> something people should be aware, e.g. returning 16-byte vector without
> -msse, or 32-byte vector without -mavx, or 64-byte vector without -mavx512f
> - without those ISA switches they are passed/returned as generic vectors,
> while with that option in vector registers.

Yup, understood.  I'm just removing the warnings from rs6000 that are
past their usefulness (and are super annoying in practice).  I brought
it up here because that will remove the need for -Wpsabi in the
testsuite in many cases.  Maybe what I call "many" is heavily influenced
by what I run most though :-)


Segher


Re: [PATCH] unroll: Run VN on unrolled-and-jammed loops

2021-07-20 Thread Richard Biener via Gcc-patches
On July 20, 2021 5:56:35 PM GMT+02:00, Richard Sandiford via Gcc-patches 
 wrote:
>Unroll and jam can sometimes leave redundancies.  E.g. for:
>
>  for (int j = 0; j < 100; ++j)
>for (int i = 0; i < 100; ++i)
>  x[i] += y[i] * z[j][i];
>
>the new loop will do the equivalent of:
>
>  for (int j = 0; j < 100; j += 2)
>for (int i = 0; i < 100; ++i)
>  {
>x[i] += y[i] * z[j][i];
>x[i] += y[i] * z[j + 1][i];
>  }
>
>with two reads of y[i] and with a round trip through memory for x[i].
>
>At the moment these redundancies survive till vectorisation, so if
>vectorisation succeeds, we're reliant on being able to remove the
>redundancies from the vector form.  This can be hard to do if
>a vector loop uses predication.  E.g. on SVE we end up with:
>
>.L3:
>ld1wz3.s, p0/z, [x3, x0, lsl 2]
>ld1wz0.s, p0/z, [x5, x0, lsl 2]
>ld1wz1.s, p0/z, [x2, x0, lsl 2]
>mad z1.s, p1/m, z0.s, z3.s
>ld1wz2.s, p0/z, [x4, x0, lsl 2]
>st1wz1.s, p0, [x3, x0, lsl 2]// store to x[i]
>ld1wz1.s, p0/z, [x3, x0, lsl 2]  // load back from x[i]
>mad z0.s, p1/m, z2.s, z1.s
>st1wz0.s, p0, [x3, x0, lsl 2]
>add x0, x0, x6
>whilelo p0.s, w0, w1
>b.any   .L3
>
>This patch runs a value-numbering pass on loops after a successful
>unroll-and-jam, which gets rid of the unnecessary load and gives
>a more accurate idea of vector costs.  Unfortunately the redundant
>store still persists without a pre-vect DSE, but that feels like
>a separate issue.
>
>Note that the pass requires the loop to have a single exit,
>hence the simple calculation of exit_bbs.
>
>Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too.
>OK to install if testing passes?

Ok. 

Thanks, 
Richard. 

>Richard
>
>
>gcc/
>   * gimple-loop-jam.c: Include tree-ssa-sccvn.h.
>   (tree_loop_unroll_and_jam): Run value-numbering on a loop that
>   has been successfully unrolled.
>
>gcc/testsuite/
>   * gcc.dg/unroll-10.c: New test.
>---
> gcc/gimple-loop-jam.c| 14 +-
> gcc/testsuite/gcc.dg/unroll-10.c | 13 +
> 2 files changed, 22 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/unroll-10.c
>
>diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
>index 4842f0dff80..544ad779dd6 100644
>--- a/gcc/gimple-loop-jam.c
>+++ b/gcc/gimple-loop-jam.c
>@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-data-ref.h"
> #include "tree-ssa-loop-ivopts.h"
> #include "tree-vectorizer.h"
>+#include "tree-ssa-sccvn.h"
> 
> /* Unroll and Jam transformation
>
>@@ -487,7 +488,7 @@ static unsigned int
> tree_loop_unroll_and_jam (void)
> {
>   class loop *loop;
>-  bool changed = false;
>+  unsigned int todo = 0;
> 
>   gcc_assert (scev_initialized_p ());
> 
>@@ -591,7 +592,11 @@ tree_loop_unroll_and_jam (void)
>   &desc);
> free_original_copy_tables ();
> fuse_loops (outer->inner);
>-changed = true;
>+todo |= TODO_cleanup_cfg;
>+
>+auto_bitmap exit_bbs;
>+bitmap_set_bit (exit_bbs, single_dom_exit (outer)->dest->index);
>+todo |= do_rpo_vn (cfun, loop_preheader_edge (outer), exit_bbs);
>   }
> 
>   loop_nest.release ();
>@@ -599,13 +604,12 @@ tree_loop_unroll_and_jam (void)
>   free_data_refs (datarefs);
> }
> 
>-  if (changed)
>+  if (todo)
> {
>   scev_reset ();
>   free_dominance_info (CDI_DOMINATORS);
>-  return TODO_cleanup_cfg;
> }
>-  return 0;
>+  return todo;
> }
> 
> /* Pass boilerplate */
>diff --git a/gcc/testsuite/gcc.dg/unroll-10.c
>b/gcc/testsuite/gcc.dg/unroll-10.c
>new file mode 100644
>index 000..0559915f2fc
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/unroll-10.c
>@@ -0,0 +1,13 @@
>+/* { dg-options "-O3 -fdump-tree-unrolljam" } */
>+
>+void
>+f (int *restrict x, int *restrict y, int z[restrict 100][100])
>+{
>+  for (int j = 0; j < 100; ++j)
>+for (int i = 0; i < 100; ++i)
>+  x[i] += y[i] * z[j][i];
>+}
>+
>+/* The loop should be unrolled 2 times, leaving one load from x,
>+   one load from y and 2 loads from z.  */
>+/* { dg-final { scan-tree-dump-times { = \(*\*} 4 "unrolljam" } } */



Re: [PATCH] unroll: Avoid unnecessary tail loops for constant niters

2021-07-20 Thread Richard Biener via Gcc-patches
On July 20, 2021 5:31:17 PM GMT+02:00, Richard Sandiford via Gcc-patches 
 wrote:
>unroll and jam can decide to unroll the outer loop of a nest like:
>
>  for (int j = 0; j < n; ++j)
>for (int i = 0; i < n; ++i)
>  x[i] += __builtin_expf (y[j][i]);
>
>It then uses a tail loop to handle any left-over iterations.
>
>However, the code is structured so that this tail loop is always used.
>If n is a multiple of the unroll factor UF, the final UF iterations
>will
>use the tail loop rather than the unrolled loop.
>
>“Fixing” that for variable loop counts would mean introducing another
>runtime test: a branch around the tail loop if there are no more
>iterations.  There's at least an argument that the overhead of doing
>that test might not pay for itself.
>
>But we use this structure even if the iteration count is provably
>a multiple of UF at compile time.  E.g. with s/n/100/ and an
>unroll factor of 2, the first 98 iterations use the unrolled loop
>and the final 2 iterations use the original loop.
>
>This patch makes the unroller avoid a tail loop in that case.
>The end result seemed easier to follow if variables were declared
>at the point of initialisation, so that it's more obvious which
>ones are meaningful even when there's no tail loop.
>
>Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too.
>OK to install if testing passes?

Ok. 

Richard. 

>Richard
>
>
>gcc/
>   * tree-ssa-loop-manip.c (determine_exit_conditions): Return a null
>   exit condition if no tail loop is needed, and if the original exit
>   condition should therefore be kept as-is.
>   (tree_transform_and_unroll_loop): Handle that case here too.
>
>gcc/testsuite/
>   * gcc.dg/unroll-9.c: New test/
>---
> gcc/testsuite/gcc.dg/unroll-9.c |  12 ++
> gcc/tree-ssa-loop-manip.c   | 306 +---
> 2 files changed, 176 insertions(+), 142 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/unroll-9.c
>
>diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
>index 28ae1316fa0..41f9872ca10 100644
>--- a/gcc/tree-ssa-loop-manip.c
>+++ b/gcc/tree-ssa-loop-manip.c
>@@ -997,8 +997,10 @@ can_unroll_loop_p (class loop *loop, unsigned
>factor,
>/* Determines the conditions that control execution of LOOP unrolled
>FACTOR
>times.  DESC is number of iterations of LOOP.  ENTER_COND is set to
>condition that must be true if the main loop can be entered.
>+   If the loop does not always iterate an exact multiple of FACTOR
>times,
>EXIT_BASE, EXIT_STEP, EXIT_CMP and EXIT_BOUND are set to values
>describing
>-   how the exit from the unrolled loop should be controlled.  */
>+   how the exit from the unrolled loop should be controlled. 
>Otherwise,
>+   the trees are set to null and EXIT_CMP is set to ERROR_MARK.  */
> 
> static void
>determine_exit_conditions (class loop *loop, class tree_niter_desc
>*desc,
>@@ -1079,6 +1081,16 @@ determine_exit_conditions (class loop *loop,
>class tree_niter_desc *desc,
>   assum = fold_build2 (cmp, boolean_type_node, base, bound);
>   cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, assum, cond);
> 
>+  if (integer_nonzerop (cond)
>+  && integer_zerop (desc->may_be_zero))
>+{
>+  /* Convert the latch count to an iteration count.  */
>+  tree niter = fold_build2 (PLUS_EXPR, type, desc->niter,
>+  build_one_cst (type));
>+  if (multiple_of_p (type, niter, bigstep))
>+  return;
>+}
>+
>cond = force_gimple_operand (unshare_expr (cond), &stmts, false,
>NULL_TREE);
>   if (stmts)
>  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
>@@ -1234,137 +1246,138 @@ tree_transform_and_unroll_loop (class loop
>*loop, unsigned factor,
>   transform_callback transform,
>   void *data)
> {
>-  gcond *exit_if;
>-  tree ctr_before, ctr_after;
>-  tree enter_main_cond, exit_base, exit_step, exit_bound;
>-  enum tree_code exit_cmp;
>-  gphi *phi_old_loop, *phi_new_loop, *phi_rest;
>-  gphi_iterator psi_old_loop, psi_new_loop;
>-  tree init, next, new_init;
>-  class loop *new_loop;
>-  basic_block rest, exit_bb;
>-  edge old_entry, new_entry, old_latch, precond_edge, new_exit;
>-  edge new_nonexit, e;
>-  gimple_stmt_iterator bsi;
>-  use_operand_p op;
>-  bool ok;
>-  unsigned i;
>-  profile_probability prob, prob_entry, scale_unrolled;
>-  profile_count freq_e, freq_h;
>   gcov_type new_est_niter = niter_for_unrolled_loop (loop, factor);
>unsigned irr = loop_preheader_edge (loop)->flags &
>EDGE_IRREDUCIBLE_LOOP;
>-  auto_vec to_remove;
> 
>+  enum tree_code exit_cmp;
>+  tree enter_main_cond, exit_base, exit_step, exit_bound;
>   determine_exit_conditions (loop, desc, factor,
>&enter_main_cond, &exit_base, &exit_step,
>&exit_cmp, &exit_bound);
>+  bool single_loop_p = !exit_base;
> 
>/* Let us assume that the unrolled loop is quite likely to be entered. 
>*/
>+  pro

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-20 Thread Thomas Rodgers

On 2021-07-17 06:32, Jonathan Wakely via Gcc-patches wrote:


On Sat, 17 Jul 2021, 09:15 Matthias Kretz,  wrote:

On Friday, 16 July 2021 21:58:36 CEST Jonathan Wakely wrote: On Fri, 16 
Jul 2021 at 20:26, Matthias Kretz  wrote: On Friday, 16 
July 2021 18:54:30 CEST Jonathan Wakely wrote: On Fri, 16 Jul 2021 at 
16:33, Jason Merrill wrote: Adjusting them based on tuning would 
certainly simplify a

 significant


use
case, perhaps the only reasonable use.  Cases more concerned with

 ABI


stability probably shouldn't use them at all. And that would mean

 not


needing to worry about the impossible task of finding the right

 values


for
an entire architecture.
But it would be quite a significant change in behaviour if -mtune
started affecting ABI, wouldn't it?


For existing code -mtune still doesn't affect ABI.
True, because existing code isn't using the constants.


The users who write

struct keep_apart {

alignas(std::hardware_destructive_interference_size) std::atomic
cat;
alignas(std::hardware_destructive_interference_size) std::atomic
dog;

};

*want* to have different sizeof(keep_apart) depending on the CPU the

 code


is compiled for. I.e. they *ask* for getting their ABI broken.


Right, but the person who wants that and the person who chooses the
-mtune option might be different people.


Yes. But it was the intent of the person who wrote the code that the
person
compiling the code can change the data layout of keep_apart via -mtune. 
Of

course, if the one compiling doesn't want to choose because the binary
needs
to work on the widest range of systems, then there's a problem we might
want
to solve (direction of target_clones?). (Or the developer of the library
solves it by providing the ABI for all possible interference_size 
values.)



A distro might add -mtune=core2 to all package builds by default, not
expecting it to cause ABI changes. Some header in a package in the
distro might start using the constants. Now everybody who includes
that header needs to use the same -mtune option as the distro default.


If somebody writes a library with `keep_apart` in the public API/ABI 
then

you're right.

Yes, it's fine if those constants don't affect anything across module
boundaries.


That change in the behaviour and expected use of an existing option
seems scary to me. Even with a warning about using the constants
(because somebody's just going to use #pragma around their use of the
constants to disable the warning, and now the ABI impact of -mtune is
much less obvious).


There are people who say that linking TUs compiled with different 
compiler
flags is UB. In general I think that's correct, but we can make 
explicit

exceptions. Up to now -mtune wouldn't lead to UB, AFAIK, though -march
easily
does. So maybe, to keep the status quo, the constants should be tied to
-march
not -mtune?


It's much less scary in a world where the code is written and used by
the same group of people, but for something like a linux distro it
worries me.


The developer who wants his code to be included in a distro should care
about
binary distribution. If his code has an ABI issue, that's a bug he 
needs

to
fix. It's not the fault of the packager.


Yes but in practice it's the packagers who have to deal with the bug
reports, analyze the problem, and often fix the bug too. It might not be
the packager's fault but it's often their problem :-(

Apropos of nothing, I can absolutely see the use of this creeping into 
Boost at some point.


Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-20 Thread Martin Sebor via Gcc-patches

On 7/14/21 10:23 AM, Jason Merrill wrote:

On 7/14/21 10:46 AM, Martin Sebor wrote:

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of 
the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the 
same value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in C++17 
it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on 
vec being a thin wrapper around a pointer.  In almost all cases it 
can be replaced with {}; one exception is == comparison, where it 
seems to be testing that the embedded pointer is null, which is a 
weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but I'll 
hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html


As Richard wrote:


The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.


Please go ahead and apply those changes and send a new patch with the 
remainder of the changes.


I have just pushed r12-2418:
https://gcc.gnu.org/pipermail/gcc-cvs/2021-July/350886.html



A few other comments:


-   omp_declare_simd_clauses);
+   *omp_declare_simd_clauses);


Instead of doing this indirection in all of the callers, let's change 
c_finish_omp_declare_simd to take a pointer as well, and do the 
indirection in initializing a reference variable at the top of the 
function.


Okay.




+    sched_init_luids (bbs.to_vec ());
+    haifa_init_h_i_d (bbs.to_vec ());


Why are these to_vec changes needed when you are also changing the 
functions to take const&?


Calling to_vec() here isn't necessary so I've removed it.




-  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
+  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo).to_vec ();


Why not use a reference here and in other similar spots?


Sure, that works too.

Attached is what's left of the original changes now that r12-2418
has been applied.

Martin
Disable implicit conversion from auto_vec to vec.

gcc/c/ChangeLog:

	* c-parser.c (c_finish_omp_declare_simd): Adjust to vec change.
	(c_parser_omp_declare_simd): Same.
	* c-tree.h (c_build_function_call_vec): Same.
	* c-typeck.c (c_build_function_call_vec): Same.

gcc/ChangeLog:

	* dominance.c (prune_bbs_to_update_dominators): Adjust to vec change.
	(iterate_fix_dominators): Same.
	* dominance.h (iterate_fix_dominators): Same.
	* ipa-prop.h:
	* tree-ssa-pre.c (insert_into_preds_of_block): Same.
	* tree-vect-data-refs.c (vect_check_nonzero_value): Same.
	(vect_enhance_data_refs_alignment): Same.
	(vect_check_lower_bound): Same.
	(vect_prune_runtime_alias_test_list): Same.
	(vect_permute_store_chain): Same.
	* tree-vect-slp-patterns.c (vect_normalize_conj_loc): Same.
	* tree-vect-stmts.c (vect_create_vectorized_demotion_stmts): Same.
	* tree-vectorizer.h (vect_permute_store_chain): Same.
	* vec.c (test_init): New.
	(vec_c_tests): Call test_init.
	* vec.h (struct vnull): Simplify.
	(auto_vec::to_vec): New member function.
	(vl_ptr>::copy): Use value initialization.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 9a56e0c04c6..fa3c

[PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-20 Thread H.J. Lu via Gcc-patches
1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that
uses subreg_lowpart_offset (mode, prev_mode) as the offset.
6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Change the mode argument
from scalar_int_mode to fixed_size_mode.
(builtin_strncpy_read_str): Likewise.
(gen_memset_value_from_prev): New function.
(gen_memset_broadcast): Likewise.
(builtin_memset_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
and gen_memset_broadcast.
(builtin_memset_gen_str): Likewise.
(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
constfun.
* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
with fixed_size_mode.
(builtin_memset_read_str): Likewise.
* expr.c (widest_int_mode_for_size): Renamed to ...
(widest_fixed_size_mode_for_size): Add a bool argument to
indicate if QI vector mode can be used.
(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(pieces_addr::adjust): Change the mode argument from
scalar_int_mode to fixed_size_mode.
(op_by_pieces_d): Make m_len read-only.  Add a bool member,
m_qi_vector_mode, to indicate that QI vector mode can be used.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(op_by_pieces_d::get_usable_mode): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Call
widest_fixed_size_mode_for_size instead of
widest_int_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
function to return the smallest integer or QI vector mode.
(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Call
smallest_fixed_size_mode_for_size instead of
smallest_int_mode_for_size.
(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
indicate that QI vector mode can be used and pass it to
op_by_pieces_d::op_by_pieces_d.
(can_store_by_pieces): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(store_by_pieces): Pass memsetp to
store_by_pieces_d::store_by_pieces_d.
(clear_by_pieces_1): Removed.
(clear_by_pieces): Replace clear_by_pieces_1 with
builtin_memset_read_str and pass true to store_by_pieces_d to
support vector mode broadcast.
(string_cst_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.
* expr.h (by_pieces_constfn): Change scalar_int_mode to
fixed_size_mode.
(by_pieces_prev): Likewise.
* rtl.h (lowpart_subreg_regno): New.
* rtlanal.c (lowpart_subreg_regno): New.  A wrapper around
simplify_subreg_regno.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.

gcc/testsuite/

* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
vmovdqu.
* gcc.target/i386/pr100865-4b.c: Likewise.
---
 gcc/builtins.c  | 150 +
 gcc/builtins.h  |   4 +-
 gcc/doc/tm.texi |   7 +
 gcc/doc/tm.texi.in  |   2 +
 gcc/expr.c  | 170 ++--
 gcc/expr.h  |   4 +-
 gcc/rtl.h   |   2 +
 gcc/rtlanal.c   |  11 ++
 gcc/target.def  |   9 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c  |   2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4b.c |   2 +-
 11 files changed, 274 insertions(+), 89 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39ab139b7e1..1972301ce3c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
machine_mode target_mode)
 
 static rtx
 builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
- 

Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-20 Thread H.J. Lu via Gcc-patches
On Tue, Jul 20, 2021 at 8:12 AM Richard Sandiford
 wrote:
>
> Richard Sandiford via Gcc-patches  writes:
> > "H.J. Lu via Gcc-patches"  writes:
> >> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
> >>  wrote:
> >>>
> >>> "H.J. Lu via Gcc-patches"  writes:
> >>> >> > + {
> >>> >> > +   /* First generate subreg of word mode if the previous mode is
> >>> >> > +  wider than word mode and word mode is wider than MODE.  */
> >>> >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> >>> >> > +   prev_mode, 0);
> >>> >> > +   prev_mode = word_mode;
> >>> >> > + }
> >>> >> > +  if (prev_rtx != nullptr)
> >>> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> >>> >>
> >>> >> This should be lowpart_subreg, since 0 isn't the right offset for
> >>> >> big-endian targets.  Using lowpart_subreg should also avoid the need
> >>> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
> >>> >> subregs of multiword values.
> >>> >
> >>> > I tried it.  It didn't work since it caused the LRA failure.   I 
> >>> > replaced
> >>> > simplify_gen_subreg with lowpart_subreg instead.
> >>>
> >>> What specifically went wrong?
> >>
> >> With vector broadcast, for
> >> ---
> >> extern void *ops;
> >>
> >> void
> >> foo (int c)
> >> {
> >>   __builtin_memset (ops, c, 18);
> >> }
> >> ---
> >> we generate HI from V16QI.   With a single lowpart_subreg, I get
> >>
> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> >> (const_int 16 [0x10])) [0 MEM  [(void
> >> *)ops.0_1]+16 S2 A8])
> >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 
> >> {*movhi_internal}
> >>  (nil))
> >>
> >> instead of
> >>
> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> >> (const_int 16 [0x10])) [0 MEM  [(void
> >> *)ops.0_1]+16 S2 A8])
> >> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
> >>  (nil))
> >>
> >> IRA and LRA fail to reload:
> >>
> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
> >> (const_int 16 [0x10])) [0 MEM  [(void
> >> *)ops.0_1]+16 S2 A8])
> >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 
> >> {*movhi_internal}
> >>  (nil))
> >>
> >> since ix86_can_change_mode_class has
> >>
> >>   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
> >> {
> >>   /* Vector registers do not support QI or HImode loads.  If we don't
> >>  disallow a change to these modes, reload will assume it's ok to
> >>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
> >>  the vec_dupv4hi pattern.  */
> >>   if (GET_MODE_SIZE (from) < 4)
> >> return false;
> >> }
> >
> > Ah!  OK.  In that case, maybe we should have something like:
> >
> >if (REG_P (prev_rtx)
> >&& HARD_REGISTER_P (prev_rtx)
> >&& REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode))
>
> Sorry, make that last line:
>
>   && lowpart_subreg_regno (REGNO (prev_rtx), prev->mode, mode) < 0
>
> where lowpart_subreg_regno is a new wrapper around simplify_subreg_regno
> that uses subreg_lowpart_offset (mode, prev->mode) as the offset.

Fixed.  I submitted the v3 patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575670.html

Thanks.

> Thanks,
> Richard
>
> >  prev_rtx = copy_to_reg (prev_rtx);
> >
> > and then just have the single lowpart_subreg after that.
> >
> > Thanks,
> > Richard



-- 
H.J.


Re: PING 2 [PATCH] handle sanitizer built-ins in -Wuninitialized (PR 101300)

2021-07-20 Thread Martin Sebor via Gcc-patches

On 7/20/21 4:48 AM, Jeff Law wrote:



On 7/19/2021 6:01 PM, Martin Sebor via Gcc-patches wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/12/21 12:06 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/2/21 1:21 PM, Martin Sebor wrote:

To avoid a class of false negatives for sanitized code
-Wuninitialized recognizes the ASAN_MARK internal function
doesn't modify its argument.  But the warning code doesn't do
the same for any sanitizer built-ins even though they don't
modify user-supplied arguments either.  This leaves another
class of false negatives unresolved.

The attached fix enhances the warning logic to recognize all
sanitizer built-ins as well and treat them as non-modifying.

Tested on x86_64-linux.

OK after fixing the "pointets" -> "pointers" typo.


Done and pushed in r12-2420.

Martin


Re: [PATCH] Fold bswap32(x) != 0 to x != 0 (and related transforms)

2021-07-20 Thread Jeff Law via Gcc-patches




On 7/18/2021 4:03 PM, Marc Glisse wrote:

On Sun, 18 Jul 2021, Roger Sayle wrote:


+    (if (GIMPLE || !TREE_SIDE_EFFECTS (@0))


I don't think you need to worry about that, the general genmatch 
machinery is already supposed to take care of it. All the existing 
cases in match.pd are about cond_expr, where counting the occurrences 
of each @i is not reliable.

OK with those tests removed.

Jeff




[committed] libstdc++: Fix create_directories to resolve symlinks [PR101510]

2021-07-20 Thread Jonathan Wakely via Gcc-patches

On 20/07/21 12:59 +0100, Jonathan Wakely wrote:

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (create_dir): Adjust whitespace.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Test creating directory with name of existing symlink to
directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Likewise.



It turned out this bug report wasn't actually about create_directory,
but create_directories, which does have a bug.

When filesystem__create_directories checks to see if the path already
exists and resovles to a directory, it uses filesystem::symlink_status,
which means it reports an error if the path is a symlink. It should use
filesystem::status, so that the target directory is detected, and no
error is reported.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101510

* src/c++17/fs_ops.cc (fs::create_directories): Use status
instead of symlink_status.
* src/filesystem/ops.cc (fs::create_directories): Likewise.
* testsuite/27_io/filesystem/operations/create_directories.cc:
* testsuite/27_io/filesystem/operations/create_directory.cc: Do
not test with symlinks on Windows.
* 
testsuite/experimental/filesystem/operations/create_directories.cc:
* testsuite/experimental/filesystem/operations/create_directory.cc:
Do not test with symlinks on Windows.


Tested powerpc64le-linux. Committed to trunk.


commit 124eaa50e0a34f5f89572c1aa812c50979da58fc
Author: Jonathan Wakely 
Date:   Tue Jul 20 18:15:48 2021

libstdc++: Fix create_directories to resolve symlinks [PR101510]

When filesystem__create_directories checks to see if the path already
exists and resovles to a directory, it uses filesystem::symlink_status,
which means it reports an error if the path is a symlink. It should use
filesystem::status, so that the target directory is detected, and no
error is reported.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (fs::create_directories): Use status
instead of symlink_status.
* src/filesystem/ops.cc (fs::create_directories): Likewise.
* testsuite/27_io/filesystem/operations/create_directories.cc:
* testsuite/27_io/filesystem/operations/create_directory.cc: Do
not test with symlinks on Windows.
* testsuite/experimental/filesystem/operations/create_directories.cc:
* testsuite/experimental/filesystem/operations/create_directory.cc:
Do not test with symlinks on Windows.

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index cec76446f06..ceaf0291d64 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -496,7 +496,7 @@ fs::create_directories(const path& p, error_code& ec)
   return false;
 }
 
-  file_status st = symlink_status(p, ec);
+  file_status st = status(p, ec);
   if (is_directory(st))
 return false;
   else if (ec && !status_known(st))
diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc
index c400376d224..7c5b164fb7f 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -426,7 +426,7 @@ fs::create_directories(const path& p, error_code& ec) noexcept
   return false;
 }
 
-  file_status st = symlink_status(p, ec);
+  file_status st = status(p, ec);
   if (is_directory(st))
 return false;
   else if (ec && !status_known(st))
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directories.cc b/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directories.cc
index 393d6a55309..304c1453afe 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directories.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directories.cc
@@ -145,10 +145,33 @@ test03()
   remove_all(p);
 }
 
+void
+test04()
+{
+#if defined(__MINGW32__) || defined(__MINGW64__)
+  // no symlinks
+#else
+  // PR libstdc++/101510
+  // create_directories reports an error if the path is a symlink to a dir
+  std::error_code ec = make_error_code(std::errc::invalid_argument);
+  const auto p = __gnu_test::nonexistent_path() / "";
+  fs::create_directories(p/"dir");
+  auto link = p/"link";
+  fs::create_directory_symlink("dir", link);
+  bool created = fs::create_directories(link, ec);
+  VERIFY( !created );
+  VERIFY( !ec );
+  created = fs::create_directories(link);
+  VERIFY( !created );
+  remove_all(p);
+#endif
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/operations/create_directory.cc b/libstdc++-v3/testsuite/27

[PATCH] libstdc++: Use __builtin_operator_new when available [PR94295]

2021-07-20 Thread Jonathan Wakely via Gcc-patches
Clang provides __builtin_operator_new and __builtin_operator_delete,
which have the same semantics as ::operator new and ::operator delete
except that the compiler is allowed to elide calls to them. This changes
std::allocator to use those built-in functions so that memory allocated
by std::allocator can be optimized away when using Clang. This avoids an
abstraction penalty for using std::allocator to allocate storage rather
than a new-expression.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/94295
* include/ext/new_allocator.h (_GLIBCXX_OPERATOR_NEW)
(_GLIBCXX_OPERATOR_DELETE, _GLIBCXX_SIZED_DEALLOC): Define.
(allocator::allocate, allocator::deallocate): Use new macros.

Tested powerpc64le-linux.

The macros are ugly, but using #ifdef is unavoidable here. I think
this is less ugly than the laternative ways to do this.

Any objections to pushing this, or any better ideas for doing this?


commit f8c86589f2eab42c0888390bea7bceb8dc31d92b
Author: Jonathan Wakely 
Date:   Mon Jul 19 16:43:11 2021

libstdc++: Use __builtin_operator_new when available [PR94295]

Clang provides __builtin_operator_new and __builtin_operator_delete,
which have the same semantics as ::operator new and ::operator delete
except that the compiler is allowed to elide calls to them. This changes
std::allocator to use those built-in functions so that memory allocated
by std::allocator can be optimized away when using Clang. This avoids an
abstraction penalty for using std::allocator to allocate storage rather
than a new-expression.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/94295
* include/ext/new_allocator.h (_GLIBCXX_OPERATOR_NEW)
(_GLIBCXX_OPERATOR_DELETE, _GLIBCXX_SIZED_DEALLOC): Define.
(allocator::allocate, allocator::deallocate): Use new macros.

diff --git a/libstdc++-v3/include/ext/new_allocator.h 
b/libstdc++-v3/include/ext/new_allocator.h
index 3fb893be152..7c48c820c62 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -97,6 +97,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return std::__addressof(__x); }
 #endif
 
+#if __has_builtin(__builtin_operator_new) >= 201802L
+# define _GLIBCXX_OPERATOR_NEW __builtin_operator_new
+# define _GLIBCXX_OPERATOR_DELETE __builtin_operator_delete
+#else
+# define _GLIBCXX_OPERATOR_NEW ::operator new
+# define _GLIBCXX_OPERATOR_DELETE ::operator delete
+#endif
+
   // NB: __n is permitted to be 0.  The C++ standard says nothing
   // about what the return value is when __n == 0.
   _GLIBCXX_NODISCARD _Tp*
@@ -121,34 +129,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__)
  {
std::align_val_t __al = std::align_val_t(alignof(_Tp));
-   return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp), __al));
+   return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp),
+  __al));
  }
 #endif
-   return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
+   return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
   }
 
   // __p is not permitted to be a null pointer.
   void
-  deallocate(_Tp* __p, size_type __t __attribute__ ((__unused__)))
+  deallocate(_Tp* __p, size_type __n __attribute__ ((__unused__)))
   {
+#if __cpp_sized_deallocation
+# define _GLIBCXX_SIZED_DEALLOC(p, n) (p), (n) * sizeof(_Tp)
+#else
+# define _GLIBCXX_SIZED_DEALLOC(p, n) (p)
+#endif
+
 #if __cpp_aligned_new
if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__)
  {
-   ::operator delete(__p,
-# if __cpp_sized_deallocation
- __t * sizeof(_Tp),
-# endif
- std::align_val_t(alignof(_Tp)));
+   _GLIBCXX_OPERATOR_DELETE(_GLIBCXX_SIZED_DEALLOC(__p, __n),
+std::align_val_t(alignof(_Tp)));
return;
  }
 #endif
-   ::operator delete(__p
-#if __cpp_sized_deallocation
- , __t * sizeof(_Tp)
-#endif
-);
+   _GLIBCXX_OPERATOR_DELETE(_GLIBCXX_SIZED_DEALLOC(__p, __n));
   }
 
+#undef _GLIBCXX_SIZED_DEALLOC
+#undef _GLIBCXX_OPERATOR_DELETE
+#undef _GLIBCXX_OPERATOR_NEW
+
 #if __cplusplus <= 201703L
   size_type
   max_size() const _GLIBCXX_USE_NOEXCEPT


Re: '#pragma GCC diagnostic' (mis-)use in 'statement' of 'if'

2021-07-20 Thread Martin Sebor via Gcc-patches

On 7/20/21 2:40 AM, Thomas Schwinge wrote:

Hi!

On 2021-07-20T09:23:24+0200, I wrote:

On 2021-07-19T10:46:35+0200, I wrote:

| On 7/16/21 11:42 AM, Thomas Schwinge wrote:
|> On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
 wrote:
|>> The attached tweak avoids the new -Warray-bounds instances when
|>> building libatomic for arm. Christophe confirms it resolves
|>> the problem (thank you!)
|>
|> As Abid has just reported in
|> , similar
|> problem with GCN target libgomp build:
|>
|>  In function ‘gcn_thrs’,
|>  inlined from ‘gomp_thread’ at 
[...]/source-gcc/libgomp/libgomp.h:803:10,
|>  inlined from ‘GOMP_barrier’ at 
[...]/source-gcc/libgomp/barrier.c:34:29:
|>  [...]/source-gcc/libgomp/libgomp.h:792:10: error: array subscript 0 is 
outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ 
[-Werror=array-bounds]
|>792 |   return *thrs;
|>|  ^
|>
|>  gcc/config/gcn/gcn.h:  c_register_addr_space ("__lds", ADDR_SPACE_LDS); 
  \
|>
|>  libgomp/libgomp.h-static inline struct gomp_thread *gcn_thrs (void)
|>  libgomp/libgomp.h-{
|>  libgomp/libgomp.h-  /* The value is at the bottom of LDS.  */
|>  libgomp/libgomp.h:  struct gomp_thread * __lds *thrs = (struct 
gomp_thread * __lds *)4;
|>  libgomp/libgomp.h-  return *thrs;
|>  libgomp/libgomp.h-}
|>
|> ..., plus a few more.  Work-around:
|>
|> struct gomp_thread * __lds *thrs = (struct gomp_thread * __lds *)4;
|>  +# pragma GCC diagnostic push
|>  +# pragma GCC diagnostic ignored "-Warray-bounds"
|> return *thrs;
|>  +# pragma GCC diagnostic pop
|>
|> ..., but it's a bit tedious to add that in all that the other places,
|> too.

Wasn't so bad after all; a lot of duplicates due to 'libgomp.h'.  I've
thus pushed "[gcn] Work-around libgomp 'error: array subscript 0 is
outside array bounds of ‘__lds struct gomp_thread * __lds[0]’
[-Werror=array-bounds]' [PR101484]" to master branch in commit
9f2bc5077debef2b046b6c10d38591ac324ad8b5, see attached.


As I should find, these '#pragma GCC diagnostic [...]' directives cause
some code generation changes (that seems unexpected, problematic!).
(Martin, any idea?  Might be a pre-existing problem, of course.)


OK, phew.  Martin: your diagnostic changes are *not* to be blamed for
code generation changes -- it's my '#pragma GCC diagnostic pop'
placement that triggers:


This
results in a lot (ten thousands) of 'GCN team arena exhausted' run-time
diagnostics, also leading to a few FAILs:

 PASS: libgomp.c/../libgomp.c-c++-common/for-11.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-11.c execution 
test

 PASS: libgomp.c/../libgomp.c-c++-common/for-12.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-12.c execution 
test

 PASS: libgomp.c/../libgomp.c-c++-common/for-3.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-3.c execution test

 PASS: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-5.c execution test

 PASS: libgomp.c/../libgomp.c-c++-common/for-6.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-6.c execution test

 PASS: libgomp.c/../libgomp.c-c++-common/for-9.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-9.c execution test

Same for 'libgomp.c++'.

It remains to be analyzed how '#pragma GCC diagnostic [...]' directives
can cause code generation changes; for now I'm working around the
"unexpected" '-Werror=array-bounds' diagnostics differently:


In addition to a few in straight-line code, I also had these two:


--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -128,7 +128,10 @@ team_malloc (size_t size)
 : "=v"(result) : "v"(TEAM_ARENA_FREE), "v"(size), "e"(1L) : "memory");

/* Handle OOM.  */
+# pragma GCC diagnostic push
+# pragma GCC diagnostic ignored "-Warray-bounds" /*TODO PR101484 */
if (result + size > *(void * __lds *)TEAM_ARENA_END)
+# pragma GCC diagnostic pop
  {
/* While this is experimental, let's make sure we know when OOM
happens.  */
@@ -162,8 +159,11 @@ team_free (void *ptr)
   However, if we fell back to using heap then we should free it.
   It would be better if this function could be a no-op, but at least
   LDS loads are cheap.  */
+# pragma GCC diagnostic push
+# pragma GCC diagnostic ignored "-Warray-bounds" /*TODO PR101484 */
if (ptr < *(void * __lds *)TEAM_ARENA_START
|| ptr >= *(void * __lds *)TEAM_ARENA_END)
+# pragma GCC diagnostic pop
  free (ptr);
  }
  #else


..., and it appears that the '#pragma GCC diagnostic pop' are considered
here to be the 'statement' of the 'if'!  That's (a) unexpected (to me, at
least) for this 

[PATCH] PR fortran/101514 - ICE: out of memory allocating 18446744073709551600 bytes

2021-07-20 Thread Harald Anlauf via Gcc-patches
While investigating one of Gerhard's latest bug reports, which was almost
obvious to fix after a hint by Richard Biener, I found further variants of
valid and invalid code that lead to either NULL pointer dereferences or
similar OOM situations.

Regtested on x86_64-pc-linux-gnu.  OK for mainline / 11-branch?

Thanks,
Harald


Fortran: ICE, OOM while calculating sizes of derived type array components

gcc/fortran/ChangeLog:

PR fortran/101514
* target-memory.c (gfc_interpret_derived): Size of array component
of derived type can only be computed here for explicit size.
* trans-types.c (gfc_get_nodesc_array_type): Do not dereference
NULL pointers.

gcc/testsuite/ChangeLog:

PR fortran/101514
* gfortran.dg/pr101514.f90: New test.

diff --git a/gcc/fortran/target-memory.c b/gcc/fortran/target-memory.c
index cfa8402dd3f..7b21a9e04e8 100644
--- a/gcc/fortran/target-memory.c
+++ b/gcc/fortran/target-memory.c
@@ -534,6 +534,9 @@ gfc_interpret_derived (unsigned char *buffer, size_t buffer_size, gfc_expr *resu
 	{
 	  int n;

+	  if (cmp->as->type != AS_EXPLICIT)
+	return 0;
+
 	  e->expr_type = EXPR_ARRAY;
 	  e->rank = cmp->as->rank;

diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index d715838a046..50fda4328f7 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -1644,7 +1644,7 @@ gfc_get_nodesc_array_type (tree etype, gfc_array_spec * as, gfc_packed packed,
   GFC_TYPE_ARRAY_STRIDE (type, n) = tmp;

   expr = as->lower[n];
-  if (expr->expr_type == EXPR_CONSTANT)
+  if (expr && expr->expr_type == EXPR_CONSTANT)
 {
   tmp = gfc_conv_mpz_to_tree (expr->value.integer,
   gfc_index_integer_kind);
@@ -1694,7 +1694,7 @@ gfc_get_nodesc_array_type (tree etype, gfc_array_spec * as, gfc_packed packed,
   for (n = as->rank; n < as->rank + as->corank; n++)
 {
   expr = as->lower[n];
-  if (expr->expr_type == EXPR_CONSTANT)
+  if (expr && expr->expr_type == EXPR_CONSTANT)
 	tmp = gfc_conv_mpz_to_tree (expr->value.integer,
 gfc_index_integer_kind);
   else
diff --git a/gcc/testsuite/gfortran.dg/pr101514.f90 b/gcc/testsuite/gfortran.dg/pr101514.f90
new file mode 100644
index 000..51fbf8a7e85
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101514.f90
@@ -0,0 +1,35 @@
+! { dg-do compile }
+! PR fortran/101514 - ICE: out of memory allocating ... bytes
+
+subroutine s
+  type t1
+ integer :: a(..) ! { dg-error "must have an explicit shape" }
+  end type
+  type t2
+ integer :: a(*)  ! { dg-error "must have an explicit shape" }
+  end type
+  type t3
+ integer :: a(:)  ! { dg-error "must have an explicit shape" }
+  end type
+  type t4
+ integer :: a(0:) ! { dg-error "must have an explicit shape" }
+  end type
+  type t5
+ integer, allocatable :: a(:)
+  end type
+  type t6
+ integer, pointer :: a(:)
+  end type
+  type(t1) :: a1
+  type(t2) :: a2
+  type(t3) :: a3
+  type(t4) :: a4
+  type(t5) :: a5
+  type(t6) :: a6
+  a1 = transfer(1, a1)
+  a2 = transfer(1, a2)
+  a3 = transfer(1, a3)
+  a4 = transfer(1, a4)
+  a5 = transfer(1, a5)
+  a6 = transfer(1, a6)
+end


Re: [PATCH] correct range of stpcpy result (PR 101397)

2021-07-20 Thread Martin Sebor via Gcc-patches

On 7/20/21 10:08 AM, Jeff Law wrote:



On 7/14/2021 7:49 PM, Martin Sebor via Gcc-patches wrote:

Access warnings look through calls to the subset of built-ins
that return one of their pointer arguments to find the object
the pointer it points to and its offset.  The computation is
wrong for functions like stpcpy, stpncpy and mempcpy that
return a pointer plus some offset, and leads to a false positive
-Warray-bounds in Glibc with the recent refactoring of the warning
to take advantage of this logic.

The attached patch corrects this mistake by accounting for this
property of these functions while at the same time constraining
the offset to the size of the source argument for better
accuracy.

Tested on x86_64-linux and by also building Glibc there.

Martin

gcc-101397.diff

PR middle-end/101397 - spurious warning writing to the result of stpcpy minus 1


gcc/ChangeLog:

PR middle-end/101397
* builtins.c (gimple_call_return_array): Add argument.  Correct
offsets for memchr, mempcpy, stpcpy, and stpncpy.
(compute_objsize_r): Adjust offset computation for argument returning
built-ins.

gcc/testsuite/ChangeLog:

PR middle-end/101397
* gcc.dg/Warray-bounds-80.c: New test.
* gcc.dg/Warray-bounds-81.c: New test.
* gcc.dg/Warray-bounds-82.c: New test.
* gcc.dg/Warray-bounds-83.c: New test.
* gcc.dg/Warray-bounds-84.c: New test.
* gcc.dg/Wstringop-overflow-46.c: Adjust expected output.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39ab139b7e1..170d776c410 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5200,12 +5200,19 @@ get_offset_range (tree x, gimple *stmt, offset_int 
r[2], range_query *rvals)
  /* Return the argument that the call STMT to a built-in function returns
 or null if it doesn't.  On success, set OFFRNG[] to the range of offsets
 from the argument reflected in the value returned by the built-in if it
-   can be determined, otherwise to 0 and HWI_M1U respectively.  */
+   can be determined, otherwise to 0 and HWI_M1U respectively.  Set
+   *PAST_END for functions like mempcpy that might return a past the end
+   pointer (most functions return a dereferenceable pointer to an existing
+   element of an array).  */
  
  static tree

-gimple_call_return_array (gimple *stmt, offset_int offrng[2],
+gimple_call_return_array (gimple *stmt, offset_int offrng[2], bool *past_end,
  range_query *rvals)
  {
+  /* Clear and set below for the rare function(s) that might return
+ a past-the-end pointer.  */
+  *past_end = false;
+
{
  /* Check for attribute fn spec to see if the function returns one
 of its arguments.  */
@@ -5213,6 +5220,7 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
  unsigned int argno;
  if (fnspec.returns_arg (&argno))
{
+   /* Functions return the first argument (not a range).  */
offrng[0] = offrng[1] = 0;
return gimple_call_arg (stmt, argno);
}
@@ -5242,6 +5250,7 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
if (gimple_call_num_args (stmt) != 2)
return NULL_TREE;
  
+  /* Allocation functions return a pointer to the beginning.  */

offrng[0] = offrng[1] = 0;
return gimple_call_arg (stmt, 1);
  }
@@ -5253,10 +5262,6 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
  case BUILT_IN_MEMMOVE:
  case BUILT_IN_MEMMOVE_CHK:
  case BUILT_IN_MEMSET:
-case BUILT_IN_STPCPY:
-case BUILT_IN_STPCPY_CHK:
-case BUILT_IN_STPNCPY:
-case BUILT_IN_STPNCPY_CHK:
  case BUILT_IN_STRCAT:
  case BUILT_IN_STRCAT_CHK:
  case BUILT_IN_STRCPY:
@@ -5265,18 +5270,34 @@ gimple_call_return_array (gimple *stmt, offset_int 
offrng[2],
  case BUILT_IN_STRNCAT_CHK:
  case BUILT_IN_STRNCPY:
  case BUILT_IN_STRNCPY_CHK:
+  /* Functions return the first argument (not a range).  */
offrng[0] = offrng[1] = 0;
return gimple_call_arg (stmt, 0);
  
  case BUILT_IN_MEMPCPY:

  case BUILT_IN_MEMPCPY_CHK:
{
+   /* The returned pointer is in a range constrained by the smaller
+  of the upper bound of the size argument and the source object
+  size.  */
ISTM that for the MEMPCPY case the range is constrained by the size 
argument only from an implementation standpoint, but the size of the 
source or dest object can also constrain since if we overflow either 
we've gone into the realm of undefined behavior.  It's a nit for the 
comment, I don't think we need to adjust the implementation further.


I thought about your observation a bit to see if I may have overlooked
something.  Deriving the constraint from the size of the source does
assume the source is in fact big enough.  If it's not big enough for
the copy another warning detects it so I think both cases are handled
correctly.  I have already pushed the change in r12-2422 but let me

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-20 Thread Jason Merrill via Gcc-patches

On 7/20/21 2:34 PM, Martin Sebor wrote:

On 7/14/21 10:23 AM, Jason Merrill wrote:

On 7/14/21 10:46 AM, Martin Sebor wrote:

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of 
the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the 
same value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in 
C++17 it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on 
vec being a thin wrapper around a pointer.  In almost all cases it 
can be replaced with {}; one exception is == comparison, where it 
seems to be testing that the embedded pointer is null, which is a 
weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but I'll 
hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html


As Richard wrote:


The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.


Please go ahead and apply those changes and send a new patch with the 
remainder of the changes.


I have just pushed r12-2418:
https://gcc.gnu.org/pipermail/gcc-cvs/2021-July/350886.html



A few other comments:


-   omp_declare_simd_clauses);
+   *omp_declare_simd_clauses);


Instead of doing this indirection in all of the callers, let's change 
c_finish_omp_declare_simd to take a pointer as well, and do the 
indirection in initializing a reference variable at the top of the 
function.


Okay.




+    sched_init_luids (bbs.to_vec ());
+    haifa_init_h_i_d (bbs.to_vec ());


Why are these to_vec changes needed when you are also changing the 
functions to take const&?


Calling to_vec() here isn't necessary so I've removed it.




-  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
+  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo).to_vec ();


Why not use a reference here and in other similar spots?


Sure, that works too.

Attached is what's left of the original changes now that r12-2418
has been applied.



@@ -3364,7 +3364,8 @@ static void
 vect_check_lower_bound (loop_vec_info loop_vinfo, tree expr, bool unsigned_p,
poly_uint64 min_value)
 {
-  vec lower_bounds = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo);
+  vec lower_bounds
+= LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).to_vec ();
   for (unsigned int i = 0; i < lower_bounds.length (); ++i)
 if (operand_equal_p (lower_bounds[i].expr, expr, 0))
   {
@@ -3466,7 +3467,7 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
   typedef pair_hash  tree_pair_hash;
   hash_set  compared_objects;
 
-  vec may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);

+  vec may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).to_vec ();


These could also be references.

That leaves this as the only remaining use of to_vec:


   ipa_call_arg_values (ipa_auto_call_arg_values *aavals)
-: m_known_vals (aavals->m_known_vals),
-  m_known_contexts (aavals->m_known_contexts),
-  m_known_aggs (aavals->m_known_aggs),
-  m_known_value_ranges (aavals->m_known_value_ranges)
+: m_known_vals (aavals->m_known_vals.to_vec (

Re: '#pragma GCC diagnostic' (mis-)use in 'statement' of 'if'

2021-07-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 20, 2021 at 01:47:01PM -0600, Martin Sebor wrote:
> > Addressing that is for another day.
> 
> David Malcolm (CC'd) has a patch attached to pr63326 to issue
> a warning to point out that #pragmas are treated as statements
> that would help prevent this type of a bug.  David, do you still
> plan to submit it?

That patch doesn't look correct.
c_parser_pragma and cp_parser_pragma is already told if it appears
in a context for which treating the pragma as standalone statement changes
the behavior and in contexts where it doesn't - pragma_stmt stands
for the problematic ones, pragma_compound for the correct ones (there are
other values for namespace scope, class scope etc.).
OpenMP/OpenACC pragmas shouldn't be touched, those already do the right
thing the standard asks for, for the remaining ones there should be a
warning for the pragma_stmt cases.

Jakub



Re: [PATCH] c-family: Add __builtin_assoc_barrier

2021-07-20 Thread Jason Merrill via Gcc-patches

On 7/19/21 8:34 AM, Richard Biener wrote:

On Mon, 19 Jul 2021, Matthias Kretz wrote:


tested on x86_64-pc-linux-gnu with no new failures. OK for master?


I think now that PAREN_EXPR can appear in C++ code you need to
adjust some machiner to expect it (constexpr folding?  template stuff?).
I suggest to add some testcases covering templates and constexpr
functions.


Yes.

The C++ front end already uses PAREN_EXPR in templates to indicate 
parenthesized initializers in cases where that matters for 
decltype(auto).  It should be fine to use it for both that and 
__builtin_assoc_barrier, but you probably want to distinguish them with 
a TREE_LANG_FLAG, and change tsubst_copy_and_build to keep the 
PAREN_EXPR in this case.


For constexpr you probably just need to add handling to 
cxx_eval_constant_expression to evaluate its operand instead.



+@deftypefn {Built-in Function} @var{type} __builtin_assoc_barrier
(@var{type} @var{expr})
+This built-in represents a re-association barrier for the floating-point
+expression @var{expr} with operations following the built-in. The
expression
+@var{expr} itself can be reordered, and the whole expression @var{expr}
can
be
+reordered with operations after the barrier.

What operations follow the built-in also applies to operations leading
the builtin?  Maybe "This built-in represents a re-association barrier
for the floating-point expression @var{expr} with the expression
consuming its value."  But I'm not an english speaker - I guess
I'm mostly confused about "follow" here.

I'm not sure if there are better C/C++ language terms describing what
the builtin does, but basically it appears as opaque operand to the
surrounding expression and the surrounding expression is opaque
to the expression inside the parens.

  The barrier is only relevant
when
+@code{-fassociative-math} is active, since otherwise floating-point is
not
+treated as associative.
+
+@smallexample
+float x0 = a + b - b;
+float x1 = __builtin_assoc_barrier(a + b) - b;
+@end smallexample
+
+@noindent
+means that, with @code{-fassociative-math}, @code{x0} can be optimized to
+@code{x0 = a} but @code{x1} cannot.
+@end deftypefn
+

Otherwise the patch looks OK, but of course C/C++ frontend maintainers
would want to chime in here (I've CCed two).

Richard.



New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

* c-c++-common/builtin-assoc-barrier-1.c: New test.

gcc/cp/ChangeLog:

* cp-objcp-common.c (names_builtin_p): Handle
RID_BUILTIN_ASSOC_BARRIER.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_ASSOC_BARRIER.

gcc/c-family/ChangeLog:

* c-common.c (c_common_reswords): Add __builtin_assoc_barrier.
* c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER.

gcc/c/ChangeLog:

* c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER.
* c-parser.c (c_parser_postfix_expression): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document __builtin_assoc_barrier.
---
  gcc/c-family/c-common.c   |  1 +
  gcc/c-family/c-common.h   |  2 +-
  gcc/c/c-decl.c|  1 +
  gcc/c/c-parser.c  | 20 
  gcc/cp/cp-objcp-common.c  |  1 +
  gcc/cp/parser.c   | 14 +++
  gcc/doc/extend.texi   | 18 ++
  .../c-c++-common/builtin-assoc-barrier-1.c| 24 +++
  8 files changed, 80 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/c-c++-common/builtin-assoc-barrier-1.c


--
──
  Dr. Matthias Kretz   https://mattkretz.github.io
  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
  std::experimental::simd  https://github.com/VcDevel/std-simd
──






[PATCH 3/4] libsanitizer: Update LOCAL_PATCHES

2021-07-20 Thread H.J. Lu via Gcc-patches
* LOCAL_PATCHES: Update to the corresponding revision.
---
 libsanitizer/LOCAL_PATCHES | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libsanitizer/LOCAL_PATCHES b/libsanitizer/LOCAL_PATCHES
index b1969fc7882..d45655392b0 100644
--- a/libsanitizer/LOCAL_PATCHES
+++ b/libsanitizer/LOCAL_PATCHES
@@ -1,2 +1 @@
-fb73b1ce36c6ede097ecb220fcd0a1ed2df8fd01
-adab7b2bf42b469e51154a09a1b4fa0726a7073c
+763479487980eefd1450c0c1cdeea651d70f2fdb
-- 
2.31.1



[PATCH 4/4] libsanitizer: Bump asan/tsan versions

2021-07-20 Thread H.J. Lu via Gcc-patches
Bump asan/tsan versions for the upstream commit:

commit acf0a6428681dccac803984bfbb1e3e54248f090
Author: Ilya Leoshkevich 
Date:   Fri Jul 2 02:42:38 2021 +0200

[sanitizer] Fix __sanitizer_kernel_sigset_t endianness issue

setuid(0) hangs on SystemZ under TSan because TSan's BackgroundThread
ignores SIGSETXID. This in turn happens because internal_sigdelset()
messes up the mask bits on big-endian system due to how
__sanitizer_kernel_sigset_t is defined.

Commit d9a1a53b8d80 ("[ESan] [MIPS] Fix workingset-signal-posix.cpp on
MIPS") fixed this for MIPS by adjusting the __sanitizer_kernel_sigset_t
definition. Generalize this by defining __SANITIZER_KERNEL_NSIG based
on kernel's _NSIG and using uptr[] for __sanitizer_kernel_sigset_t.sig
on all platforms.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D105629

which changed __sanitizer_kernel_sigset_t and changed the ABI for function

void __sanitizer_syscall_post_impl_rt_sigaction
  (long int, long int,
   const __sanitizer::__sanitizer_kernel_sigaction_t*,
   __sanitizer::__sanitizer_kernel_sigaction_t*,
   SIZE_T);

* asan/libtool-version: Bump version.
* tsan/libtool-version: Likewise.
---
 libsanitizer/asan/libtool-version | 2 +-
 libsanitizer/tsan/libtool-version | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/asan/libtool-version 
b/libsanitizer/asan/libtool-version
index c509757b572..2cd4546d1b9 100644
--- a/libsanitizer/asan/libtool-version
+++ b/libsanitizer/asan/libtool-version
@@ -3,4 +3,4 @@
 # a separate file so that version updates don't involve re-running
 # automake.
 # CURRENT:REVISION:AGE
-6:0:0
+7:0:0
diff --git a/libsanitizer/tsan/libtool-version 
b/libsanitizer/tsan/libtool-version
index 11974598ac5..79dfeeea15f 100644
--- a/libsanitizer/tsan/libtool-version
+++ b/libsanitizer/tsan/libtool-version
@@ -3,4 +3,4 @@
 # a separate file so that version updates don't involve re-running
 # automake.
 # CURRENT:REVISION:AGE
-0:0:0
+1:0:0
-- 
2.31.1



[PATCH 0/4] libsanitizer: Sync with upstream

2021-07-20 Thread H.J. Lu via Gcc-patches
1. Sync with upstream commit 7704fedfff6ef5676adb6415f3be0ac927d1a746
2. Apply local patche
3. Update LOCAL_PATCHES
4. Bump asan/tsan versions for the upstream commit:

   commit acf0a6428681dccac803984bfbb1e3e54248f090
Author: Ilya Leoshkevich 
Date:   Fri Jul 2 02:42:38 2021 +0200

[sanitizer] Fix __sanitizer_kernel_sigset_t endianness issue

which changed the ABI for function

void __sanitizer_syscall_post_impl_rt_sigaction
  (long int, long int,
   const __sanitizer::__sanitizer_kernel_sigaction_t*,
   __sanitizer::__sanitizer_kernel_sigaction_t*,
   SIZE_T);

Tested on Linux/i686 and Linux/x86-64.  OK for master?

H.J. Lu (4):
  libsanitizer: Merge with upstream
  libsanitizer: Apply local patches
  libsanitizer: Update LOCAL_PATCHES
  libsanitizer: Bump asan/tsan versions

 libsanitizer/LOCAL_PATCHES|   3 +-
 libsanitizer/MERGE|   2 +-
 libsanitizer/asan/Makefile.am |   1 -
 libsanitizer/asan/Makefile.in |   8 +-
 libsanitizer/asan/asan_allocator.cpp  |  10 +-
 libsanitizer/asan/asan_errors.cpp |   1 -
 libsanitizer/asan/asan_fake_stack.cpp |  50 +++-
 libsanitizer/asan/asan_flags.cpp  |   4 -
 libsanitizer/asan/asan_flags.inc  |   3 +-
 libsanitizer/asan/asan_interceptors.cpp   |  28 +-
 libsanitizer/asan/asan_interceptors.h |   4 +-
 .../asan/asan_interceptors_memintrinsics.cpp  |   6 +-
 libsanitizer/asan/asan_interface.inc  |  11 +
 libsanitizer/asan/asan_internal.h |  15 +-
 libsanitizer/asan/asan_malloc_linux.cpp   |  36 +--
 libsanitizer/asan/asan_malloc_local.h |  52 
 libsanitizer/asan/asan_mapping.h  |  29 +-
 libsanitizer/asan/asan_mapping_myriad.h   |  85 --
 libsanitizer/asan/asan_new_delete.cpp |  20 +-
 libsanitizer/asan/asan_poisoning.cpp  |  19 +-
 libsanitizer/asan/asan_poisoning.h|   3 -
 libsanitizer/asan/asan_rtems.cpp  | 266 -
 libsanitizer/asan/asan_rtl.cpp|  47 ++-
 libsanitizer/asan/asan_shadow_setup.cpp   |  11 +-
 libsanitizer/asan/asan_stack.cpp  |   3 +-
 libsanitizer/asan/asan_thread.cpp |  45 +--
 libsanitizer/asan/asan_thread.h   |  17 +-
 libsanitizer/asan/libtool-version |   2 +-
 libsanitizer/hwasan/Makefile.am   |   2 +
 libsanitizer/hwasan/Makefile.in   |   9 +-
 libsanitizer/hwasan/hwasan.cpp|  77 -
 libsanitizer/hwasan/hwasan.h  |  42 ++-
 .../hwasan/hwasan_allocation_functions.cpp| 172 +++
 libsanitizer/hwasan/hwasan_allocator.cpp  |  35 ++-
 libsanitizer/hwasan/hwasan_allocator.h|   7 +-
 libsanitizer/hwasan/hwasan_dynamic_shadow.cpp |   4 +-
 libsanitizer/hwasan/hwasan_fuchsia.cpp| 192 
 libsanitizer/hwasan/hwasan_interceptors.cpp   | 182 +---
 libsanitizer/hwasan/hwasan_linux.cpp  | 166 ---
 libsanitizer/hwasan/hwasan_mapping.h  |  17 +-
 libsanitizer/hwasan/hwasan_new_delete.cpp |  39 ++-
 libsanitizer/hwasan/hwasan_poisoning.cpp  |  24 --
 libsanitizer/hwasan/hwasan_report.cpp | 206 -
 libsanitizer/hwasan/hwasan_thread.cpp |  21 +-
 libsanitizer/hwasan/hwasan_thread.h   |  11 +-
 libsanitizer/hwasan/hwasan_thread_list.cpp|   2 +-
 libsanitizer/hwasan/hwasan_thread_list.h  |   8 +-
 .../include/sanitizer/dfsan_interface.h   |  95 +++---
 libsanitizer/interception/interception.h  |  33 +-
 libsanitizer/lsan/lsan.cpp|  14 +-
 libsanitizer/lsan/lsan_common.h   |   4 +-
 libsanitizer/lsan/lsan_thread.cpp |   7 +-
 libsanitizer/sanitizer_common/Makefile.am |   2 +-
 libsanitizer/sanitizer_common/Makefile.in |  19 +-
 .../sanitizer_common/sanitizer_addrhashmap.h  | 106 +++
 .../sanitizer_common/sanitizer_allocator.cpp  |  38 +--
 .../sanitizer_allocator_combined.h|   4 +-
 .../sanitizer_allocator_local_cache.h |  19 +-
 .../sanitizer_allocator_primary32.h   |   4 +-
 .../sanitizer_allocator_primary64.h   | 170 +--
 .../sanitizer_allocator_secondary.h   |   8 +-
 .../sanitizer_common/sanitizer_common.cpp |  15 +-
 .../sanitizer_common/sanitizer_common.h   |  23 +-
 .../sanitizer_common_interceptors.inc |  84 --
 .../sanitizer_common_libcdep.cpp  |   4 +-
 .../sanitizer_common_nolibc.cpp   |   1 -
 .../sanitizer_deadlock_detector1.cpp  |   2 +-
 .../sanitizer_deadlock_detector2.cpp  |  32 +-
 .../sanitizer_common/sanitizer_errno.h|   3 +-
 .../sanitizer_common/sanitizer_fuchsia.cpp|  33 +-
 .../sanitizer_common/sanitizer_fuchsia.h  |   2 +
 .../sanitizer_common/sanitizer_libc.h |   3 +-
 .../sanitizer_common/sanitizer_libignore

[PATCH 2/4] libsanitizer: Apply local patches

2021-07-20 Thread H.J. Lu via Gcc-patches
---
 libsanitizer/asan/asan_globals.cpp| 19 --
 libsanitizer/asan/asan_interceptors.h |  7 ++-
 libsanitizer/asan/asan_mapping.h  |  2 +-
 .../sanitizer_linux_libcdep.cpp   |  4 
 .../sanitizer_common/sanitizer_mac.cpp| 12 +--
 libsanitizer/sanitizer_common/sanitizer_mac.h | 20 +++
 .../sanitizer_platform_limits_linux.cpp   |  7 +--
 .../sanitizer_platform_limits_posix.h |  2 +-
 .../sanitizer_common/sanitizer_stacktrace.cpp | 17 +++-
 libsanitizer/tsan/tsan_rtl_ppc64.S|  1 +
 libsanitizer/ubsan/ubsan_flags.cpp|  1 +
 libsanitizer/ubsan/ubsan_handlers.cpp | 15 ++
 libsanitizer/ubsan/ubsan_handlers.h   |  8 
 libsanitizer/ubsan/ubsan_platform.h   |  2 ++
 14 files changed, 86 insertions(+), 31 deletions(-)

diff --git a/libsanitizer/asan/asan_globals.cpp 
b/libsanitizer/asan/asan_globals.cpp
index 9d7dbc6f264..e045c31cd1c 100644
--- a/libsanitizer/asan/asan_globals.cpp
+++ b/libsanitizer/asan/asan_globals.cpp
@@ -154,23 +154,6 @@ static void CheckODRViolationViaIndicator(const Global *g) 
{
   }
 }
 
-// Check ODR violation for given global G by checking if it's already poisoned.
-// We use this method in case compiler doesn't use private aliases for global
-// variables.
-static void CheckODRViolationViaPoisoning(const Global *g) {
-  if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {
-// This check may not be enough: if the first global is much larger
-// the entire redzone of the second global may be within the first global.
-for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) {
-  if (g->beg == l->g->beg &&
-  (flags()->detect_odr_violation >= 2 || g->size != l->g->size) &&
-  !IsODRViolationSuppressed(g->name))
-ReportODRViolation(g, FindRegistrationSite(g),
-   l->g, FindRegistrationSite(l->g));
-}
-  }
-}
-
 // Clang provides two different ways for global variables protection:
 // it can poison the global itself or its private alias. In former
 // case we may poison same symbol multiple times, that can help us to
@@ -216,8 +199,6 @@ static void RegisterGlobal(const Global *g) {
 // where two globals with the same name are defined in different modules.
 if (UseODRIndicator(g))
   CheckODRViolationViaIndicator(g);
-else
-  CheckODRViolationViaPoisoning(g);
   }
   if (CanPoisonMemory())
 PoisonRedZones(*g);
diff --git a/libsanitizer/asan/asan_interceptors.h 
b/libsanitizer/asan/asan_interceptors.h
index a9249dea45b..25e05e458be 100644
--- a/libsanitizer/asan/asan_interceptors.h
+++ b/libsanitizer/asan/asan_interceptors.h
@@ -81,7 +81,12 @@ void InitializePlatformInterceptors();
 #if ASAN_HAS_EXCEPTIONS && !SANITIZER_WINDOWS && !SANITIZER_SOLARIS && \
 !SANITIZER_NETBSD
 # define ASAN_INTERCEPT___CXA_THROW 1
-# define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1
+# if ! defined(ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION) \
+ || ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION
+#   define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1
+# else
+#   define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 0
+# endif
 # if defined(_GLIBCXX_SJLJ_EXCEPTIONS) || (SANITIZER_IOS && defined(__arm__))
 #  define ASAN_INTERCEPT__UNWIND_SJLJ_RAISEEXCEPTION 1
 # else
diff --git a/libsanitizer/asan/asan_mapping.h b/libsanitizer/asan/asan_mapping.h
index e5a7f2007ae..4b0037fced3 100644
--- a/libsanitizer/asan/asan_mapping.h
+++ b/libsanitizer/asan/asan_mapping.h
@@ -165,7 +165,7 @@ static const u64 kAArch64_ShadowOffset64 = 1ULL << 36;
 static const u64 kRiscv64_ShadowOffset64 = 0xd;
 static const u64 kMIPS32_ShadowOffset32 = 0x0aaa;
 static const u64 kMIPS64_ShadowOffset64 = 1ULL << 37;
-static const u64 kPPC64_ShadowOffset64 = 1ULL << 44;
+static const u64 kPPC64_ShadowOffset64 = 1ULL << 41;
 static const u64 kSystemZ_ShadowOffset64 = 1ULL << 52;
 static const u64 kSPARC64_ShadowOffset64 = 1ULL << 43;  // 0x800
 static const u64 kFreeBSD_ShadowOffset32 = 1ULL << 30;  // 0x4000
diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp 
b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
index 7ce9e25da34..fc5619e4b37 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
@@ -759,9 +759,13 @@ u32 GetNumberOfCPUs() {
 #elif SANITIZER_SOLARIS
   return sysconf(_SC_NPROCESSORS_ONLN);
 #else
+#if defined(CPU_COUNT)
   cpu_set_t CPUs;
   CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
   return CPU_COUNT(&CPUs);
+#else
+  return 1;
+#endif
 #endif
 }
 
diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cpp 
b/libsanitizer/sanitizer_common/sanitizer_mac.cpp
index 125ecac8b12..0aafbdbc50c 100644
--- a/libsanitizer/sanitizer_common/sanitizer_mac.cpp
+++ b/libsanitizer/sanitizer_common/

Re: [PATCH 0/4] libsanitizer: Sync with upstream

2021-07-20 Thread Jeff Law via Gcc-patches




On 7/20/2021 2:55 PM, H.J. Lu via Gcc-patches wrote:

1. Sync with upstream commit 7704fedfff6ef5676adb6415f3be0ac927d1a746
2. Apply local patche
3. Update LOCAL_PATCHES
4. Bump asan/tsan versions for the upstream commit:

commit acf0a6428681dccac803984bfbb1e3e54248f090
 Author: Ilya Leoshkevich 
 Date:   Fri Jul 2 02:42:38 2021 +0200

 [sanitizer] Fix __sanitizer_kernel_sigset_t endianness issue

which changed the ABI for function
 
 void __sanitizer_syscall_post_impl_rt_sigaction

   (long int, long int,
const __sanitizer::__sanitizer_kernel_sigaction_t*,
__sanitizer::__sanitizer_kernel_sigaction_t*,
SIZE_T);

Tested on Linux/i686 and Linux/x86-64.  OK for master?
If it's resync and reapplying local patches, then it's OK and doesn't 
need approval during stage1.


jeff



Re: [PATCH 0/4] libsanitizer: Sync with upstream

2021-07-20 Thread H.J. Lu via Gcc-patches
On Tue, Jul 20, 2021 at 1:58 PM Jeff Law  wrote:
>
>
>
> On 7/20/2021 2:55 PM, H.J. Lu via Gcc-patches wrote:
> > 1. Sync with upstream commit 7704fedfff6ef5676adb6415f3be0ac927d1a746
> > 2. Apply local patche
> > 3. Update LOCAL_PATCHES
> > 4. Bump asan/tsan versions for the upstream commit:
> >
> > commit acf0a6428681dccac803984bfbb1e3e54248f090
> >  Author: Ilya Leoshkevich 
> >  Date:   Fri Jul 2 02:42:38 2021 +0200
> >
> >  [sanitizer] Fix __sanitizer_kernel_sigset_t endianness issue
> >
> > which changed the ABI for function
> >
> >  void __sanitizer_syscall_post_impl_rt_sigaction
> >(long int, long int,
> > const __sanitizer::__sanitizer_kernel_sigaction_t*,
> > __sanitizer::__sanitizer_kernel_sigaction_t*,
> > SIZE_T);
> >
> > Tested on Linux/i686 and Linux/x86-64.  OK for master?
> If it's resync and reapplying local patches, then it's OK and doesn't
> need approval during stage1.
>

I am checking them in.

Thanks.

-- 
H.J.


Re: [PATCH] Fix for powerpc64 long double complex divide failure

2021-07-20 Thread Segher Boessenkool
On Tue, Jul 20, 2021 at 11:25:29AM -0500, Patrick McGehearty via Gcc-patches 
wrote:
> Ping...
> 
> The fix is minimal (four lines changed).
> I recognize that those familiar with IBM 128-bit floating
> point precision is a select set of people.
> On the plus side, tests fail without the patch and pass with the patch.

Hi!

In the future, please Cc: the relevant maintainers.  Your patch will be
seen much quicker and much more reliably if you do.  I'll handle it now.


Segher


Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-20 Thread Martin Sebor via Gcc-patches

On 7/20/21 2:08 PM, Jason Merrill wrote:

On 7/20/21 2:34 PM, Martin Sebor wrote:

On 7/14/21 10:23 AM, Jason Merrill wrote:

On 7/14/21 10:46 AM, Martin Sebor wrote:

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of 
the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to 
use

vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each 
translation

unit.


The ODR says this is OK because it's a literal constant with the 
same value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in 
C++17 it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on 
vec being a thin wrapper around a pointer.  In almost all cases 
it can be replaced with {}; one exception is == comparison, where 
it seems to be testing that the embedded pointer is null, which 
is a weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but 
I'll hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html


As Richard wrote:


The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.


Please go ahead and apply those changes and send a new patch with the 
remainder of the changes.


I have just pushed r12-2418:
https://gcc.gnu.org/pipermail/gcc-cvs/2021-July/350886.html



A few other comments:


-   omp_declare_simd_clauses);
+   *omp_declare_simd_clauses);


Instead of doing this indirection in all of the callers, let's change 
c_finish_omp_declare_simd to take a pointer as well, and do the 
indirection in initializing a reference variable at the top of the 
function.


Okay.




+    sched_init_luids (bbs.to_vec ());
+    haifa_init_h_i_d (bbs.to_vec ());


Why are these to_vec changes needed when you are also changing the 
functions to take const&?


Calling to_vec() here isn't necessary so I've removed it.




-  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
+  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo).to_vec ();


Why not use a reference here and in other similar spots?


Sure, that works too.

Attached is what's left of the original changes now that r12-2418
has been applied.



@@ -3364,7 +3364,8 @@ static void
 vect_check_lower_bound (loop_vec_info loop_vinfo, tree expr, bool 
unsigned_p,

 poly_uint64 min_value)
 {
-  vec lower_bounds = LOOP_VINFO_LOWER_BOUNDS 
(loop_vinfo);

+  vec lower_bounds
+    = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).to_vec ();
   for (unsigned int i = 0; i < lower_bounds.length (); ++i)
 if (operand_equal_p (lower_bounds[i].expr, expr, 0))
   {
@@ -3466,7 +3467,7 @@ vect_prune_runtime_alias_test_list 
(loop_vec_info loop_vinfo)
   typedef pair_hash  
tree_pair_hash;

   hash_set  compared_objects;

-  vec may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
+  vec may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS 
(loop_vinfo).to_vec ();


These could also be references.


Even const references it turns out for some of them.


That leaves this as the only remaining use of to_vec:


   ipa_call_arg_values (ipa_auto_call_arg_values *aavals)
-    : m_known_vals (aavals->m_known_vals),
-  m_known_contexts (aavals->m_known_contexts),
-  m_known_aggs (aavals->m_known_aggs),
-  m_know

[PATCH] include: Fix -Wundef warnings in ansidecl.h

2021-07-20 Thread Marek Polacek via Gcc-patches
This quashes -Wundef warnings in ansidecl.h when compiled in C or C++.
In C, __cpp_constexpr and __cplusplus aren't defined so we evaluate
them to 0; conversely, __STDC_VERSION__ is not defined in C++.
This has caused grief when -Wundef is used with -Werror.

I've also tested -traditional-cpp.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

include/ChangeLog:

* ansidecl.h: Check if __cplusplus is defined before checking
the value of __cpp_constexpr and __cplusplus.  Don't check
__STDC_VERSION__ in C++.
---
 include/ansidecl.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/ansidecl.h b/include/ansidecl.h
index 0515228f325..2efe3e85e59 100644
--- a/include/ansidecl.h
+++ b/include/ansidecl.h
@@ -79,7 +79,7 @@ So instead we use the macro below and test it against 
specific values.  */
 /* inline requires special treatment; it's in C99, and GCC >=2.7 supports
it too, but it's not in C89.  */
 #undef inline
-#if __STDC_VERSION__ >= 199901L || defined(__cplusplus) || 
(defined(__SUNPRO_C) && defined(__C99FEATURES__))
+#if (!defined(__cplusplus) && __STDC_VERSION__ >= 199901L) || 
defined(__cplusplus) || (defined(__SUNPRO_C) && defined(__C99FEATURES__))
 /* it's a keyword */
 #else
 # if GCC_VERSION >= 2007
@@ -356,7 +356,7 @@ So instead we use the macro below and test it against 
specific values.  */
 #define ENUM_BITFIELD(TYPE) unsigned int
 #endif
 
-#if __cpp_constexpr >= 200704
+#if defined(__cplusplus) && __cpp_constexpr >= 200704
 #define CONSTEXPR constexpr
 #else
 #define CONSTEXPR
@@ -419,7 +419,7 @@ So instead we use the macro below and test it against 
specific values.  */
 
so that most attempts at copy are caught at compile-time.  */
 
-#if __cplusplus >= 201103
+#if defined(__cplusplus) && __cplusplus >= 201103
 #define DISABLE_COPY_AND_ASSIGN(TYPE)  \
   TYPE (const TYPE&) = delete; \
   void operator= (const TYPE &) = delete

base-commit: 4eea703e7d87b1e0b116c93782cab82c9b1e842a
-- 
2.31.1



[PATCH] Fix PR 10153: tail recusion for vector types.

2021-07-20 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is we try to an initialized value
from a scalar constant. For vectors we need to do
a vect_dup instead.  This fixes that issue and we
get the correct code even and it does not crash.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimize/10153
* tree-tailcall.c (create_tailcall_accumulator):
For vector types create a duplicated VECTOR_CST
before calling fold_convert.

gcc/testsuite/ChangeLog:

PR tree-optimize/10153
* gcc.c-torture/compile/pr10153-1.c: New test.
* gcc.c-torture/compile/pr10153-2.c: New test.
---
 gcc/testsuite/gcc.c-torture/compile/pr10153-1.c | 7 +++
 gcc/testsuite/gcc.c-torture/compile/pr10153-2.c | 9 +
 gcc/tree-tailcall.c | 3 +++
 3 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr10153-2.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
new file mode 100644
index 000..3f2040f32a1
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
@@ -0,0 +1,7 @@
+typedef int V __attribute__ ((vector_size (2 * sizeof (int;
+V
+foo (void)
+{
+  V v = { };
+  return v - foo();
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
new file mode 100644
index 000..1af4c8e2a36
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
@@ -0,0 +1,9 @@
+typedef int V __attribute__ ((vector_size (2 * sizeof (int;
+V
+foo (int t)
+{
+  if (t < 10)
+return (V){1, 1};
+  V v = { };
+  return v - foo(t - 1);
+}
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index a4d31c90c49..9d1a98b1cfd 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -1080,6 +1080,9 @@ create_tailcall_accumulator (const char *label, 
basic_block bb, tree init)
 
   phi = create_phi_node (tmp, bb);
   /* RET_TYPE can be a float when -ffast-maths is enabled.  */
+  /* For vectors create a dup. */
+  if (VECTOR_TYPE_P (ret_type))
+init = build_vector_from_val (ret_type, fold_convert (TREE_TYPE 
(ret_type), init));
   add_phi_arg (phi, fold_convert (ret_type, init), single_pred_edge (bb),
   UNKNOWN_LOCATION);
   return PHI_RESULT (phi);
-- 
2.27.0



Re: [PATCH] gcov: Fix use of profile info section

2021-07-20 Thread Jeff Law via Gcc-patches




On 7/14/2021 1:46 AM, Sebastian Huber wrote:

If the -fprofile-info-section is used, then the gcov information is registered
in a linker set.  This is done by build_gcov_info_var_registration().  The
compiler generated object placed in the section was not marked as referenced,
so once optimization was enabled, this object was optimized away.  Mark it as
referenced.

gcc/
coverage.c (build_gcov_info_var_registration): Mark the object placed
in the linker set as referenced so that it does not get optimized away.

OK
jeff



Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-20 Thread Bill Schmidt via Gcc-patches

Hi Segher,

On 7/19/21 2:15 PM, Segher Boessenkool wrote:

Hi!

On Thu, Jun 17, 2021 at 10:18:54AM -0500, Bill Schmidt wrote:

* config/rs6000/rs6000-gen-builtins.c (rbtree.h): New #include.
(num_bifs): New variable.
(num_ovld_stanzas): Likewise.
(num_ovlds): Likewise.
(parse_codes): New enum.
(bif_rbt): New variable.
(ovld_rbt): Likewise.
(fntype_rbt): Likewise.
(bifo_rbt): Likewise.
(parse_bif): New stub function.
(create_bif_order): Likewise.
(parse_ovld): Likewise.
(write_header_file): Likewise.
(write_init_file): Likewise.
(write_defines_file): Likewise.
(delete_output_files): New function.
(main): Likewise.
+/* Parse the built-in file.  */
+static parse_codes
+parse_bif (void)
+{
+  return PC_OK;
+}

Baby steps :-)


+/* Write everything to the header file (rs6000-builtins.h).  */
+static int
+write_header_file (void)
+{
+  return 1;
+}

What does the return value mean?  Please document it in a comment.  Same
for other functions (where the function name does not make it obvious
what the return value is).


+static void
+delete_output_files (void)
+{
+  /* Depending on whence we're called, some of these may already be
+ closed.  Don't check for errors.  */
+  fclose (header_file);
+  fclose (init_file);
+  fclose (defines_file);
+
+  unlink (header_path);
+  unlink (init_path);
+  unlink (defines_path);
+}

What are header_path etc.?  It is a very good idea to make sure this is
never something terrible to call unlink on (including making sure the
delete_output_files function is *obviously* never called if creating the
files didn't succeed).


See the main function.  All three files are guaranteed to have been 
opened for writing when this is called, but some of them may have 
already been closed.  So the fclose calls may fail to do anything, but 
the unlinks will always delete the output files. This is done to avoid 
leaving garbage lying around after a parsing failure.





+/* Main program to convert flat files into built-in initialization code.  */
+int
+main (int argc, const char **argv)
+{
+  if (argc != 6)
+{
+  fprintf (stderr,
+  "Five arguments required: two input file and three output "
+  "files.\n");

Two input file_s_ :-)  (Or s/file //).


+  pgm_path = argv[0];

This isn't portable (depending on what you use it for -- argv[0] is not
necessarily a path at all).


The only thing it's used for is as a documentation string in the output 
files, indicating the path to the program that built them. So long as 
argv[0] is a NULL-terminated string, which it had better be, this is 
harmless.


ISO C11:  "If the value of|argc|is greater than zero, the string pointed 
to by|argv[0]|represents the program name;|argv[0][0]|shall be the null 
character if the program name is not available from the host environment."


So I think we're good here.




+  bif_file = fopen (bif_path, "r");
+  if (!bif_file)
+{
+  fprintf (stderr, "Cannot find input built-in file '%s'.\n", bif_path);
+  exit (1);
+}

Say s/find/open/ in the error?


+  fprintf (stderr, "Cannot find input overload file '%s'.\n", ovld_path);

(more)

Okay with those trivialities, and the unlink stuff looked at.  Thanks!


I'll get this cleaned up and post what I commit.  Thanks!

Bill





Segher


Re: [PATCH] Fix for powerpc64 long double complex divide failure

2021-07-20 Thread Segher Boessenkool
Hi!

On Thu, Jul 08, 2021 at 09:24:31PM +, Patrick McGehearty via Gcc-patches 
wrote:
> This patch resolves the failure of powerpc64 long double complex divide
> in native ibm long double format after the patch "Practical improvement
> to libgcc complex divide".
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101104

> The MAX and MIN values have only modest changes since the exponent
> field for IBM 128-bit floating point values is the same size as
> the exponent field for IBM 64-bit floating point values.

Finite double-double values have two exponent fields, one for each DP
half that together make up the number.  You are referring to the first
one here.

> However
> the EPSILON field is considerably different. Due to how small
> values can be represented in the lower 64 bits of the IBM 128-bit
> floating point, EPSILON is extremely small, so far beyond the
> desired value that inversion of the value overflows and even
> without the overflow, the RMAX2 is so small as to eliminate
> most usage of the test.

The representable values of double-double and those of IEEE QP float are
not a subset of each other in either direction.  Since a number in
double-double is essentially the sum of two DP float numbers, two such
sums numbers can be very close together, compared to the magnitude of
them (the epsilon is equal to the minimum non-zero value).

> In addition, the gcc support for the KF fields (IBM native long double
> format)

KFmode is IEEE QP float, always.  IFmode is IBM extended double, always.
TFmode can be either.

> does not exist on older gcc compilers such as the default
> compilers on the gcc compiler farm. That adds build complexity
> for users who's environment is only a few years out of date.

CentOS 7 is from 2014.  It's about the oldest we support at all.

> libgcc/

You should say
PR target/101104
>   * config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2):
>   Fix long double complex divide for native IBM 128-bit

End sentences (like lines in a changelog) with a full stop, too.

> --- a/libgcc/config/rs6000/_divkc3.c
> +++ b/libgcc/config/rs6000/_divkc3.c
> @@ -38,10 +38,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #endif
>  
>  #ifndef __LONG_DOUBLE_IEEE128__
> -#define RBIG   (__LIBGCC_KF_MAX__ / 2)
> -#define RMIN   (__LIBGCC_KF_MIN__)
> -#define RMIN2  (__LIBGCC_KF_EPSILON__)
> -#define RMINSCAL (1 / __LIBGCC_KF_EPSILON__)
> +#define RBIG   (__LIBGCC_DF_MAX__ / 2)
> +#define RMIN   (__LIBGCC_DF_MIN__)
> +#define RMIN2  (__LIBGCC_DF_EPSILON__)
> +#define RMINSCAL (1 / __LIBGCC_DF_EPSILON__)
>  #define RMAX2  (RBIG * RMIN2)
>  #else
>  #define RBIG   (__LIBGCC_TF_MAX__ / 2)

What *is* your long double?  It should always be IEEE QP float in this
file!  And it is for me.  So what did you do differently?


Segher


Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-20 Thread Segher Boessenkool
Hi!

On Tue, Jul 20, 2021 at 05:19:54PM -0500, Bill Schmidt wrote:
> See the main function.  All three files are guaranteed to have been 
> opened for writing when this is called, but some of them may have 
> already been closed.  So the fclose calls may fail to do anything, but 
> the unlinks will always delete the output files. This is done to avoid 
> leaving garbage lying around after a parsing failure.

That is much worse actually!  From the C spec:
  The value of a pointer to a FILE object is indeterminate after the
  associated file is closed
so this is undefined behaviour.

Please fix that?  Just assign 0 after closing, and guard the fclose on
error with that?

> >>+  pgm_path = argv[0];
> >This isn't portable (depending on what you use it for -- argv[0] is not
> >necessarily a path at all).
> 
> The only thing it's used for is as a documentation string in the output 
> files, indicating the path to the program that built them. So long as 
> argv[0] is a NULL-terminated string, which it had better be, this is 
> harmless.

It is allowed to be a null pointer as well.  (gfortran does not work on
systems that do that, so I wouldn't worry to much about it, but still).

> ISO C11:  "If the value of|argc|is greater than zero, the string pointed 
> to by|argv[0]|represents the program name;|argv[0][0]|shall be the null 
> character if the program name is not available from the host environment."
> 
> So I think we're good here.

Yup, we'll survive, argc > 0 pretty much everywhere (but technically it
isn't even required by POSIX afaics).

Thanks,


Segher


Re: [PATCH 16/55] rs6000: Write output to the builtin definition include file

2021-07-20 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:19:00AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (write_defines_file):
>   Implement.

Okay for trunk.  Thanks!


Segher


Re: [PATCH 17/55] rs6000: Write output to the builtins header file

2021-07-20 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:19:01AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c
>   (write_autogenerated_header): New function.
>   (write_decls): Likewise.
>   (write_extern_fntype): New callback function.
>   (write_header_file): Implement.

> +  /*  Cannot mark this as a GC root because only pointer types can
> + be marked as GTY((user)) and be GC roots.  All trees in here are
> + kept alive by other globals, so not a big deal.  Alternatively,
> + we could change the enum fields to ints and cast them in and out
> + to avoid requiring a GTY((user)) designation, but that seems
> + unnecessarily gross.  */

Quite :-)

Maybe you want to print that as a comment to the generated file as well?
It makes more sense there.

Okay for trunk.  Thanks!


Segher


Re: [PATCH 18/55] rs6000: Write output to the builtins init file, part 1 of 3

2021-07-20 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:19:02AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (write_fntype): New
>   callback function.
>   (write_fntype_init): New stub function.
>   (write_init_bif_table): Likewise.
>   (write_init_ovld_table): New function.
>   (write_init_file): Implement.

> +   /* Check whether we have a "tf" token in this string, representing
> +  a float128_type_node.  It's possible that float128_type_node is
> +  undefined (occurs for -maltivec -mno-vsx, for example), so we
> +  must guard against that.  */

Yeah, this is still a problem :-(

> +   /* Similarly, look for decimal float tokens.  */
> +   int dfp_found = (strstr (ovlds[i].fndecl, "dd") != NULL
> +|| strstr (ovlds[i].fndecl, "td") != NULL
> +|| strstr (ovlds[i].fndecl, "sd") != NULL);

Strange ordering?  It's not alphabetic, it's not by size -- is it
random?

> +   /* The fndecl for an overload is arbitrarily the first one
> +  for the overload.  We sort out the real types when
> +  processing the overload in the gcc front end.  */

Same as before -- please put such comments in the generated file!

Okay for trunk.  Thanks!


Segher


Re: [PATCH 19/55] rs6000: Write output to the builtins init file, part 2 of 3

2021-07-20 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:19:03AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (write_init_bif_table):
>   Implement.

Okido.  Thanks!


Segher


Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-20 Thread Bill Schmidt via Gcc-patches



On 7/20/21 6:22 PM, Segher Boessenkool wrote:

Hi!

On Tue, Jul 20, 2021 at 05:19:54PM -0500, Bill Schmidt wrote:

See the main function.  All three files are guaranteed to have been
opened for writing when this is called, but some of them may have
already been closed.  So the fclose calls may fail to do anything, but
the unlinks will always delete the output files. This is done to avoid
leaving garbage lying around after a parsing failure.

That is much worse actually!  From the C spec:
   The value of a pointer to a FILE object is indeterminate after the
   associated file is closed
so this is undefined behaviour.

Please fix that?  Just assign 0 after closing, and guard the fclose on
error with that?


No, you're misunderstanding.

unlink doesn't use a pointer to a FILE object.  It takes a string 
representing the path and deletes that name from the filesystem. If 
nobody has the file open, the file is then deleted.


In this case the files are all always closed before unlink is called.  
The names are removed from the filesystem, and the files are deleted.  
If somehow the file managed to remain open (really impossible), the file 
would not be deleted, but the name would be.  No undefined behavior.


Thanks,
Bill


+  pgm_path = argv[0];

This isn't portable (depending on what you use it for -- argv[0] is not
necessarily a path at all).

The only thing it's used for is as a documentation string in the output
files, indicating the path to the program that built them. So long as
argv[0] is a NULL-terminated string, which it had better be, this is
harmless.

It is allowed to be a null pointer as well.  (gfortran does not work on
systems that do that, so I wouldn't worry to much about it, but still).


ISO C11:  "If the value of|argc|is greater than zero, the string pointed
to by|argv[0]|represents the program name;|argv[0][0]|shall be the null
character if the program name is not available from the host environment."

So I think we're good here.

Yup, we'll survive, argc > 0 pretty much everywhere (but technically it
isn't even required by POSIX afaics).

Thanks,


Segher


Re: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-20 Thread Hongtao Liu via Gcc-patches
On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak  wrote:
>
> On Tue, Jul 20, 2021 at 2:33 PM liuhongt  wrote:
> >
> > Hi:
> >   As mention in 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> >
> > cut start-
> > > note for the lowpart we can just view-convert away the excess bits,
> > > fully re-using the mask.  We generate surprisingly "good" code:
> > >
> > > kmovb   %k1, %edi
> > > shrb$4, %dil
> > > kmovb   %edi, %k2
> > >
> > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > a mask register alternative for
> > Yes, we can do it similar as kor/kand/kxor.
> > ---cut end
> >
> >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/constraints.md (Wb): New constraint.
> > (Ww): Ditto.
> > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> > shift.
> > (*ashlqi3_1): Ditto.
> > (*3_1): Ditto.
> > (*3_1): Ditto.
> > * config/i386/sse.md (k): New define_split after
> > it to convert generic shift pattern to mask shift ones.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/mask-shift.c: New test.
> > ---
> >  gcc/config/i386/constraints.md | 10 +++
> >  gcc/config/i386/i386.md| 94 +++---
> >  gcc/config/i386/sse.md | 14 
> >  gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++
> >  4 files changed, 173 insertions(+), 28 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
> >
> > diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> > index 485e3f5b2cf..4aa28a5621c 100644
> > --- a/gcc/config/i386/constraints.md
> > +++ b/gcc/config/i386/constraints.md
> > @@ -222,6 +222,16 @@ (define_constraint "BC"
> > (match_operand 0 "vector_all_ones_operand"
> >
> >  ;; Integer constant constraints.
> > +(define_constraint "Wb"
> > +  "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
> > +  (and (match_code "const_int")
> > +   (match_test "IN_RANGE (ival, 0, 7)")))
> > +
> > +(define_constraint "Ww"
> > +  "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
> > +  (and (match_code "const_int")
> > +   (match_test "IN_RANGE (ival, 0, 15)")))
> > +
> >  (define_constraint "I"
> >"Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
> >(and (match_code "const_int")
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 8b809c49fe0..c5f9bd4d4d8 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
> >
> >  ;; Immediate operand constraint for shifts.
> >  (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
> > +(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
> >
> >  ;; Print register name in the specified mode.
> >  (define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
> > @@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl3_1"
> > (set_attr "mode" "")])
> >
> >  (define_insn "*ashl3_1"
> > -  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
> > -   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
> > "0,l,rm")
> > - (match_operand:QI 2 "nonmemory_operand" "c,M,r")))
> > +  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
> > +   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
> > "0,l,rm,k")
> > + (match_operand:QI 2 "nonmemory_operand" 
> > "c,M,r,")))
> > (clobber (reg:CC FLAGS_REG))]
> >"ix86_binary_operator_ok (ASHIFT, mode, operands)"
> >  {
> > @@ -11098,6 +11099,7 @@ (define_insn "*ashl3_1"
> >  {
> >  case TYPE_LEA:
> >  case TYPE_ISHIFTX:
> > +case TYPE_MSKLOG:
> >return "#";
> >
> >  case TYPE_ALU:
> > @@ -3,7 +5,11 @@ (define_insn "*ashl3_1"
> > return "sal{}\t{%2, %0|%0, %2}";
> >  }
> >  }
> > -  [(set_attr "isa" "*,*,bmi2")
> > +  [(set_attr "isa" "*,*,bmi2,avx512bw")
> > (set (attr "type")
> >   (cond [(eq_attr "alternative" "1")
> >   (const_string "lea")
> > @@ -11123,6 +11129,8 @@ (define_insn "*ashl3_1"
> >   (match_operand 0 "register_operand"))
> >  (match_operand 2 "const1_operand"))
> >   (const_string "alu")
> > +   (eq_attr "alternative" "3")
> > + (const_string "msklog")
> >]
> >(const_string "ishift")))
> > (set (attr "length_immediate")
> > @@ -11218,15 +11226,16 @@ (define_split
> >"operands[2] = gen_lowpart (SImode, operands[2]);")
> >
> >  (define_insn "*ashlhi3_1"
> > -  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
> > -   (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
> > -  

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-20 Thread Richard Biener
On Tue, 20 Jul 2021, Richard Biener wrote:

> On Thu, 15 Jul 2021, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > The following extends the existing loop masking support using
> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > > you can now enable masked vectorized epilogues (=1) or fully
> > > masked vector loops (=2).
> > 
> > As mentioned on IRC, WHILE_ULT is supposed to ensure that every
> > element after the first zero is also zero.  That happens naturally
> > for power-of-2 vectors if the start index is a multiple of the VF.
> > (And at the moment, variable-length vectors are the only way of
> > supporting non-power-of-2 vectors.)
> > 
> > This probably works fine for =2 and =1 as things stand, since the
> > vector IVs always start at zero.  But if in future we have a single
> > IV counting scalar iterations, and use it even for peeled prologue
> > iterations, we could end up with a situation where the approximation
> > is no longer safe.
> > 
> > E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
> > If we peeled 2 iterations for alignment and then had a VF of 8,
> > the final vector would have a start index of (uint32_t)-6 and the
> > vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.
> > 
> > So I think it would be safer to handle this as an alternative to
> > using while, rather than as a direct emulation, so that we can take
> > the extra restrictions into account.  Alternatively, we could probably
> > do { 0, 1, 2, ... } < { end - start, end - start, ... }.
> 
> That doesn't end up working since in the last iteration with a
> non-zero mask we'll compare with all underflowed values (start
> will be > end).  So while we compute a correct mask we cannot use
> that for loop control anymore.

Of course I can just use a signed comparison here (until we get
V128QI and a QImode iterator).

Richard.