Add 'gcc/tree.cc:user_omp_clause_code_name' [PR65095]

2022-03-12 Thread Thomas Schwinge
Hi!

On 2022-02-17T13:33:45+0100, I wrote:
> On 2019-10-18T14:28:18+0200, I wrote:
>> On 2019-10-06T15:32:34-0700, Julian Brown  wrote:
>>> This patch adds a function to pretty-print OpenACC clause names from
>>> OMP_CLAUSE_MAP_KINDs, for error output.
>>
>> Indeed talking about (OpenMP) 'map' clauses in an OpenACC context is not
>> quite ideal -- that's what PR65095 is about
>
>>> Previously approved as part of:
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01292.html
>
>
>> A few more comments, for later:
>>
>>>  gcc/c-family/c-common.h |  1 +
>>>  gcc/c-family/c-omp.c| 33 +
>>
>> As I'd mentioned before: 'Eventually (that is, later), this should move
>> into generic code, next to the other "clause printing".
>
> As part of an ICE bug fix that I'm working on, I now need to use
> this in GCC middle end code.

Pushed to master branch commit 828335beb77676acffb5911e575672cb55beb2e9
"Add 'gcc/tree.cc:user_omp_clause_code_name' [PR65095]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 828335beb77676acffb5911e575672cb55beb2e9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 17 Feb 2022 12:46:57 +0100
Subject: [PATCH] Add 'gcc/tree.cc:user_omp_clause_code_name' [PR65095]

Re PR65095 "Adapt OpenMP diagnostic messages for OpenACC", move C/C++
front end 'gcc/c-family/c-omp.cc:c_omp_map_clause_name' to generic
'gcc/tree.cc:user_omp_clause_code_name' .  No functional change.

	PR other/65095
	gcc/
	* tree-core.h (user_omp_claus_code_name): Declare function.
	* tree.cc (user_omp_clause_code_name): New function.
	gcc/c/
	* c-typeck.cc (handle_omp_array_sections_1)
	(c_oacc_check_attachments): Call 'user_omp_clause_code_name'
	instead of 'c_omp_map_clause_name'.
	gcc/cp/
	* semantics.cc (handle_omp_array_sections_1)
	(cp_oacc_check_attachments): Call 'user_omp_clause_code_name'
	instead of 'c_omp_map_clause_name'.
	gcc/c-family/
	* c-common.h (c_omp_map_clause_name): Remove.
	* c-omp.cc (c_omp_map_clause_name): Remove.
---
 gcc/c-family/c-common.h |  1 -
 gcc/c-family/c-omp.cc   | 33 -
 gcc/c/c-typeck.cc   |  4 ++--
 gcc/cp/semantics.cc |  4 ++--
 gcc/tree-core.h |  1 +
 gcc/tree.cc | 36 
 6 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index a8d6f82bb2c..5f0b5d99d07 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1250,7 +1250,6 @@ extern enum omp_clause_default_kind c_omp_predetermined_sharing (tree);
 extern enum omp_clause_defaultmap_kind c_omp_predetermined_mapping (tree);
 extern tree c_omp_check_context_selector (location_t, tree);
 extern void c_omp_mark_declare_variant (location_t, tree, tree);
-extern const char *c_omp_map_clause_name (tree, bool);
 extern void c_omp_adjust_map_clauses (tree, bool);
 
 enum c_omp_directive_kind {
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index cd9d86641e1..777cdc65572 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -2996,39 +2996,6 @@ c_omp_predetermined_mapping (tree decl)
 }
 
 
-/* For OpenACC, the OMP_CLAUSE_MAP_KIND of an OMP_CLAUSE_MAP is used internally
-   to distinguish clauses as seen by the user.  Return the "friendly" clause
-   name for error messages etc., where possible.  See also
-   c/c-parser.cc:c_parser_oacc_data_clause and
-   cp/parser.cc:cp_parser_oacc_data_clause.  */
-
-const char *
-c_omp_map_clause_name (tree clause, bool oacc)
-{
-  if (oacc && OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP)
-switch (OMP_CLAUSE_MAP_KIND (clause))
-{
-case GOMP_MAP_FORCE_ALLOC:
-case GOMP_MAP_ALLOC: return "create";
-case GOMP_MAP_FORCE_TO:
-case GOMP_MAP_TO: return "copyin";
-case GOMP_MAP_FORCE_FROM:
-case GOMP_MAP_FROM: return "copyout";
-case GOMP_MAP_FORCE_TOFROM:
-case GOMP_MAP_TOFROM: return "copy";
-case GOMP_MAP_RELEASE: return "delete";
-case GOMP_MAP_FORCE_PRESENT: return "present";
-case GOMP_MAP_ATTACH: return "attach";
-case GOMP_MAP_FORCE_DETACH:
-case GOMP_MAP_DETACH: return "detach";
-case GOMP_MAP_DEVICE_RESIDENT: return "device_resident";
-case GOMP_MAP_LINK: return "link";
-case GOMP_MAP_FORCE_DEVICEPTR: return "deviceptr";
-default: break;
-}
-  return omp_clause_code_name[OMP_CLAUSE_CODE (clause)];
-}
-
 /* Used to merge map clause information in c_omp_adjust_map_clauses.  */
 struct map_clause
 {
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 54b0b0d369b..c0812de84b4 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -13373,7 +13373,7 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types,
 	{
 	  error_at (OMP_CLAUSE_LOCATION (

Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-12 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-03-11 at 21:26 +, Qing Zhao wrote:
> Hi, Ruoyao,
> 
> (I might not be able to reply to this thread till next Wed due to a
> short vacation).
> 
> First, some comments on opening bugs against Gcc:
> 
> I took a look at the bug reports PR104817 and PR104820:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104820
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104817
> 
> I didn’t see a testing case and a script to repeat the error, so I
> cannot repeat the error at my side.

I've put the test case, but maybe you didn't see it because it is too
simple:

echo 'int t() {}' | /home/xry111/git-repos/gcc-test-mips/gcc/cc1 -nostdinc 
-fzero-call-used-regs=all

An empty function is enough to break -fzero-call-used-regs=all.  And if
you append -mips64r2 to the cc1 command line you'll get 104820.  I
enabled 4 existing tests for MIPS (reported "not work" on MIPS) in the
patch so I think it's unnecessary to add new test cases.

Richard: can we use MIPS_EPILOGUE_TEMP as a scratch register in the
sequence for zeroing the call-used registers, and then zero itself
(despite it's not in need_zeroed_hardregs)?
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086]

2022-03-12 Thread Thomas Schwinge
Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> On 2019-02-01T00:59:30+0100, I wrote:
>> I've just pushed the attached nine patches to openacc-gcc-8-branch:
>> OpenACC 'kernels' construct changes: splitting of the construct into
>> several regions.
>
> Now, slightly more polished, I've pushed to master branch a variant of
> most of these patches combined in commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.
>
> That's still the case...  :-)

> --- /dev/null
> +++ 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
> +/* Hopefully, this is the same issue as 
> '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.
> +   { dg-ice "TODO" }
> +   TODO { dg-prune-output "during GIMPLE pass: omplower" }
> +   TODO { dg-do link } */
> +
> +#undef KERNELS_DECOMPOSE_ICE_HACK
> +#include "declare-vla.c"

Arseny had later reduced that, and filed .
To document the status quo, pushed to master branch
commit 9781ae3a254a8c17ef4ffa70f21ed1728ff3c707
"Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086]",
see attached.


Grüße
 Thomas


> --- /dev/null
> +++ 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
> @@ -0,0 +1,6 @@
> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
> +
> +/* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
> +
> +#define KERNELS_DECOMPOSE_ICE_HACK
> +#include "declare-vla.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
> @@ -38,6 +38,12 @@ f_data (void)
>  for (i = 0; i < N; i++)
>A[i] = -i;
>
> +/* See 'declare-vla-kernels-decompose.c'.  */
> +#ifdef KERNELS_DECOMPOSE_ICE_HACK
> +(volatile int *) &i;
> +(volatile int *) &N;
> +#endif
> +
>  # pragma acc kernels
>  for (i = 0; i < N; i++)
>A[i] = i;


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9781ae3a254a8c17ef4ffa70f21ed1728ff3c707 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 18 Jan 2022 17:22:14 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c'
 [PR104086]

..., currently XFAILed with 'dg-ice', as it runs into
'gcc/omp-low.cc:lower_omp_target':

13125			else if (is_gimple_reg (var))
13126			  {
13127			gcc_assert (offloaded);

This means, the recent PR100280 etc. changes are still not sufficient.

	gcc/testsuite/
	PR middle-end/104086
	* c-c++-common/goacc/kernels-decompose-pr104086-1.c: New file.
---
 .../goacc/kernels-decompose-pr104086-1.c  | 25 +++
 1 file changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
new file mode 100644
index 000..eab10cf6c72
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
@@ -0,0 +1,25 @@
+/* Reduced from 'libgomp.oacc-c-c++-common/declare-vla.c'.  */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {during GIMPLE pass: omplower} } */
+
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+void
+foo (void)
+{
+#pragma acc data /* { dg-line l_data1 } */
+  /* { dg-bogus {note: variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {TODO 'data'} { xfail *-*-* } l_data1 } */
+  {
+int i;
+
+#pragma acc kernels
+/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+i = 0;
+  }
+}
-- 
2.34.1



OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086]

2022-03-12 Thread Thomas Schwinge
Hi!

On 2022-03-12T13:38:38+0100, I wrote:
> On 2020-11-13T23:22:30+0100, I wrote:
>> On 2019-02-01T00:59:30+0100, I wrote:
>>> I've just pushed the attached nine patches to openacc-gcc-8-branch:
>>> OpenACC 'kernels' construct changes: splitting of the construct into
>>> several regions.
>>
>> Now, slightly more polished, I've pushed to master branch a variant of
>> most of these patches combined in commit
>> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
>> constructs into parts, a sequence of compute constructs", see attached.
>>
>>> There's more work to be done there, and we're aware of a number of TODO
>>> items, but nevertheless: it's a good first step.
>>
>> That's still the case...  :-)
>
>> --- /dev/null
>> +++ 
>> b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
>> @@ -0,0 +1,8 @@
>> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
>> +/* Hopefully, this is the same issue as 
>> '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.

(Related, but not the same.)

>> +   { dg-ice "TODO" }
>> +   TODO { dg-prune-output "during GIMPLE pass: omplower" }
>> +   TODO { dg-do link } */
>> +
>> +#undef KERNELS_DECOMPOSE_ICE_HACK
>> +#include "declare-vla.c"
>
> Arseny had later reduced that, and filed .
> To document the status quo, pushed to master branch
> commit 9781ae3a254a8c17ef4ffa70f21ed1728ff3c707
> "Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086]"

>> --- /dev/null
>> +++ 
>> b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
>> @@ -0,0 +1,6 @@
>> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
>> +
>> +/* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
>> +
>> +#define KERNELS_DECOMPOSE_ICE_HACK
>> +#include "declare-vla.c"

>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
>> @@ -38,6 +38,12 @@ f_data (void)
>>  for (i = 0; i < N; i++)
>>A[i] = -i;
>>
>> +/* See 'declare-vla-kernels-decompose.c'.  */
>> +#ifdef KERNELS_DECOMPOSE_ICE_HACK
>> +(volatile int *) &i;
>> +(volatile int *) &N;
>> +#endif
>> +
>>  # pragma acc kernels
>>  for (i = 0; i < N; i++)
>>A[i] = i;

Pushed to master branch commit 337ed336d7dd83526891bdb436f0bfe9e351f69d
"OpenACC 'kernels' decomposition: Mark variables used in 'present'
clauses as addressable [PR100280, PR104086]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 337ed336d7dd83526891bdb436f0bfe9e351f69d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 17 Feb 2022 14:18:57 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Mark variables used in
 'present' clauses as addressable [PR100280, PR104086]

... like in recent commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized
data clauses as addressable [PR100280]".  Otherwise, we may run into
'gcc/omp-low.cc:lower_omp_target':

13125   else if (is_gimple_reg (var))
13126 {
13127   gcc_assert (offloaded);

	PR middle-end/100280
	PR middle-end/104086
	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Mark variables used in 'present' clauses as addressable.
	* omp-low.cc (scan_sharing_clauses) : Gracefully
	handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust,
	extend.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Merge this...
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	..., and this...
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into
	this, and adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Extend.
---
 gcc/omp-low.cc| 27 +---
 gcc/omp-oacc-kernels-decompose.cc | 32 +
 .../goacc/kernels-decompose-pr104086-1.c  | 37 +--
 .../declare-vla-kernels-decompose-ice-1.c | 22 ---
 .../declare-vla-kernels-decompose.c   | 29 
 .../libgomp.oacc-c-c++-common/declare-vla.c   | 38 ++-
 .../kernels-decompose-1.c | 66 ++-
 7 files changed, 168 insertions(+), 83 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d932d74cb03..cfc63d6a104 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.c

Enhance further testcases to verify handling of OpenACC privatization level [PR90115]

2022-03-12 Thread Thomas Schwinge
Hi!

On 2021-05-21T21:29:19+0200, I wrote:
> I've pushed "[OpenACC privatization] Largely extend diagnostics and
> corresponding testsuite coverage [PR90115]" to master branch in commit
> 11b8286a83289f5b54e813f14ff56d730c3f3185

To demonstrate that later changes don't vs. how they do change things,
pushed to master branch commit 2e53fa7bb2ae9fe1152c27e423be9e261da82ddc
"Enhance further testcases to verify handling of OpenACC privatization
level [PR90115]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2e53fa7bb2ae9fe1152c27e423be9e261da82ddc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 11 Mar 2022 15:10:59 +0100
Subject: [PATCH] Enhance further testcases to verify handling of OpenACC
 privatization level [PR90115]

As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".

	PR middle-end/90115
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
---
 .../libgomp.oacc-c-c++-common/default-1.c |  32 ++-
 .../kernels-reduction-1.c |  14 +-
 .../libgomp.oacc-c-c++-common/parallel-dims.c | 261 +++---
 .../kernels-reduction-1.f90   |  14 +-
 4 files changed, 266 insertions(+), 55 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
index 1ac0b9587b9..0ac8d7132d4 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
@@ -1,4 +1,18 @@
-/* { dg-do run } */
+/* { dg-additional-options "-fopt-info-all-omp" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+   { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
 
 #include  
 
@@ -13,10 +27,15 @@ int test_parallel ()
 ary[i] = ~0;
 
   /* val defaults to firstprivate, ary defaults to copy.  */
-#pragma acc parallel num_gangs (32) copy (ok) copy(ondev)
+#pragma acc parallel num_gangs (32) copy (ok) copy(ondev) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
 ondev = acc_on_device (acc_device_not_host);
-#pragma acc loop gang(static:1)
+/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { c++ && { ! __OPTIMIZE__ } } } .-1 }
+   ..., as without optimizations, we're not inlining the C++ 'acc_on_device' wrapper.  */
+#pragma acc loop gang(static:1) /* { dg-line l_loop_i[incr c_loop_i] } */
+/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+/* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
 for (unsigned i = 0; i < 32; i++)
   {
 	if (val != 2)
@@ -51,10 +70,13 @@ int test_kernels ()
 ary[i] = ~0;
 
   /* val defaults to copy, ary defaults to copy.  */
-#pragma acc kernels copy(ondev)
+#pragma acc kernels copy(ondev) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
   {
 ondev = acc_on_device (acc_device_not_host);
-#pragma acc loop 
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressabl

OpenACC 'kernels' decomposition: wrong-code cases unless manually making certain variables addressable [PR104892]

2022-03-12 Thread Thomas Schwinge
Hi!

On 2022-03-01T17:46:20+0100, I wrote:
> On 2022-01-13T10:54:16+0100, I wrote:
>> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>>  - The "addressable" bit is set during the kernels conversion pass for
>>>variables that have "create" (alloc) clauses created for them in the
>>>synthesised outer data region (instead of in the front-end, etc.,
>>>where it can't be done accurately). Such variables actually have
>>>their address taken during transformations made in a later pass
>>>(omp-low, I think), but there's a phase-ordering problem that means
>>>the flag should be set earlier.
>>
>> The actual issue is a bit different, but yes, there is a problem.
>> The related ICE has also been reported as 
>> "ICE in lower_omp_target, at omp-low.c:12287".  [...]

We've resolved all such known ICEs -- but still have open
 "OpenACC 'kernels' decomposition:
wrong-code cases unless manually making certain variables addressable".
This is avoided by:

> workaround patches like
> we have on the og11 development branch:
>   - "Avoid introducing 'create' mapping clauses for loop index variables in 
> kernels regions",
>   - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings 
> synchronously",
>   - "Fix for is_gimple_reg vars to 'data kernels'"

..., but the misbehavior is visible without the workaround patches, for
example on the master branch.

Pushed to master branch commit 535afbd959bc72de85fca36ba6417f075cca1018
"OpenACC 'kernels' decomposition: wrong-code cases unless manually making
certain variables addressable [PR104892]", see attached, to "Document a
few examples of the status quo".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 535afbd959bc72de85fca36ba6417f075cca1018 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 11 Mar 2022 15:11:25 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: wrong-code cases unless
 manually making certain variables addressable [PR104892]

Document a few examples of the status quo.

	PR middle-end/104892
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point
	to PR104892.
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise,
	enable '--param=openacc-kernels=decompose' and adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
	Likewise.
---
 .../libgomp.oacc-c-c++-common/default-1.c | 14 ++--
 .../kernels-decompose-1.c |  4 +--
 .../kernels-reduction-1.c |  8 -
 .../libgomp.oacc-c-c++-common/parallel-dims.c | 34 +--
 .../kernels-reduction-1.f90   | 15 +++-
 5 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
index 0ac8d7132d4..fed65c8dccc 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-fopt-info-all-omp" }
{ dg-additional-options "-foffload=-fopt-info-all-omp" } */
 
@@ -63,6 +65,8 @@ int test_parallel ()
 int test_kernels ()
 {
   int val = 2;
+  /*TODO  */
+  (volatile int *) &val;
   int ary[32];
   int ondev = 0;
 
@@ -71,12 +75,18 @@ int test_kernels ()
 
   /* val defaults to copy, ary defaults to copy.  */
 #pragma acc kernels copy(ondev) /* { dg-line l_compute[incr c_compute] } */
-  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
-  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'ondev\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'val\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
+/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
 ondev = acc_on_device (acc_device_not_host);
+/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { c++ && { ! __OPTIMIZE__ } } } .-1 }
+   ..., as without optimizations, we're not inlining the C++ 'acc_on_device' wrapper.  */
 #pragma acc loop /* { dg-line l_loop_i[incr c_

OpenACC 'kernels' decomposition: resolve wrong-code cases unless manually making certain variables addressable [PR100280, PR104892]

2022-03-12 Thread Thomas Schwinge
Hi!

On 2022-03-12T15:54:31+0100, I wrote:
> On 2022-03-01T17:46:20+0100, I wrote:
>> On 2022-01-13T10:54:16+0100, I wrote:
>>> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
  - The "addressable" bit is set during the kernels conversion pass for
variables that have "create" (alloc) clauses created for them in the
synthesised outer data region (instead of in the front-end, etc.,
where it can't be done accurately). Such variables actually have
their address taken during transformations made in a later pass
(omp-low, I think), but there's a phase-ordering problem that means
the flag should be set earlier.
>>>
>>> The actual issue is a bit different, but yes, there is a problem.
>>> The related ICE has also been reported as 
>>> "ICE in lower_omp_target, at omp-low.c:12287".  [...]
>
> We've resolved all such known ICEs -- but still have open
>  "OpenACC 'kernels' decomposition:
> wrong-code cases unless manually making certain variables addressable".
> This is avoided by:
>
>> workaround patches like
>> we have on the og11 development branch:
>>   - "Avoid introducing 'create' mapping clauses for loop index variables in 
>> kernels regions",
>>   - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings 
>> synchronously",
>>   - "Fix for is_gimple_reg vars to 'data kernels'"
>
> ..., but the misbehavior is visible without the workaround patches, for
> example on the master branch.

..., and to avoid the need for those (thus, please remove for
the upcoming og12 branch, Kwok), pushed to master branch
commit a07b8f4fb756484893b5612cbe9410970dc76db9
"OpenACC 'kernels' decomposition: resolve wrong-code cases unless
manually making certain variables addressable [PR100280, PR104892]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a07b8f4fb756484893b5612cbe9410970dc76db9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 11 Mar 2022 22:31:51 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: resolve wrong-code cases
 unless manually making certain variables addressable [PR100280, PR104892]

Currently in OpenACC 'kernels' decomposition, there is special handling of
'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler
errors in later passes".  For performance reasons, the current repetitive
to/from device copying for every region is not ideal, compared to using
'present' clauses, as done for almost all other 'GOMP_MAP_*'.  Also, the
current special handling (incomplete, evidently) is the reason for the PR104892
misbehavior.  For PR100280 etc. we've resolved all such known ICEs -- removing
the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892.

	PR middle-end/100280
	PR middle-end/104892
	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Remove special handling of 'GOMP_MAP_FORCE_TOFROM'.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
	Likewise.
---
 gcc/omp-oacc-kernels-decompose.cc |  1 -
 .../c-c++-common/goacc/kernels-decompose-2.c  | 32 +
 .../goacc/kernels-decompose-pr100400-1-1.c|  2 ++
 .../goacc/kernels-decompose-pr100400-1-2.c|  2 ++
 .../goacc/kernels-decompose-pr100400-1-3.c|  2 ++
 .../goacc/kernels-decompose-pr100400-1-4.c|  2 ++
 .../goacc/kernels-decompose-pr104061-1-1.c|  2 ++
 .../goacc/kernels-decompose-pr104061-1-2.c

Re: [PATCH, V3] PR target/99708- Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__

2022-03-12 Thread Segher Boessenkool
On Fri, Mar 11, 2022 at 05:51:05PM -0500, Michael Meissner wrote:
> On Fri, Mar 11, 2022 at 02:51:23PM -0600, Segher Boessenkool wrote:
> > On Fri, Mar 11, 2022 at 08:42:27PM +, Joseph Myers wrote:
> > > The version of this patch applied to GCC 10 branch (commit 
> > > 641b407763ecfee5d4ac86d8ffe9eb1eeea5fd10) has broken the glibc build for 
> > > powerpc64le-linux-gnu (it's fine with GCC 11 branch and master, just GCC 
> > > 10 branch is broken) 
> > 
> > Mike, please revert it then?
> 
> Ok, I will revert both the GCC 11 and GCC 10 backport once I make sure the fix
> builds.  Sorry about that.  Obviously, we will want to backport whatever we do
> shortly to the older branches.

Yes, except the "shortly".  Experience shows that we need more thorough
testing for this, which means we need to wait longer, so that other
projects have time to do their testing.  Rushing things only causes more
problems :-/


Segher


[x86 PATCH] Fix libitm.c/memset-1.c test fails with new peephole2s.

2022-03-12 Thread Roger Sayle

My sincere apologies for the breakage, but alas handling SImode in the
recently added "xorl;movb -> movzbl" peephole2 turns out to be slightly
more complicated that just using SWI48 as a mode iterator.  I'd failed
to check the machine description carefully, but the *zero_extendsi2
define_insn is conditionally defined, based on x86 target tuning using
TARGET_ZERO_EXTEND_WITH_AND, and therefore unavailable on 486 and pentium
unless optimizing the code for size.  It turns out that the libitm testsuite
specifies -m486 with make check RUNTESTFLAGS="--target_board='unix{-m32}'"
and therefore encounters/catches my oversight.

Fixed by adding the appropriate conditions to the new peephole2 patterns.
It don't think it's worth the effort to provide an equivalent optimization
for
these (very) old architectures.

Tested on x86_64-pc-linux-gnu with make bootstrap and make -k check
with no new failures.  Confirmed using RUNTESTFLAGS that this
fixes the above failure, and the recently added testcase with -march=i486.
Ok for mainline?


2022-03-12  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (peephole2 xorl;movb -> movzbl): Disable
transformation when *zero_extendsi2 is not available.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr98335.c: Skip this test if tuning for i486
or pentium, and not optimizing for size.


Sorry for the noise.
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c8fbf60..80b5974 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4316,7 +4316,10 @@
  (clobber (reg:CC FLAGS_REG))])
(set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
(match_operand:SWI12 2 "nonimmediate_operand"))]
-  "REGNO (operands[0]) == REGNO (operands[1])"
+  "REGNO (operands[0]) == REGNO (operands[1])
+   && ( != 4
+   || !TARGET_ZERO_EXTEND_WITH_AND
+   || !optimize_function_for_speed_p (cfun))"
   [(set (match_dup 0) (zero_extend:SWI48 (match_dup 2)))])
 
 ;; Likewise, but preserving FLAGS_REG.
@@ -4324,7 +4327,10 @@
   [(set (match_operand:SWI48 0 "general_reg_operand") (const_int 0))
(set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
(match_operand:SWI12 2 "nonimmediate_operand"))]
-  "REGNO (operands[0]) == REGNO (operands[1])"
+  "REGNO (operands[0]) == REGNO (operands[1])
+   && ( != 4
+   || !TARGET_ZERO_EXTEND_WITH_AND
+   || !optimize_function_for_speed_p (cfun))"
   [(set (match_dup 0) (zero_extend:SWI48 (match_dup 2)))])
 
 ;; Sign extension instructions
diff --git a/gcc/testsuite/gcc.target/i386/pr98335.c 
b/gcc/testsuite/gcc.target/i386/pr98335.c
index 7fa7ad7..bf731b4 100644
--- a/gcc/testsuite/gcc.target/i386/pr98335.c
+++ b/gcc/testsuite/gcc.target/i386/pr98335.c
@@ -1,5 +1,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-skip-if "" { *-*-* } { "-march=i[45]86" } { "-O[sz]" } } */
+/* { dg-skip-if "" { *-*-* } { "-march=pentium" } { "-O[sz]" } } */
+/* { dg-skip-if "" { *-*-* } { "-mtune=i[45]86" } { "-O[sz]" } } */
+/* { dg-skip-if "" { *-*-* } { "-mtune=pentium" } { "-O[sz]" } } */
+
 union Data { char a; short b; };
 
 char c;


[PATCH] c++: Fix up cp_parser_skip_to_pragma_eol [PR104623]

2022-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

We ICE on the following testcase, because we tentatively parse it multiple
times and the erroneous attribute syntax results in
cp_parser_skip_to_end_of_statement, which when seeing CPP_PRAGMA (can be
any deferred one, OpenMP/OpenACC/ivdep etc.) it calls
cp_parser_skip_to_pragma_eol, which calls cp_lexer_purge_tokens_after.
That call purges all the tokens from CPP_PRAGMA until CPP_PRAGMA_EOL,
excluding the initial CPP_PRAGMA though (but including the final
CPP_PRAGMA_EOL).  This means the second time we parse this, we see
CPP_PRAGMA with no tokens after it from the pragma, most importantly
not the CPP_PRAGMA_EOL, so either if it is the last pragma in the TU,
we ICE, or if there are other pragmas we treat everything in between
as a pragma.

I've tried various things, including making the CPP_PRAGMA token
itself also purged, or changing the cp_parser_skip_to_end_of_statement
(and cp_parser_skip_to_end_of_block_or_statement) to call it with
NULL instead of token, so that this purging isn't done there,
but each patch resulted in lots of regressions.
But removing the purging altogether surprisingly doesn't regress anything,
and I think it is the right thing, if we e.g. parse tentatively, why can't
we parse the pragma multiple times or at least skip over it?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-12  Jakub Jelinek  

PR c++/104623
* parser.cc (cp_parser_skip_to_pragma_eol): Don't purge any tokens.

* g++.dg/gomp/pr104623.C: New test.

--- gcc/cp/parser.cc.jj 2022-03-11 13:11:53.622094878 +0100
+++ gcc/cp/parser.cc2022-03-11 14:45:36.877647173 +0100
@@ -4111,8 +4111,6 @@ cp_parser_skip_to_pragma_eol (cp_parser*
 
   if (pragma_tok)
 {
-  /* Ensure that the pragma is not parsed again.  */
-  cp_lexer_purge_tokens_after (parser->lexer, pragma_tok);
   parser->lexer->in_pragma = false;
   if (parser->lexer->in_omp_attribute_pragma
  && cp_lexer_next_token_is (parser->lexer, CPP_EOF))
--- gcc/testsuite/g++.dg/gomp/pr104623.C.jj 2022-03-11 14:22:15.724288282 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr104623.C2022-03-11 14:22:06.746413835 
+0100
@@ -0,0 +1,9 @@
+// PR c++/104623
+// { dg-do compile }
+
+void
+foo ()
+{
+  struct __attribute__() a // { dg-error "expected primary-expression 
before" }
+  #pragma omp task
+}

Jakub



[PATCH] i386: Fix up _mm_loadu_si{16,32} [PR99754]

2022-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

These intrinsics are supposed to do an unaligned may_alias load
of a 16-bit or 32-bit value and store it as the first element of
a 128-bit integer vector, with all other elements cleared.

The current _mm_storeu_* implementation implements that correctly, uses
__*_u types to do the store and extracts the first element of a vector into
it.
But _mm_loadu_si{16,32} gets it all wrong.  It performs an aligned
non-may_alias load and because _mm_set_epi{16,32} has the args reversed,
it also inserts it into the last vector element instead of first.

The following patch fixes that, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk and affected release branches?

Note, while the Intrinsics guide for _mm_loadu_si32 says SSE2,
for _mm_loadu_si16 it says strangely SSE.  But the intrinsics
returns __m128i, which is only defined in emmintrin.h, and
_mm_set_epi16 is also only SSE2 and later in emmintrin.h.
Even clang defines it in emmintrin.h and ends up with inlining
failure when calling _mm_loadu_si16 from sse,no-sse2 function.
So, isn't that a bug in the intrinsic guide instead?

2022-03-12  Jakub Jelinek  

PR target/99754
* config/i386/emmintrin.h (_mm_loadu_si32): Put loaded value into
first   rather than last element of the vector, use __m32_u to do
a really unaligned load, use just 0 instead of (int)0.
(_mm_loadu_si16): Put loaded value into first rather than last
element of the vector, use __m16_u to do a really unaligned load,
use just 0 instead of (short)0.

* gcc.target/i386/pr99754-1.c: New test.
* gcc.target/i386/pr99754-2.c: New test.

--- gcc/config/i386/emmintrin.h.jj  2022-01-11 23:11:21.766298923 +0100
+++ gcc/config/i386/emmintrin.h 2022-03-11 16:47:24.789884825 +0100
@@ -718,14 +718,13 @@ _mm_loadu_si64 (void const *__P)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadu_si32 (void const *__P)
 {
-  return _mm_set_epi32 (*(int *)__P, (int)0, (int)0, (int)0);
+  return _mm_set_epi32 (0, 0, 0, (*(__m32_u *)__P)[0]);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadu_si16 (void const *__P)
 {
-  return _mm_set_epi16 (*(short *)__P, (short)0, (short)0, (short)0,
-   (short)0, (short)0, (short)0, (short)0);
+  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, (*(__m16_u *)__P)[0]);
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
--- gcc/testsuite/gcc.target/i386/pr99754-1.c.jj2022-03-11 
16:43:30.621120896 +0100
+++ gcc/testsuite/gcc.target/i386/pr99754-1.c   2022-03-11 16:43:18.250291856 
+0100
@@ -0,0 +1,20 @@
+/* PR target/99754 */
+/* { dg-do run } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+#include "sse2-check.h"
+#include 
+
+static void
+sse2_test (void)
+{
+  union { unsigned char buf[32]; long long ll; } u;
+  u.buf[1] = 0xfe;
+  u.buf[2] = 0xca;
+  u.buf[17] = 0xaa;
+  u.buf[18] = 0x55;
+  _mm_storeu_si16 (&u.buf[17], _mm_loadu_si16 (&u.buf[1]));
+  if (u.buf[17] != 0xfe || u.buf[18] != 0xca)
+abort ();
+}
--- gcc/testsuite/gcc.target/i386/pr99754-2.c.jj2022-03-11 
16:43:41.701967763 +0100
+++ gcc/testsuite/gcc.target/i386/pr99754-2.c   2022-03-11 16:45:16.391659211 
+0100
@@ -0,0 +1,24 @@
+/* PR target/99754 */
+/* { dg-do run } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+#include "sse2-check.h"
+#include 
+
+static void
+sse2_test (void)
+{
+  union { unsigned char buf[32]; long long ll; } u;
+  u.buf[1] = 0xbe;
+  u.buf[2] = 0xba;
+  u.buf[3] = 0xfe;
+  u.buf[4] = 0xca;
+  u.buf[17] = 0xaa;
+  u.buf[18] = 0x55;
+  u.buf[19] = 0xaa;
+  u.buf[20] = 0x55;
+  _mm_storeu_si32 (&u.buf[17], _mm_loadu_si32 (&u.buf[1]));
+  if (u.buf[17] != 0xbe || u.buf[18] != 0xba || u.buf[19] != 0xfe || u.buf[20] 
!= 0xca)
+abort ();
+}

Jakub



[PATCH] lra: Fix up debug_p handling in lra_substitute_pseudo [PR104778]

2022-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs on powerpc-linux, because lra_substitute_pseudo
substitutes (const_int 1) into a subreg operand.  First a subreg of subreg
of a reg appears in a debug insn (which surely is invalid outside of
debug insns, but in debug insns we allow even what is normally invalid in
RTL like subregs which the target doesn't like, because either dwarf2out
is able to handle it, or we just throw away the location expression,
making some var .

lra_substitute_pseudo already has some code to deal with specifically
SUBREG of REG with the REG being substituted for VOIDmode constant,
but that doesn't cover this case, so the following patch extends
lra_substitute_pseudo for debug_p mode to treat stuff like e.g.
combiner's subst function to ensure we don't lose mode which is essential
for the IL.

Bootstrapped/regtested on {powerpc64{,le},x86_64,i686}-linux, ok for trunk?

2022-03-12  Jakub Jelinek  

PR debug/104778
* lra.cc (lra_substitute_pseudo): For debug_p mode, simplify
SUBREG, ZERO_EXTEND, SIGN_EXTEND, FLOAT or UNSIGNED_FLOAT if recursive
call simplified the first operand into VOIDmode constant.

* gcc.target/powerpc/pr104778.c: New test.

--- gcc/lra.cc.jj   2022-02-04 14:36:55.375600131 +0100
+++ gcc/lra.cc  2022-03-11 18:47:15.555025540 +0100
@@ -2015,8 +2015,39 @@ lra_substitute_pseudo (rtx *loc, int old
 {
   if (fmt[i] == 'e')
{
- if (lra_substitute_pseudo (&XEXP (x, i), old_regno,
-new_reg, subreg_p, debug_p))
+ if (debug_p
+ && i == 0
+ && (code == SUBREG
+ || code == ZERO_EXTEND
+ || code == SIGN_EXTEND
+ || code == FLOAT
+ || code == UNSIGNED_FLOAT))
+   {
+ rtx y = XEXP (x, 0);
+ if (lra_substitute_pseudo (&y, old_regno,
+new_reg, subreg_p, debug_p))
+   {
+ result = true;
+ if (CONST_SCALAR_INT_P (y))
+   {
+ if (code == SUBREG)
+   y = simplify_subreg (GET_MODE (x), y,
+GET_MODE (SUBREG_REG (x)),
+SUBREG_BYTE (x));
+ else
+   y = simplify_unary_operation (code, GET_MODE (x), y,
+ GET_MODE (XEXP (x, 0)));
+ if (y)
+   *loc = y;
+ else
+   *loc = gen_rtx_CLOBBER (GET_MODE (x), const0_rtx);
+   }
+ else
+   XEXP (x, 0) = y;
+   }
+   }
+ else if (lra_substitute_pseudo (&XEXP (x, i), old_regno,
+ new_reg, subreg_p, debug_p))
result = true;
}
   else if (fmt[i] == 'E')
--- gcc/testsuite/gcc.target/powerpc/pr104778.c.jj  2022-03-11 
18:54:09.455264514 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr104778.c 2022-03-11 18:52:57.216269904 
+0100
@@ -0,0 +1,51 @@
+/* PR debug/104778 */
+/* { dg-do compile } */
+/* { dg-options "-mcmpb -Og -g" } */
+/* { dg-additional-options "-fpie" { target pie } } */
+
+unsigned long long int p;
+short int m, n;
+
+void
+foo (double u, int v, int x, int y, int z)
+{
+  long long int a = v;
+  short int b = v;
+  int c = 0, d = m, e = u;
+
+  if (n)
+{
+  int q = b;
+
+  while (p / 1.0)
+c = 0;
+
+  if (n * n == (d + 1) / (1LL << x))
+a = 1;
+
+  b = u;
+  while (d)
+{
+  u = m + 1ULL;
+  b = a - (unsigned long long int) u + a + (char) (u + 1.0);
+  d = (v - 1LL) * n / d + q + x;
+  q = m;
+}
+}
+
+  while (c < 1)
+{
+  int r;
+
+  if (m == y)
+m = e * z;
+
+  e = !a;
+
+  while (!r)
+;
+
+  if (b)
+m = d;
+}
+}

Jakub



[PATCH] PR middle-end/104885: Fix ICE with large stack frame on powerpc64.

2022-03-12 Thread Roger Sayle

My recent testcase for PR c++/84964.C stress tests the middle-end by
attempting to pass a UINT_MAX sized structure on the stack.  Although
my fix to PR84964 avoids the ICE after sorry() on x86_64 and similar
targets, a related issue still exists on powerpc64 (and similar
ACCUMULATE_OUTGOING_ARGS/ARGS_GROW_DOWNWARD targets) which don't
issue a "sorry, unimplemented" message, but instead ICE elsewhere.

After attempting several alternate fixes, the simplest solution is
to just defensively check in mark_stack_region_used that the upper
bound of the region lies within the allocated stack_usage_map
array, which is of size highest_outgoing_arg_in_use.  When this isn't
the case, the code now follows the same path as for variable sized
regions, and uses stack_usage_watermark rather than a map.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check to confirm there are no surprises, and with the cc1plus
of a cross-compiler to powerpc64-linux-gnu to confirm the new test
case no longer ICEs.  Ok for mainline?


2022-03-12  Roger Sayle  

gcc/ChangeLog
PR middle-end/104885
* calls.cc (mark_stack_region_used): Check that the region
is within the allocated size of stack_usage_map.


Thanks in advance,
Roger
--

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 50fa7b8..1ca96e7 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -201,7 +201,8 @@ mark_stack_region_used (poly_uint64 lower_bound, 
poly_uint64 upper_bound)
 {
   unsigned HOST_WIDE_INT const_lower, const_upper;
   const_lower = constant_lower_bound (lower_bound);
-  if (upper_bound.is_constant (&const_upper))
+  if (upper_bound.is_constant (&const_upper)
+  && const_upper <= highest_outgoing_arg_in_use)
 for (unsigned HOST_WIDE_INT i = const_lower; i < const_upper; ++i)
   stack_usage_map[i] = 1;
   else


New French PO file for 'gcc' (version 12.1-b20220213)

2022-03-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-12.1-b20220213.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH] wwwdocs: fedora-devel-list archives changes

2022-03-12 Thread Gerald Pfeifer
I have *NOT* pushed this yet, looking for feedback:

It appears redhat.com has lost Fedora mailing list archives, which are
now at lists.fedoraproject.org using completely different tooling.

Jakub, is there a better way than the patch below?

Gerald

diff --git a/htdocs/gcc-4.3/porting_to.html b/htdocs/gcc-4.3/porting_to.html
index 630290ce..5301729f 100644
--- a/htdocs/gcc-4.3/porting_to.html
+++ b/htdocs/gcc-4.3/porting_to.html
@@ -527,7 +527,7 @@ svn diff -r529854:529855 
http://svn.apache.org/repos/asf/ant/core/trunk/src/main
 
 
 Jakub Jelinek,
-https://listman.redhat.com/archives/fedora-devel-list/2008-January/msg00128.html";>
+https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/WV3KUDEP2JNOWGWES42RQZFYFNLFLAMJ/";>
 Mass rebuild status with gcc-4.3.0-0.4 of rawhide-20071220
 
 


Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.

2022-03-12 Thread Marc Glisse via Gcc-patches

On Fri, 11 Mar 2022, Roger Sayle wrote:

+(match vec_same_elem_p
+  CONSTRUCTOR@0
+  (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
+? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0

Ah, I didn't remember we needed that, we don't seem to be very consistent 
about it. Probably for this reason, the transformation "Prefer vector1 << 
scalar to vector1 << vector2" does not match


typedef int vec __attribute__((vector_size(16)));
vec f(vec a, int b){
  vec bb = { b, b, b, b };
  return a << bb;
}

which is only optimized at vector lowering time.

+/* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
+(for plusminus (plus minus)
+  (simplify
+(plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
+(plusminus (mult (vec_perm @1 @1 @3) @2) @4)))

Don't you want :s on mult and vec_perm?

--
Marc Glisse


Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-12 Thread Xi Ruoyao via Gcc-patches
On Sat, 2022-03-12 at 18:48 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2022-03-11 at 21:26 +, Qing Zhao wrote:
> > Hi, Ruoyao,
> > 
> > (I might not be able to reply to this thread till next Wed due to a
> > short vacation).
> > 
> > First, some comments on opening bugs against Gcc:
> > 
> > I took a look at the bug reports PR104817 and PR104820:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104820
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104817
> > 
> > I didn’t see a testing case and a script to repeat the error, so I
> > cannot repeat the error at my side.
> 
> I've put the test case, but maybe you didn't see it because it is too
> simple:
> 
> echo 'int t() {}' | /home/xry111/git-repos/gcc-test-mips/gcc/cc1 -
> nostdinc -fzero-call-used-regs=all
> 
> An empty function is enough to break -fzero-call-used-regs=all.  And
> if
> you append -mips64r2 to the cc1 command line you'll get 104820.  I
> enabled 4 existing tests for MIPS (reported "not work" on MIPS) in the
> patch so I think it's unnecessary to add new test cases.
> 
> Richard: can we use MIPS_EPILOGUE_TEMP as a scratch register in the
> sequence for zeroing the call-used registers, and then zero itself
> (despite it's not in need_zeroed_hardregs)?

No, it leads to an ICE at stage 3 bootstrapping :(.

Now I think the only rational ways are:

(1) allow zeroing more registers than need_zeroed_hardregs.

Or

(2) allow zeroing less registers than need_zeroed_hardregs (then I'll
skip ST_REGS, after all they are just 8 bits in total).

If all these are unacceptable, then

(3) I'll just call sorry in MIPS target hook to tell not to use this
feature on MIPS.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


New Swedish PO file for 'gcc' (version 12.1-b20220213)

2022-03-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-12.1-b20220213.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.