Hi!

On 2020-04-02T11:12:48+0200, Richard Biener <rguent...@suse.de> wrote:
> On Wed, 1 Apr 2020, Jason Merrill wrote:
>
>> On 4/1/20 9:36 AM, Richard Biener wrote:
>> > This does away with enabling -ffinite-loops at -O2+ for all languages
>> > and instead enables it selectively for C++ only.

> I'm retesting the following [...]

..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
"c/94392 - only enable -ffinite-loops for C++".

I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b
'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713,
PR94392]'.  As can be observed in two nvptx offloading test cases
regressing, 'apparently now again "empty oacc loops are" no longer
"removed before expand"' (quoting myself from the commit log, adapting
Tom's commit log snippet from the reverted commit).

It's not obvious to me how the "finite loop" property discussed/changed
in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 -
only enable -ffinite-loops for C++" relates to the previously observed
optimization of removing "empty oacc loops [...] before expand" (after
PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR
tree-optimization/89713 - Assume loop with an exit is finite"), but
examining that in detail is for another day.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 4f6a0888de52a2e523a6fd4235fe7f8193819c3b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Fri, 3 Apr 2020 10:07:16 +0200
Subject: [PATCH] Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases"
 [PR89713, PR94392]

In response to PR94392 commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
"c/94392 - only enable -ffinite-loops for C++", this reverts PR89713
commit 00908992f2a78f213d227aea8dbab014a1361df0, as apparently now again
"empty oacc loops are" no longer "removed before expand".

	libgomp/
	PR tree-optimization/89713
	PR c/94392
	* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect
	'bar.sync'.
	* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
---
 libgomp/ChangeLog                             |  8 ++++++++
 .../libgomp.oacc-c-c++-common/pr85381-2.c     | 20 ++++++++++++++++++-
 .../libgomp.oacc-c-c++-common/pr85381-4.c     |  5 ++++-
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 6c437930b02f..3716f559aa1c 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,11 @@
+2020-04-03  Thomas Schwinge  <tho...@codesourcery.com>
+
+	PR tree-optimization/89713
+	PR c/94392
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect
+	'bar.sync'.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
+
 2020-03-31  Tobias Burnus  <tob...@codesourcery.com>
 
 	* target.c (GOMP_target_enter_exit_data): Handle PSET/MAP_POINTER.
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
index 2cb5b95949de..6570c64afff5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
@@ -15,4 +15,22 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bar.sync" 0 } } */
+/* Todo: Boths bar.syncs can be removed.
+   Atm we generate this dead code inbetween forked and joining:
+
+                     mov.u32 %r28, %ntid.y;
+                     mov.u32 %r29, %tid.y;
+                     add.u32 %r30, %r29, %r29;
+                     setp.gt.s32     %r31, %r30, 19;
+             @%r31   bra     $L2;
+                     add.u32 %r25, %r28, %r28;
+                     mov.u32 %r24, %r30;
+     $L3:
+                     add.u32 %r24, %r24, %r25;
+                     setp.le.s32     %r33, %r24, 19;
+             @%r33   bra     $L3;
+     $L2:
+
+   so the loop is not recognized as empty loop (which we detect by seeing if
+   joining immediately follows forked).  */
+/* { dg-final { scan-assembler-times "bar.sync" 2 } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
index e8a433ffc0a5..d955d79718df 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
@@ -21,4 +21,7 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bar.sync" 0 } } */
+/* Atm, %ntid.y is broadcast from one loop to the next, so there are 2 bar.syncs
+   for that (the other two are there for the same reason as in pr85381-2.c).
+   Todo: Recompute %ntid.y instead of broadcasting it. */
+/* { dg-final { scan-assembler-times "bar.sync" 4 } } */
-- 
2.25.1

Reply via email to