On 9/9/20 4:14 PM, Tom de Vries wrote: > On 9/9/20 3:15 PM, Tom de Vries wrote: >> On 9/9/20 2:36 PM, Tobias Burnus wrote: >>> Hi Tom, >>> >>> On 9/8/20 5:05 PM, Tobias Burnus wrote: >>> >>>> On 9/8/20 8:51 AM, Tom de Vries wrote: >>>>> PR target/96964 >>>>> * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New >>>>> expansion. >>>>> * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. >>> >>> I have your patch applied on a current mainline powerpc64le-none-linux-gnu >>> + nvptx offloading build. >> >> Thanks for trying this out. >> >>> And I observe the following fails – which seems >>> to be new and related to your patch (but I have not confirmed it by >>> reverting your libatomic patch). >>> >> >> Could you confirm that? >> >> Meanwhile, I'll try to reproduce on x86_64. >> >>> Required option for the fail: "-O2 -ftracer", >>> hence, only the "-O3 ..." testsuite builds fail. >>> (-ftracer = "Perform tail duplication to enlarge superblock size.") >>> >>> >>> during RTL pass: mach >>> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at >>> config/nvptx/nvptx.c:3293 >>> 0x10bf9f13 nvptx_find_par >>> gcc/config/nvptx/nvptx.c:3293 >>> 0x10bf9b97 nvptx_find_par >>> gcc/config/nvptx/nvptx.c:3320 >>> 0x10bf9b97 nvptx_find_par >>> gcc/config/nvptx/nvptx.c:3320 >>> ... >>> >>> >>> The ICE occurs for the second assert of: >>> case CODE_FOR_nvptx_join: >>> /* A loop tail. Finish the current loop and return to >>> parent. */ >>> { >>> unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); >>> >>> gcc_assert (par->mask == mask); >>> gcc_assert (par->join_block == NULL); >>> >>> gdb shows: >>> (gdb) p debug_bb(par->join_block ) >>> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK) >>> (insn 31 213 204 24 (unspec_volatile:SI [ >>> (const_int 4 [0x4]) >>> ] UNSPECV_JOIN) >>> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 >>> {nvptx_join} >>> (nil)) >>> (jump_insn 204 31 205 24 (set (pc) >>> (label_ref 198)) 121 {jump} >>> (nil) >>> -> 198) >>> >> >> Yep, code duplication works against the matching of fork/join, it's not >> the first time we see this. >> >> Usually the fix is to make an optimization pass conservative with >> respect to these fork/join regions, but AFAICT, ftracer already has such >> code in ignore_bb_p that tests gimple_call_internal_unique_p. >> >> So, perhaps the ftracer pass is the trigger, but not the pass that does >> the problematic transformation? Just a guess at this point. >> > > I can reproduce it, and it's indeed the ftracer pass that does the > duplication. So, the question is why doesn't ignore_bb_p work.
Filed PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97000 for this. Thanks, - Tom