On 9/9/20 3:15 PM, Tom de Vries wrote: > On 9/9/20 2:36 PM, Tobias Burnus wrote: >> Hi Tom, >> >> On 9/8/20 5:05 PM, Tobias Burnus wrote: >> >>> On 9/8/20 8:51 AM, Tom de Vries wrote: >>>> PR target/96964 >>>> * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New >>>> expansion. >>>> * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. >> >> I have your patch applied on a current mainline powerpc64le-none-linux-gnu >> + nvptx offloading build. > > Thanks for trying this out. > >> And I observe the following fails – which seems >> to be new and related to your patch (but I have not confirmed it by >> reverting your libatomic patch). >> > > Could you confirm that? > > Meanwhile, I'll try to reproduce on x86_64. > >> Required option for the fail: "-O2 -ftracer", >> hence, only the "-O3 ..." testsuite builds fail. >> (-ftracer = "Perform tail duplication to enlarge superblock size.") >> >> >> during RTL pass: mach >> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at >> config/nvptx/nvptx.c:3293 >> 0x10bf9f13 nvptx_find_par >> gcc/config/nvptx/nvptx.c:3293 >> 0x10bf9b97 nvptx_find_par >> gcc/config/nvptx/nvptx.c:3320 >> 0x10bf9b97 nvptx_find_par >> gcc/config/nvptx/nvptx.c:3320 >> ... >> >> >> The ICE occurs for the second assert of: >> case CODE_FOR_nvptx_join: >> /* A loop tail. Finish the current loop and return to >> parent. */ >> { >> unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0)); >> >> gcc_assert (par->mask == mask); >> gcc_assert (par->join_block == NULL); >> >> gdb shows: >> (gdb) p debug_bb(par->join_block ) >> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK) >> (insn 31 213 204 24 (unspec_volatile:SI [ >> (const_int 4 [0x4]) >> ] UNSPECV_JOIN) >> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 >> {nvptx_join} >> (nil)) >> (jump_insn 204 31 205 24 (set (pc) >> (label_ref 198)) 121 {jump} >> (nil) >> -> 198) >> > > Yep, code duplication works against the matching of fork/join, it's not > the first time we see this. > > Usually the fix is to make an optimization pass conservative with > respect to these fork/join regions, but AFAICT, ftracer already has such > code in ignore_bb_p that tests gimple_call_internal_unique_p. > > So, perhaps the ftracer pass is the trigger, but not the pass that does > the problematic transformation? Just a guess at this point. >
I can reproduce it, and it's indeed the ftracer pass that does the duplication. So, the question is why doesn't ignore_bb_p work. Thanks, - Tom > >> >> That affects the testcases: >> libgomp.oacc-fortran/asyncwait-1.f90 >> libgomp.oacc-fortran/asyncwait-2.f90 >> libgomp.oacc-fortran/asyncwait-3.f90 >> libgomp.oacc-fortran/atomic_capture-1.f90 >> libgomp.oacc-fortran/atomic_update-1.f90 >> libgomp.oacc-fortran/classtypes-1.f95 >> libgomp.oacc-fortran/collapse-1.f90 >> libgomp.oacc-fortran/collapse-2.f90 >> libgomp.oacc-fortran/collapse-3.f90 >> libgomp.oacc-fortran/collapse-4.f90 >> libgomp.oacc-fortran/collapse-5.f90 >> libgomp.oacc-fortran/collapse-6.f90 >> libgomp.oacc-fortran/collapse-7.f90 >> libgomp.oacc-fortran/collapse-8.f90 >> libgomp.oacc-fortran/combined-directives-1.f90 >> libgomp.oacc-fortran/combined-reduction.f90 >> libgomp.oacc-fortran/common-block-1.f90 >> libgomp.oacc-fortran/common-block-2.f90 >> libgomp.oacc-fortran/common-block-3.f90 >> libgomp.oacc-fortran/deep-copy-1.f90 >> libgomp.oacc-fortran/deep-copy-3.f90 >> libgomp.oacc-fortran/deep-copy-4.f90 >> libgomp.oacc-fortran/deep-copy-5.f90 >> libgomp.oacc-fortran/deep-copy-6-no_finalize.F90 >> libgomp.oacc-fortran/deep-copy-6.f90 >> libgomp.oacc-fortran/deep-copy-7.f90 >> libgomp.oacc-fortran/deep-copy-8.f90 >> libgomp.oacc-fortran/derived-type-1.f90 >> libgomp.oacc-fortran/host_data-2.f90 >> libgomp.oacc-fortran/host_data-3.f >> libgomp.oacc-fortran/host_data-4.f90 >> libgomp.oacc-fortran/implicit-firstprivate-ref.f90 >> libgomp.oacc-fortran/lib-14.f90 >> libgomp.oacc-fortran/map-1.f90 >> libgomp.oacc-fortran/nested-function-1.f90 >> libgomp.oacc-fortran/nested-function-2.f90 >> libgomp.oacc-fortran/nested-function-3.f90 >> libgomp.oacc-fortran/no_create-3.F90 >> libgomp.oacc-fortran/optional-data-copyin.f90 >> libgomp.oacc-fortran/optional-data-copyout.f90 >> libgomp.oacc-fortran/optional-data-enter-exit.f90 >> libgomp.oacc-fortran/optional-declare.f90 >> libgomp.oacc-fortran/optional-firstprivate.f90 >> libgomp.oacc-fortran/optional-reduction.f90 >> libgomp.oacc-fortran/optional-update-device.f90 >> libgomp.oacc-fortran/optional-update-host.f90 >> libgomp.oacc-fortran/parallel-dims.f90 >> libgomp.oacc-fortran/parallel-loop-1.f90 >> libgomp.oacc-fortran/pr81352.f90 >> libgomp.oacc-fortran/pr84028.f90 >> libgomp.oacc-fortran/reduction-1.f90 >> libgomp.oacc-fortran/reduction-2.f90 >> libgomp.oacc-fortran/reduction-3.f90 >> libgomp.oacc-fortran/reduction-4.f90 >> libgomp.oacc-fortran/reduction-5.f90 >> libgomp.oacc-fortran/reduction-6.f90 >> libgomp.oacc-fortran/reduction-7.f90 >> libgomp.oacc-fortran/reduction-8.f90 >> libgomp.oacc-fortran/routine-1.f90 >> libgomp.oacc-fortran/routine-2.f90 >> libgomp.oacc-fortran/routine-3.f90 >> libgomp.oacc-fortran/routine-4.f90 >> libgomp.oacc-fortran/routine-7.f90 >> libgomp.oacc-fortran/routine-9.f90 >> libgomp.oacc-fortran/subarrays-1.f90 >> libgomp.oacc-fortran/subarrays-2.f90 >> libgomp.oacc-fortran/update-2.f90 >> >> Tobias >> >> ----------------- >> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / >> Germany >> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, >> Alexander Walter