http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57297
Bug ID: 57297 Summary: FAIL: gfortran.dg/select_type_4.f90 -O2 execution test Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: gretay at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target: arm-none-eabi Created attachment 30126 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30126&action=edit reduced test case I am not sure whether it's target, fortran, or alias analysis problem (or a combination). I am attaching a reduced testcase. It allocates a list with three nodes, traverses it counting the number of nodes, and then deallocates it. The problem arises from the middle node that has no data (similar to the original test). The failure manifests itself by finding only one node in the list, instead of three. $ /work/builds/fortran/install/bin/arm-none-eabi-gfortran reduced_select_type_4.f90 -O2 -o bad.exe $ qemu-arm bad.exe 1 Done counting The original test fails on qemu arm-none-eabi using a recent trunk compiler, and it has been failing on trunk since at least September 2012 (and probably long ago). The reduced testcase fails in arm mode, but not in thumb mode. It does not fail when compiled with -fno-strict-aliasing. It also does not fail when compiled with -fno-schedule-insns -fno-schedule-insns2. It does not fail when inlining is disabled. $ /work/builds/fortran/install/bin/arm-none-eabi-gfortran reduced_select_type_4.f90 -O2 -o good.exe -fno-strict-aliasing greyor01@e103227-lin:/work/tmp/sel/sept$ qemu-arm good.exe 1 2 3 Done counting The problem seems to be incorrect aliasing information that gets used by instruction reordering. It results in the following code slice (bad): add r3, sp, #8 @ 94 *arm_addsi3/2 [length = 4] ldmia r3, {r0, r1} @ 104 *ldm2_ia [length = 4] str r6, [sp, #8] @ 99 *arm_movsi_vfp/6 [length = 4] str r5, [sp, #12] @ 101 *arm_movsi_vfp/6 [length = 4] stmia r4, {r0, r1} @ 105 *stm2_ia [length = 4] instead of (good): add r3, sp, #8 @ 94 *arm_addsi3/2 [length = 4] str r6, [sp, #8] @ 99 *arm_movsi_vfp/6 [length = 4] str r5, [sp, #12] @ 101 *arm_movsi_vfp/6 [length = 4] ldmia r3, {r0, r1} @ 104 *ldm2_ia [length = 4] stmia r4, {r0, r1} @ 105 *stm2_ia [length = 4] The problem is that the load is moved before stores to the same address. This happens in MAIN, in the code of append that is inlined in MAIN to append the second node to the list. The fact that the second node has a different type (base node, not integer node) may be playing a role here. Gimple for this code is (block 11): MEM[(struct __class_poly_list_Node_type_p *)&node] = node_17; MEM[(struct __class_poly_list_Node_type_p *)&node + 4B] = &__vtab_poly_list_Node_type; MEM[(struct node_type *)integer_node_4].next = VIEW_CONVERT_EXPR<struct __class_poly_list_Node_type_p>(MEM[(struct __class_poly_list_Node_type &)&node]); The stores (rtl insns 99, 101) come from the first two GIMPLE statement, and the load (insn 104) comes from accessing rhs of the third statement. Note that the third statement is an object assignment via movmemqi expand pattern calling arm_gen_movmemqi function in arm.c. (Could be a target problem if the alias sets are not handled correctly here, but it seems that they copied as is from Gimple). In the RTL right after expand, the relevant memory accesses are annotated as follows: (insn 99) [3 MEM[(struct __class_poly_list_Node_type_p *)&node]+0 S4 A64] (insn 104) [8 MEM[(struct __class_poly_list_Node_type &)&node]+0 S4 A64] This is not recognized as aliasing and no dependence edge is created by the scheduler. In comparison, gimple for block 7 (appending the first node in the list, of type integer) is handled correctly. MEM[(struct __class_poly_list_Node_type *)&class.1] = integer_node_4; MEM[(struct __class_poly_list_Node_type *)&class.1 + 4B] = &__vtab_main_Integer_node_type; _63->next = VIEW_CONVERT_EXPR<struct __class_poly_list_Node_type_p>(class.1); And the alias sets are: (insn 48) [8 MEM[(struct __class_poly_list_Node_type *)&class.1] (insn 54) [8 class.1+0 S4 A64] and the scheduler knows about the dependency. It doesn't seem to be target-related, because the mem annotation is exactly the same as in gimple, but I don't see this test failing on other targets. The difference may be that other targets use generic move_mem_by_pieces while arm has expand for movmemqi. The complete RTL after expand for block 11 is: ;; MEM[(struct __class_poly_list_Node_type_p *)&node] = node_17; (insn 99 98 0 (set (mem/f/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -376 [0xfffffffffffffe88])) [3 MEM[(struct __class_poly_list_Node_type_p *)&node]+0 S4 A64]) (reg/f:SI 112 [ node ])) select_type_4.f90:37 -1 (nil)) ;; MEM[(struct __class_poly_list_Node_type_p *)&node + 4B] = &__vtab_poly_list_Node_type; (insn 100 99 101 (set (reg/f:SI 155) (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) select_type_4.f90:37 -1 (nil)) (insn 101 100 0 (set (mem/f/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -372 [0xfffffffffffffe8c])) [3 MEM[(struct __class_poly_list_Node_type_p *)&node + 4B]+0 S4 A32]) (reg/f:SI 155)) select_type_4.f90:37 -1 (nil)) ;; MEM[(struct node_type *)integer_node_4].next = VIEW_CONVERT_EXPR<struct __class_poly_list_Node_type_p>(MEM[(struct __class_poly_list_Node_type &)&node]); (insn 102 101 103 (set (reg:SI 156) (reg/v/f:SI 110 [ integer_node ])) select_type_4.f90:37 -1 (nil)) (insn 103 102 104 (set (reg:SI 157) (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -376 [0xfffffffffffffe88]))) select_type_4.f90:37 -1 (nil)) (insn 104 103 105 (parallel [ (set (reg:SI 0 r0) (mem/c:SI (reg:SI 157) [8 MEM[(struct __class_poly_list_Node_type &)&node]+0 S4 A64])) (set (reg:SI 1 r1) (mem/c:SI (plus:SI (reg:SI 157) (const_int 4 [0x4])) [8 MEM[(struct __class_poly_list_Node_type &)&node]+4 S4 A32])) ]) select_type_4.f90:37 -1 (nil)) (insn 105 104 0 (parallel [ (set (mem:SI (reg:SI 156) [3 MEM[(struct node_type *)integer_node_4].next+0 S4 A32]) (reg:SI 0 r0)) (set (mem:SI (plus:SI (reg:SI 156) (const_int 4 [0x4])) [3 MEM[(struct node_type *)integer_node_4].next+4 S4 A32]) (reg:SI 1 r1)) ]) select_type_4.f90:37 -1 (nil)) $ /work/builds/fortran/install/bin/arm-none-eabi-gfortran -v Using built-in specs. COLLECT_GCC=/work/builds/fortran/install/bin/arm-none-eabi-gfortran COLLECT_LTO_WRAPPER=/work/builds/fortran/install/libexec/gcc/arm-none-eabi/4.8.0/lto-wrapper Target: arm-none-eabi Configured with: /work/local-checkouts/gcc-git//configure --target=arm-none-eabi --prefix=/work/may-builds/fortran/install --with-sysroot=/work/may-builds/fortran/install/arm-none-eabi --with-newlib --with-gnu-as --with-gnu-ld --enable-languages=c,c++,fortran --disable-shared --disable-nls --disable-threads --disable-lto --disable-plugin --disable-tls --enable-checking=yes --disable-libssp --disable-libgomp --disable-libmudflap --with-cpu=cortex-a15 --with-fpu=neon-vfpv4 --with-float=softfp Thread model: single gcc version 4.8.0 20120912 (experimental) (GCC)