http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57297

            Bug ID: 57297
           Summary: FAIL: gfortran.dg/select_type_4.f90 -O2  execution
                    test
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gretay at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
            Target: arm-none-eabi

Created attachment 30126
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30126&action=edit
reduced test case

I am not sure whether it's target, fortran, or alias analysis problem (or a
combination).

I am attaching a reduced testcase. It allocates a list with three nodes,
traverses it counting the number of nodes, and then deallocates it. The problem
arises from the middle node that has no data (similar to the original test).
The failure manifests itself by finding only one node in the list, instead of
three. 

$ /work/builds/fortran/install/bin/arm-none-eabi-gfortran
reduced_select_type_4.f90 -O2 -o bad.exe

$ qemu-arm bad.exe
           1
 Done counting

The original test fails on qemu arm-none-eabi using a recent trunk compiler,
and it has been failing on trunk since at least September 2012 (and probably
long ago). 
The reduced testcase fails in arm mode, but not in thumb mode. 
It does not fail when compiled with -fno-strict-aliasing. 
It also does not fail when compiled with -fno-schedule-insns
-fno-schedule-insns2. 
It does not fail when inlining is disabled.

$ /work/builds/fortran/install/bin/arm-none-eabi-gfortran
reduced_select_type_4.f90 -O2 -o good.exe -fno-strict-aliasing
greyor01@e103227-lin:/work/tmp/sel/sept$ qemu-arm good.exe
           1
           2
           3
 Done counting

The problem seems to be incorrect aliasing information that gets used by
instruction reordering.
It results in the following code slice (bad):
    add    r3, sp, #8          @ 94    *arm_addsi3/2    [length = 4]
    ldmia    r3, {r0, r1}    @ 104    *ldm2_ia    [length = 4]
    str    r6, [sp, #8]    @ 99    *arm_movsi_vfp/6    [length = 4]
    str    r5, [sp, #12]    @ 101    *arm_movsi_vfp/6    [length = 4]
    stmia    r4, {r0, r1}    @ 105    *stm2_ia    [length = 4]
instead of (good):
    add    r3, sp, #8          @ 94    *arm_addsi3/2    [length = 4] 
    str    r6, [sp, #8]    @ 99    *arm_movsi_vfp/6    [length = 4]
    str    r5, [sp, #12]    @ 101    *arm_movsi_vfp/6    [length = 4]
    ldmia    r3, {r0, r1}    @ 104    *ldm2_ia    [length = 4]
    stmia    r4, {r0, r1}    @ 105    *stm2_ia    [length = 4]

The problem is that the load is moved before stores to the same address.
This happens in MAIN, in the code of append that is inlined in MAIN to append
the second node to the list.
The fact that the second node has a different type (base node, not integer
node) may be playing a role here.

Gimple for this code is (block 11):
  MEM[(struct __class_poly_list_Node_type_p *)&node] = node_17;
  MEM[(struct __class_poly_list_Node_type_p *)&node + 4B] =
&__vtab_poly_list_Node_type;
  MEM[(struct node_type *)integer_node_4].next = VIEW_CONVERT_EXPR<struct
__class_poly_list_Node_type_p>(MEM[(struct __class_poly_list_Node_type
&)&node]);

The stores (rtl insns 99, 101) come from the first two GIMPLE statement, and
the load (insn 104) comes from accessing rhs of the third statement. 
Note that the third statement is an object assignment via movmemqi expand
pattern calling arm_gen_movmemqi function in arm.c. (Could be a target problem
if the alias sets are not handled correctly here, but it seems that they copied
as is from Gimple).

In the RTL right after expand, the relevant memory accesses are annotated as
follows:

(insn 99)   [3 MEM[(struct __class_poly_list_Node_type_p *)&node]+0 S4 A64]
(insn 104)  [8 MEM[(struct __class_poly_list_Node_type &)&node]+0 S4 A64]

This is not recognized as aliasing and no dependence edge is created by the
scheduler. 

In comparison, gimple for block 7 (appending the first node in the list, of
type integer) is handled correctly.

  MEM[(struct __class_poly_list_Node_type *)&class.1] = integer_node_4;
  MEM[(struct __class_poly_list_Node_type *)&class.1 + 4B] =
&__vtab_main_Integer_node_type;
  _63->next = VIEW_CONVERT_EXPR<struct __class_poly_list_Node_type_p>(class.1);

And the alias sets are: 
(insn 48)   [8 MEM[(struct __class_poly_list_Node_type *)&class.1]
(insn 54)   [8 class.1+0 S4 A64]

and the scheduler knows about the dependency.

It doesn't seem to be target-related, because the mem annotation is exactly the
same as in gimple, but I don't see this test failing on other targets. The
difference may be that other targets use generic move_mem_by_pieces while arm
has expand for movmemqi.

The complete RTL after expand for block 11 is:

;; MEM[(struct __class_poly_list_Node_type_p *)&node] = node_17;

(insn 99 98 0 (set (mem/f/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars)
                (const_int -376 [0xfffffffffffffe88])) [3 MEM[(struct
__class_poly_list_Node_type_p *)&node]+0 S4 A64])
        (reg/f:SI 112 [ node ])) select_type_4.f90:37 -1
     (nil))

;; MEM[(struct __class_poly_list_Node_type_p *)&node + 4B] =
&__vtab_poly_list_Node_type;

(insn 100 99 101 (set (reg/f:SI 155)
        (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) select_type_4.f90:37 -1
     (nil))

(insn 101 100 0 (set (mem/f/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars)
                (const_int -372 [0xfffffffffffffe8c])) [3 MEM[(struct
__class_poly_list_Node_type_p *)&node + 4B]+0 S4 A32])
        (reg/f:SI 155)) select_type_4.f90:37 -1
     (nil))

;; MEM[(struct node_type *)integer_node_4].next = VIEW_CONVERT_EXPR<struct
__class_poly_list_Node_type_p>(MEM[(struct __class_poly_list_Node_type
&)&node]);

(insn 102 101 103 (set (reg:SI 156)
        (reg/v/f:SI 110 [ integer_node ])) select_type_4.f90:37 -1
     (nil))

(insn 103 102 104 (set (reg:SI 157)
        (plus:SI (reg/f:SI 105 virtual-stack-vars)
            (const_int -376 [0xfffffffffffffe88]))) select_type_4.f90:37 -1
     (nil))

(insn 104 103 105 (parallel [
            (set (reg:SI 0 r0)
                (mem/c:SI (reg:SI 157) [8 MEM[(struct
__class_poly_list_Node_type &)&node]+0 S4 A64]))
            (set (reg:SI 1 r1)
                (mem/c:SI (plus:SI (reg:SI 157)
                        (const_int 4 [0x4])) [8 MEM[(struct
__class_poly_list_Node_type &)&node]+4 S4 A32]))
        ]) select_type_4.f90:37 -1
     (nil))

(insn 105 104 0 (parallel [
            (set (mem:SI (reg:SI 156) [3 MEM[(struct node_type
*)integer_node_4].next+0 S4 A32])
                (reg:SI 0 r0))
            (set (mem:SI (plus:SI (reg:SI 156)
                        (const_int 4 [0x4])) [3 MEM[(struct node_type
*)integer_node_4].next+4 S4 A32])
                (reg:SI 1 r1))
        ]) select_type_4.f90:37 -1
     (nil))


$ /work/builds/fortran/install/bin/arm-none-eabi-gfortran -v
Using built-in specs.
COLLECT_GCC=/work/builds/fortran/install/bin/arm-none-eabi-gfortran
COLLECT_LTO_WRAPPER=/work/builds/fortran/install/libexec/gcc/arm-none-eabi/4.8.0/lto-wrapper
Target: arm-none-eabi
Configured with: /work/local-checkouts/gcc-git//configure
--target=arm-none-eabi --prefix=/work/may-builds/fortran/install
--with-sysroot=/work/may-builds/fortran/install/arm-none-eabi --with-newlib
--with-gnu-as --with-gnu-ld --enable-languages=c,c++,fortran --disable-shared
--disable-nls --disable-threads --disable-lto --disable-plugin --disable-tls
--enable-checking=yes --disable-libssp --disable-libgomp --disable-libmudflap
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4 --with-float=softfp 
Thread model: single
gcc version 4.8.0 20120912 (experimental) (GCC)

Reply via email to