I have forgotten to mentionned that I have a variant of fatigue
in which I have done the inlining manually along with few other
optimizations and the timing for it is
[macbook] lin/test% gfc -Ofast fatigue_v8.f90
[macbook] lin/test% time a.out > /dev/null
2.793u 0.002s 0:02.79 100.0%    0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast -fwhole-program fatigue_v8.f90
[macbook] lin/test% time a.out > /dev/null
2.680u 0.003s 0:02.68 100.0%    0+0k 0+2io 0pf+0w
[macbook] lin/test% gfc -Ofast -fwhole-program -flto fatigue_v8.f90
[macbook] lin/test% time a.out > /dev/null
2.671u 0.002s 0:02.67 100.0%    0+0k 0+2io 0pf+0w
[macbook] lin/test% gfc -Ofast -fwhole-program -fstack-arrays fatigue_v8.f90
[macbook] lin/test% time a.out > /dev/null
2.680u 0.003s 0:02.68 100.0%    0+0k 0+2io 0pf+0w
[macbook] lin/test% gfc -Ofast -fwhole-program -flto -fstack-arrays 
fatigue_v8.f90
[macbook] lin/test% time a.out > /dev/null
2.677u 0.003s 0:02.68 99.6%     0+0k 0+0io 0pf+0w

So the timing of the original code with 
-Ofast -finline-limit=600 -fwhole-program -flto -fstack-arrays
is quite close to this lower bound.

I have also looked at the failure for gfortran.dg/elemental_dependency_1.f90
and it seems due to a spurious integer(kind=4) A.37[4]; (and friends) in

    integer(kind=8) D.1674;
    integer(kind=4) A.37[4];
    struct array1_integer(kind=4) atmp.36;
    void * D.1669;
    integer(kind=8) D.1668;
    struct array1_integer(kind=4) parm.35;

    parm.35.dtype = 265;
    parm.35.dim[0].lbound = 1;
    parm.35.dim[0].ubound = 4;
    parm.35.dim[0].stride = 1;
    parm.35.data = (void *) &a[1];
    parm.35.offset = -2;
    atmp.36.dtype = 265;
    atmp.36.dim[0].stride = 1;
    atmp.36.dim[0].lbound = 0;
    atmp.36.dim[0].ubound = 3;
        integer(kind=4) A.37[4];
    atmp.36.data = (void * restrict) &A.37;

compared to

    integer(kind=8) D.1658;
    integer(kind=4) A.37[4];
    struct array1_integer(kind=4) atmp.36;
    void * D.1653;
    integer(kind=8) D.1652;
    struct array1_integer(kind=4) parm.35;

    parm.35.dtype = 265;
    parm.35.dim[0].lbound = 1;
    parm.35.dim[0].ubound = 4;
    parm.35.dim[0].stride = 1;
    parm.35.data = (void *) &a[1];
    parm.35.offset = -2;
    atmp.36.dtype = 265;
    atmp.36.dim[0].stride = 1;
    atmp.36.dim[0].lbound = 0;
    atmp.36.dim[0].ubound = 3;
    atmp.36.data = (void * restrict) &A.37;

Note that this is without the -fstack-arrays option.
Same thing for gfortran.dg/vector_subscript_4.f90.

Dominique

Reply via email to