Hi, On Thu, 14 Apr 2011, Dominique Dhumieres wrote:
> I have forgotten to mentionned that I have a variant of fatigue > in which I have done the inlining manually along with few other > optimizations and the timing for it is > > [macbook] lin/test% gfc -Ofast fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.793u 0.002s 0:02.79 100.0% 0+0k 0+1io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.680u 0.003s 0:02.68 100.0% 0+0k 0+2io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program -flto fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.671u 0.002s 0:02.67 100.0% 0+0k 0+2io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program -fstack-arrays fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.680u 0.003s 0:02.68 100.0% 0+0k 0+2io 0pf+0w > [macbook] lin/test% gfc -Ofast -fwhole-program -flto -fstack-arrays > fatigue_v8.f90 > [macbook] lin/test% time a.out > /dev/null > 2.677u 0.003s 0:02.68 99.6% 0+0k 0+0io 0pf+0w > > So the timing of the original code with > -Ofast -finline-limit=600 -fwhole-program -flto -fstack-arrays > is quite close to this lower bound. > > I have also looked at the failure for gfortran.dg/elemental_dependency_1.f90 > and it seems due to a spurious integer(kind=4) A.37[4]; (and friends) in Yes, this is due to the DECL_EXPR statement which is rendered by the dumper just the same as a normal decl. The testcase looks for exactly one such decl, but with -fstack-arrays there are exactly two for each such array. > integer(kind=4) A.37[4]; So, this is the normal decl. > atmp.36.dim[0].ubound = 3; > integer(kind=4) A.37[4]; And this is the DECL_EXPR statement, which then actually is transformed into the stack_save/alloca/stack_restore sequence. Ciao, Michael.