Hi, On Tue, 12 Apr 2011, Dominique Dhumieres wrote:
> > The resulting speed up for nf.f90 is rather remarkable. What specific > > feature of the fortran leads to a 30=>15s ? > > I think it is the automatic array in the subroutine trisolve. Note that the > speedup is rather 27->19s and may be darwin specific (slow malloc). > > Note also that -fstack-arrays prevents some optimizations on > fatigue: 4.7->7s. This may be related to pr45810. That's the effect of -finline-limit=600 that you use it seems. For me (opteron 2356) fatigue behaves like this: (base options: -march=native -ffast-math -funroll-loops -O3) no stack-arrays with stack-arrays no addtional options: 10.2s 8.8s + -fwhole-program: 7.1s 8.8s + -fwhole-program -flto: 10.1s 8.9s + -fwhole-program -flto -finline-limit=600 4.8s 8.7s The perdida subroutine isn't inlined with -fstack-arrays, although it's called only once. The dump report doesn't reveal any reasons, perdida doesn't seem to be in the candidate list for the called-once functions. Ciao, Michael.