The run time of air.f90 of the Polyhedron test suite takes ~15% more time when compiled with -fwhole-file than without the option. I have checked that the subroutines DERIV(X|Y) are inlined with -finline-limit=100, but not with -finline-limit=50 (for the later I recover the timing without -fwhole-file). What I have found very odd is that if I manually inline only a single call (see below) I get the same timing that with all of them (2*14) inlined. This is the case for trunk and gfortran 4.4.0, but not for 4.3.3 which gives a slower executable.
I have inlined SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M) IMPLICIT REAL*8(A-H,O-Z) PARAMETER (NX=150,NY=150) DIMENSION D(NX,33) , U(NX,NY) , Ux(NX,NY) , Al(30) , Np(30) DO jm = 1 , M jmax = 0 jmin = 1 DO i = 1 , Nd jmax = jmax + Np(i) + 1 DO j = jmin , jmax uxt = 0. DO k = 0 , Np(i) uxt = uxt + D(j,k+1)*U(jmin+k,jm) ENDDO Ux(j,jm) = uxt*Al(i) ENDDO ! jmin = jmin + Np(i) + 1 ENDDO ENDDO CONTINUE END at line 793 as ! CALL DERIVX(DX,f4,f4x,ALX,NPX,NDX,MXPy) DO jm = 1 , MXPy jmax = 0 jmin = 1 DO i = 1 , NDX jmax = jmax + NPX(i) + 1 DO j = jmin , jmax uxt = 0. DO k = 0 , NPX(i) uxt = uxt + DX(j,k+1)*f4(jmin+k,jm) ENDDO f4x(j,jm) = uxt*ALX(i) ENDDO jmin = jmin + NPX(i) + 1 ENDDO ENDDO -- Summary: Time increase with inlining for the Polyhedron test air.f90 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dominiq at lps dot ens dot fr GCC build triplet: i686-apple-darwin9 GCC host triplet: i686-apple-darwin9 GCC target triplet: i686-apple-darwin9 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106