https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65701
--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> --- The profile difference is: 52.31% facerec facerec [.] MAIN__.lto_priv.3 � 16.68% facerec facerec [.] topcostfct.3487.lto_priv.4 � 8.28% facerec facerec [.] __gaborroutines_MOD_gabortrafo � 7.91% facerec facerec [.] cfftb_ � 7.20% facerec libgfortran.so.3 [.] _gfortrani_cshift0_r4 � 2.76% facerec facerec [.] __fft2d_MOD_fft2db � 1.54% facerec facerec [.] __graphroutines_MOD_graphsimfct.constprop.0 � 0.53% facerec libc-2.13.so [.] __memcpy_ssse3 � (mainline) WRT 59.16% facerec facerec [.] MAIN__.lto_priv.3 � 10.95% facerec facerec [.] __gaborroutines_MOD_gabortrafo � 10.51% facerec facerec [.] cfftb1_ � 9.33% facerec libgfortran.so.3 [.] _gfortrani_cshift0_r4 � 3.64% facerec facerec [.] __fft2d_MOD_fft2db � 2.07% facerec facerec [.] __graphroutines_MOD_graphsimfct.constprop.0 � 0.67% facerec libc-2.13.so [.] __memcpy_ssse3 � 0.57% facerec libgfortran.so.3 [.] _gfortrani_read_radix � 0.43% facerec libgcc_s.so.1 [.] __udivti3 � 0.36% facerec libgfortran.so.3 [.] formatted_transfer � patch reverted. I wonder if we don't want to iline udivti... I suppose the problem is that we no longer inline topcostfct which we do not inline because... not inlinable: localmove.constprop/304 -> topcostfct/208, --param large-function-growth limit reached while patched tree suceeds: Inlining topcostfct size 1393. Called once from localmove.constprop 740 insns. Accounting size:1132.00, time:12187.80 on predicate:(true) Bumping the large-function-insns limit up to 4000 makes the function to be inlined but curiously enough causes further degradation. The profile is now: 66.35% facerec facerec [.] MAIN__.lto_priv.3 � 8.93% facerec facerec [.] __gaborroutines_MOD_gabortrafo � 8.72% facerec facerec [.] cfftb_ � 7.77% facerec libgfortran.so.3 [.] _gfortrani_cshift0_r4 � 2.96% facerec facerec [.] __fft2d_MOD_fft2db � 1.68% facerec facerec [.] __graphroutines_MOD_graphsimfct.constprop.0 � 0.55% facerec libc-2.13.so [.] __memcpy_ssse3 � 0.47% facerec libgfortran.so.3 [.] _gfortrani_read_radix � 0.34% facerec libgcc_s.so.1 [.] __udivti3 � 0.30% facerec libgfortran.so.3 [.] formatted_transfer � 0.22% facerec libgfortran.so.3 [.] next_format0 � 0.22% facerec facerec [.] cfftf_ � 0.20% facerec libgfortran.so.3 [.] _gfortrani_read_block_form � so basically identical except that mainline inlines cfftb1_ and the patched tree inlines cfftb_ which is a wrapper. Perhaps the wrapper heuristics may be generalized for this.