------- Comment #4 from tkoenig at gcc dot gnu dot org 2008-08-23 13:18 ------- Created an attachment (id=16134) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16134&action=view) test case
Actually, the test cases were a bit unfair, because the middle-end decided not to calculate the values of c that were never used. Attached is a better test case. Timings on x86_64-unknown-linux-gnu: matmul = 12.840802 s subroutine without explicit interface: 0.88805580 s subroutine with explicit interface: 0.87605572 s inline with sum 2.0721283 s While inlining is still much better than matmul, a hand-rolled 3*3 subroutine is much faster overall, which I find a bit surprising. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37131