The attached file is a loop over the same function implemented in C and inline asm.
When compiled with: gcc -O3 -fno-pic -fomit-frame-pointer -fdump-tree-cunroll-details -S cabac_unroll.i cunroll thinks they're different sizes: size: 55-4, last_iteration: 55-4 Loop size: 55 Estimated size after unrolling: 442 size: 8-4, last_iteration: 8-4 Loop size: 8 Estimated size after unrolling: 34 and expands the asm loop all 13 times. This is reduced from ffmpeg decode_cabac_residual, where it apparently causes significant decoding slowdown. Besides that, cunroll seems to be hurting ffmpeg in general on x86-32 (http://multimedia.cx/eggs/last-performance-smackdown-for-awhile/), maybe we'll turn it down some. -- Summary: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992