The attached file is a loop over the same function implemented in C and inline
asm. 

When compiled with:
gcc -O3 -fno-pic -fomit-frame-pointer -fdump-tree-cunroll-details -S
cabac_unroll.i
cunroll thinks they're different sizes:

size: 55-4, last_iteration: 55-4
  Loop size: 55
  Estimated size after unrolling: 442

size: 8-4, last_iteration: 8-4
  Loop size: 8
  Estimated size after unrolling: 34

and expands the asm loop all 13 times.

This is reduced from ffmpeg decode_cabac_residual, where it apparently causes
significant decoding slowdown.

Besides that, cunroll seems to be hurting ffmpeg in general on x86-32
(http://multimedia.cx/eggs/last-performance-smackdown-for-awhile/), maybe we'll
turn it down some.


-- 
           Summary: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992

Reply via email to