First of all, I'm using Debian's gcc-snapshot package:

  gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian
20081117-1) 

Let me know if I should try to rebuild with another GCC version.

I tested my image scaler (http://bzr.sesse.net/qscale/) and libjpeg with 4.4
vs. 4.3, and got the following oprofile graph for the same load in both cases.

4.3:

  samples  %        app name                 symbol name
  5182     21.8484  libjpeg.so.62.0.0        jpeg_idct_islow
  5150     21.7135  libjpeg.so.62.0.0        decode_mcu
  3582     15.1025  qscale                   vscale
  1237      5.2154  libjpeg.so.62.0.0        jpeg_fill_bit_buffer
  592       2.4960  qscale                   hscale

4.4:

  samples  %        app name                 symbol name
  7054     31.9056  qscale                   jpeg_idct_islow
  4401     19.9059  qscale                   decode_mcu
  3584     16.2106  qscale                   vscale
  1352      6.1152  qscale                   jpeg_fill_bit_buffer
  606       2.7410  qscale                   hscale

Note that decode_mcu is 17% faster (probably due to better register
allocation), but jpeg_idct_islow is 36% slower! jpeg_fill_bit_buffer is also a
tiny bit slower, but that's not as critical. (The overall effect is that the
JPEG decoding as a whole runs slower.) I have not looked at the generated code,
but it's definitely not good.

FWIW, it's repeatable between runs -- the sample counts change very little
(1-2%, perhaps).


-- 
           Summary: Massive performance regression for jpeg_idct_islow
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sgunderson at bigfoot dot com
 GCC build triplet: i486-linux-gnu
  GCC host triplet: i486-linux-gnu
GCC target triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

Reply via email to