First of all, I'm using Debian's gcc-snapshot package: gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian 20081117-1)
Let me know if I should try to rebuild with another GCC version. I tested my image scaler (http://bzr.sesse.net/qscale/) and libjpeg with 4.4 vs. 4.3, and got the following oprofile graph for the same load in both cases. 4.3: samples % app name symbol name 5182 21.8484 libjpeg.so.62.0.0 jpeg_idct_islow 5150 21.7135 libjpeg.so.62.0.0 decode_mcu 3582 15.1025 qscale vscale 1237 5.2154 libjpeg.so.62.0.0 jpeg_fill_bit_buffer 592 2.4960 qscale hscale 4.4: samples % app name symbol name 7054 31.9056 qscale jpeg_idct_islow 4401 19.9059 qscale decode_mcu 3584 16.2106 qscale vscale 1352 6.1152 qscale jpeg_fill_bit_buffer 606 2.7410 qscale hscale Note that decode_mcu is 17% faster (probably due to better register allocation), but jpeg_idct_islow is 36% slower! jpeg_fill_bit_buffer is also a tiny bit slower, but that's not as critical. (The overall effect is that the JPEG decoding as a whole runs slower.) I have not looked at the generated code, but it's definitely not good. FWIW, it's repeatable between runs -- the sample counts change very little (1-2%, perhaps). -- Summary: Massive performance regression for jpeg_idct_islow Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sgunderson at bigfoot dot com GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328