http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57315
Bug ID: 57315 Summary: LTO and/or vectorizer performance regression on salsa20 core, 4.7->4.8 Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com I'm seeing a significant performance regression from 4.7 to 4.8 (targeting x86-64) on the "salsa20" core function (this is a stream cipher). Repro instructions: $ git clone git://github.com/zackw/rngstats.git # ... $ make -s cipher-test CC=gcc-4.7 && ./cipher-test >&/dev/null && ./cipher-test KAT: aes128... ok KAT: aes256... ok KAT: arc4... ok KAT: isaac64... ok KAT: salsa20_128... ok KAT: salsa20_256... ok TIME: aes128... 2000 keys, 3.47834s -> 574.987 keys/s TIME: aes256... 2000 keys, 3.62452s -> 551.797 keys/s TIME: arc4... 2000 keys, 2.21746s -> 901.933 keys/s TIME: isaac64... 2000 keys, 2.03467s -> 982.962 keys/s TIME: salsa20_128... 2000 keys, 2.31960s -> 862.217 keys/s TIME: salsa20_256... 2000 keys, 2.31932s -> 862.320 keys/s $ make -s clean cipher-test CC=gcc-4.8 && ./cipher-test >&/dev/null && ./cipher-test KAT: aes128... ok KAT: aes256... ok KAT: arc4... ok KAT: isaac64... ok KAT: salsa20_128... ok KAT: salsa20_256... ok TIME: aes128... 2000 keys, 2.49224s -> 802.491 keys/s TIME: aes256... 2000 keys, 3.62372s -> 551.919 keys/s TIME: arc4... 2000 keys, 2.22794s -> 897.689 keys/s TIME: isaac64... 2000 keys, 2.05087s -> 975.194 keys/s TIME: salsa20_128... 2000 keys, 3.53085s -> 566.436 keys/s TIME: salsa20_256... 2000 keys, 2.53003s -> 790.505 keys/s The regression shows in the last two TIME: lines for each build. The relevant code is probably in ciphers/salsa20.c, or else in worker.c. Note that there are other programs in this repository, and they require unusual libraries to build. I recommend you do not attempt a "make all", and if you get errors, try commenting out the CFLAGS.mpi and LIBS.mpi lines in the Makefile.