https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069
Bug ID: 115069 Summary: 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: colin.king at intel dot com Target Milestone: --- Created attachment 58188 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58188&action=edit reproducer source code I'm seeing a ~12-14% performance regression in gcc-14 compared to gcc-13, using gcc on Ubuntu 24.04: Versions: gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) gcc version 14.0.1 20240412 (experimental) [master r14-9935-g67e1433a94f] (Ubuntu 14-20240412-0ubuntu1) cking@skylake:~$ gcc-13 reproducer-vecmath.c -O2 cking@skylake:~$ ./a.out 13540.16 vec8 ops per sec, duration = 14.77 secs cking@skylake:~$ gcc-14 reproducer-vecmath.c -O2 cking@skylake:~$ ./a.out 11720.25 vec8 ops per sec, duration = 17.06 secs The original issue appeared when regression testing stress-ng vecmath stressor [1]. I've managed to extract the attached reproducer from the original code (see attached). Salient point to focus on: 1. The issue is also dependant on the TARGET_CLONES macro being defined as __attribute__((target_clones("mmx,avx,avx2,default"))) - the avx2 target clones seems to be an issue in reproducing this problem, remove it for gcc-14 and the performance regression is reduced. Attached are the reproducer C source and disassembled object code. References: [1] https://github.com/ColinIanKing/stress-ng/blob/master/stress-vecmath.c