https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324

--- Comment #7 from mjr19 at cam dot ac.uk ---
The patch to GCC 15 in commit
r15-1508-g59221dc587f369695d9b0c2f73aedf8458931f0f  from pr 68855 has made a
significant improvement to the optimisation of these examples at -O3, causing
the -Ofast version now to be slower than the -O3 version for both of the
attachments. For the two examples given, rough timings in ns/iteration on a
3GHz Kaby Lake are

m3spf

gf-12  -Ofast   26.5
gf-15  -O3      27.6
gf-14  -Ofast   34.8
gf-15  -Ofast   35.1
gf-14  -O3      43.8
gf-12  -O3      44.8

m4spf

gf-12  -Ofast   23.3
gf-15  -O3      23.8
gf-14  -Ofast   29.6
gf-15  -Ofast   29.7
gf-14  -O3      37.3
gf-12  -O3      37.6

All with the flag -mavx2, and in both cases the fastest time is very similar to
ifort -O3. gf-15 is gfortran 15.0-20240623

(I believe there is interest in the optimisation of these expressions. I am an
electronic structure physicist, and the major simulation codes in my area,
Abinit, CASTEP, QE, Siesta, VASP, are all written in Fortran, all use the
complex datatype, are likely to make use of conjugation and also multiplication
by +/-i, and use large amounts of time on academic supercomputers. The ability
to alternate neg and nop efficiently along a vector would be very useful if it
dealt with conjg and *(+/-i), and the obvious xor seems quite safe.)

Reply via email to