https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123260
--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Yichao Yu from comment #0) > > but refuses to do so for the scalar version, even though they are doing > exactly the same operations AFAICT, > > > ``` > ldp d30, d28, [x2] > ldp d31, d29, [x1] > ldp d27, d26, [x0] > fmadd d27, d31, d30, d27 > fmadd d26, d31, d28, d26 > fmsub d27, d29, d28, d27 > fmadd d26, d30, d29, d26 > stp d27, d26, [x0] > ``` > > Maybe related https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121925 This is because of a missing handling to deal with + commutativity in the FMA. The nodes get changed to >>> p debug (vals[1]) fcmla_scal.c:6:9: note: node 0x66ad4e0 (max_nunits=2, refcnt=2) vector(2) double fcmla_scal.c:6:9: note: op template: a$real_10 = a.real; fcmla_scal.c:6:9: note: stmt 0 a$real_10 = a.real; fcmla_scal.c:6:9: note: stmt 1 a$imag_11 = a.imag; $5 = void >>> p debug (l0node[0]) fcmla_scal.c:6:9: note: node 0x66ad220 (max_nunits=2, refcnt=3) vector(2) double fcmla_scal.c:6:9: note: op template: _2 = _1 + a$real_10; fcmla_scal.c:6:9: note: stmt 0 _2 = _1 + a$real_10; fcmla_scal.c:6:9: note: stmt 1 _6 = _5 + a$imag_11; fcmla_scal.c:6:9: note: children 0x66ad2d0 0x66ad4e0 $6 = void >>> p debug (vals[0]) fcmla_scal.c:6:9: note: node 0x66ad2d0 (max_nunits=2, refcnt=2) vector(2) double fcmla_scal.c:6:9: note: op template: _1 = b$real_12 * c$real_14; fcmla_scal.c:6:9: note: stmt 0 _1 = b$real_12 * c$real_14; fcmla_scal.c:6:9: note: stmt 1 _5 = b$real_12 * c$imag_15; fcmla_scal.c:6:9: note: children 0x66ad380 0x66ad430 $7 = void by match.pd, which gets the multiplication on the first operand of the +. and we only check the first one. Fixing that gives the right sequence.
