https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809
palmer at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Keywords| |missed-optimization Ever confirmed|0 |1 CC| |palmer at gcc dot gnu.org Last reconfirmed| |2024-04-22 --- Comment #1 from palmer at gcc dot gnu.org --- Thanks. Sounds like there's really two issues here: a missed peephole and a more complex set of micro-architectural tradeoffs. The peephole seems like a pretty straight-forward missed optimization, if you've got a smaller reproducer it's probably worth filing another bug for it. We're right at the end of the GCC-14 release process and ended up with some last-minute breakages so stuff is pretty chaotic right now, having the bug will make it easier to avoid forgetting about this. The reduction looks way more complicated to me. Just thinking a bit as I'm watching the regressions run, I think there's a few options for generating the code here: * Do we accumulate into a vector and then reduce, or reduce and then accumulate? * Do we reduce via a sum-reduction or a popcnt? * Do we reconfigure to a wider type or handle the overflow? I think this will depend on the cost model for the hardware: we're essentially trading off operations of one flavor of op for another, and that's going to depend on how these ops perform. Your suggestion is essentially a reconfiguration vs reduction trade-off, which is probably going to be implementation-specific. Do you have a system that this code performs poorly on? If there's something concrete to target and we're not generating good code that's pretty actionable, otherwise I think this one is going to be hard to reason about for a bit.