https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809

palmer at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
                 CC|                            |palmer at gcc dot gnu.org
   Last reconfirmed|                            |2024-04-22

--- Comment #1 from palmer at gcc dot gnu.org ---
Thanks.

Sounds like there's really two issues here: a missed peephole and a more
complex set of micro-architectural tradeoffs.

The peephole seems like a pretty straight-forward missed optimization, if
you've got a smaller reproducer it's probably worth filing another bug for it. 
We're right at the end of the GCC-14 release process and ended up with some
last-minute breakages so stuff is pretty chaotic right now, having the bug will
make it easier to avoid forgetting about this.

The reduction looks way more complicated to me.  Just thinking a bit as I'm
watching the regressions run, I think there's a few options for generating the
code here:

* Do we accumulate into a vector and then reduce, or reduce and then
accumulate?
* Do we reduce via a sum-reduction or a popcnt?
* Do we reconfigure to a wider type or handle the overflow?

I think this will depend on the cost model for the hardware: we're essentially
trading off operations of one flavor of op for another, and that's going to
depend on how these ops perform.  Your suggestion is essentially a
reconfiguration vs reduction trade-off, which is probably going to be
implementation-specific.

Do you have a system that this code performs poorly on?  If there's something
concrete to target and we're not generating good code that's pretty actionable,
otherwise I think this one is going to be hard to reason about for a bit.

Reply via email to