https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note that on Haswell the conditional moves are two uops while on Broadwell and
up
they are only one uop (overall loop 16 uops vs. 18 uops).  IACA doesn't show
any particular issue (the iterations shoud neatly interleave w/o inter
iteration
dependences), but it says the throughput bottleneck is dependency chains
(not sure if it models conditional moves very well).

Reply via email to