https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863

--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> ---
For binary size, the `movsd` takes 4 bytes and the `blendps` takes 6 bytes

The port allocations for the instructions are as such (same formatting as for
the throughputs) : 

Wolfdale: p5, p015
Nehalem: p5, p5
Westmere: p5, p5
Sandy Bridge: p05, p5
Ivy Bridge: p05, p5
Haswell: p015, p5
Broadwell: p015, p5
Skylake: p015, p5
Skylake-X: p015, p5
Kaby Lake: p015, p5
Coffee Lake: p015, p5
Cannon Lake: p015, p015
Ice Lake: p015, p015
Zen+: fp01, fp0123
Zen 2: fp013, fp0123

Something like "p015" meaning that the instruction can be executed on port 0, 1
or 5. Also, all architectures have both instructions take a single uop.

The latency of `blendps` and `movsd` are 1 on every single architecture I could
test

Final note : The numbers are specifically for the `blendps xmm, xmm, imm8` and
the `movsd xmm, xmm` forms of those instructions

Reply via email to