https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125636
Bug ID: 125636
Summary: [missed optimization] HImode byte-extend from memory
causes partial-register-write stall on modern x86-64
Product: gcc
Version: 17.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: etc_26 at yahoo dot com
Target Milestone: ---
On modern out-of-order x86-64 microarchitectures (Intel Skylake+, AMD
Zen+, "generic" tuning), writing a 16-bit partial register (%ax, %bx, …)
while the upper 16 bits of the enclosing 32-bit register are live
creates a false dependency: the CPU must merge the old upper half with
the new lower half before subsequent reads of the full register can
proceed.
GCC's extendqihi2 / zero_extendqihi2 patterns emit movsbw / movzbw
(16-bit destination writes), triggering this hazard. LLVM avoids it
by always selecting the 32-bit MOVSX32rm8 / MOVZX32rm8 forms.
Minimal reproducer (compile with -O2):
short f(const signed char *p) { return *p; }
GCC output (movsbw = partial-register write ← BAD):
movsbw (%rdi), %ax
ret
Expected output (movsbl = full-register write ← GOOD):
movsbl (%rdi), %eax
ret
The fix is a new pre-RA RTL pass (pass_fixup_bw) that rewrites
(set (reg:HI R) (sign/zero_extend:HI (mem:QI addr)))
to a SImode extend + HImode lowpart subreg, which register allocation
coalesces away, yielding movsbl/movzbl. A patch is attached / posted
to gcc-patches.