https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125636

            Bug ID: 125636
           Summary: [missed optimization] HImode byte-extend from memory
                    causes partial-register-write stall on modern x86-64
           Product: gcc
           Version: 17.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: etc_26 at yahoo dot com
  Target Milestone: ---

On modern out-of-order x86-64 microarchitectures (Intel Skylake+, AMD
Zen+, "generic" tuning), writing a 16-bit partial register (%ax, %bx, …)
while the upper 16 bits of the enclosing 32-bit register are live
creates a false dependency: the CPU must merge the old upper half with
the new lower half before subsequent reads of the full register can
proceed.

GCC's extendqihi2 / zero_extendqihi2 patterns emit movsbw / movzbw
(16-bit destination writes), triggering this hazard.  LLVM avoids it
by always selecting the 32-bit MOVSX32rm8 / MOVZX32rm8 forms.

Minimal reproducer (compile with -O2):

  short f(const signed char *p) { return *p; }

GCC output (movsbw = partial-register write ← BAD):
  movsbw  (%rdi), %ax
  ret

Expected output (movsbl = full-register write ← GOOD):
  movsbl  (%rdi), %eax
  ret

The fix is a new pre-RA RTL pass (pass_fixup_bw) that rewrites
  (set (reg:HI R) (sign/zero_extend:HI (mem:QI addr)))
to a SImode extend + HImode lowpart subreg, which register allocation
coalesces away, yielding movsbl/movzbl.  A patch is attached / posted
to gcc-patches.

Reply via email to