https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

--- Comment #25 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:3ac58063114cf491891072be6205d32a42c6707d

commit r14-6915-g3ac58063114cf491891072be6205d32a42c6707d
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Thu Jan 4 10:49:33 2024 +0000

    Improved RTL expansion of field assignments into promoted registers.

    This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
    the fields are written into a pseudo register that needs to be kept sign
    extended.

    The motivating example from the bugzilla PR is:

    extern void ext(int);
    void foo(const unsigned char *buf) {
      int val;
      ((unsigned char*)&val)[0] = *buf++;
      ((unsigned char*)&val)[1] = *buf++;
      ((unsigned char*)&val)[2] = *buf++;
      ((unsigned char*)&val)[3] = *buf++;
      if(val > 0)
        ext(1);
      else
        ext(0);
    }

    which at the end of the tree optimization passes looks like:

    void foo (const unsigned char * buf)
    {
      int val;
      unsigned char _1;
      unsigned char _2;
      unsigned char _3;
      unsigned char _4;
      int val.5_5;

      <bb 2> [local count: 1073741824]:
      _1 = *buf_7(D);
      MEM[(unsigned char *)&val] = _1;
      _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
      MEM[(unsigned char *)&val + 1B] = _2;
      _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
      MEM[(unsigned char *)&val + 2B] = _3;
      _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
      MEM[(unsigned char *)&val + 3B] = _4;
      val.5_5 = val;
      if (val.5_5 > 0)
        goto <bb 3>; [59.00%]
      else
        goto <bb 4>; [41.00%]

      <bb 3> [local count: 633507681]:
      ext (1);
      goto <bb 5>; [100.00%]

      <bb 4> [local count: 440234144]:
      ext (0);

      <bb 5> [local count: 1073741824]:
      val ={v} {CLOBBER(eol)};
      return;

    }

    Here four bytes are being sequentially written into the SImode value
    val.  On some platforms, such as MIPS64, this SImode value is kept in
    a 64-bit register, suitably sign-extended.  The function expand_assignment
    contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
    in expr.cc) which outputs an explicit extension operation after each
    store_field (typically insv) to such promoted/extended pseudos.

    The first observation is that there's no need to perform sign extension
    after each byte in the example above; the extension is only required
    after changes to the most significant byte (i.e. to a field that overlaps
    the most significant bit).

    The bug fix is actually a bit more subtle, but at this point during
    code expansion it's not safe to use a SUBREG when sign-extending this
    field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
    but combine (and other RTL optimizers) later realize that because SImode
    values are always sign-extended in their 64-bit hard registers that
    this is a no-op and eliminates it.  The trouble is that it's unsafe to
    refer to the SImode lowpart of a 64-bit register using SUBREG at those
    critical points when temporarily the value isn't correctly sign-extended,
    and the usual backend invariants don't hold.  At these critical points,
    the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
    TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
    (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.

    2024-01-04  Roger Sayle  <ro...@nextmovesoftware.com>
                Jeff Law  <j...@ventanamicro.com>

    gcc/ChangeLog
            PR rtl-optimization/104914
            * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
            a sign or zero extension is only required if the modified field
            overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
            targets, don't refer to the temporarily incorrectly extended value
            using a SUBREG, but instead generate an explicit TRUNCATE rtx.

Reply via email to