https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914
--- Comment #25 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>: https://gcc.gnu.org/g:3ac58063114cf491891072be6205d32a42c6707d commit r14-6915-g3ac58063114cf491891072be6205d32a42c6707d Author: Roger Sayle <ro...@nextmovesoftware.com> Date: Thu Jan 4 10:49:33 2024 +0000 Improved RTL expansion of field assignments into promoted registers. This patch fixes PR rtl-optmization/104914 by tweaking/improving the way the fields are written into a pseudo register that needs to be kept sign extended. The motivating example from the bugzilla PR is: extern void ext(int); void foo(const unsigned char *buf) { int val; ((unsigned char*)&val)[0] = *buf++; ((unsigned char*)&val)[1] = *buf++; ((unsigned char*)&val)[2] = *buf++; ((unsigned char*)&val)[3] = *buf++; if(val > 0) ext(1); else ext(0); } which at the end of the tree optimization passes looks like: void foo (const unsigned char * buf) { int val; unsigned char _1; unsigned char _2; unsigned char _3; unsigned char _4; int val.5_5; <bb 2> [local count: 1073741824]: _1 = *buf_7(D); MEM[(unsigned char *)&val] = _1; _2 = MEM[(const unsigned char *)buf_7(D) + 1B]; MEM[(unsigned char *)&val + 1B] = _2; _3 = MEM[(const unsigned char *)buf_7(D) + 2B]; MEM[(unsigned char *)&val + 2B] = _3; _4 = MEM[(const unsigned char *)buf_7(D) + 3B]; MEM[(unsigned char *)&val + 3B] = _4; val.5_5 = val; if (val.5_5 > 0) goto <bb 3>; [59.00%] else goto <bb 4>; [41.00%] <bb 3> [local count: 633507681]: ext (1); goto <bb 5>; [100.00%] <bb 4> [local count: 440234144]: ext (0); <bb 5> [local count: 1073741824]: val ={v} {CLOBBER(eol)}; return; } Here four bytes are being sequentially written into the SImode value val. On some platforms, such as MIPS64, this SImode value is kept in a 64-bit register, suitably sign-extended. The function expand_assignment contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an explicit extension operation after each store_field (typically insv) to such promoted/extended pseudos. The first observation is that there's no need to perform sign extension after each byte in the example above; the extension is only required after changes to the most significant byte (i.e. to a field that overlaps the most significant bit). The bug fix is actually a bit more subtle, but at this point during code expansion it's not safe to use a SUBREG when sign-extending this field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) but combine (and other RTL optimizers) later realize that because SImode values are always sign-extended in their 64-bit hard registers that this is a no-op and eliminates it. The trouble is that it's unsafe to refer to the SImode lowpart of a 64-bit register using SUBREG at those critical points when temporarily the value isn't correctly sign-extended, and the usual backend invariants don't hold. At these critical points, the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. 2024-01-04 Roger Sayle <ro...@nextmovesoftware.com> Jeff Law <j...@ventanamicro.com> gcc/ChangeLog PR rtl-optimization/104914 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P a sign or zero extension is only required if the modified field overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED targets, don't refer to the temporarily incorrectly extended value using a SUBREG, but instead generate an explicit TRUNCATE rtx.