Hi, https://gcc.gnu.org/PR81504 reports a problem with the use of vec_st to generate the stvx instruction. With swap optimization enabled, the stvx instruction uses the wrong base address, causing data corruption.
The problem arises in the recombine_lvx_stvx pre-pass that runs prior to swap optimization proper. This pre-pass looks for RTL that can be combined into an lvx or stvx pattern and makes that transformation before the swap optimizer has a chance to disturb things beyond recognition. Unfortunately it contains a rather silly bug. The base register for the memory operand of an lvx or stvx looks something like ((ptr + offset) & -16). The optimization copies this expression into the new lvx or stvx pattern. However, Dorothy, we aren't in SSA form anymore, and we can't assume this expression is fully available at the point of replacement. The fix is to insert a copy immediately after the AND expression that computes this result. The AND contains a register with the value (ptr + offset) at that location. We copy that into a new pseudo and use that pseudo in the AND subexpression of the lvx/stvx, rather than the original register. This solves the problem reported in the PR. The bug was introduced in GCC 7, so I would like to backport this after sufficient burn-in time. The backport will be the same, except that this code was in rs6000.c rather than rs6000-p8swap.c in GCC 7. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this okay for trunk and 7? Thanks, Bill 2017-08-24 Bill Schmidt <wschm...@linux.vnet.ibm.com> PR target/81504 * config/rs6000/rs6000-p8swap.c (find_alignment_op): Add reference parameter and_insn and return it. (recombine_lvx_pattern): Insert a copy to ensure availability of the base register of the copied masking operation at the point of the instruction replacement. (recombine_stvx_pattern): Likewise. Index: gcc/config/rs6000/rs6000-p8swap.c =================================================================== --- gcc/config/rs6000/rs6000-p8swap.c (revision 251339) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -1431,9 +1431,10 @@ alignment_mask (rtx_insn *insn) } /* Given INSN that's a load or store based at BASE_REG, look for a - feeding computation that aligns its address on a 16-byte boundary. */ + feeding computation that aligns its address on a 16-byte boundary. + Return the rtx and its containing AND_INSN. */ static rtx -find_alignment_op (rtx_insn *insn, rtx base_reg) +find_alignment_op (rtx_insn *insn, rtx base_reg, rtx_insn **and_insn) { df_ref base_use; struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); @@ -1454,8 +1455,8 @@ static rtx if (DF_REF_IS_ARTIFICIAL (base_def_link->ref)) break; - rtx_insn *and_insn = DF_REF_INSN (base_def_link->ref); - and_operation = alignment_mask (and_insn); + *and_insn = DF_REF_INSN (base_def_link->ref); + and_operation = alignment_mask (*and_insn); if (and_operation != 0) break; } @@ -1477,7 +1478,8 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *t rtx mem = XEXP (SET_SRC (body), 0); rtx base_reg = XEXP (mem, 0); - rtx and_operation = find_alignment_op (insn, base_reg); + rtx_insn *and_insn; + rtx and_operation = find_alignment_op (insn, base_reg, &and_insn); if (and_operation != 0) { @@ -1501,7 +1503,21 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *t to_delete[INSN_UID (swap_insn)].replace = true; to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn; - XEXP (mem, 0) = and_operation; + /* However, first we must be sure that we make the + base register from the AND operation available + in case the register has been overwritten. Copy + the base register to a new pseudo and use that + as the base register of the AND operation in + the new LVX instruction. */ + rtx and_base = XEXP (and_operation, 0); + rtx new_reg = gen_reg_rtx (GET_MODE (and_base)); + rtx copy = gen_rtx_SET (new_reg, and_base); + rtx_insn *new_insn = emit_insn_after (copy, and_insn); + set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn)); + df_insn_rescan (new_insn); + + XEXP (mem, 0) = gen_rtx_AND (GET_MODE (and_base), new_reg, + XEXP (and_operation, 1)); SET_SRC (body) = mem; INSN_CODE (insn) = -1; /* Force re-recognition. */ df_insn_rescan (insn); @@ -1524,7 +1540,8 @@ recombine_stvx_pattern (rtx_insn *insn, del_info * rtx mem = SET_DEST (body); rtx base_reg = XEXP (mem, 0); - rtx and_operation = find_alignment_op (insn, base_reg); + rtx_insn *and_insn; + rtx and_operation = find_alignment_op (insn, base_reg, &and_insn); if (and_operation != 0) { @@ -1552,7 +1569,21 @@ recombine_stvx_pattern (rtx_insn *insn, del_info * to_delete[INSN_UID (swap_insn)].replace = true; to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn; - XEXP (mem, 0) = and_operation; + /* However, first we must be sure that we make the + base register from the AND operation available + in case the register has been overwritten. Copy + the base register to a new pseudo and use that + as the base register of the AND operation in + the new STVX instruction. */ + rtx and_base = XEXP (and_operation, 0); + rtx new_reg = gen_reg_rtx (GET_MODE (and_base)); + rtx copy = gen_rtx_SET (new_reg, and_base); + rtx_insn *new_insn = emit_insn_after (copy, and_insn); + set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn)); + df_insn_rescan (new_insn); + + XEXP (mem, 0) = gen_rtx_AND (GET_MODE (and_base), new_reg, + XEXP (and_operation, 1)); SET_SRC (body) = src_reg; INSN_CODE (insn) = -1; /* Force re-recognition. */ df_insn_rescan (insn);