On Sat, May 11, 2024 at 12:32 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 5/7/24 11:17 PM, Christoph Müllner wrote: > > The RISC-V cpymemsi expansion is called, whenever the by-pieces > > infrastructure will not take care of the builtin expansion. > > The code emitted by the by-pieces infrastructure may emits code, > > that includes unaligned accesses if riscv_slow_unaligned_access_p > > is false. > > > > The RISC-V cpymemsi expansion is handled via riscv_expand_block_move(). > > The current implementation of this function does not check > > riscv_slow_unaligned_access_p and never emits unaligned accesses. > > > > Since by-pieces emits unaligned accesses, it is reasonable to implement > > the same behaviour in the cpymemsi expansion. And that's what this patch > > is doing. > > > > The patch checks riscv_slow_unaligned_access_p at the entry and sets > > the allowed alignment accordingly. This alignment is then propagated > > down to the routines that emit the actual instructions. > > > > The changes introduced by this patch can be seen in the adjustments > > of the cpymem tests. > > > > gcc/ChangeLog: > > > > * config/riscv/riscv-string.cc (riscv_block_move_straight): Add > > parameter align. > > (riscv_adjust_block_mem): Replace parameter length by align. > > (riscv_block_move_loop): Add parameter align. > > (riscv_expand_block_move_scalar): Set alignment properly if the > > target has fast unaligned access. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access. > > * gcc.target/riscv/cpymem-64-ooo.c: Likewise. > Mostly ok. One concern noted below. > > > > > > Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu> > > --- > > gcc/config/riscv/riscv-string.cc | 53 +++++++++++-------- > > .../gcc.target/riscv/cpymem-32-ooo.c | 20 +++++-- > > .../gcc.target/riscv/cpymem-64-ooo.c | 14 ++++- > > 3 files changed, 59 insertions(+), 28 deletions(-) > > > > @@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx > > length) > > unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); > > unsigned HOST_WIDE_INT factor, align; > > > > - align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); > > - factor = BITS_PER_WORD / align; > > + if (riscv_slow_unaligned_access_p) > > + { > > + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); > > + factor = BITS_PER_WORD / align; > > + } > > + else > > + { > > + align = hwi_length * BITS_PER_UNIT; > > + factor = 1; > > + } > Not sure why you're using hwi_length here. That's a property of the > host, not the target. ISTM you wanted BITS_PER_WORD here to encourage > word sized moves irrespective of alignment.
We set 'align' here to pretend proper alignment to force unaligned accesses (if needed). 'hwi_length' is defined above as: unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); So, it is not a host property, but the number of bytes to compare. Setting 'align' to BITS_PER_WORD does the same but is indeed the better choice. I'll also add the comment "Pretend alignment" to make readers aware of the fact that we ignore the actual alignment. > OK with that change after a fresh rounding of testing. Pushed after adjusting as stated above and retesting: * rv32 & rv64: GCC regression tests * rv64: CPU 2017 intrate