LGTM, but I would like to wait Palmer's ack. I am fine with current scheme, just check riscv_slow_unaligned_access_p, in case we have performance issue, we can consider change mtune's slow_unaligned_access field to a number rather than a boolean value, so that we can have better cost model for that.
On Fri, Jul 23, 2021 at 6:55 AM Christoph Müllner via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > I have added tests for memset and memcpy. > Additionally, I have added a test to ensure that -mstrict-align is > still working. > I've run the complete GCC test suite with and without the resulting > patch with no regressions > (rv32imac/ilp32/medlow, rv32imafdc/ilp32d/medlow, > rv64imac/lp64/medlow, rv64imafdc/lp64d/medlow). > > I have not changed the patch itself, because I think it is reasonable > to assume that overlapping access > (i.e. reducing store instructions) will always be cheaper on targets, > that don't have penalties for unaligned accesses. > If maintainers disagree, I can change accordingly. > > The new patch can be found here: > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html > > > On Thu, Jul 22, 2021 at 8:23 PM Christoph Müllner <cmuell...@gcc.gnu.org> > wrote: > > > > On Thu, Jul 22, 2021 at 7:27 PM Palmer Dabbelt <pal...@dabbelt.com> wrote: > > > > > > On Thu, 22 Jul 2021 06:29:46 PDT (-0700), gcc-patches@gcc.gnu.org wrote: > > > > Could you add a testcase? Otherwise LGTM. > > > > > > > > Option: -O2 -mtune=thead-c906 -march=rv64gc -mabi=lp64 > > > > void foo(char *dst){ > > > > __builtin_memset(dst, 0, 15); > > > > } > > > > > > I'd like to see: > > > > > > * Test results. This is only on for one target right now, so relying on > > > it to just work on others isn't a good idea. > > > > So you prefer the previous version of the patch, where each > > tune struct can decide if that feature should be enabled or not? > > > > > * Something to demonstrate this doesn't break -mstrict-align. > > > > -mstrict-align sets riscv_slow_unaligned_access_p to true. > > This will be returned by TARGET_SLOW_UNALIGNED_ACCESS. > > So, this cannot happen. I will add an additional test for that. > > > > > * Some sort of performance analysis. Most machines that support > > > unaligned access do so with some performance degredation, it's unclear > > > that this conversion will actually manifst a performance improvement. > > > I don't have a C906 and don't know what workloads people care about > > > running on one, but I'm certainly not convinced this is a win -- > > > what's listed here looks to be the best case, and that's only saving > > > two instructions to generate a pretty pathological sequence > > > (misaligned access that conflicts with a prior store). > > > > > > Jojo: do you have any description of the C906 pipeline? Specifically in > > > this case it'd be good to know how it handles unaligned accesses. > > > > The tune struct already includes the field slow_unaligned_access. > > For c906 this is set to false. So the answer is: c906 handles unaligned > > access reasonably well (assuming the contents of its tune struct are > > correct). > > > > Note, that the overlapping access only happens if unaligned accesses > > are allowed. > > If slow_unaligned_access is set, then you will get a sequence of > > store-byte instructions. > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 8:53 PM Christoph Muellner via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > >> > > > >> This patch enables the overlap-by-pieces feature of the by-pieces > > > >> infrastructure for inlining builtins in case the target has set > > > >> riscv_slow_unaligned_access_p to false. > > > >> > > > >> To demonstrate the effect for targets with fast unaligned access, > > > >> the following code sequences are generated for a 15-byte memset-zero. > > > >> > > > >> Without overlap_op_by_pieces we get: > > > >> 8e: 00053023 sd zero,0(a0) > > > >> 92: 00052423 sw zero,8(a0) > > > >> 96: 00051623 sh zero,12(a0) > > > >> 9a: 00050723 sb zero,14(a0) > > > >> > > > >> With overlap_op_by_pieces we get: > > > >> 7e: 00053023 sd zero,0(a0) > > > >> 82: 000533a3 sd zero,7(a0) > > > >> > > > >> gcc/ChangeLog: > > > >> > > > >> * config/riscv/riscv.c (riscv_overlap_op_by_pieces): New > > > >> function. > > > >> (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to > > > >> riscv_overlap_op_by_pieces. > > > >> > > > >> Signed-off-by: Christoph Muellner <cmuell...@gcc.gnu.org> > > > >> --- > > > >> gcc/config/riscv/riscv.c | 11 +++++++++++ > > > >> 1 file changed, 11 insertions(+) > > > >> > > > >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c > > > >> index 576960bb37c..98c76ba657a 100644 > > > >> --- a/gcc/config/riscv/riscv.c > > > >> +++ b/gcc/config/riscv/riscv.c > > > >> @@ -5201,6 +5201,14 @@ riscv_slow_unaligned_access (machine_mode, > > > >> unsigned int) > > > >> return riscv_slow_unaligned_access_p; > > > >> } > > > >> > > > >> +/* Implement TARGET_OVERLAP_OP_BY_PIECES_P. */ > > > >> + > > > >> +static bool > > > >> +riscv_overlap_op_by_pieces (void) > > > >> +{ > > > >> + return !riscv_slow_unaligned_access_p; > > > >> +} > > > >> + > > > >> /* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ > > > >> > > > >> static bool > > > >> @@ -5525,6 +5533,9 @@ riscv_asan_shadow_offset (void) > > > >> #undef TARGET_SLOW_UNALIGNED_ACCESS > > > >> #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access > > > >> > > > >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P > > > >> +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces > > > >> + > > > >> #undef TARGET_SECONDARY_MEMORY_NEEDED > > > >> #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed > > > >> > > > >> -- > > > >> 2.31.1 > > > >>