Hi James, I have split off the aarch64 bit off from the generic parts and processed your feedback.
Attached is the reworked patch. Ok for Tunk? Thanks, Tamar Thanks, Tamar gcc/ 2017-11-14 Tamar Christina <tamar.christ...@arm.com> * config/aarch64/aarch64.c (aarch64_expand_movmem): Add MEM to REG optimized case. gcc/testsuite/ 2017-11-14 Tamar Christina <tamar.christ...@arm.com> * gcc.target/aarch64/structs.c > -----Original Message----- > From: James Greenhalgh [mailto:james.greenha...@arm.com] > Sent: Wednesday, August 30, 2017 16:56 > To: Tamar Christina <tamar.christ...@arm.com> > Cc: Richard Sandiford <richard.sandif...@linaro.org>; GCC Patches <gcc- > patc...@gcc.gnu.org>; nd <n...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com>; Marcus Shawcroft > <marcus.shawcr...@arm.com> > Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for > structure/block/unaligned memory access > > Hi Tamar, > > I think the AArch64 parts of this patch can be substantially simplified. > > On Mon, Jul 03, 2017 at 07:17:51AM +0100, Tamar Christina wrote: > > diff --git a/gcc/config/aarch64/aarch64.c > > b/gcc/config/aarch64/aarch64.c index > > > ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd0 > 37 > > 54bbba9264a6 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -13466,7 +13466,6 @@ > aarch64_copy_one_block_and_progress_pointers > > (rtx *src, rtx *dst, > > > > /* Expand movmem, as if from a __builtin_memcpy. Return true if > > we succeed, otherwise return false. */ > > - > > bool > > aarch64_expand_movmem (rtx *operands) { @@ -13498,16 +13497,55 > @@ > > aarch64_expand_movmem (rtx *operands) > > base = copy_to_mode_reg (Pmode, XEXP (src, 0)); > > src = adjust_automodify_address (src, VOIDmode, base, 0); > > > > + /* Optimize routines for MEM to REG copies. > > + This particular function only handles copying to two > > + destination types: 1) a regular register and 2) the stack. > > + When writing to a regular register we may end up writting too much in > cases > > + where the register already contains a live value or when no data > padding is > > + happening so we disallow it. */ if (n < 8 && !REG_P (XEXP > > + (operands[0], 0))) > > I don't understand how this comment lines up with the condition on this > code path. On the one hand you say you permit regular registers, but on the > other hand you say you disallow regular registers because you may overwrite. > > > > + { > > + machine_mode mov_mode, dest_mode > > + = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT); > > + rtx result = gen_reg_rtx (dest_mode); > > + emit_insn (gen_move_insn (result, GEN_INT (0))); > > + > > + unsigned int shift_cnt = 0; > > Any reason not to just initialise the shift_cnt in place here? > > > + for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode)) > > > for (unsigned int shift_cnt = 0; > shift_cnt <= n; > shift_cnt += GET_MODE_SIZE (mov_mode)) > > Having the loop counter first in the comparison is personal preference. > > mov_mode looks uninitialised, but I guess by the time you check the loop > condition you have actually initialized it. > > > + { > > + int nearest = 0; > > This isn't required, we could do the loop below with one > > > + /* Find the mode to use. */ > > + for (unsigned max = 1; max <= (n - shift_cnt); max *= 2) > > + nearest = max; > > Wrong formatting. > > So, when this loop terminates for the first time (shift_cnt == 0) nearest is > the > first power of two greater than n. > > > + > > + mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT, > > +MODE_INT); > > That means this is always a mode of at least the right size, which in turn > means that we can't execute round the loop again (GET_MODE_SIZE > (mov_mode) will always be greater than n). So, we can be sure this loop only > executes once, and we can fold mov_mode and dest_mode to be equal. > > > + rtx reg = gen_reg_rtx (mov_mode); > > + > > + src = adjust_address (src, mov_mode, 0); > > + emit_insn (gen_move_insn (reg, src)); > > + src = aarch64_progress_pointer (src); > > + > > + reg = gen_rtx_ZERO_EXTEND (dest_mode, reg); > > + reg = gen_rtx_ASHIFT (dest_mode, reg, > > + GEN_INT (shift_cnt * BITS_PER_UNIT)); > > + result = gen_rtx_IOR (dest_mode, reg, result); > > + } > > + > > + dst = adjust_address (dst, dest_mode, 0); > > + emit_insn (gen_move_insn (dst, result)); > > + return true; > > + } > > + > > /* Simple cases. Copy 0-3 bytes, as (if applicable) a 2-byte, then a > > 1-byte chunk. */ > > if (n < 4) > > { > > if (n >= 2) > > - { > > - aarch64_copy_one_block_and_progress_pointers (&src, &dst, > HImode); > > - n -= 2; > > - } > > > > + { > > + aarch64_copy_one_block_and_progress_pointers (&src, &dst, > HImode); > > + n -= 2; > > + } > > These formatting changes leave a blank newline between if (n >= 2) and the > remainder of this block. Why? > > > if (n == 1) > > aarch64_copy_one_block_and_progress_pointers (&src, &dst, > QImode); > > > > Thanks, > James
memcpy-aarch64.patch
Description: memcpy-aarch64.patch