The first version of this had a big bug and cleared past the requested bytes. This version passes regstrap on ppc64le(power7/8/9), ppc64be(power6/7/8), and ppc32(power8).
OK for trunk (and 8 backport after a week)? Thanks! Aaron Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 266524) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -85,6 +85,8 @@ if (! optimize_size && bytes > 8 * clear_step) return 0; + bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); + for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) { machine_mode mode = BLKmode; @@ -91,8 +93,7 @@ rtx dest; if (TARGET_ALTIVEC - && ((bytes >= 16 && align >= 128) - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) + && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok))) { clear_bytes = 16; mode = V4SImode; On 11/26/18 4:29 PM, Segher Boessenkool wrote: > On Mon, Nov 26, 2018 at 03:08:32PM -0600, Aaron Sawdey wrote: >> When I previously added the use of unaligned vsx stores to inline expansion >> of memset, I didn't do a good job of managing boundary conditions. The >> intention >> was to only use unaligned vsx if the block being cleared was more than 32 >> bytes. >> What it actually did was to prevent the use of unaligned vsx for the last 32 >> bytes of any block being cleared. So this change puts the test up front so it >> is not affected by the decrement of bytes. > > Oh wow. Yes, that isn't so great. Okay for trunk (and whatever backports). > Thanks, > > > Segher > > >> 2018-11-26 Aaron Sawdey <acsaw...@linux.ibm.com> >> >> * config/rs6000/rs6000-string.c (expand_block_clear): Change how >> we determine if unaligned vsx is ok. > -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain