On Fri, Jan 12, 2018 at 01:55:26PM -0600, Kelvin Nilsen wrote: > On Power 7 and Power 8 little endian, the code generator has been > emitting two instructions for each vector load and each vector store. > One instruction does a swapping load or store, and the second > instruction does an in-register swap. > > This patch replaces the two-instruction sequences with a single lvx (for > loads) or stvx (for stores) instruction in the very common case that the > vector is known to reside at a quad-word aligned address in memory. > This patch is most relevant to Power 7 and Power 8 targets because > Power 9 code generation uses new single-instruction encodings for both > aligned and unaligned vector loads and stores. > > This patch has been boostrapped and tested without regressions on > powerpc64le-unknown-linux (P8). It has also been boostrapped and tested > on powerpc-linux (P7 and P8, big-endian, with both -m32 and -m64 target > options). > > One regression was identified during big-endian regression testing: > > > FAIL: gcc.target/powerpc/pr83399.c (internal compiler error) > > FAIL: gcc.target/powerpc/pr83399.c (test for excess errors) > > The pr83399.c test and the ICE are related to a recently committed patch > that addresses a problem originally found and reported as part of the work > on this lvx/stvx optimization patch. It appears that the PR83399 patch > may not have fully addressed the big-endian aspects of the original > problem report.
So your patch is just exposing a pre-existing problem, in a single testcase. I am fine with that. (But let's follow up, etc.) > 2018-01-10 Kelvin Nilsen <kel...@gcc.gnu.org> > > * config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p): > New function. > (rs6000_quadword_masked_address_p): Likewise. > (quad_aligned_load_p): Likewise. > (quad_aligned_store_p): Likewise. > (const_load_sequence_p): Add comment to describe the outer-most loop. > (mimic_memory_attributes_and_flags): New function. > (rs6000_gen_stvx): Likewise. > (replace_swapped_aligned_store): Likewise. > (rs6000_gen_lvx): Likewise. > (replace_swapped_aligned_load): Likewise. > (replace_swapped_load_constant): Capitalize argument name in > comment describing this function. > (rs6000_analyze_swaps): Add a third pass to search for vector loads > and stores that access quad-word aligned addresses and replace > with stvx or lvx instructions when appropriate. > * config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p): > New function prototype. > (rs6000_quadword_masked_address_p): Likewise. > (rs6000_gen_lvx): Likewise. > (rs6000_gen_stvx): Likewise. > * config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes > VSX_D (V2DF, V2DI), modify this split to select lvx instruction > when memory address is aligned. > (*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify > this split to select lvx instruction when memory address is aligned. > (*vsx_le_perm_load_v8hi): Modify this split to select lvx > instruction when memory address is aligned. > (*vsx_le_perm_load_v16qi): Likewise. > (four unnamed splitters): Modify to select the stvx instruction > when memory is aligned. > > gcc/testsuite/ChangeLog: > > 2018-01-10 Kelvin Nilsen <kel...@gcc.gnu.org> > > * gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look > for lvx and stvx instead of lxvd2x and stxvd2x and require > little-endian target. Add comments. > * gcc.target/powerpc/swaps-p8-28.c: Add functions for more > comprehensive testing. > * gcc.target/powerpc/swaps-p8-29.c: Likewise. > * gcc.target/powerpc/swaps-p8-30.c: Likewise. > * gcc.target/powerpc/swaps-p8-31.c: Likewise. > * gcc.target/powerpc/swaps-p8-32.c: Likewise. > * gcc.target/powerpc/swaps-p8-33.c: Likewise. > * gcc.target/powerpc/swaps-p8-34.c: Likewise. > * gcc.target/powerpc/swaps-p8-35.c: Likewise. > * gcc.target/powerpc/swaps-p8-36.c: Likewise. > * gcc.target/powerpc/swaps-p8-37.c: Likewise. > * gcc.target/powerpc/swaps-p8-38.c: Likewise. > * gcc.target/powerpc/swaps-p8-39.c: Likewise. > * gcc.target/powerpc/swaps-p8-40.c: Likewise. > * gcc.target/powerpc/swaps-p8-41.c: Likewise. > * gcc.target/powerpc/swaps-p8-42.c: Likewise. > * gcc.target/powerpc/swaps-p8-43.c: Likewise. > * gcc.target/powerpc/swaps-p8-44.c: Likewise. > * gcc.target/powerpc/swaps-p8-45.c: Likewise. > * gcc.target/powerpc/vec-extract-2.c: Add comment and remove > scan-assembler-not directives that forbid lvx and xxpermdi. > * gcc.target/powerpc/vec-extract-3.c: Likewise. > * gcc.target/powerpc/vec-extract-5.c: Likewise. > * gcc.target/powerpc/vec-extract-6.c: Likewise. > * gcc.target/powerpc/vec-extract-7.c: Likewise. > * gcc.target/powerpc/vec-extract-8.c: Likewise. > * gcc.target/powerpc/vec-extract-9.c: Likewise. > * gcc.target/powerpc/vsx-vector-6-le.c: Change > scan-assembler-times directives to reflect different numbers of > expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions. > > libcpp/ChangeLog: > > 2018-01-10 Kelvin Nilsen <kel...@gcc.gnu.org> > > * lex.c (search_line_fast): Remove illegal coercion of an > unaligned pointer value to vector pointer type and replace with > use of __builtin_vec_vsx_ld () built-in function, which operates > on unaligned pointer values. > +/* Return true iff EXPR represents an address expression that masks off > + the low-order 4 bits in the style of an lvx or stvx rtl pattern. */ > +bool > +rs6000_quadword_masked_address_p (const_rtx expr) > +{ > + if (GET_CODE (expr) == AND) > + { > + const_rtx operand1 = XEXP (expr, 0); > + const_rtx operand2 = XEXP (expr, 1); > + if ((REG_P (operand2) || rs6000_sum_of_two_registers_p (operand2)) > + && CONST_SCALAR_INT_P (operand1) && INTVAL (operand1) == -16) > + return true; > + } > + return false; > +} Is that correct? The -16 will be the second arg of the AND, the regs the first, in canonical RTL. > + && MEM_ALIGN (mem) >= 128)? true: false; We write spaces before ? and :, too. > + /* if this is not the definition of the candidate swap register, > + then skip it. I am interested in a different definition. */ Capital on "If". > +rtx > +rs6000_gen_stvx (enum machine_mode mode, rtx dest_exp, rtx src_exp) { { goes on a new line. > + /* KFmode, TFmode, other modes not expected in this context. */ > + gcc_unreachable (); (Whitespace at end of line). Looks fine, except the nits and the rs6000_quadword_masked_address_p thing, which looks pretty serious. Could you look at that, retest with it fixed (if you agree it is broken as-is)? Segher