On Fri, Oct 24, 2014 at 01:06:41PM +0100, Alan Lawrence wrote: > This migrates the reduction patterns in altivec.md and vector.md to > the new names. I've not touched paired.md as I wasn't really sure > how to fix that (how do I vec_extractv2sf ?), moreover the testing I > did didn't seem to exercise any of those patterns (iow: I'm not sure > what would be an appropriate target machine?). > > I note the reduc_uplus_v16qi (which I've removed, as unsigned and > signed addition should be equivalent) differed from > reduc_splus_v16qi in using gen_altivec_vsum4ubs rather than > gen_altivec_vsum4sbs. Testcases > gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c} > thus produce assembly which differs from previously (only) in that > "vsum4ubs" becomes "vsum4sbs". These tests are still passing so I > assume this is OK. > > The combining of signed and unsigned addition also improves > gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c} > : these are now reduced using direct vector reduction, rather than > with shifts as previously (because there was only a reduc_splus > rather than the reduc_uplus these tests looked for).
I checked the integer vector add reductions, and it seems to generate the same value with old/new code, and I like eliminating the vector shift. > ((Side note: the RTL changes to vector.md are to match the combine > patterns in vsx.md; now that we now longer depend upon combine to > generate those patterns (as the optab outputs them directly), one > might wish to remove the smaller pattern from vsx.md, and/or > simplify the RTL. I theorize that a reduction of a two-element > vector is just adding the first element to the second, so maybe to > something like > > [(parallel [(set (match_operand:DF 0 "vfloat_operand" "") > (VEC_reduc:V2DF > (vec_select:DF > (match_operand:V2DF 1 "vfloat_operand" "") > (parallel [(const_int 1)])) > (vec_select:DF > (match_dup 1) > (parallel [(const_int 0)])))) > (clobber (match_scratch:V2DF 2 ""))])] > > but I think it's best for me to leave that to the port maintainers.)) > > Bootstrapped and check-gcc on powerpc64-none-linux-gnu > (gcc110.fsffrance.org, with thanks to the GCC Compile Farm). However, the double pattern is completely broken. This cannot go in. Consider this source: #include <stdio.h> #include <stddef.h> #include <stdlib.h> #include <string.h> #ifndef TYPE #define TYPE double #endif #ifndef OTYPE #define OTYPE TYPE #endif #ifndef SIZE #define SIZE 1024 #endif #ifndef ALIGN #define ALIGN 32 #endif TYPE a[SIZE] __attribute__((__aligned__(ALIGN))); OTYPE sum (void) __attribute__((__noinline__)); OTYPE sum (void) { size_t i; OTYPE s = (OTYPE) 0; for (i = 0; i < SIZE; i++) s += a[i]; return s; } If I compile with today's trunk, and -mcpu=power8 -ffast-math -O3, I get code that I expect (though it could xxpermdi instead of xxsldwi): sum: .quad .L.sum,.TOC.@tocbase,0 .previous .type sum, @function .L.sum: li 10,512 addis 9,2,.LC1@toc@ha # gpr load fusion, type long ld 9,.LC1@toc@l(9) xxlxor 0,0,0 mtctr 10 .p2align 4,,15 .L2: lxvd2x 12,0,9 addi 9,9,16 xvadddp 0,0,12 bdnz .L2 xxsldwi 12,0,0,2 xvadddp 1,12,0 xxpermdi 1,1,1,2 blr .long 0 However, the code produced by the patches gives: sum: .quad .L.sum,.TOC.@tocbase,0 .previous .type sum, @function .L.sum: xxlxor 0,0,0 addi 10,1,-16 li 8,512 addis 9,2,.LC1@toc@ha # gpr load fusion, type long ld 9,.LC1@toc@l(9) mtctr 8 stxvd2x 0,0,10 .p2align 5,,31 .L2: addi 10,1,-16 lxvd2x 0,0,9 addi 9,9,16 lxvd2x 12,0,10 xvadddp 12,12,0 stxvd2x 12,0,10 bdnz .L2 lfd 0,-16(1) xxpermdi 1,12,12,2 fadd 1,0,1 blr .long 0 It is unacceptable to have to do the inner loop doing a load, vector add, and store in the loop. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797