On 2/10/19 8:42 PM, Bill Schmidt wrote: > On 2/10/19 4:05 PM, Segher Boessenkool wrote: >> Hi Bill, >> >> On Sun, Feb 10, 2019 at 10:10:02AM -0600, Bill Schmidt wrote: >>> I've added executable tests for both shift-right algebraic and shift-right >>> logical. >>> Both fail prior to applying the patch, and work correctly afterwards. >> Please add a test for left shifts, as well? > Can do. I verified that left shifts were not broken and figured a test case > had been added then, but have not checked. Good to test this particular > scenario, though. > >>> 2019-02-08 Bill Schmidt <wschm...@linux.ibm.com> >>> >>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right >>> and shift-left vector built-ins need to include a TRUNC_MOD_EXPR >>> for correct semantics. Also, don't expand a vector-splat if there >>> is a type mismatch; let the back end handle it. >> Does it always result in just the shift instruction now? Does the modulo >> ever remain? (Maybe at -O0?) Modulo is hugely expensive; it is always >> modulo by a power of two, so a simple bitmask, so maybe write that directly >> instead? > We always get the shift. For -mcpu=power8, we always load the mask from > memory rather than generating the vspltisb, which is not ideal code > generation, > but is at least correct. > > For -mcpu=power9, we get close, but have some bad register allocation and > an unnecessary extend: > > xxspltib 0,4 <- why not just xxspltib 32,4? > xxlor 32,0,0 <- wasted copy > vextsb2d 0,0 <- unnecessary due to vsrad semantics > vsrad 2,2,0 > > Again, this is at least correct. We have more work to do to produce the > most efficient code, but we have PR89213 open for that. > >>> 2019-02-08 Bill Schmidt <wschm...@linux.ibm.com> >>> >>> * gcc.target/powerpc/srad-modulo.c: New. >>> * gcc.target/powerpc/srd-modulo.c: New. >> Please put "vec-" in the testcase name. You may need to rename vec-shift.c >> and/or vec-shr.c, which are not as generic as their names suggest. > Ok. > >>> @@ -16072,6 +16116,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator * >>> arg0 = gimple_call_arg (stmt, 0); >>> lhs = gimple_call_lhs (stmt); >>> >>> + /* It's sometimes useful to use one of these to build a >>> + splat for V2DImode, since the upper bits will be ignored. >>> + Don't fold if we detect that situation, as we end up >>> + losing the splat instruction in this case. */ >>> + if (size != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (lhs))))) >>> + return false; >> This isn't really detecting that situation... Instead, it doesn't fold >> whenever the size of the splatted elements isn't the same as the size of >> the elts of the target vector. That's probably perfectly safe, but please >> spell it out. It's fine to mention the motivating case, of course. > Yep, will correct. Actually, as I look back at my notes, I believe that this > change is not necessary after all (same code generated with and without it). > I'll verify. > >>> Index: gcc/testsuite/gcc.target/powerpc/srad-modulo.c >>> =================================================================== >>> --- gcc/testsuite/gcc.target/powerpc/srad-modulo.c (nonexistent) >>> +++ gcc/testsuite/gcc.target/powerpc/srad-modulo.c (working copy) >>> @@ -0,0 +1,43 @@ >>> +/* Test that using a character splat to set up a shift-right algebraic >>> + for a doubleword vector works correctly after gimple folding. */ >>> + >>> +/* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */ >>> +/* { dg-options { "-O3" } } */ >> powerpc64*-*-* is almost always wrong. I don't think you need to limit >> to 64-bit at all here, but if you do, test for lp64 instead. > Ok. > >> Testing for vsx_hw but not enabling vsx is probably wrong, too. > Weird. I just tried adding -mvsx and I get this peculiar error we've seen > before about AMD graphics card offloading: > > spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc > -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ > /home/wschmidt/gcc/gcc-mainline-t\ > est2/gcc/testsuite/gcc.target/powerpc/srad-modulo.c > -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers > -fdiagnostics-color=never -O2 -mvsx -lm -o \ > ./srad-modulo.exe^M > ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to > '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, > '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\ > m^[[K' or '^[[01m^[[Kfast^[[m^[[K'^M > compiler exited with status 1 > Executing on host: /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc > -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c > -fno-diagnosti\ > cs-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never > -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s (timeout = 300) > spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc > -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c > -fno-diagnostic\ > s-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never > -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s^M > xgcc: fatal error: GCC is not configured to support amdgcn-unknown-amdhsa as > offload target^M > compilation terminated.^M > compiler exited with status 1 > FAIL: gcc.target/powerpc/srad-modulo.c (test for excess errors) > Excess errors: > ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to > '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, > '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\ > m^[[K' or '^[[01m^[[Kfast^[[m^[[K' > > Any ideas what's causing this? I can't test with -mvsx until it's fixed. > Do we still have a bug open on this?
Looks like this was PR88920, and apparently this silly offload thing pops up once per testsuite run if there's a failure of any kind. I don't consider just caching it so it only happens once to be an adequate fix... this shouldn't be happening at all and is highly misleading. The actual problem is that I've got an extra set of braces in the dg-options directive... oops... Thanks, Bill > >> Does it need -O3, does -O2 not work? > -O2 is fine, can change to that. > >> Should this testcase check expected machine code as well? > I think we should defer that to the fix for PR89213. We aren't at > ideal code quite yet. > >>> --- gcc/testsuite/gcc.target/powerpc/srd-modulo.c (nonexistent) >>> +++ gcc/testsuite/gcc.target/powerpc/srd-modulo.c (working copy) >>> +vui64_t >>> +test_sradi_4 (vui64_t a) >>> +{ >>> + return vec_sradi (a, 4); >>> +} >> Pasto? (srad vs. srd). > Ecch. Will fix. > >> >> Segher >>