On Tue, Oct 23, 2018 at 11:47 PM Will Schmidt <will_schm...@vnet.ibm.com> wrote: > > Hi all, > I've been attempting to get early gimple-folding to work with the > vec_sel intrinsic for powerpc, and I've run into a snag or two such that > I'm not sure how to best proceed. Code snippet is below, followed by a > description of the issues as I interpret them below. > > Apologies for the ramble, Thanks in advance, ... :-) > > -Will > > ---8<--- > /* vector selects */ > /* d = vec_sel (a, b, c) > Each bit of the result vector (d) has the value of > the corresponding bit of (a) if the corresponding bit > of (c) is 0. Otherwise, each bit of the result vector > has the value of the corresponding bit of (b). */ > case ALTIVEC_BUILTIN_VSEL_16QI: > case ALTIVEC_BUILTIN_VSEL_8HI: > case ALTIVEC_BUILTIN_VSEL_4SI: > case ALTIVEC_BUILTIN_VSEL_2DI: > case ALTIVEC_BUILTIN_VSEL_4SF: > case ALTIVEC_BUILTIN_VSEL_2DF: > { > tree cond_tree = gimple_call_arg (stmt, 2); > tree then_tree = gimple_call_arg (stmt, 0); > tree else_tree = gimple_call_arg (stmt, 1); > lhs = gimple_call_lhs (stmt); > location_t loc = gimple_location (stmt); > gimple_seq stmts = NULL; > tree truth_cond_tree_type > = build_same_sized_truth_vector_type (TREE_TYPE(cond_tree)); > tree truth_cond_tree > = gimple_convert (&stmts, loc, truth_cond_tree_type, cond_tree); > gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > g = gimple_build_assign (lhs, VEC_COND_EXPR, truth_cond_tree, > then_tree, else_tree); > gimple_set_location (g, gimple_location (stmt)); > gsi_replace (gsi, g, true); > return true; > } > ---8<--- > > First issue (easier?) - The above code snippet works for the > existing powerpc/fold-vec-sel-* testcases, except that when comparing > before and after codegen, we end up with an extra pair of instructions > (vspltisw and vcmpgtsh ~~ splat zero, compare). This appears to be due > to taking the "Fake op0 < 0" path during optabs.c: > expand_vec_cond_expr(), I've not fully exhausted my debug on that > front, but mention it in case this is something obvious. And, this is > probably minor with respect to issue 2. > > Second issue - This works for the simple tests, so seems like the > implementation should be close to correct, but triggers an ICE > when trying to build some of the pre-existing tests. After some > investigation, this appears to be specific to those tests that have > non-variable values for the condition vector. > > For instance, our powerpc/altivec-32.c testcase contains: > unsigned k = 1; > a = (vector unsigned) { 0, 0, 0, 1 }; > b = c = (vector unsigned) { 0, 0, 0, 0 }; > a = vec_add (a, vec_splats (k)); > b = vec_add (b, a); > c = vec_sel (c, a, b); > > which ends up as... > c = vec_sel ({0,0,0,0}, {1,1,1,2}, {1,1,1,2}); > Our condition vector is the last argument, so this gets rearranged a bit > when we build the vec_cond_expr, and we eventually get (at gimple time) > > _1 = VEC_COND_EXPR <{ 1(OVF), 1(OVF), 1(OVF), 2(OVF) }, { 0, 0, 0, 0 }, { > 1, 1, 1, 2 }>; > > And we subsequently ICE (at expand time) when we hit a gcc_unreachable() > in expr.c const_vector_mask_from_tree() when we try to map that '2(OVF)' > value to a (boolean) zero or minus_one, and fail. > > during RTL pass: expand > dump file: altivec-34.c.230r.expand > /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c: > In function ‘foo’: > /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:21:6: > internal compiler error: in const_vector_mask_from_tree, at expr.c:12247 > 0x1059fba7 const_vector_mask_from_tree > /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/expr.c:12247 > > Ultimately, this seems like an impasse between the vec_sel intrinsic > being a bit-wise operation, and the truth_vector being of a boolean > type. But.. i'm not certain. > I had at an earlier time tried to implement vec_sel() up using a mix of > BIT_NOT_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, but ended up with some > incredibly horrible codegen. (which would be a no-go). I may need to > revisit that..
If vec_sel is really a bitwise merge then there's no choice but using BIT_{AND,IOR,NOT}_EXPR for open-coding it. I suppose the original builtin expanded to an UNSPEC because there's nothing in RTL besides bitwise operations describing the semantics. Now - for constants that do select on element boundary (which seems to be what you implemented) I'd have chosen not a VEC_COND_EXPR but instead a VEC_PERM_EXPR (you have to translate the constant bit-mask to a constant vector of indexes). I don't think VEC_COND_EXPR is a good choice for the middle-end or RTL expansion. Richard. > Thanks, > -Will > > >