On Tue, Oct 23, 2018 at 11:47 PM Will Schmidt <will_schm...@vnet.ibm.com> wrote:
>
> Hi all,
> I've been attempting to get early gimple-folding to work with the
> vec_sel intrinsic for powerpc, and I've run into a snag or two such that
> I'm not sure how to best proceed.  Code snippet is below, followed by a
> description of the issues as I interpret them below.
>
> Apologies for the ramble, Thanks in advance, ...  :-)
>
> -Will
>
> ---8<---
> /* vector selects */
> /* d = vec_sel (a, b, c)
> Each bit of the result vector (d) has the value of
> the corresponding bit of (a) if the corresponding bit
> of (c) is 0. Otherwise, each bit of the result vector
> has the value of the corresponding bit of (b).  */
>     case ALTIVEC_BUILTIN_VSEL_16QI:
>     case ALTIVEC_BUILTIN_VSEL_8HI:
>     case ALTIVEC_BUILTIN_VSEL_4SI:
>     case ALTIVEC_BUILTIN_VSEL_2DI:
>     case ALTIVEC_BUILTIN_VSEL_4SF:
>     case ALTIVEC_BUILTIN_VSEL_2DF:
> {
>         tree cond_tree = gimple_call_arg (stmt, 2);
>         tree then_tree = gimple_call_arg (stmt, 0);
>         tree else_tree = gimple_call_arg (stmt, 1);
>         lhs = gimple_call_lhs (stmt);
>         location_t loc = gimple_location (stmt);
>         gimple_seq stmts = NULL;
>        tree truth_cond_tree_type
>          = build_same_sized_truth_vector_type (TREE_TYPE(cond_tree));
>        tree truth_cond_tree
>          = gimple_convert (&stmts, loc, truth_cond_tree_type, cond_tree);
>        gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>        g = gimple_build_assign (lhs, VEC_COND_EXPR, truth_cond_tree,
>                                 then_tree, else_tree);
>        gimple_set_location (g, gimple_location (stmt));
>        gsi_replace (gsi, g, true);
>        return true;
> }
> ---8<---
>
> First issue (easier?) - The above code snippet works for the
> existing powerpc/fold-vec-sel-* testcases, except that when comparing
> before and after codegen, we end up with an extra pair of instructions
> (vspltisw and vcmpgtsh ~~ splat zero, compare).  This appears to be due
> to taking the "Fake op0 < 0" path during optabs.c:
> expand_vec_cond_expr(),  I've not fully exhausted my debug on that
> front, but mention it in case this is something obvious.  And, this is
> probably minor with respect to issue 2.
>
> Second issue - This works for the simple tests, so seems like the
> implementation should be close to correct, but triggers an ICE
> when trying to build some of the pre-existing tests.  After some
> investigation, this appears to be specific to those tests that have
> non-variable values for the condition vector.
>
> For instance, our powerpc/altivec-32.c testcase contains:
>   unsigned k = 1;
>   a = (vector unsigned) { 0, 0, 0, 1 };
>   b = c = (vector unsigned) { 0, 0, 0, 0 };
>   a = vec_add (a, vec_splats (k));
>   b = vec_add (b, a);
>   c = vec_sel (c, a, b);
>
> which ends up as...
> c = vec_sel ({0,0,0,0}, {1,1,1,2}, {1,1,1,2});
> Our condition vector is the last argument, so this gets rearranged a bit
> when we build the vec_cond_expr, and we eventually get (at gimple time)
>
>   _1 = VEC_COND_EXPR <{ 1(OVF), 1(OVF), 1(OVF), 2(OVF) }, { 0, 0, 0, 0 }, { 
> 1, 1, 1, 2 }>;
>
> And we subsequently ICE (at expand time) when we hit a gcc_unreachable()
> in expr.c const_vector_mask_from_tree() when we try to map that '2(OVF)'
> value to a (boolean) zero or minus_one, and fail.
>
> during RTL pass: expand
> dump file: altivec-34.c.230r.expand
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:
>  In function ‘foo’:
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:21:6:
>  internal compiler error: in const_vector_mask_from_tree, at expr.c:12247
> 0x1059fba7 const_vector_mask_from_tree
>         /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/expr.c:12247
>
> Ultimately, this seems like an impasse between the vec_sel intrinsic
> being a bit-wise operation, and the truth_vector being of a boolean
> type.   But.. i'm not certain.
> I had at an earlier time tried to implement vec_sel() up using a mix of
> BIT_NOT_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, but ended up with some
> incredibly horrible codegen.  (which would be a no-go).  I may need to
> revisit that..

If vec_sel is really a bitwise merge then there's no choice but using
BIT_{AND,IOR,NOT}_EXPR for open-coding it.  I suppose the
original builtin expanded to an UNSPEC because there's nothing
in RTL besides bitwise operations describing the semantics.

Now - for constants that do select on element boundary (which seems
to be what you implemented) I'd have chosen not a VEC_COND_EXPR
but instead a VEC_PERM_EXPR (you have to translate the constant
bit-mask to a constant vector of indexes).  I don't think VEC_COND_EXPR
is a good choice for the middle-end or RTL expansion.

Richard.

> Thanks,
> -Will
>
>
>

Reply via email to