On Tue, Aug 16, 2011 at 4:28 PM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Mon, Aug 15, 2011 at 6:58 PM, Artem Shinkarov > <artyom.shinkar...@gmail.com> wrote: >> On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther >> <richard.guent...@gmail.com> wrote: >>> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov >>> <artyom.shinkar...@gmail.com> wrote: >>>> Hi >>>> >>>> Here is a completed version of the vector comparison patch we >>>> discussed a long time ago here: >>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html >>>> >>>> The patch implements vector comparison according to the OpenCL >>>> standard, when the result of the comparison of two vectors is vector >>>> of signed integers, where -1 represents true and 0 false. >>>> >>>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3> >>>> which is expanded into: >>>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i]. >>> >>> Some comments on the patch below. First, in general I don't see >>> why you need a new target hook to specify whether to "vectorize" >>> a comparison. Why are the existing hooks used by the vectorizer >>> not enough? >>> >>> Index: gcc/fold-const.c >>> =================================================================== >>> --- gcc/fold-const.c (revision 177665) >>> +++ gcc/fold-const.c (working copy) >>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr >>> floating-point, we can only do some of these simplifications.) */ >>> if (operand_equal_p (arg0, arg1, 0)) >>> { >>> - switch (code) >>> + if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE) >>> { >>> - case EQ_EXPR: >>> - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) >>> - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) >>> - return constant_boolean_node (1, type); >>> >>> I think this change should go in a separate patch for improved >>> constant folding. It shouldn't be necessary for enabling vector compares, >>> no? >> >> Unfortunately no, this case must be covered here, otherwise x != x >> condition fails. > > How does it fail?
When I have x > x, x == x, and so on, fold-const.c trigger operand_equal_p (arg0, arg1, 0), which returns true, and then it calls constant_boolean_node (<val>, type). But the problem is that the result of the comparison is a vector, not a boolean. So we have an assertion failure: test.c: In function ‘foo’: test.c:9:3: internal compiler error: in build_int_cst_wide, at tree.c:1222 Please submit a full bug report, with preprocessed source if appropriate. > >>> >>> + if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp)))) >>> + { >>> + error_at (colon_loc, "vector comparison must be of signed " >>> + "integer vector type"); >>> + return error_mark_node; >>> + } >>> >>> why that? >> >> Well, later on I rely on this fact. I mean OpenCL says that it should >> return -1 in the sense that all bits set. I don't really know, I can >> support unsigned masks as well, but wouldn't it just introduce a >> source for possible errors. I mean that natural choice for true and >> flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion >> there, and I could easily adjust it if we decide that we want it. > > I think we want to allow both signed and unsigned masks. Ok, I'll adjust. > >>> >>> + if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2) >>> + || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp)) >>> + != TYPE_VECTOR_SUBPARTS (type1)) >>> + { >>> + error_at (colon_loc, "vectors of different length found in " >>> + "vector comparison"); >>> + return error_mark_node; >>> + } >>> >>> I miss verification that type1 and type2 are vector types, or is that done >>> elsewhere? I think type1 and type2 are already verified to be >>> compatible (but you might double-check). At least the above would be >>> redundant with >> >> Thanks, type1 and type2 both vectors comparison is missing, going to >> be added in the new version of the patch. >>> >>> + if (type1 != type2) >>> + { >>> + error_at (colon_loc, "vectors of different types involved in " >>> + "vector comparison"); >>> + return error_mark_node; >>> + } >> >> You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE >> (type2), because vector (4, int) have the same number of elements as >> vector (4, float). This would be fixed in the new version. >> >>> >>> Joseph may have comments about the fully-fold stuff that follows. >>> >>> + /* Currently the expansion of VEC_COND_EXPR does not allow >>> + expessions where the type of vectors you compare differs >>> + form the type of vectors you select from. For the time >>> + being we insert implicit conversions. */ >>> + if ((COMPARISON_CLASS_P (ifexp) >>> >>> Why only for comparison-class? >> Not only, there is || involved: >> (COMPARISON_CLASS_P (ifexp) && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1) >> || TREE_TYPE (ifexp) != type1 >> >> So if this is a comparison class, we check the first operand, because >> the result of the comparison fits, however the operands could not. In >> case we have an expression of signed vector, we know that we would >> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the >> types of operands do not match we convert them. > > Hm, ok ... let's hope we can sort-out the backend issues before this > patch goes in so we can remove this converting stuff. Hm, I would hope that we could commit this patch even with this issue, because my feeling is that this case would produce errors on all the other architectures as well, as VEC_COND_EXPR is the feature heavily used in auto-vectorizer. So it means that all the backends must be fixed. And another argument, that this conversion is harmless. So I really hope that someone could shed some light or help me with this issue, but even if not I think that the current conversion is ok. However, I don't have any architectures different from x86. > >>> >>> + && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1) >>> + || TREE_TYPE (ifexp) != type1) >>> + { >>> + tree comp_type = COMPARISON_CLASS_P (ifexp) >>> + ? TREE_TYPE (TREE_OPERAND (ifexp, 0)) >>> + : TREE_TYPE (ifexp); >>> + tree vcond; >>> + >>> + op1 = convert (comp_type, op1); >>> + op2 = convert (comp_type, op2); >>> + vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2); >>> + vcond = convert (type1, vcond); >>> + return vcond; >>> + } >>> + else >>> + return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2); >>> >>> In the end we of course will try to fix the middle-end/backends to >>> allow mixed types here as the current restriction doesn't really make sense. >> >> Yes, that would be nice, but these conversions do not really affect >> the code generation, so for the time being I think it is fine to have >> them. >> >>> >>> case EQ_EXPR: >>> case NE_EXPR: >>> + if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE) >>> + { >>> + tree intt; >>> + if (TREE_TYPE (type0) != TREE_TYPE (type1)) >>> + { >>> + error_at (location, "comparing vectors with different " >>> + "element types"); >>> + return error_mark_node; >>> + } >>> + >>> + if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)) >>> + { >>> + error_at (location, "comparing vectors with different " >>> + "number of elements"); >>> + return error_mark_node; >>> + } >>> >>> as above - compatibility should already be ensured, thus type0 == type1 >>> here? >> >> Yeah, we know that they are both vector types, but that is about all >> we know. Anyhow, all these errors are reachable. As an example see >> vector-compare-1.c: >> r4 = x > y; /* { dg-error "comparing vectors with different element types" >> } */ >> r8 == r4; /* { dg-error "comparing vectors with different number of >> elements"} */ > > Ok, I see. > >>> >>> +/* Try a hardware hook for vector comparison or >>> + extract comparison piecewise. */ >>> +static tree >>> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, >>> + tree op1, enum tree_code code) >>> >>> comments should mention and describe all function arguments. >> >> Ok, coming in the new version of the patch. >> >>> +/* Expand vector condition EXP which should have the form >>> + VEC_COND_EXPR<cond, vec0, vec1> into the following >>> + vector: >>> + {cond[i] != 0 ? vec0[i] : vec1[i], ... } >>> + i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)). */ >>> +static tree >>> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp) >>> >>> that would be expand_vec_cond_expr_piecewise, no? >> >> Adjusted. >> >>> >>> + /* Ensure that we will be able to expand vector comparison >>> + in case it is not supported by the architecture. */ >>> + gcc_assert (COMPARISON_CLASS_P (cond)); >>> >>> that looks dangerous to me - did you try >>> >>> vec = v1 <= v2; >>> vec2 = vec ? v1 : v2; >>> >>> without optimization? >> >> Sure, tests should cover this case. >> I have this assertion there because only two cases are possible: >> 1) it is a comparison >> 2) function callee converted expr to expr != {0,0,...} >> So we should be perfectly fine. >> >>> >>> + /* Check if we need to expand vector condition inside of >>> + VEC_COND_EXPR. */ >>> + var = create_tmp_reg (TREE_TYPE (cond), "cond"); >>> + new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond), >>> + TREE_OPERAND (cond, 0), >>> + TREE_OPERAND (cond, 1), >>> + TREE_CODE (cond)); >>> >>> That unconditionally expands, so no need for "Check". >> >> Ok. >> >>> >>> + /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs. */ >>> + v = VEC_alloc(constructor_elt, gc, nunits); >>> + for (i = 0; i < nunits; >>> + i += 1, index = int_const_binop (PLUS_EXPR, index, part_width)) >>> + { >>> + tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, >>> index); >>> + tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index); >>> + tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index); >>> + tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond, >>> + build_int_cst (inner_type ,0)); >>> >>> I seriously doubt that when expanding this part piecewise expanding >>> the mask first in either way is going to be beneficial. Instead I would >>> suggest to "inline" the comparison here. Thus instead of >> >> Well, the ting is that, if expand_vector_comparison, would insert >> builtin there rather than expanding the code piecewise, I'll have to >> do the comparison with 0 anyway, because true is expressed as -1 >> there. >> >> Well, I would hope that in case we have: >> c_0 = a_0 > b_0; >> d_0 = c_0 != 0; >> >> {d_0, d_1,...} >> >> all the d_n should be constant-folded, or should I pull fold explicitly here? >> >> 1) I construct the mask >>> >>> mask = >>> = { mask[0] != 0 ? ... } >>> >>> do >>> >>> = { c0[0] < c1[0] ? ..., } >>> >>> or even expand the ? : using mask operations if we efficiently can >>> create that mask. >>> >> >> I assume that if we cannot expand VEC_COND_EXPR, then masking the >> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a >> bug somewhere. Or I am wrong somewhere? > > I think we can always do bitwise operations, so if we can get at the > mask vector we are fine. > > I was thinking about how the case of explicitly computing the value of > v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should > be handled. If the target produces a mask of condition codes for > a comparison then it might be able to efficiently expand a VEC_COND_EXPR. > It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then. > A similar case is for AMD XOP which can expand mask ? v1 : v2 > with a single instruction (so even without seeing a comparison). > > Basically, if we can get at the mask we should use that to do the > vector selection in parallel via (v1 & mask) | (v2 & ~mask). > > If we cannot even get at the mask then we can build the result > vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc. Ok, I am perfectly fine to construct (v1 & mask) | (v2 & ~mask), the question is do I need to check (v1 & mask) and (v2 & mask) or I can just blindly insert it? The problem is that we have a single veclower pass, so if I insert something that needs expansion, we would not have the second chance to expand it again. I'll adjust the patch. >>> >>> + /* Check if VEC_COND_EXPR is supported in hardware within the >>> + given types. */ >>> + if (code == VEC_COND_EXPR) >>> + { >>> + tree exp = gimple_assign_rhs1 (stmt); >>> + tree cond = TREE_OPERAND (exp, 0); >>> + >>> + /* If VEC_COND_EXPR is presented as A ? V0 : V1, we >>> + change it to A != {0,0,...} ? V0 : V1 */ >>> + if (!COMPARISON_CLASS_P (cond)) >>> + TREE_OPERAND (exp, 0) = >>> + build2 (NE_EXPR, TREE_TYPE (cond), cond, >>> + build_vector_from_val (TREE_TYPE (cond), >>> + build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0))); >>> >>> That looks inefficient as well. Iff we know that the mask is always >>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using >>> bitwise operations (see what the i?86 expander does, for example). >> >> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters, >> not 3, that is why I introduce this fake {0,0,..} here. > > Sure, but if you look at expand_vec_cond_expr_p then you don't need > that, and this fake comparison should instead be produced by the > expander (or really avoided by maybe splitting up the named pattern > into two). > > It's for sure not necessary for earlyexpand_vec_cond_expr (but instead > makes it less efficient - with just the mask it can do the bitwise > fallback easily). Richard, let me give you an example: #define vector(elcount, type) \ __attribute__((vector_size((elcount)*sizeof(type)))) type int foo (vector (4, int) i0, vector (4, int) i1, int x) { i0 = i0 ? i1 : i0; return i0[x]; } when we optimize i0 ? i1 : i0, expand_vec_cond_expr_p happily accepts that and says that it can expand this expression. Now after the veclowering is done, expand_vec_cond_expr calls vector_compare_rtx (op0, unsignedp, icode), which has an assertion: gcc_assert (COMPARISON_CLASS_P (cond)); and of course it fails. So someone needs to insert != {0,0...} expression. I do it in the tree-vect-geneic, but it could be done in expand_vec_cond_expr. The question is where? I can agree with you that I don't need to put this mask in case I expand vcond piecewise, I will adjust that, but actually it does not make much of a difference, in case expansion will use (v0 & mask) | (v1 & ~mask). Am I wrong somewhere? > >>> >>> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_ >>> >>> gcc_assert (code != CONVERT_EXPR); >>> >>> + >>> /* The signedness is determined from input argument. */ >>> if (code == VEC_UNPACK_FLOAT_HI_EXPR >>> || code == VEC_UNPACK_FLOAT_LO_EXPR) >>> >>> spurious whitespace change. >> >> Fixed. >>> >>> Index: gcc/tree-cfg.c >>> =================================================================== >>> --- gcc/tree-cfg.c (revision 177665) >>> +++ gcc/tree-cfg.c (working copy) >>> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre >>> return true; >>> } >>> >>> + if (TREE_CODE (op0_type) == VECTOR_TYPE >>> + && TREE_CODE (op1_type) == VECTOR_TYPE >>> + && TREE_CODE (type) == VECTOR_TYPE) >>> + { >>> >>> this should check TREE_CODE (type) == VECTOR_TYPE only >>> and then verify the comparison operands are actually vectors. >> >> Yes, you are right, adjusted. >> >>> >>> + if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type)) >>> + { >>> + error ("invalid vector comparison, vector element type >>> mismatch"); >>> + debug_generic_expr (op0_type); >>> + debug_generic_expr (op1_type); >>> + return true; >>> + } >>> >>> this needs to use code similar to the scalar variant, >>> >>> !useless_type_conversion_p (op0_type, op1_type) >>> && !useless_type_conversion_p (op1_type, op0_type) >>> >>> which also makes the first TYPE_VECTOR_SUBPARTS redundant. >>> >> >> Fixed. >> >>> + if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type) >>> + && TYPE_PRECISION (TREE_TYPE (op0_type)) >>> + != TYPE_PRECISION (TREE_TYPE (type))) >>> + { >>> + error ("invalid vector comparison resulting type"); >>> + debug_generic_expr (type); >>> + return true; >>> + } >>> >>> I think you can drop the TYPE_PRECISION check. We might want to >>> assert that a vector element types precision always matches its >>> mode precision (in make_vector_type). >> >> I would leave it for a while. During the optimisation you can >> construct some strange things, so I would better make verifier >> resistant to the all kind of stuff. > > Ok. > >>> >>> Index: gcc/c-parser.c >>> =================================================================== >>> --- gcc/c-parser.c (revision 177665) >>> +++ gcc/c-parser.c (working copy) >>> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse >>> if (c_parser_next_token_is (parser, CPP_COLON)) >>> { >>> tree eptype = NULL_TREE; >>> - >>> + >>> middle_loc = c_parser_peek_token (parser)->location; >>> + >>> + if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE) >>> >>> watch out for whitespace changes - you add a trailing tab here. >> >> Fixed. >> >>> >>> +/* Find target specific sequence for vector comparison of >>> + real-type vectors V0 and V1. Returns variable containing >>> + result of the comparison or NULL_TREE in other case. */ >>> +static tree >>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, >>> + enum machine_mode mode, tree v0, tree v1, >>> + enum tree_code code) >>> +{ >>> + enum ix86_builtins fcode; >>> >>> is there a reason we need this and cannot simply provide expanders >>> for the named patterns? We'd need to give them semantics of >>> producing all-ones / all-zero masks of course. Richard, do you think >>> that's sensible? That way we'd avoid the new target hook and could >>> simply do optab queries. >> >> I think I don't really understand the idea. How we are going to >> represent the fact that we need to convert a given node to the given >> machine instruction? May be you could point where the similar >> technique is already used. > > In all places we check optab_handler (op, mode) != CODE_FOR_nothing. > We have eq_optab for example, so optab_handler (eq_optab, V4SImode) > would get you the instruction sequence for a comparison of V4SImode > vectors. That isn't yet properly defined what it should return. > > Otherwise I'd say we should ask the target to expand > v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead. That one could > as well special-case the -1 and 0 result vectors (and maybe it already > does). Ok, I can adjust the optab checking for the mode, but I recall that we introduced the hook exactly because optabs did not return anything sensible. It was your idea :) Also, I don't like the idea to expand any comparison to VEC_COND_EXPR (v1 < v2, -1, 0). Look at expand_vec_cond_expr, it would do the job only if there is an instruction vcond in the architecture, it checks for direct_optab_handler (vcond_optab, mode). But it is not necessarily the case that using vcond is as efficient as using comparison instructions. Also, we could run into the situation when vcond is not supported, but comparison is, or can't we? Anyhow, I would think that we want to keep vcond and comparison separately. Artem. > > Richard. >> >> The new patch will be tested and submitted here soon. >> >> >> Thanks, >> Artem. >> >>> >>> Thanks, >>> Richard. >>> >>>> ChangeLog >>>> >>>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkar...@gmail.com> >>>> >>>> gcc/ >>>> * targhooks.c (default_builtin_vec_compare): New hook. >>>> * targhooks.h (default_builtin_vec_compare): New definition. >>>> * target.def (builtin_vec_compare): New hook. >>>> * target.h: New include (gimple.h). >>>> * fold-const.c >>>> (fold_comparison): Adjust x <cmp> x vector operations. >>>> * c-typeck.c (build_binary_op): Allow vector comparison. >>>> (c_obj_common_truthvalue_conversion): Deny vector comparison >>>> inside of if statement. >>>> (build_conditional_expr): Adjust to build VEC_COND_EXPR. >>>> * tree-vect-generic.c (do_compare): Helper function. >>>> (expand_vector_comparison): Check if hardware comparison >>>> is available, if not expand comparison piecewise. >>>> (expand_vector_operation): Handle vector comparison >>>> expressions separately. >>>> (earlyexpand_vec_cond_expr): Expand vector comparison >>>> piecewise. >>>> * Makefile.in: New dependencies. >>>> * tree-cfg.c (verify_gimple_comparison): Allow vector >>>> comparison operations in gimple. >>>> * c-parser.c (c_parser_conditional_expression): Adjust >>>> to handle VEC_COND_EXPR. >>>> * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR. >>>> * config/i386/i386.c (vector_fp_compare): Build hardware >>>> specific code for floating point vector comparison. >>>> (vector_int_compare): Build hardware specific code for >>>> integer vector comparison. >>>> (ix86_vectorize_builtin_vec_compare): Implementation of >>>> builtin_vec_compare hook. >>>> >>>> gcc/testsuite/ >>>> * gcc.c-torture/execute/vector-vcond-1.c: New test. >>>> * gcc.c-torture/execute/vector-vcond-2.c: New test. >>>> * gcc.c-torture/execute/vector-compare-1.c: New test. >>>> * gcc.c-torture/execute/vector-compare-2.c: New test. >>>> * gcc.dg/vector-compare-1.c: New test. >>>> * gcc.dg/vector-compare-2.c: New test. >>>> >>>> gcc/doc >>>> * extend.texi: Adjust. >>>> * tm.texi: Adjust. >>>> * tm.texi.in: Adjust. >>>> >>>> >>>> bootstrapped and tested on x86_64_unknown-linux. >>>> >>>> >>>> Thanks, >>>> Artem Shinkarov. >>>> >>> >> >