> Am 30.08.2019 um 09:16 schrieb Richard Biener <richard.guent...@gmail.com>: > > On Fri, Aug 30, 2019 at 9:12 AM Richard Biener > <richard.guent...@gmail.com> wrote: >> >> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <i...@linux.ibm.com> wrote: >>> >>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <i...@linux.ibm.com>: >>>> >>>> Bootstrap and regtest running on x86_64-redhat-linux and >>>> s390x-redhat-linux. >>>> >>>> This patch series adds signaling FP comparison support (both scalar and >>>> vector) to s390 backend. >>> >>> I'm running into a problem on ppc64 with this patch, and it would be >>> great if someone could help me figure out the best way to resolve it. >>> >>> vector36.C test is failing because gimplifier produces the following >>> >>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }; >>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; >>> >>> from >>> >>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) , >>> { -1, -1, -1, -1 } , >>> { 0, 0, 0, 0 } > >>> >>> Since the comparison tree code is now hidden behind a temporary, my code >>> does not have anything to pass to the backend. The reason for creating >>> a temporary is that the comparison can trap, and so the following check >>> in gimplify_expr fails: >>> >>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p)) >>> goto out; >>> >>> gimple_test_f is is_gimple_condexpr, and it eventually calls >>> operation_could_trap_p (GT). >>> >>> My current solution is to simply state that backend does not support >>> SSA_NAME in vector comparisons, however, I don't like it, since it may >>> cause performance regressions due to having to fall back to scalar >>> comparisons. >>> >>> I was thinking about two other possible solutions: >>> >>> 1. Change the gimplifier to allow trapping vector comparisons. That's >>> a bit complicated, because tree_could_throw_p checks not only for >>> floating point traps, but also e.g. for array index out of bounds >>> traps. So I would have to create a tree_could_throw_p version which >>> disregards specific kinds of traps. >>> >>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use >>> its tree_code instead of SSA_NAME. The potential problem I see with >>> this is that there appears to be no guarantee that _5 will be inlined >>> into _6 at a later point. So if we say that we don't need to fall >>> back to scalar comparisons based on availability of vector > >>> instruction and inlining does not happen, then what's actually will >>> be required is vector selection (vsel on S/390), which might not be >>> available in general case. >>> >>> What would be a better way to proceed here? >> >> On GIMPLE there isn't a good reason to split out trapping comparisons >> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs >> where it is important because we'd have no way to represent EH info >> when not done. It might be a bit awkward to preserve EH across RTL >> expansion though in case the [VEC_]COND_EXPR are not expanded >> as a single pattern, but I'm not sure. >> >> To go this route you'd have to split the is_gimple_condexpr check >> I guess and eventually users turning [VEC_]COND_EXPR into conditional >> code (do we have any?) have to be extra careful then. > > Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR > is something that bothers me for quite some time already and it makes > things like VN awkward and GIMPLE fincky. We've discussed alternatives > to dead with the simplest being moving the comparison out to a separate > stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR > codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE > with either optional 3rd and 4th operand (defaulting to > boolean_true/false_node) > or always explicit ones (and thus dropping [VEC_]COND_EXPR). > > What does LLVM do here?
For void f(long long * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) w[i] = x[i] == y[i] ? x[i] : y[i]; } LLVM does %26 = fcmp oeq <2 x double> %21, %25 %27 = extractelement <2 x i1> %26, i32 0 %28 = select <2 x i1> %26, <2 x double> %21, <2 x double> %25 So they have separate operations for comparisons and ternary operator (fcmp + select).