390: Use signaling FP comparison instructions

Ilya Leoshkevich Fri, 30 Aug 2019 07:20:23 -0700

> Am 30.08.2019 um 09:16 schrieb Richard Biener <richard.guent...@gmail.com>:
> 
> On Fri, Aug 30, 2019 at 9:12 AM Richard Biener
> <richard.guent...@gmail.com> wrote:
>> 
>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <i...@linux.ibm.com> wrote:
>>> 
>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <i...@linux.ibm.com>:
>>>> 
>>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>>> s390x-redhat-linux.
>>>> 
>>>> This patch series adds signaling FP comparison support (both scalar and
>>>> vector) to s390 backend.
>>> 
>>> I'm running into a problem on ppc64 with this patch, and it would be
>>> great if someone could help me figure out the best way to resolve it.
>>> 
>>> vector36.C test is failing because gimplifier produces the following
>>> 
>>>  _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>>  _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>>> 
>>> from
>>> 
>>>  VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>>                  { -1, -1, -1, -1 } ,
>>>                  { 0, 0, 0, 0 } >
>>> 
>>> Since the comparison tree code is now hidden behind a temporary, my code
>>> does not have anything to pass to the backend.  The reason for creating
>>> a temporary is that the comparison can trap, and so the following check
>>> in gimplify_expr fails:
>>> 
>>>  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>>    goto out;
>>> 
>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>>> operation_could_trap_p (GT).
>>> 
>>> My current solution is to simply state that backend does not support
>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>>> cause performance regressions due to having to fall back to scalar
>>> comparisons.
>>> 
>>> I was thinking about two other possible solutions:
>>> 
>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>>>   a bit complicated, because tree_could_throw_p checks not only for
>>>   floating point traps, but also e.g. for array index out of bounds
>>>   traps.  So I would have to create a tree_could_throw_p version which
>>>   disregards specific kinds of traps.
>>> 
>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>>   its tree_code instead of SSA_NAME.  The potential problem I see with
>>>   this is that there appears to be no guarantee that _5 will be inlined
>>>   into _6 at a later point.  So if we say that we don't need to fall
>>>   back to scalar comparisons based on availability of vector >
>>>   instruction and inlining does not happen, then what's actually will
>>>   be required is vector selection (vsel on S/390), which might not be
>>>   available in general case.
>>> 
>>> What would be a better way to proceed here?
>> 
>> On GIMPLE there isn't a good reason to split out trapping comparisons
>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
>> where it is important because we'd have no way to represent EH info
>> when not done.  It might be a bit awkward to preserve EH across RTL
>> expansion though in case the [VEC_]COND_EXPR are not expanded
>> as a single pattern, but I'm not sure.
>> 
>> To go this route you'd have to split the is_gimple_condexpr check
>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
>> code (do we have any?) have to be extra careful then.
> 
> Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR
> is something that bothers me for quite some time already and it makes
> things like VN awkward and GIMPLE fincky.  We've discussed alternatives
> to dead with the simplest being moving the comparison out to a separate
> stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR
> codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE
> with either optional 3rd and 4th operand (defaulting to 
> boolean_true/false_node)
> or always explicit ones (and thus dropping [VEC_]COND_EXPR).
> 
> What does LLVM do here?


For

void f(long long * restrict w, double * restrict x, double * restrict y, int n)
{
        for (int i = 0; i < n; i++)
                w[i] = x[i] == y[i] ? x[i] : y[i];
}

LLVM does

  %26 = fcmp oeq <2 x double> %21, %25
  %27 = extractelement <2 x i1> %26, i32 0
  %28 = select <2 x i1> %26, <2 x double> %21, <2 x double> %25

So they have separate operations for comparisons and ternary operator
(fcmp + select).

Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions

Reply via email to