On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote:
> > > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > > +                        SFmode, NULL_RTX, NULL,
> > > +                        as_a <rtx_code_label *> (operands[3]),
> > > +                        /* Unfortunately this isn't propagated.  */
> > > +                        profile_probability::even ());
> 
> You could use ix86_expand_branch instead of do_compare_rtx_and_jump
> here. This would expand in SFmode, so insn condition from cbranchsf4
> should be copied here:
> 
>   "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"
> 
> Additionally, ix86_fp_comparison_operator predicate should be used for
> operator0. Basically, just copy predicates from cbranchsf4 as we are
> effectively expanding the SFmode compare & branch.

The reason why I've used there the generic routine was exactly to handle
not just ix86_fp_comparison_operator, but also comparisons that are more
complex than that (need 2 comparisons).

While for ix86_fp_comparison_operator cases the optabs wouldn't be actually
strictly needed, the generic code would see e.g. cbranchbf4 isn't supported
and try cbranchsf4, succeed on that and the only disadvantage would be
that the BFmode -> SFmode extensions would be performed using library
functions unless -ffast-math while they can be handled by left shifting
the 16 BFmode bits to most significant 16 bits of SFmode even when honoring
NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior
is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor
cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall
(__{eq,ne}bf2).  I bet that is the reason why libgcc contains __{eq,ne}hf2
entrypoints.
I wanted to avoid adding __{eq,ne}bf2 and the addition of
cbranchbf4/cstorebf4 was how I managed to do that; by telling the
generic code that it can handle those by the faster BFmode to SFmode
conversions of the operands and then perform one or two bit checks.

I guess another possibility would be to call ix86_expand_branch there
once or twice and repeat what the generic code does, or add the
libgcc entrypoints which would perhaps bypass soft-fp and just do the
shifts + SFmode comparison.

> > > +  else
> > > +    {
> > > +      rtx t2 = gen_reg_rtx (SImode);
> > > +      emit_insn (gen_zero_extendhisi2 (t2, op2));
> > > +      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > > +      op2 = gen_lowpart (SFmode, t2);
> > > +    }
> 
> Similar to cbranch above, use ix86_expand_setcc and copy predicates
> from cstoresf4.

Ditto here, cstore was actually quite required by the generic code when
cbranch is implemented.

        Jakub

Reply via email to