On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote: > > > + do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0, > > > + SFmode, NULL_RTX, NULL, > > > + as_a <rtx_code_label *> (operands[3]), > > > + /* Unfortunately this isn't propagated. */ > > > + profile_probability::even ()); > > You could use ix86_expand_branch instead of do_compare_rtx_and_jump > here. This would expand in SFmode, so insn condition from cbranchsf4 > should be copied here: > > "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)" > > Additionally, ix86_fp_comparison_operator predicate should be used for > operator0. Basically, just copy predicates from cbranchsf4 as we are > effectively expanding the SFmode compare & branch.
The reason why I've used there the generic routine was exactly to handle not just ix86_fp_comparison_operator, but also comparisons that are more complex than that (need 2 comparisons). While for ix86_fp_comparison_operator cases the optabs wouldn't be actually strictly needed, the generic code would see e.g. cbranchbf4 isn't supported and try cbranchsf4, succeed on that and the only disadvantage would be that the BFmode -> SFmode extensions would be performed using library functions unless -ffast-math while they can be handled by left shifting the 16 BFmode bits to most significant 16 bits of SFmode even when honoring NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall (__{eq,ne}bf2). I bet that is the reason why libgcc contains __{eq,ne}hf2 entrypoints. I wanted to avoid adding __{eq,ne}bf2 and the addition of cbranchbf4/cstorebf4 was how I managed to do that; by telling the generic code that it can handle those by the faster BFmode to SFmode conversions of the operands and then perform one or two bit checks. I guess another possibility would be to call ix86_expand_branch there once or twice and repeat what the generic code does, or add the libgcc entrypoints which would perhaps bypass soft-fp and just do the shifts + SFmode comparison. > > > + else > > > + { > > > + rtx t2 = gen_reg_rtx (SImode); > > > + emit_insn (gen_zero_extendhisi2 (t2, op2)); > > > + emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16))); > > > + op2 = gen_lowpart (SFmode, t2); > > > + } > > Similar to cbranch above, use ix86_expand_setcc and copy predicates > from cstoresf4. Ditto here, cstore was actually quite required by the generic code when cbranch is implemented. Jakub