On Thu, Oct 13, 2022 at 11:35 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote: > > > > + do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0, > > > > + SFmode, NULL_RTX, NULL, > > > > + as_a <rtx_code_label *> (operands[3]), > > > > + /* Unfortunately this isn't propagated. */ > > > > + profile_probability::even ()); > > > > You could use ix86_expand_branch instead of do_compare_rtx_and_jump > > here. This would expand in SFmode, so insn condition from cbranchsf4 > > should be copied here: > > > > "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)" > > > > Additionally, ix86_fp_comparison_operator predicate should be used for > > operator0. Basically, just copy predicates from cbranchsf4 as we are > > effectively expanding the SFmode compare & branch. > > The reason why I've used there the generic routine was exactly to handle > not just ix86_fp_comparison_operator, but also comparisons that are more > complex than that (need 2 comparisons). > > While for ix86_fp_comparison_operator cases the optabs wouldn't be actually > strictly needed, the generic code would see e.g. cbranchbf4 isn't supported > and try cbranchsf4, succeed on that and the only disadvantage would be > that the BFmode -> SFmode extensions would be performed using library > functions unless -ffast-math while they can be handled by left shifting > the 16 BFmode bits to most significant 16 bits of SFmode even when honoring > NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior > is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor > cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall > (__{eq,ne}bf2). I bet that is the reason why libgcc contains __{eq,ne}hf2 > entrypoints. > I wanted to avoid adding __{eq,ne}bf2 and the addition of > cbranchbf4/cstorebf4 was how I managed to do that; by telling the > generic code that it can handle those by the faster BFmode to SFmode > conversions of the operands and then perform one or two bit checks.
Thanks, for the explanation, I see the intention now. The patch is OK as is. Thanks, Uros. > I guess another possibility would be to call ix86_expand_branch there > once or twice and repeat what the generic code does, or add the > libgcc entrypoints which would perhaps bypass soft-fp and just do the > shifts + SFmode comparison. > > > > > + else > > > > + { > > > > + rtx t2 = gen_reg_rtx (SImode); > > > > + emit_insn (gen_zero_extendhisi2 (t2, op2)); > > > > + emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16))); > > > > + op2 = gen_lowpart (SFmode, t2); > > > > + } > > > > Similar to cbranch above, use ix86_expand_setcc and copy predicates > > from cstoresf4. > > Ditto here, cstore was actually quite required by the generic code when > cbranch is implemented. > > Jakub >