On 5/30/19 2:53 AM, Hongtao Liu wrote: > On Thu, May 30, 2019 at 3:23 AM Jeff Law <l...@redhat.com> wrote: >> On 5/9/19 10:54 PM, Hongtao Liu wrote: >>> On Fri, May 10, 2019 at 3:55 AM Jeff Law <l...@redhat.com> wrote: >>>> On 5/6/19 11:38 PM, Hongtao Liu wrote: >>>>> Hi Uros and GCC: >>>>> This patch is to fix ix86_expand_sse_comi_round whose implementation >>>>> was not correct. >>>>> New implentation aligns with _mm_cmp_round_s[sd]_mask. >>>>> >>>>> Bootstrap and regression tests for x86 is fine. >>>>> Ok for trunk? >>>>> >>>>> >>>>> ChangeLog: >>>>> gcc/ >>>>> * config/i386/i386-expand.c (ix86_expand_sse_comi_round): >>>>> Modified, original implementation isn't correct. >>>>> >>>>> gcc/testsuite >>>>> * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests. >>>>> * gcc.target/i386/avx512f-vcomisd-2.c: Likewise. >>>> So you'll have to bear with me, I'm not really familiar with this code, >>>> but in the absence of a maintainer I'll try to work through it. >>>> >>>> >>>>> -- BR, Hongtao >>>>> >>>>> >>>>> 0001-Fix-ix86_expand_sse_comi_round.patch >>>>> >>>>> Index: gcc/ChangeLog >>>>> =================================================================== >>>>> --- gcc/ChangeLog (revision 270933) >>>>> +++ gcc/ChangeLog (working copy) >>>>> @@ -1,3 +1,11 @@ >>>>> +2019-05-06 H.J. Lu <hongjiu...@intel.com> >>>>> + Hongtao Liu <hongtao....@intel.com> >>>>> + >>>>> + PR Target/89750 >>>>> + PR Target/86444 >>>>> + * config/i386/i386-expand.c (ix86_expand_sse_comi_round): >>>>> + Modified, original implementation isn't correct. >>>>> + >>>>> 2019-05-06 Segher Boessenkool <seg...@kernel.crashing.org> >>>>> >>>>> * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO) >>>>> Index: gcc/config/i386/i386-expand.c >>>>> =================================================================== >>>>> --- gcc/config/i386/i386-expand.c (revision 270933) >>>>> +++ gcc/config/i386/i386-expand.c (working copy) >>>>> @@ -9853,18 +9853,24 @@ >>>>> const struct insn_data_d *insn_p = &insn_data[icode]; >>>>> machine_mode mode0 = insn_p->operand[0].mode; >>>>> machine_mode mode1 = insn_p->operand[1].mode; >>>>> - enum rtx_code comparison = UNEQ; >>>>> - bool need_ucomi = false; >>>>> >>>>> /* See avxintrin.h for values. */ >>>>> - enum rtx_code comi_comparisons[32] = >>>>> + static const enum rtx_code comparisons[32] = >>>> So I assume the comment refers to the _CMP_* #defines in avxintrin.h? >>>> >>> Yes. >>>>> { >>>>> - UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT, >>>>> - UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE, >>>>> - UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT >>>>> + EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED, >>>>> + EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED, >>>>> + EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED, >>>>> + EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED >>>>> }; >>>> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ >>>> seems right, but you're using EQ. Can you double-check this? If it's >>>> wrong, then please make sure we cover this case with a test. >>>> >>> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates. >>> UNEQ and EQ behave differently when either operand is NAN, besides >>> they're the same. >>> Since NAN operands are handled separtely, so EQ/UNEQ makes no >>> difference, That why this passes cover tests. >>> I'll correct it. >> Ah. Thanks. FWIW my approach was to walk through the _CMP_* defines in >> avxintrin.h and map that to what I thought the comparison ought to be. >> Then I reviewed my result against your patch. I got a couple wrong, but >> could easily see my mistake. The only one I couldn't reconcile was the >> CMP_EQ_UQ. Knowing the NaNs are handled separately makes it clear. >> Thanks gain. >> >> >> >>>> >>>> >>>>> @@ -9932,11 +10021,37 @@ >>>>> } >>>>> >>>>> emit_insn (pat); >>>>> + >>>>> + /* XXX: Set CCFPmode and check a different CCmode. Does it work >>>>> + correctly? */ >>>>> + if (GET_MODE (set_dst) != mode) >>>>> + set_dst = gen_rtx_REG (mode, REGNO (set_dst)); >>>> This looks worrisome, even without the cryptic comment. I don't think >>>> you can just blindly change the mode like that. Unless you happen to >>>> know that the only things you test in the new mode were set in precisely >>>> the same way as the old mode. >>>> >>> Modified as: >>> + /* NB: Set CCFPmode and check a different CCmode. */ >>> + if (GET_MODE (set_dst) != mode) >>> + set_dst = gen_rtx_REG (mode, FLAGS_REG); >> That might actually be worse. The mode carries semantic information >> about where to find the various condition codes within the flags >> register and which condition codes are valid. The register number >> determines which (of possibly many) flags registers we are querying. >> >> Thus if the mode of SET_DEST is not the same as MODE, then there is a >> mismatch between the point where the condition codes were set and where >> we want to use them. >> >> That can only be safe is the condition codes set in the new mode are a >> strict subset of the condition codes set in the old mode and they're >> found in the same place within the same flags register. >> > vcomiss/vcomisd will set ZF,PF,CF,OF,SF,AF flags in FLAGS_REG which is > treated > as CCFPmode. As you mentioned, It's safe to use subset of the > condition codes in a > new mode, in the same FLAGS_REG, so I've modified as > > 1. Use CCZmode instead of CCmode. > > + case EQ: > + /* NB: COMI/UCOMI will set ZF with NAN operands. Use CCZmode for > + _CMP_EQ_OQ/_CMP_EQ_OS. */ > + check_unordered = true; > + mode = CCZmode; > + break; > + case NE: > + /* NB: COMI/UCOMI will set ZF with NAN operands. Use CCZmode for > + _CMP_NEQ_UQ/_CMP_NEQ_US. */ > + gcc_assert (!ordered); > + check_unordered = true; > + mode = CCZmode; > + const_val = const1_rtx; > > 2. I've double checked that condition codes in new mode must be subset > of that in old mode. > Also gcc_assert is added to guard. > > + /* NB: Set CCFPmode and check a different CCmode which is in subset > + of CCFPmode. */ > + if (GET_MODE (set_dst) != mode) > + { > + gcc_assert (mode == CCAmode || mode == CCCmode > + || mode == CCOmode || mode == CCPmode > + || mode == CCSmode || mode == CCZmode); > + set_dst = gen_rtx_REG (mode, FLAGS_REG); > + } > + > > >> Maybe a simple example would help. Consider a fairly standard target >> with arithmetic insns that set ZNV and logicals that set ZN and clobber V. >> >> Arithmetic insns will have a set of the flags register with one mode >> (CC_ZNV for example) and logicals would use a different mode for the >> flags register (CC_ZN for example). >> >> Now consider if the architecture has memory bit operations. Ie, you can >> query the state of a bit in memory. Unfortunately the result is stored >> in the C bit of the condition code register (rather than in the Z bit >> like you might hope). Yes, this comes from a real world example. >> >> During compare elimination, the compiler will try to eliminate a compare >> insn and instead use the condition codes set by a prior insn such as an >> arithmetic, logical, bit test, etc. The mode of the flags register will >> indicate which bits are valid and which bits can be tested. So for our >> hypothetical architecture we might have modes CC_ZNV, CC_ZN, CC_Z_IN_C >> for arithmetic, logical and bit testing. >> >> If we are using the result of a prior bit test to eliminate a compare, >> the mode of the flags register on the bit test would be CC_Z_N_C as >> would the mode of the conditional branch. >> >> We can not blindly change the mode to CC_ZNV because the bit we want is >> not in the ZNV bits of the flag register, it's actually in the C bit. >> >> Things might be even more complicated if there's a separate register for >> floating point condition codes or multiple integer condition code registers. >> >> Hopefully that makes things clear in terms of how the mode of the flags >> register is used as well as the register number. You'd have to dig into >> the x86 port's use of the flags register to know what changes are safe >> and which aren't -- and that's the analysis I'd need to help move this >> along. >> >> Jeff >> > Thanks a lot for your detailed explanation, It helps me a lot. > > Here is updated patch. > -- BR, Hongtao > > > 0001-Fix-ix86_expand_sse_comi_round_v3.patch > > Index: gcc/ChangeLog > =================================================================== > --- gcc/ChangeLog (revision 271761) > +++ gcc/ChangeLog (working copy) > @@ -2372,6 +2372,14 @@ > * tree-ssa-phiopt.c (two_value_replacement): Fix a typo in parameter > detection. > > +2019-05-06 H.J. Lu <hongjiu...@intel.com> > + Hongtao Liu <hongtao....@intel.com> > + > + PR target/89750 > + PR target/86444 > + * config/i386/i386-expand.c (ix86_expand_sse_comi_round): > + Modified, original implementation isn't correct. OK for the trunk.
Thanks for your patience, jeff