On 5/30/19 2:53 AM, Hongtao Liu wrote:
> On Thu, May 30, 2019 at 3:23 AM Jeff Law <l...@redhat.com> wrote:
>> On 5/9/19 10:54 PM, Hongtao Liu wrote:
>>> On Fri, May 10, 2019 at 3:55 AM Jeff Law <l...@redhat.com> wrote:
>>>> On 5/6/19 11:38 PM, Hongtao Liu wrote:
>>>>> Hi Uros and GCC:
>>>>>   This patch is to fix ix86_expand_sse_comi_round whose implementation
>>>>> was not correct.
>>>>>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
>>>>>
>>>>> Bootstrap and regression tests for x86 is fine.
>>>>> Ok for trunk?
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>> gcc/
>>>>>    * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>>>>>    Modified, original implementation isn't correct.
>>>>>
>>>>> gcc/testsuite
>>>>>    * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
>>>>>    * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
>>>> So you'll have to bear with me, I'm not really familiar with this code,
>>>> but in the absence of a maintainer I'll try to work through it.
>>>>
>>>>
>>>>> -- BR, Hongtao
>>>>>
>>>>>
>>>>> 0001-Fix-ix86_expand_sse_comi_round.patch
>>>>>
>>>>> Index: gcc/ChangeLog
>>>>> ===================================================================
>>>>> --- gcc/ChangeLog     (revision 270933)
>>>>> +++ gcc/ChangeLog     (working copy)
>>>>> @@ -1,3 +1,11 @@
>>>>> +2019-05-06  H.J. Lu  <hongjiu...@intel.com>
>>>>> +         Hongtao Liu  <hongtao....@intel.com>
>>>>> +
>>>>> +     PR Target/89750
>>>>> +     PR Target/86444
>>>>> +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>>>>> +     Modified, original implementation isn't correct.
>>>>> +
>>>>>  2019-05-06  Segher Boessenkool  <seg...@kernel.crashing.org>
>>>>>
>>>>>       * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
>>>>> Index: gcc/config/i386/i386-expand.c
>>>>> ===================================================================
>>>>> --- gcc/config/i386/i386-expand.c     (revision 270933)
>>>>> +++ gcc/config/i386/i386-expand.c     (working copy)
>>>>> @@ -9853,18 +9853,24 @@
>>>>>    const struct insn_data_d *insn_p = &insn_data[icode];
>>>>>    machine_mode mode0 = insn_p->operand[0].mode;
>>>>>    machine_mode mode1 = insn_p->operand[1].mode;
>>>>> -  enum rtx_code comparison = UNEQ;
>>>>> -  bool need_ucomi = false;
>>>>>
>>>>>    /* See avxintrin.h for values.  */
>>>>> -  enum rtx_code comi_comparisons[32] =
>>>>> +  static const enum rtx_code comparisons[32] =
>>>> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
>>>>
>>>   Yes.
>>>>>      {
>>>>> -      UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
>>>>> -      UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
>>>>> -      UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
>>>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
>>>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
>>>>> +      EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
>>>>> +      EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
>>>>>      };
>>>> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
>>>> seems right, but you're using EQ.  Can you double-check this?  If it's
>>>> wrong, then please make sure we cover this case with a test.
>>>>
>>> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
>>> UNEQ and EQ behave differently when either operand is NAN, besides
>>> they're the same.
>>> Since NAN operands are handled separtely, so EQ/UNEQ makes no
>>> difference, That why this passes cover tests.
>>> I'll correct it.
>> Ah.  Thanks.  FWIW my approach was to walk through the _CMP_* defines in
>> avxintrin.h and map that to what I thought the comparison ought to be.
>> Then I reviewed my result against your patch.  I got a couple wrong, but
>> could easily see my mistake.  The only one I couldn't reconcile was the
>> CMP_EQ_UQ.  Knowing the NaNs are handled separately makes it clear.
>> Thanks gain.
>>
>>
>>
>>>>
>>>>
>>>>> @@ -9932,11 +10021,37 @@
>>>>>      }
>>>>>
>>>>>    emit_insn (pat);
>>>>> +
>>>>> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
>>>>> +     correctly?  */
>>>>> +  if (GET_MODE (set_dst) != mode)
>>>>> +    set_dst = gen_rtx_REG (mode, REGNO (set_dst));
>>>> This looks worrisome, even without the cryptic comment.  I don't think
>>>> you can just blindly change the mode like that.  Unless you happen to
>>>> know that the only things you test in the new mode were set in precisely
>>>> the same way as the old mode.
>>>>
>>> Modified as:
>>> +  /* NB: Set CCFPmode and check a different CCmode.  */
>>> +  if (GET_MODE (set_dst) != mode)
>>> +    set_dst = gen_rtx_REG (mode, FLAGS_REG);
>> That might actually be worse.  The mode carries semantic information
>> about where to find the various condition codes within the flags
>> register and which condition codes are valid.  The register number
>> determines which (of possibly many) flags registers we are querying.
>>
>> Thus if the mode of SET_DEST is not the same as MODE, then there is a
>> mismatch between the point where the condition codes were set and where
>> we want to use them.
>>
>> That can only be safe is the condition codes set in the new mode are a
>> strict subset of the condition codes set in the old mode and they're
>> found in the same place within the same flags register.
>>
> vcomiss/vcomisd will set ZF,PF,CF,OF,SF,AF flags in  FLAGS_REG which is 
> treated
> as CCFPmode. As you mentioned, It's safe to use subset of the
> condition codes in a
> new mode,  in the same FLAGS_REG,  so I've modified as
> 
> 1. Use CCZmode instead of CCmode.
> 
> +    case EQ:
> +      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
> + _CMP_EQ_OQ/_CMP_EQ_OS.  */
> +      check_unordered = true;
> +      mode = CCZmode;
> +      break;
> +    case NE:
> +      /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCZmode for
> + _CMP_NEQ_UQ/_CMP_NEQ_US.  */
> +      gcc_assert (!ordered);
> +      check_unordered = true;
> +      mode = CCZmode;
> +      const_val = const1_rtx;
> 
> 2. I've double checked that condition codes in new mode must be subset
> of that in old mode.
> Also gcc_assert is added to guard.
> 
> +  /* NB: Set CCFPmode and check a different CCmode which is in subset
> +     of CCFPmode.  */
> +  if (GET_MODE (set_dst) != mode)
> +    {
> +      gcc_assert (mode == CCAmode || mode == CCCmode
> +   || mode == CCOmode || mode == CCPmode
> +   || mode == CCSmode || mode == CCZmode);
> +      set_dst = gen_rtx_REG (mode, FLAGS_REG);
> +    }
> +
> 
> 
>> Maybe a simple example would help.  Consider a fairly standard target
>> with arithmetic insns that set ZNV and logicals that set ZN and clobber V.
>>
>> Arithmetic insns will have a set of the flags register with one mode
>> (CC_ZNV for example) and logicals would use a different mode for the
>> flags register (CC_ZN for example).
>>
>> Now consider if the architecture has memory bit operations.  Ie, you can
>> query the state of a bit in memory.  Unfortunately the result is stored
>> in the C bit of the condition code register (rather than in the Z bit
>> like you might hope).  Yes, this comes from a real world example.
>>
>> During compare elimination, the compiler will try to eliminate a compare
>> insn and instead use the condition codes set by a prior insn such as an
>> arithmetic, logical, bit test, etc.  The mode of the flags register will
>> indicate which bits are valid and which bits can be tested.  So for our
>> hypothetical architecture we might have modes CC_ZNV, CC_ZN, CC_Z_IN_C
>> for arithmetic, logical and bit testing.
>>
>> If we are using the result of a prior bit test to eliminate a compare,
>> the mode of the flags register on the bit test would be CC_Z_N_C as
>> would the mode of the conditional branch.
>>
>> We can not blindly change the mode to CC_ZNV because the bit we want is
>> not in the ZNV  bits of the flag register, it's actually in the C bit.
>>
>> Things might be even more complicated if there's a separate register for
>> floating point condition codes or multiple integer condition code registers.
>>
>> Hopefully that makes things clear in terms of how the mode of the flags
>> register is used as well as the register number.  You'd have to dig into
>> the x86 port's use of the flags register to know what changes are safe
>> and which aren't -- and that's the analysis I'd need to help move this
>> along.
>>
>> Jeff
>>
> Thanks a lot for your detailed explanation, It helps me a lot.
> 
> Here is updated patch.
> -- BR, Hongtao
> 
> 
> 0001-Fix-ix86_expand_sse_comi_round_v3.patch
> 
> Index: gcc/ChangeLog
> ===================================================================
> --- gcc/ChangeLog     (revision 271761)
> +++ gcc/ChangeLog     (working copy)
> @@ -2372,6 +2372,14 @@
>       * tree-ssa-phiopt.c (two_value_replacement): Fix a typo in parameter
>       detection.
>  
> +2019-05-06  H.J. Lu  <hongjiu...@intel.com>
> +         Hongtao Liu  <hongtao....@intel.com>
> +
> +     PR target/89750
> +     PR target/86444
> +     * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> +     Modified, original implementation isn't correct.
OK for the trunk.

Thanks for your patience,
jeff

Reply via email to