Re: PING^1: [PATCH] i386: Generate standard floating point scalar operation patterns

Jeff Law Wed, 19 Jun 2019 12:22:11 -0700

On 6/3/19 4:50 PM, H.J. Lu wrote:
> On Tue, May 21, 2019 at 8:54 AM H.J. Lu <hjl.to...@gmail.com> wrote:
>>
>> On Wed, May 15, 2019 at 2:29 PM Richard Sandiford
>> <richard.sandif...@arm.com> wrote:
>>>
>>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>>> On Thu, Feb 7, 2019 at 9:49 AM H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>
>>>>> Standard scalar operation patterns which preserve the rest of the vector
>>>>> look like
>>>>>
>>>>>      (vec_merge:V2DF
>>>>>        (vec_duplicate:V2DF
>>>>>          (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ])
>>>>>                 (parallel [ (const_int 0 [0])]))
>>>>>          (reg:DF 87))
>>>>>        (reg/v:V2DF 85 [ x ])
>>>>>        (const_int 1 [0x1])]))
>>>>>
>>>>> Add such pattens to i386 backend and convert VEC_CONCAT patterns to
>>>>> standard standard scalar operation patterns.
>>>
>>> It looks like there's some variety in the patterns used, e.g.:
>>>
>>> (define_insn 
>>> "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
>>>   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
>>>         (vec_merge:VF_128
>>>           (smaxmin:VF_128
>>>             (match_operand:VF_128 1 "register_operand" "0,v")
>>>             (match_operand:VF_128 2 "vector_operand" 
>>> "xBm,<round_saeonly_scalar_constraint>"))
>>>          (match_dup 1)
>>>          (const_int 1)))]
>>>   "TARGET_SSE"
>>>   "@
>>>    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
>>>    
>>> v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_scalar_mask_op3>%2, 
>>> %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, 
>>> %<iptr>2<round_saeonly_scalar_mask_op3>}"
>>>   [(set_attr "isa" "noavx,avx")
>>>    (set_attr "type" "sse")
>>>    (set_attr "btver2_sse_attr" "maxmin")
>>>    (set_attr "prefix" "<round_saeonly_scalar_prefix>")
>>>    (set_attr "mode" "<ssescalarmode>")])
>>>
>>> makes the operand a full vector operation, which seems simpler.
>>
>> This pattern is used to implement scalar smaxmin intrinsics.
>>
>>> The above would then be:
>>>
>>>       (vec_merge:V2DF
>>>         (op:V2DF
>>>           (reg:V2DF 85)
>>>           (vec_duplicate:V2DF (reg:DF 87)))
>>>         (reg/v:V2DF 85 [ x ])
>>>         (const_int 1 [0x1])]))
>>>
>>> I guess technically the two have different faulting behaviour though,
>>> since the smaxmin gets applied to all elements, not just element 0.
>>
>> This is the issue.   We don't use the correct mode for scalar instructions:
>>
>> ---
>> #include <immintrin.h>
>>
>> __m128d
>> foo1 (__m128d x, double *p)
>> {
>>   __m128d y = _mm_load_sd (p);
>>   return _mm_max_pd (x, y);
>> }
>> ---
>>
>> movq (%rdi), %xmm1
>> maxpd %xmm1, %xmm0
>> ret
>>
>>
>> Here is the updated patch to add standard floating point scalar
>> operation patterns to i386 backend.    Then we can do
>>
>> ---
>> #include <immintrin.h>
>>
>> extern __inline __m128d __attribute__((__gnu_inline__,
>> __always_inline__, __artificial__))
>> _new_mm_max_pd (__m128d __A, __m128d __B)
>> {
>>   __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
>>   return __A;
>> }
>>
>> __m128d
>> foo2 (__m128d x, double *p)
>> {
>>   __m128d y = _mm_load_sd (p);
>>   return _new_mm_max_pd (x, y);
>> }
>>
>> maxsd (%rdi), %xmm0
>> ret
>>
>> We should use generic vector operations to implement i386 intrinsics
>> as much as we can.
>>
>>> The patch seems very specific.  E.g. why just PLUS, MINUS, MULT and DIV?
>>
>> This patch only adds  +, -, *, /, > and <.    We can add more if there
>> are testcases
>> for them.
>>
>>> Thanks,
>>> Richard
>>>
>>>
>>>>>
>>>>> gcc/
>>>>>
>>>>>         PR target/54855
>>>>>         * simplify-rtx.c (simplify_binary_operation_1): Convert
>>>>>         VEC_CONCAT patterns to standard standard scalar operation
>>>>>         patterns.
>>>>>         * config/i386/sse.md (*<sse>_vm<plusminus_insn><mode>3): New.
>>>>>         (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
>>>>>
>>>>> gcc/testsuite/
>>>>>
>>>>>         PR target/54855
>>>>>         * gcc.target/i386/pr54855-1.c: New test.
>>>>>         * gcc.target/i386/pr54855-2.c: Likewise.
>>>>>         * gcc.target/i386/pr54855-3.c: Likewise.
>>>>>         * gcc.target/i386/pr54855-4.c: Likewise.
>>>>>         * gcc.target/i386/pr54855-5.c: Likewise.
>>>>>         * gcc.target/i386/pr54855-6.c: Likewise.
>>>>>         * gcc.target/i386/pr54855-7.c: Likewise.
>>>>
>>>> PING:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00398.html
>>
>> Thanks.
>>
> 
> PING:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01416.html
The simplify-rtx changes are OK as are the x86 backend changes (either
the original version that just handled basic arithmetic operators or the
subsequent one that added support for minmax and setv2df_0.


Jeff

Re: PING^1: [PATCH] i386: Generate standard floating point scalar operation patterns

Reply via email to