2016-04-28 13:43 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>:
> On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich <enkovich....@gmail.com> 
> wrote:
>> 2016-04-27 22:58 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>:
>>> Hello!
>>>
>>> This RFC patch illustrates the idea of using STV pass to load/store
>>> any TImode constant using SSE insns. The testcase:
>>>
>>> --cut here--
>>> __int128 x;
>>>
>>> __int128 test_1 (void)
>>> {
>>>   x = (__int128) 0x00112233;
>>> }
>>>
>>> __int128 test_2 (void)
>>> {
>>>   x = ((__int128) 0x0011223344556677 << 64);
>>> }
>>>
>>> __int128 test_3 (void)
>>> {
>>>   x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
>>> }
>>> --cut here--
>>>
>>> currently compiles (-O2) on x86_64 to:
>>>
>>> test_1:
>>>         movq    $1122867, x(%rip)
>>>         movq    $0, x+8(%rip)
>>>         ret
>>>
>>> test_2:
>>>         xorl    %eax, %eax
>>>         movabsq $4822678189205111, %rdx
>>>         movq    %rax, x(%rip)
>>>         movq    %rdx, x+8(%rip)
>>>         ret
>>>
>>> test_3:
>>>         movabsq $4822678189205111, %rax
>>>         movabsq $4822678189205111, %rdx
>>>         movq    %rax, x(%rip)
>>>         movq    %rdx, x+8(%rip)
>>>         ret
>>>
>>> However, using the attached patch, we compile all tests to:
>>>
>>> test:
>>>         movdqa  .LC0(%rip), %xmm0
>>>         movaps  %xmm0, x(%rip)
>>>         ret
>>>
>>> Ilya, HJ - do you think new sequences are better, or - as suggested by
>>> Jakub - they are beneficial with STV pass, as we are now able to load
>>> any immediate value? A variant of this patch can also be used to load
>>> DImode values to 32bit STV pass.
>>>
>>> Uros.
>>
>> Hi,
>>
>> Why don't we have two movq instructions in all three cases now?  Is it
>> because of late split?
>
> movq can handle only 32bit sign-extended immediates. There is actually
> room for improvement in test_2, where we could directly store 0 to
> x(%rip).

Right.  In this case timode_scalar_chain::compute_convert_gain should
analyze immediate values used in a chain.

Thanks,
Ilya

>
> Uros.
>
>> I wouldn't say SSE load+store is always better than two movq instructions.
>> But it obviously can enable bigger chains for STV which is good.  I think
>> you should adjust a cost model to handle immediates for proper decision.
>>
>> That's what I have in my draft for DImode immediates:
>>
>> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
>> unsigned insn_uid)
>>    BITMAP_FREE (queue);
>>  }
>>
>> +/* Return a cost of building a vector costant
>> +   instead of using a scalar one.  */
>> +
>> +int
>> +scalar_chain::vector_const_cost (rtx exp)
>> +{
>> +  gcc_assert (CONST_INT_P (exp));
>> +
>> +  if (const0_operand (exp, GET_MODE (exp))
>> +      || constm1_operand (exp, GET_MODE (exp)))
>> +    return COSTS_N_INSNS (1);
>> +  return ix86_cost->sse_load[1];
>> +}
>> +
>>  /* Compute a gain for chain conversion.  */
>>
>>  int
>> @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
>>                || GET_CODE (src) == IOR
>>                || GET_CODE (src) == XOR
>>                || GET_CODE (src) == AND)
>> -       gain += ix86_cost->add;
>> +       {
>> +         gain += ix86_cost->add;
>> +         if (CONST_INT_P (XEXP (src, 0)))
>> +           gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
>> +         if (CONST_INT_P (XEXP (src, 1)))
>> +           gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
>> +       }
>>        else if (GET_CODE (src) == COMPARE)
>>         {
>>           /* Assume comparison cost is the same.  */
>>         }
>> +      else if (GET_CODE (src) == CONST_INT)
>> +       {
>> +         if (REG_P (dst))
>> +           gain += COSTS_N_INSNS (2);
>> +         else if (MEM_P (dst))
>> +           gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
>> +         gain -= scalar_chain::vector_const_cost (src);
>> +       }
>>        else
>>         gcc_unreachable ();

Reply via email to