On Fri, May 31, 2013 at 2:56 PM, Igor Zamyatin <izamya...@gmail.com> wrote:
> Like this?

Yes, but put the comment above the peephole2 pattern.

The patch is OK for mainline with the above change.

Thanks,
Uros.

>
> On Fri, May 31, 2013 at 3:45 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>> On Fri, May 31, 2013 at 1:38 PM, Igor Zamyatin <izamya...@gmail.com> wrote:
>>> We do want to use the same register for float_extend.
>>
>> OK then. Please add a comment for this fact and also, please put
>> single-line preparation statements inside double-quotes instead of
>> curved braces.
>>
>> Uros.
>>
>>> On Thu, May 30, 2013 at 9:22 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>>> On Thu, May 30, 2013 at 4:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Hi All
>>>>>
>>>>> Second patch enables several Silvermont uarch features which improve
>>>>> performance of the new processor (based on experiments on real SLM
>>>>> hardware):
>>>>> 1. If using a 2-source or 3-source LEA for non-destructive destination
>>>>> purposes, or due to wanting ability to use SCALE, the use of LEA is
>>>>> preferable.
>>>>> 2. Transformation of FP conversion for memory operands into conversion
>>>>> from register.
>>>>> 3. Couple of improvements  for post-reload scheduling:
>>>>>     - increase latency of integer loads and load/store with exact 
>>>>> dependence;
>>>>>     - simple re-ordering of the top of ready list - if 2 instructions
>>>>> at the top of the list have the same priority we consider instruction
>>>>> which producer(s) were scheduled earlier as the best candidate.
>>>>>
>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
>>>>>
>>>>> 2013-05-30  Yuri Rumyantsev  <yuri.s.rumyant...@intel.com>
>>>>>              Igor Zamyatin  <igor.zamya...@intel.com>
>>>>>
>>>>>         Silvermont (SLM) architecture performance tuning.
>>>>>         * config/i386/i386.h (enum ix86_tune_indices): Add
>>>>>         X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS.
>>>>>         (TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS): New define.
>>>>>
>>>>>         * config/i386/i386.c (initial_ix86_tune_features)
>>>>>         <X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS>: Initialize.
>>>>>         (ix86_lea_outperforms): Handle Silvermont tuning.
>>>>>         (ix86_avoid_lea_for_add): Add new argument to ix86_lea_outperforms
>>>>>         call.
>>>>>         (ix86_use_lea_for_mov): Likewise.
>>>>>         (ix86_avoid_lea_for_addr): Likewise.
>>>>>         (ix86_lea_for_add_ok): Likewise.
>>>>>         (exact_dependency_1): New function.
>>>>>         (exact_store_load_dependency): Likewise.
>>>>>         (ix86_adjust_cost): Handle Silvermont tuning.
>>>>>         (do_reoder_for_imul): Likewise.
>>>>>         (swap_top_of_ready_list): New function.
>>>>>         (ix86_sched_reorder): Changed to handle Silvermont tuning.
>>>>>
>>>>>         * config/i386/i386.md (peepholes that split memory operand in fp
>>>>>         converts): New
>>>>
>>>> @@ -24625,9 +24730,9 @@ ix86_sched_reorder(FILE *dump, int
>>>> sched_verbose, rtx *ready, int *pn_ready,
>>>>
>>>> -      con = DEP_CON (dep);
>>>> -      if (!NONDEBUG_INSN_P (con))
>>>> -        continue;
>>>> +          con = DEP_CON (dep);
>>>> +          if (!NONDEBUG_INSN_P (con))
>>>> +            continue;
>>>>
>>>> There are some unnecessary whitespace changes (tabs->spaces) in a
>>>> couple of places throughout the patch, such as in the above lines.
>>>>
>>>> +(define_peephole2
>>>> +  [(set (match_operand:DF 0 "register_operand")
>>>> +        (float_extend:DF
>>>> +          (match_operand:SF 1 "memory_operand")))]
>>>> +  "TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS
>>>> +   && optimize_insn_for_speed_p ()
>>>> +   && SSE_REG_P (operands[0])"
>>>> +  [(set (match_dup 2) (match_dup 1))
>>>> +   (set (match_dup 0) (float_extend:DF (match_dup 2)))]
>>>> +{
>>>> +  operands[2] = gen_rtx_REG (SFmode, REGNO (operands[0]));
>>>> +})
>>>>
>>>> You should use
>>>>
>>>> (match_scratch:SF 2 "x")
>>>>
>>>> at the top of the peephole2 pattern, and you will get a free scratch
>>>> register (assuming that it is not necessary to use the same register
>>>> for input and output operand of the float_extend insn).
>>>>
>>>> Otherwise, the patch looks OK to me.
>>>>
>>>> Uros.

Reply via email to