On Fri, May 31, 2013 at 2:56 PM, Igor Zamyatin <izamya...@gmail.com> wrote: > Like this?
Yes, but put the comment above the peephole2 pattern. The patch is OK for mainline with the above change. Thanks, Uros. > > On Fri, May 31, 2013 at 3:45 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Fri, May 31, 2013 at 1:38 PM, Igor Zamyatin <izamya...@gmail.com> wrote: >>> We do want to use the same register for float_extend. >> >> OK then. Please add a comment for this fact and also, please put >> single-line preparation statements inside double-quotes instead of >> curved braces. >> >> Uros. >> >>> On Thu, May 30, 2013 at 9:22 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >>>> On Thu, May 30, 2013 at 4:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Hi All >>>>> >>>>> Second patch enables several Silvermont uarch features which improve >>>>> performance of the new processor (based on experiments on real SLM >>>>> hardware): >>>>> 1. If using a 2-source or 3-source LEA for non-destructive destination >>>>> purposes, or due to wanting ability to use SCALE, the use of LEA is >>>>> preferable. >>>>> 2. Transformation of FP conversion for memory operands into conversion >>>>> from register. >>>>> 3. Couple of improvements for post-reload scheduling: >>>>> - increase latency of integer loads and load/store with exact >>>>> dependence; >>>>> - simple re-ordering of the top of ready list - if 2 instructions >>>>> at the top of the list have the same priority we consider instruction >>>>> which producer(s) were scheduled earlier as the best candidate. >>>>> >>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? >>>>> >>>>> 2013-05-30 Yuri Rumyantsev <yuri.s.rumyant...@intel.com> >>>>> Igor Zamyatin <igor.zamya...@intel.com> >>>>> >>>>> Silvermont (SLM) architecture performance tuning. >>>>> * config/i386/i386.h (enum ix86_tune_indices): Add >>>>> X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS. >>>>> (TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS): New define. >>>>> >>>>> * config/i386/i386.c (initial_ix86_tune_features) >>>>> <X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS>: Initialize. >>>>> (ix86_lea_outperforms): Handle Silvermont tuning. >>>>> (ix86_avoid_lea_for_add): Add new argument to ix86_lea_outperforms >>>>> call. >>>>> (ix86_use_lea_for_mov): Likewise. >>>>> (ix86_avoid_lea_for_addr): Likewise. >>>>> (ix86_lea_for_add_ok): Likewise. >>>>> (exact_dependency_1): New function. >>>>> (exact_store_load_dependency): Likewise. >>>>> (ix86_adjust_cost): Handle Silvermont tuning. >>>>> (do_reoder_for_imul): Likewise. >>>>> (swap_top_of_ready_list): New function. >>>>> (ix86_sched_reorder): Changed to handle Silvermont tuning. >>>>> >>>>> * config/i386/i386.md (peepholes that split memory operand in fp >>>>> converts): New >>>> >>>> @@ -24625,9 +24730,9 @@ ix86_sched_reorder(FILE *dump, int >>>> sched_verbose, rtx *ready, int *pn_ready, >>>> >>>> - con = DEP_CON (dep); >>>> - if (!NONDEBUG_INSN_P (con)) >>>> - continue; >>>> + con = DEP_CON (dep); >>>> + if (!NONDEBUG_INSN_P (con)) >>>> + continue; >>>> >>>> There are some unnecessary whitespace changes (tabs->spaces) in a >>>> couple of places throughout the patch, such as in the above lines. >>>> >>>> +(define_peephole2 >>>> + [(set (match_operand:DF 0 "register_operand") >>>> + (float_extend:DF >>>> + (match_operand:SF 1 "memory_operand")))] >>>> + "TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS >>>> + && optimize_insn_for_speed_p () >>>> + && SSE_REG_P (operands[0])" >>>> + [(set (match_dup 2) (match_dup 1)) >>>> + (set (match_dup 0) (float_extend:DF (match_dup 2)))] >>>> +{ >>>> + operands[2] = gen_rtx_REG (SFmode, REGNO (operands[0])); >>>> +}) >>>> >>>> You should use >>>> >>>> (match_scratch:SF 2 "x") >>>> >>>> at the top of the peephole2 pattern, and you will get a free scratch >>>> register (assuming that it is not necessary to use the same register >>>> for input and output operand of the float_extend insn). >>>> >>>> Otherwise, the patch looks OK to me. >>>> >>>> Uros.