On 14/09/2020 03:53, Qian, Jianhua wrote: >> -----Original Message----- >> From: Richard Earnshaw <richard.earns...@foss.arm.com> >> Sent: Friday, September 11, 2020 9:30 PM >> To: Qian, Jianhua/钱 建华 <qia...@cn.fujitsu.com>; gcc@gcc.gnu.org >> Subject: Re: A problem with one instruction multiple latencies and pipelines >> >> On 07/09/2020 07:08, Qian, Jianhua wrote: >>> Hi >>> >>> I'm adding a new machine model. I have a problem when writing the >> "define_insn_reservation" for instruction scheduling. >>> How to write the "define_insn_reservation" for one instruction that there >>> are >> different latencies and pipelines according to parameter. >>> >>> For example, the ADD (shifted register) instruction in a64fx >>> >>> Instruction Option Latency >> Pipeline >>> ADD (shifted register) <amount> = 0 1 EX* >> | EAG* >>> <amount> = [1-4] && <shift>=LSL 1+1 >> (EXA + EXA) | (EXB + EXB) >>> 2+1 (EXA >> + EXA) | (EXB + EXB) >>> >> >> A shift by immediate zero isn't a shift, so should never use this RTL >> pattern. >> We can ignore that case. >> >>> In aarch64.md ADD (shifted register) instruction is defined as following. >>> (define_insn "*add_<shift>_<mode>" >>> [(set (match_operand:GPI 0 "register_operand" "=r") >>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" >> "r") >>> (match_operand:QI 2 >> "aarch64_shift_imm_<mode>" "n")) >>> (match_operand:GPI 3 "register_operand" "r")))] >>> "" >>> "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" >>> [(set_attr "type" "alu_shift_imm")] >>> ) >> >> You might consider using a define_bypass to adjust the cost - the matcher >> rule >> takes a producer and consumer RTL - you don't care about the consumer, but >> you can use the bypass to reduce the cost if the producer uses an immediate >> in >> the 'low latency' range. This would avoid having to make a load of >> whole-sale >> changes to the main parts of the machine description. > > Thanks for your comment. > But I think the define_bypass can only change the latency for special > instruction. > Pipeline also could be changed by define_bypass? >
Possibly, but if this is part of the out-of-order units of the pipe, I really don't think it will matter. In fact, I'm not even convinced that trying to model the out-of-order stages is worthwhile - let the CPU handle that: any long-latency instruction, such as a memory access that misses the L1 cache will completely mess up the compiler's understanding of the pipeline state anyway. What I think is more important is to get a good model for the in-order bits at the front of the pipe accurately modelled so that you can maximize the throughput of those stages. Try to get a mix of instructions so that a single issue unit in the core doesn't get clogged up and block further decode. R. > Regards > Qian > >>> >>> It could not be distinguished by the type "alu_shift_imm" when writing >> "define_insn_reservation" for ADD (shifted register). >>> What should I do? >>> >>> Regards >>> Qian >>> >>> >>> >> >> R. >> > > >