Re: A problem with one instruction multiple latencies and pipelines

Richard Earnshaw Mon, 14 Sep 2020 02:09:18 -0700

On 14/09/2020 03:53, Qian, Jianhua wrote:
>> -----Original Message-----
>> From: Richard Earnshaw <richard.earns...@foss.arm.com>
>> Sent: Friday, September 11, 2020 9:30 PM
>> To: Qian, Jianhua/钱 建华 <qia...@cn.fujitsu.com>; gcc@gcc.gnu.org
>> Subject: Re: A problem with one instruction multiple latencies and pipelines
>>
>> On 07/09/2020 07:08, Qian, Jianhua wrote:
>>> Hi
>>>
>>> I'm adding a new machine model. I have a problem when writing the
>> "define_insn_reservation" for instruction scheduling.
>>> How to write the "define_insn_reservation" for one instruction that there 
>>> are
>> different latencies and pipelines according to parameter.
>>>
>>> For example, the ADD (shifted register) instruction in a64fx
>>>
>>> Instruction            Option                             Latency
>> Pipeline
>>> ADD (shifted register)  <amount> = 0                     1          EX*
>> | EAG*
>>>                       <amount> = [1-4] && <shift>=LSL  1+1
>> (EXA + EXA) | (EXB + EXB)
>>>                                                          2+1       (EXA
>> + EXA) | (EXB + EXB)
>>>
>>
>> A shift by immediate zero isn't a shift, so should never use this RTL 
>> pattern.
>> We can ignore that case.
>>
>>> In aarch64.md ADD (shifted register) instruction is defined as following．
>>>  (define_insn "*add_<shift>_<mode>"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
>> "r")
>>>                               (match_operand:QI 2
>> "aarch64_shift_imm_<mode>" "n"))
>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>   ""
>>>   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>>>   [(set_attr "type" "alu_shift_imm")]
>>> )
>>
>> You might consider using a define_bypass to adjust the cost - the matcher 
>> rule
>> takes a producer and consumer RTL - you don't care about the consumer, but
>> you can use the bypass to reduce the cost if the producer uses an immediate 
>> in
>> the 'low latency' range.  This would avoid having to make a load of 
>> whole-sale
>> changes to the main parts of the machine description.
> 
> Thanks for your comment.
> But I think the define_bypass can only change the latency for special 
> instruction.
> Pipeline also could be changed by define_bypass?
>


Possibly, but if this is part of the out-of-order units of the pipe, I
really don't think it will matter.  In fact, I'm not even convinced that
trying to model the out-of-order stages is worthwhile - let the CPU
handle that: any long-latency instruction, such as a memory access that
misses the L1 cache will completely mess up the compiler's understanding
of the pipeline state anyway.

What I think is more important is to get a good model for the in-order
bits at the front of the pipe accurately modelled so that you can
maximize the throughput of those stages.  Try to get a mix of
instructions so that a single issue unit in the core doesn't get clogged
up and block further decode.

R.

> Regards
> Qian
> 
>>>
>>> It could not be distinguished by the type "alu_shift_imm" when writing
>> "define_insn_reservation" for ADD (shifted register).
>>> What should I do?
>>>
>>> Regards
>>> Qian
>>>
>>>
>>>
>>
>> R.
>>
> 
> 
>

Re: A problem with one instruction multiple latencies and pipelines

Reply via email to