[gem5-users] Implicit Register Dependencies in x86

Mohit Gambhir via gem5-users Tue, 20 Jul 2021 17:27:37 -0700

Hi all,



I am running a DerivO3CPU basesd SE mode simulation with x86 ISA. The micro
benchmark that I am running contains a loop with independent multiply
instructions. An excerpt from the disassembly of the benchmark loop looks
something like this



  400c07:             48 0f af d2                         imul   %rdx,%rdx

  400c0b:             48 0f af db                         imul   %rbx,%rbx

…



When I look at the O3PipeView, I see that all the independent multiply
instructions are issued sequentially, even though there are 2 multiply
functional units and each of them is pipelined



[................f....dn.pi..c.r.................................................]-(
16664000.0) 0x00400c07.0 IMUL_R_R                  [     34983]

[................f....dn.p...ic.r................................................]-(
16664000.0) 0x00400c07.1 IMUL_R_R                  [     34984]

[................f....dn.p...ic.r................................................]-(
16664000.0) 0x00400c07.2 IMUL_R_R                  [     34985]

[................f....dn.p...i..c.r..............................................]-(
16664000.0) 0x00400c0b.0 IMUL_R_R                  [     34986]

[................f....dn.p......ic.r.............................................]-(
16664000.0) 0x00400c0b.1 IMUL_R_R                  [     34987]

[................f....dn.p......ic.r.............................................]-(
16664000.0) 0x00400c0b.2 IMUL_R_R                  [     34988]

…



Digging into it further I found that each of the IMUL_R_R instructions have
Implicit Register 0 and 1 (ProdHi and ProdLow) added as a source and
destination in the generated code. Following is the excerpt from
 decoder-ns-cc.inc.



Mul1sFlags::Mul1sFlags(…)

    {



…

….

               setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
INTREG_FOLDED(src1, foldOBit)));

               setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
INTREG_FOLDED(src2, foldOBit)));

               setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
INTREG_IMPLICIT(0)));

               setDestRegIdx(_numDestRegs++, RegId(IntRegClass,
INTREG_IMPLICIT(0)));

               _numIntDestRegs++;

               setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
INTREG_IMPLICIT(1)));

               setDestRegIdx(_numDestRegs++, RegId(IntRegClass,
INTREG_IMPLICIT(1)));



…

}



This results in all the independent multiply instructions to execute
sequentially and multiply throughput is 1/3.

If we have multiple functional units, then should these implicit registers
(ProdHi and ProdLo) be replicated for each of them, and if so, why add them
as source and destination at all?

Any clarifications or workaround for this?



Thanks,

Mohit

_______________________________________________
gem5-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Implicit Register Dependencies in x86

Reply via email to