Hi! On Fri, Sep 04, 2020 at 04:47:37PM +0800, Kewen.Lin wrote: > >> Apart from that, one P9 specific point is that the update form load isn't > >> preferred, the reason is that the instruction can not retire until both > >> parts complete, it can hold up subsequent instructions from retiring. > >> If the addi stalls (starvation), the instruction can not retire and can > >> cause things stuck. It seems also something we can model here? > > > > This is (almost) no problem on p9, since we no longer have issue groups. > > It can hold up older insns from retiring, sure, but they *will* have > > finished, and p9 can retire 64 insns per cycle. The "completion wall" > > is gone. The only problem is if things stick around so long that > > resources run out... but you're talking 100s of insns there. > > Theoretically it's fine, but the addi starvation was observed in the FP/SIMD > instructions intensive loop code, which did cause some worse performance. :(
"addi starvation" has nothing to do with addi (it also happens for other insns), and nothing with update form memory insns either. What happens is simply that no shorter latency insns are issued by the core so long as longer latency insns (like most float insns) are available. So in really nice floating point loops we execute the few integer add insns much too late, much later than they were in the machine code, which then makes the memory insns late as well, etc. Segher