On 11/22/13 10:03, BELBACHIR Selim wrote:
Ok so I should avoid the auto_inc alternatives in PARALLEL. It's certainly a quite rare RTL and I doubt the effort worth it.
That'd be my inclination as well.
I'm not sure what chip you're working on, but those kind of multiple-output instructions tend to cause all kinds of performance problems once the chip goes to out-of-order execution. Basically most folks designing the chip allow the operations to run independently, but they have to retire as a group. Thus an insn like that would hold 3 slots in the retirement buffer (two outputs plus embedded side effect) until all three operations are ready to retire. That can be a real drag if the memory reference doesn't hit the cache.
jeff