Hello! > So what I'm confused about is in the original output template operand > 0 is duplicated. In the new template operand 1 is duplicated. > > Presumably what you're trying to accomplish is avoiding a false read > on operand 0 (the destination)? Can you please confirm?
> Knowing that should also help me evaluate the changes to recp and > rsqrt since they're being changed to the same style encoding when > operating strictly on registers. Yes, it's the same for all instructions in the patch - we're not just avoiding read but present more possibilities to execute speculatively for CPU here. The destination depends only on the source after the patch, and (thanks to CPU register renaming) CPU can successfully execute this instruction even if some previous instruction with write to the same destination is not finished currently. -- Alexander Nesterovskiy