On 11/16/18 6:10 AM, Emilio G. Cota wrote: > It's possible that newer machines with larger reorder buffers > will be able to take better advantage of the higher instruction > locality, hiding the latency of having to execute more instructions. > I'll test on Skylake tomorrow.
I've noticed that the code we generate for calls has twice as many instructions as really needed for setting up the arguments. I have a plan to fix that, which hopefully will solve this problem. r~