A proper duffcopy/duffzero/memmove is also an option. Best regards, Kenny Levinsen
> On 23. feb. 2016, at 18.02, erik quanstrom <quans...@quanstro.net> wrote: > >> On Tue Feb 23 07:55:26 PST 2016, kennylevin...@gmail.com wrote: >> A benchmark was supposedly made of the new duffcopy/duffzero which claimed >> significant speedup for larger copies: >> https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da >> >> I have no clue whether this holds true or not. My intention to reenable >> duffcopy and continue to use duffzero is mostly to avoid differences and >> ensure that the note handlers are floating point free in the future. Whether >> the duffcopy/duffzero’s current form is an actual optimization or just a >> complexity, I cannot say. A test was made in #cat-v out of annoyance where >> the result seemed to be that it was indeed faster to use MOVUPS, but I don’t >> remember the details. > > that post is a speedup relative to the original asm, which might not be as > good as the best > non-sse versions, and it is also for amd64. > > - erik >