On 5/7/18, Mateusz Guzik <m...@freebsd.org> wrote: > Author: mjg > Date: Mon May 7 15:07:28 2018 > New Revision: 333324 > URL: https://svnweb.freebsd.org/changeset/base/333324 > > Log: > amd64: replace libkern's memset and memmove with assembly variants > > memmove is repurposed bcopy (arguments swapped, return value added) > The libkern variant is a wrapper around bcopy, so this is a big > improvement. > > memset is repurposed memcpy. The librkern variant is doing fishy stuff, > including branching on 0 and calling bzero. > > Both functions are rather crude and subject to partial depessimization. > > This is a soft prerequisite to adding variants utilizing the > 'Enhanced REP MOVSB/STOSB' bit and let the kernel patch at runtime. > > Modified: > head/sys/amd64/amd64/support.S > head/sys/conf/files.amd64 > > Modified: head/sys/amd64/amd64/support.S > ============================================================================== > --- head/sys/amd64/amd64/support.S Mon May 7 15:07:26 2018 > (r333323) > +++ head/sys/amd64/amd64/support.S Mon May 7 15:07:28 2018 > (r333324) > @@ -162,6 +162,58 @@ ENTRY(bcopy) > END(bcopy) > > /* > + * memmove(dst, src, cnt) > + * rdi, rsi, rdx > + * Original by: > + * w...@tools.de (Wolfgang Solfrank, TooLs GmbH) +49-228-985800 > + */ > +ENTRY(memmove) > + PUSH_FRAME_POINTER > + movq %rdi,%r9 > + movq %rdx,%rcx > + > + movq %rdi,%rax > + subq %rsi,%rax > + cmpq %rcx,%rax /* overlapping && src < dst? */ > + jb 1f > + > + shrq $3,%rcx /* copy by 64-bit words */ > + rep > + movsq > + movq %rdx,%rcx > + andq $7,%rcx /* any bytes left? */ > + rep > + movsb > + movq %r9,%rax > + POP_FRAME_POINTER > + ret > + > + /* ALIGN_TEXT */ > +1: > + addq %rcx,%rdi /* copy backwards */ > + addq %rcx,%rsi > + decq %rdi > + decq %rsi > + andq $7,%rcx /* any fractional bytes? */ > + std > + rep > + movsb > + movq %rdx,%rcx /* copy remainder by 32-bit > words */ > + shrq $3,%rcx > + subq $7,%rsi > + subq $7,%rdi > + rep > + movsq > + cld > + movq %r9,%rax > + POP_FRAME_POINTER > + ret > +END(memmove) > + > +/* > + * memcpy(dst, src, len) > + * rdi, rsi, rdx > + * > * Note: memcpy does not support overlapping copies > */ > ENTRY(memcpy) > @@ -178,6 +230,27 @@ ENTRY(memcpy) > POP_FRAME_POINTER > ret > END(memcpy) > + > +/* > + * memset(dst, c, len) > + * rdi, rsi, rdx > + */ > +ENTRY(memset) > + PUSH_FRAME_POINTER > + movq %rdi,%r9 > + movq %rdx,%rcx > + movq %rsi,%rax > + shrq $3,%rcx > + rep > + stosq
According to Intel SDM stosq stores the whole RAX into destination, and then increments the destination register with 8. This implementation is wrong, since the c is a char, and the The RAX looks like 000000CC, so the stored patter would be 000000CC * SIZE / 8 * 8 + CC * SIZE % 8 in destination buffer. > + movq %rdx,%rcx > + andq $7,%rcx > + rep > + stosb > + movq %r9,%rax > + POP_FRAME_POINTER > + ret > +END(memset) > > /* > * pagecopy(%rdi=from, %rsi=to) > > Modified: head/sys/conf/files.amd64 > ============================================================================== > --- head/sys/conf/files.amd64 Mon May 7 15:07:26 2018 (r333323) > +++ head/sys/conf/files.amd64 Mon May 7 15:07:28 2018 (r333324) > @@ -620,8 +620,6 @@ isa/vga_isa.c optional vga > kern/kern_clocksource.c standard > kern/link_elf_obj.c standard > libkern/x86/crc32_sse42.c standard > -libkern/memmove.c standard > -libkern/memset.c standard > # > # IA32 binary support > # > _______________________________________________ > svn-src-head@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/svn-src-head > To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org" > _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"