The futex(2) syscall needs to be able to atomically copy the futex in and out of userland. The current implementation uses copyin(9) and copyout(9) for that. The futex is a 32-bit integer, and currently our copyin(9) and copyout(9) don't guarantee an atomic 32-bit access. Previously mpi@ and I discussed implementing new interfaces that do guarantee the required atomicity. However, it oocurred to me that we could simply change our copyin implementations such that they guarantee atomicity of a properly aligned 32-bit copyin and copyout.
The i386 version of these calls uses "rep movsl", which means it is already atomic. At least that is how I interpret 8.2.4 in Volume 3A of the Intel SDM. The diff below makes the amd64 version safe as well. This does introduce a few additional instructions in the loop. Apparently modern Intel CPUs optimize the string loops. If we can rely on the hardware to turn 32-bit moves into 64-bit moves, we could simplify the code by using "rep movsl" instead of "rep movsq". Thoughts? Index: arch/amd64/amd64/copy.S =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v retrieving revision 1.7 diff -u -p -r1.7 copy.S --- arch/amd64/amd64/copy.S 25 Apr 2015 21:31:24 -0000 1.7 +++ arch/amd64/amd64/copy.S 1 May 2017 15:32:17 -0000 @@ -138,7 +138,12 @@ ENTRY(copyout) rep movsq movb %al,%cl - andb $7,%cl + shrb $2,%cl + andb $1,%cl + rep + movsl + movb %al,%cl + andb $3,%cl rep movsb SMAP_CLAC @@ -168,7 +173,12 @@ ENTRY(copyin) rep movsq movb %al,%cl - andb $7,%cl + shrb $2,%cl + andb $1,%cl + rep + movsl + movb %al,%cl + andb $3,%cl rep movsb