The futex(2) syscall needs to be able to atomically copy the futex in
and out of userland.  The current implementation uses copyin(9) and
copyout(9) for that.  The futex is a 32-bit integer, and currently our
copyin(9) and copyout(9) don't guarantee an atomic 32-bit access.
Previously mpi@ and I discussed implementing new interfaces that do
guarantee the required atomicity.  However, it oocurred to me that we
could simply change our copyin implementations such that they
guarantee atomicity of a properly aligned 32-bit copyin and copyout.

The i386 version of these calls uses "rep movsl", which means it is
already atomic.  At least that is how I interpret 8.2.4 in Volume 3A
of the Intel SDM.  The diff below makes the amd64 version safe as
well.  This does introduce a few additional instructions in the loop.
Apparently modern Intel CPUs optimize the string loops.  If we can
rely on the hardware to turn 32-bit moves into 64-bit moves, we could
simplify the code by using "rep movsl" instead of "rep movsq".

Thoughts?


Index: arch/amd64/amd64/copy.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v
retrieving revision 1.7
diff -u -p -r1.7 copy.S
--- arch/amd64/amd64/copy.S     25 Apr 2015 21:31:24 -0000      1.7
+++ arch/amd64/amd64/copy.S     1 May 2017 15:32:17 -0000
@@ -138,7 +138,12 @@ ENTRY(copyout)
        rep
        movsq
        movb    %al,%cl
-       andb    $7,%cl
+       shrb    $2,%cl
+       andb    $1,%cl
+       rep
+       movsl
+       movb    %al,%cl
+       andb    $3,%cl
        rep
        movsb
        SMAP_CLAC
@@ -168,7 +173,12 @@ ENTRY(copyin)
        rep
        movsq
        movb    %al,%cl
-       andb    $7,%cl
+       shrb    $2,%cl
+       andb    $1,%cl
+       rep
+       movsl
+       movb    %al,%cl
+       andb    $3,%cl
        rep
        movsb
 

Reply via email to