On Thu, 21 Nov 2024 at 13:40, Josh Poimboeuf <jpoim...@kernel.org> wrote:
>
> The profile is showing futex_get_value_locked():

Ahh.

> That has several callers, so we can probably just use get_user() there?

Yeah, that's the simplest thing. That thing isn't even some inline
function, so the real cost is the call.

That said, exactly because it's not inlined, and calls are expensive,
and this is apparently really critical, we can just do it with the
full "unsafe_get_user()" model.

It's not so complicated. The attached patch is untested, but I did
check that it generates almost perfect code:

    mov    %gs:0x0,%rax                 # current
    incl   0x1a9c(%rax)                 # current->pagefault_disable++
    movabs $0x123456789abcdef,%rcx      # magic virtual address size
    cmp    %rsi,%rcx                    # address masking
    sbb    %rcx,%rcx
    or     %rsi,%rcx
    stac                                # enable user space acccess
    mov    (%rcx),%ecx                  # get the value
    clac                                # disable user space access
    decl   0x1a9c(%rax)                 # current->pagefault_disable--
    mov    %ecx,(%rdi)                  # save the value
    xor    %eax,%eax                    # return 0
    ret

(with the error case for the page fault all out-of-line).

So this should be _faster_ than the old __get_user(), because while
the address masking is not needed, it's cheaper than the function call
used to be and the error handling is better.

If you can test this and verify that it actually help, I'll take it as
a patch. Consider it signed-off after testing.

> Also, is there any harm in speeding up __get_user()?  It still has ~80
> callers and it's likely to be slowing down things we don't know about.

How would you speed it up?  We definitely can't replace the fence with
addressing tricks. So we can't just replace it with "get_user()",
because of those horrid architecture-specific kernel uses.

Now, we could possibly say "just remove the fence in __get_user()
entirely", but that would involve moving it to access_ok().

And then it wouldn't actually speed anything up (except the horrid
architecture-specific kernel uses that then don't call access_ok() at
all - and we don't care about *those*).

               Linus
 kernel/futex/core.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 326bfe6549d7..9e1bd76652d8 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -464,13 +464,32 @@ int futex_cmpxchg_value_locked(u32 *curval, u32 __user *uaddr, u32 uval, u32 new
 
 int futex_get_value_locked(u32 *dest, u32 __user *from)
 {
-	int ret;
+	u32 val;
 
 	pagefault_disable();
-	ret = __get_user(*dest, from);
+	/*
+	 * The user address has been verified by get_futex_key(),
+	 * and futex_cmpxchg_value_locked() trusts that, but we do
+	 * not have any other ways to do it well, so we do the
+	 * full user access song and dance here.
+	 */
+	if (can_do_masked_user_access())
+		from = masked_user_access_begin(from);
+	else if (!user_read_access_begin(from, sizeof(*from)))
+		goto enable_and_error;
+
+	unsafe_get_user(val, from, Efault);
+	user_access_end();
 	pagefault_enable();
 
-	return ret ? -EFAULT : 0;
+	*dest = val;
+	return 0;
+
+Efault:
+	user_access_end();
+enable_and_error:
+	pagefault_enable();
+	return -EFAULT;
 }
 
 /**

Reply via email to