----- On Apr 13, 2021, at 1:07 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Apr 13, 2021 at 7:01 PM Eric Dumazet <eduma...@google.com> wrote:
>>
>> On Tue, Apr 13, 2021 at 6:57 PM Eric Dumazet <eduma...@google.com> wrote:
>> >
>> > On Tue, Apr 13, 2021 at 6:54 PM Mathieu Desnoyers
>> > <mathieu.desnoy...@efficios.com> wrote:
>> > >
>> > > ----- On Apr 13, 2021, at 12:22 PM, Eric Dumazet eric.duma...@gmail.com 
>> > > wrote:
>> > >
>> > > > From: Eric Dumazet <eduma...@google.com>
>> > > >
>> > > > Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union,
>> > > > update includes") added regressions for our servers.
>> > > >
>> > > > Using copy_from_user() and clear_user() for 64bit values
>> > > > is suboptimal.
>> > > >
>> > > > We can use faster put_user() and get_user().
>> > > >
>> > > > 32bit arches can be changed to use the ptr32 field,
>> > > > since the padding field must always be zero.
>> > > >
>> > > > v2: added ideas from Peter and Mathieu about making this
>> > > >    generic, since my initial patch was only dealing with
>> > > >    64bit arches.
>> > >
>> > > Ah, now I remember the reason why reading and clearing the entire 64-bit
>> > > is important: it's because we don't want to allow user-space processes to
>> > > use this change in behavior to figure out whether they are running on a
>> > > 32-bit or in a 32-bit compat mode on a 64-bit kernel.
>> > >
>> > > So although I'm fine with making 64-bit kernels faster, we'll want to 
>> > > keep
>> > > updating the entire 64-bit ptr field on 32-bit kernels as well.
>> > >
>> > > Thanks,
>> > >
>> >
>> > So... back to V1 then ?
>>
>> Or add more stuff as in :
> 
> diff against v2, WDYT ?

I like this approach slightly better, because it moves the preprocessor ifdefs 
into
rseq_get_rseq_cs and clear_rseq_cs, while keeping the same behavior for a 32-bit
process running on native 32-bit kernel and as compat task on a 64-bit kernel.

That being said, I don't expect anyone to care much about performance of 32-bit
kernels, so we could use copy_from_user() on 32-bit kernels to remove 
special-cases
in 32-bit specific code. This would eliminate the 32-bit specific "padding" 
read, and
let the TASK_SIZE comparison handle the check for both 32-bit and 64-bit 
kernels.

As for clear_user(), I wonder whether we could simply keep using it, but change 
the
clear_user() macro to figure out that it can use a faster 8-byte put_user ? I 
find it
odd that performance optimizations which would be relevant elsewhere creep into 
the
rseq code.

Thanks,

Mathieu

> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index
> f2eee3f7f5d330688c81cb2e57d47ca6b843873e..537b1f684efa11069990018ffa3642c209993011
> 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -136,6 +136,10 @@ static int rseq_get_cs_ptr(struct rseq_cs __user **uptrp,
> {
>        u32 ptr;
> 
> +       if (get_user(ptr, &rseq->rseq_cs.ptr.padding))
> +               return -EFAULT;
> +       if (ptr)
> +               return -EINVAL;
>        if (get_user(ptr, &rseq->rseq_cs.ptr.ptr32))
>                return -EFAULT;
>        *uptrp = (struct rseq_cs __user *)ptr;
> @@ -150,8 +154,9 @@ static int rseq_get_rseq_cs(struct task_struct *t,
> struct rseq_cs *rseq_cs)
>        u32 sig;
>        int ret;
> 
> -       if (rseq_get_cs_ptr(&urseq_cs, t->rseq))
> -               return -EFAULT;
> +       ret = rseq_get_cs_ptr(&urseq_cs, t->rseq);
> +       if (ret)
> +               return ret;
>        if (!urseq_cs) {
>                memset(rseq_cs, 0, sizeof(*rseq_cs));
>                return 0;
> @@ -237,7 +242,8 @@ static int clear_rseq_cs(struct task_struct *t)
> #ifdef CONFIG_64BIT
>        return put_user(0UL, &t->rseq->rseq_cs.ptr64);
> #else
> -       return put_user(0UL, &t->rseq->rseq_cs.ptr.ptr32);
> +       return put_user(0UL, &t->rseq->rseq_cs.ptr.ptr32) |
> +              put_user(0UL, &t->rseq->rseq_cs.ptr.padding);
> #endif
>  }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Reply via email to