On Mon, Jul 18, 2016 at 03:57:17AM -0700, Sargun Dhillon wrote: > > > On Sun, 17 Jul 2016, Alexei Starovoitov wrote: > > >On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote: > >> > >>+static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) > >>+{ > >>+ void *to = (void *) (long) r1; > >>+ void *from = (void *) (long) r2; > >>+ int size = (int) r3; > >>+ > >>+ /* check if we're in a user context */ > >>+ if (unlikely(in_interrupt())) > >>+ return -EINVAL; > >>+ if (unlikely(!current->pid)) > >>+ return -EINVAL; > >>+ > >>+ return copy_to_user(to, from, size); > >>+} > > > >thanks for the patch, unfortunately it's not that straightforward. > >copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and > >you'll see the splat since bpf programs are protected by rcu. > >Also 'current' can be null and I'm not sure what current->pid does. > >So the writing to user memory either has to be verified to avoid > >sleeping and faults or we need to use something like task_work_add > >mechanism. Ideas are certainly welcome. > > > > > From casual inspection, I can't find where current can be null when > in_interrupt() is false. Although, we can check before dereferencing it. > When not in a user context, the pid of the task struct returns 0. > > As far as preventing sleep, would the following alteration do? Or do we > actually need something more sophisticated? > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index be89c148..45878f3 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -86,14 +86,19 @@ static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 > r4, u64 r5) > void *to = (void *) (long) r1; > void *from = (void *) (long) r2; > int size = (int) r3; > + struct task_struct *task = current; > > /* check if we're in a user context */ > if (unlikely(in_interrupt())) > return -EINVAL; > - if (unlikely(!current->pid)) > + if (unlikely(!task || !task->pid)) > return -EINVAL; > > - return copy_to_user(to, from, size); > + /* Is this a user address, or a kernel address? */ > + if (!access_ok(VERIFY_WRITE, to, size)) > + return -EINVAL; > + > + return probe_kernel_write(to, from, size); > }
I think it can actually work. The only concern is that comment in access_ok() says that it may sleep whereas I couldn't find any arch where that would be the case. Could you please send an official patch with detailed commit log?