Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

Jann Horn Fri, 22 Feb 2019 15:03:16 -0800

On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <[email protected]> wrote:
> > On Feb 22, 2019, at 2:21 PM, Nadav Amit <[email protected]> wrote:
> >
> >> On Feb 22, 2019, at 2:17 PM, Jann Horn <[email protected]> wrote:
> >>
> >> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <[email protected]> wrote:
> >>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <[email protected]> wrote:
> >>>>
> >>>> (adding some people from the text_poke series to the thread, removing 
> >>>> stable@)
> >>>>
> >>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <[email protected]> 
> >>>> wrote:
> >>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov 
> >>>>>> <[email protected]> wrote:
> >>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote:
> >>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800
> >>>>>>> Alexei Starovoitov <[email protected]> wrote:
> >>>>>>>
> >>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
> >>>>>>>>>
> >>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not
> >>>>>>>>> allow user accesses. The easiest way to do that is actually likely 
> >>>>>>>>> to
> >>>>>>>>> use the "unsafe_get_user()" functions *without* doing a
> >>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault
> >>>>>>>>> on a kernel access to user space.
> >>>>>>>>
> >>>>>>>> On bpf side the bpf_probe_read() helper just calls 
> >>>>>>>> probe_kernel_read()
> >>>>>>>> and users pass both user and kernel addresses into it and expect
> >>>>>>>> that the helper will actually try to read from that address.
> >>>>>>>>
> >>>>>>>> If __probe_kernel_read will suddenly start failing on all user 
> >>>>>>>> addresses
> >>>>>>>> it will break the expectations.
> >>>>>>>> How do we solve it in bpf_probe_read?
> >>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user 
> >>>>>>>> byte-by-byte
> >>>>>>>> in the loop?
> >>>>>>>> That's doable, but people already complain that bpf_probe_read() is 
> >>>>>>>> slow
> >>>>>>>> and shows up in their perf report.
> >>>>>>>
> >>>>>>> We're changing kprobes to add a specific flag to say that we want to
> >>>>>>> differentiate between kernel or user reads. Can this be done with
> >>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single
> >>>>>>
> >>>>>> so you're saying you will break existing kprobe scripts?
> >>>>>> I don't think it's a good idea.
> >>>>>> It's not acceptable to break bpf_probe_read uapi.
> >>>>>
> >>>>> If so, the uapi is wrong: a long-sized number does not reliably 
> >>>>> identify an address if you don’t separately know whether it’s a user or 
> >>>>> kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I 
> >>>>> have lobbied for RISC-V and future x86_64 to join the crowd.  I don’t 
> >>>>> know whether I’ll win this fight, but the uapi will probably have to 
> >>>>> change for at least s390x.
> >>>>>
> >>>>> What to do about existing scripts is a different question.
> >>>>
> >>>> This lack of logical separation between user and kernel addresses
> >>>> might interact interestingly with the text_poke series, specifically
> >>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for
> >>>> patching" 
> >>>> (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&amp;data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&amp;sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&amp;reserved=0)
> >>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text
> >>>> poking" 
> >>>> (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&amp;data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&amp;sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&amp;reserved=0),
> >>>> right? If someone manages to get a tracing BPF program to trigger in a
> >>>> task that has switched to the patching mm, could they use
> >>>> bpf_probe_write_user() - which uses probe_kernel_write() after
> >>>> checking that KERNEL_DS isn't active and that access_ok() passes - to
> >>>> overwrite kernel text that is mapped writable in the patching mm?
> >>>
> >>> Yes, this is a good point. I guess text_poke() should be defined with
> >>> “__kprobes” and open-code memcpy.
> >>>
> >>> Does it sound reasonable?
> >>
> >> Doesn't __text_poke() as implemented in the proposed patch use a
> >> couple other kernel functions, too? Like switch_mm_irqs_off() and
> >> pte_clear() (which can be a call into a separate function on paravirt
> >> kernels)?
> >
> > I will move the pte_clear() to be done after the poking mm was unloaded.
> > Give me a few minutes to send a sketch of what I think should be done.
>
> Err.. You are right, I don’t see an easy way of preventing a kprobe from
> being set on switch_mm_irqs_off(), and open-coding this monster is too ugly.
>
> The reasonable solution seems to me as taking all the relevant pieces of
> code (and data) that might be used during text-poking and encapsulating them, 
> so they
> will be set in a memory area which cannot be kprobe'd. This can also be
> useful to write-protect data structures of code that calls text_poke(),
> e.g., static-keys. It can also protect data on that stack that is used
> during text_poke() from being overwritten from another core.
>
> This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves”
> when doing write-rarely operations.
>
> Right now, I think that text_poke() will keep being susceptible to such
> an attack, unless you have a better suggestion.


A relatively simple approach might be to teach BPF not to run kprobe
programs and such in contexts where current->mm isn't the active mm?
Maybe using nmi_uaccess_okay(), or something like that? It looks like
perf_callchain_user() also already uses that. Except that a lot of
this code is x86-specific...

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

Reply via email to