On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 02/21/2019 06:31 AM, Kees Cook wrote:
> > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > <alexei.starovoi...@gmail.com> wrote:
> >>
> >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> >> <alexei.starovoi...@gmail.com> wrote:
> >>>
> >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption 
> >>>> disabled")
> >>>> a check was added for BPF_PROG_RUN() that for every invocation 
> >>>> preemption is
> >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course 
> >>>> this does
> >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it 
> >>>> does
> >>>> not make use of any functionality that would require this assertion. Fix 
> >>>> this
> >>>> false positive by adding and using SECCOMP_RUN() variant that does not 
> >>>> have
> >>>> the cant_sleep(); check.
> >>>>
> >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption 
> >>>> disabled")
> >>>> Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com
> >>>> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> >>>> Acked-by: Kees Cook <keesc...@chromium.org>
> >>>
> >>> Applied, Thanks
> >>
> >> Actually I think it's a wrong approach to go long term.
> >> I'm thinking to revert it.
> >> I think it's better to disable preemption for duration of
> >> seccomp cbpf prog.
> >> It's short and there is really no reason for it to be preemptible.
> >> When seccomp switches to ebpf we'll have this weird inconsistency.
> >> Let's just disable preemption for seccomp as well.
> >
> > A lot of changes will be needed for seccomp ebpf -- not the least of
> > which is convincing me there is a use-case. ;)
> >
> > But the main issue is that I'm not a huge fan of dropping two
> > barriers() across syscall entry. That seems pretty heavy-duty for
> > something that is literally not needed right now.
>
> Yeah, I think it's okay to add once actually technically needed. Last
> time I looked, if I recall correctly, at least Chrome installs some
> heavy duty seccomp programs that go close to prog limit.

Half of that is probably because that seccomp BPF code is so
inefficient, though.

This snippet shows that those programs constantly recheck the high
halves of arguments:

014e if args[1].high == 0x00000000: [true +3, false +0]
0153   if args[1].low == 0x00000003: [true +135, false +0] -> ret
ALLOW (syscalls: fcntl)
0155   if args[1].high == 0x00000000: [true +3, false +0]
015a     if args[1].low == 0x00000001: [true +128, false +0] -> ret
ALLOW (syscalls: fcntl)
015c     if args[1].high == 0x00000000: [true +3, false +0]
0161       if args[1].low == 0x00000002: [true +121, false +0] -> ret
ALLOW (syscalls: fcntl)
0163       if args[1].high == 0x00000000: [true +3, false +0]
0168         if args[1].low == 0x00000006: [true +114, false +0] ->
ret ALLOW (syscalls: fcntl)
016a         if args[1].high == 0x00000000: [true +3, false +0]
016f           if args[1].low == 0x00000007: [true +107, false +0] ->
ret ALLOW (syscalls: fcntl)
0171           if args[1].high == 0x00000000: [true +3, false +0]
0176             if args[1].low == 0x00000005: [true +100, false +0]
-> ret ALLOW (syscalls: fcntl)
0178             if args[1].high == 0x00000000: [true +3, false +0]
017d               if args[1].low == 0x00000000: [true +93, false +0]
-> ret ALLOW (syscalls: fcntl)
017f               if args[1].high == 0x00000000: [true +3, false +0]
0184                 if args[1].low == 0x00000406: [true +86, false
+0] -> ret ALLOW (syscalls: fcntl)
0186                 if args[1].high == 0x00000000: [true +3, false +0]
018b                   if args[1].low != 0x00000004: [true +80, false
+0] -> ret TRAP
018d                   if args[2].high != 0x00000000: [true +78, false
+0] -> ret TRAP
018f                   if args[2].low COMMON-BITS 0xffe363fc: [true
+76, false +75] -> ret TRAP
01db                   ret ALLOW (syscalls: fcntl)

Some of the generated code is pointless because all reachable code
from that point on has the same outcome (the last "ret ALLOW" in the
following sample is unreachable because they've already checked that
the high bit of the low half is set, so the low half can't be 3):

00c9           if args[0].low == 0x00000005: [true +16, false +0] ->
ret ALLOW (syscalls: clock_gettime, clock_getres)
00cb           if args[0].high == 0x00000000: [true +3, false +0]
00d0             if args[0].low == 0x00000003: [true +9, false +0] ->
ret ALLOW (syscalls: clock_gettime, clock_getres)
01dc             ret TRAP
00cc           if args[0].high != 0xffffffff: [true +8, false +0] -> ret TRAP
00ce           if args[0].low COMMON-BITS 0x80000000: [true +0, false +6]
00d0             if args[0].low == 0x00000003: [true +9, false +0] ->
ret ALLOW (syscalls: clock_gettime, clock_getres)
01dc             ret TRAP
01d7           ret TRAP

Reply via email to