On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <dan...@iogearbox.net> wrote: > On 02/21/2019 06:31 AM, Kees Cook wrote: > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > <alexei.starovoi...@gmail.com> wrote: > >> > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > >> <alexei.starovoi...@gmail.com> wrote: > >>> > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption > >>>> disabled") > >>>> a check was added for BPF_PROG_RUN() that for every invocation > >>>> preemption is > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course > >>>> this does > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it > >>>> does > >>>> not make use of any functionality that would require this assertion. Fix > >>>> this > >>>> false positive by adding and using SECCOMP_RUN() variant that does not > >>>> have > >>>> the cant_sleep(); check. > >>>> > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption > >>>> disabled") > >>>> Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com > >>>> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> > >>>> Acked-by: Kees Cook <keesc...@chromium.org> > >>> > >>> Applied, Thanks > >> > >> Actually I think it's a wrong approach to go long term. > >> I'm thinking to revert it. > >> I think it's better to disable preemption for duration of > >> seccomp cbpf prog. > >> It's short and there is really no reason for it to be preemptible. > >> When seccomp switches to ebpf we'll have this weird inconsistency. > >> Let's just disable preemption for seccomp as well. > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > which is convincing me there is a use-case. ;) > > > > But the main issue is that I'm not a huge fan of dropping two > > barriers() across syscall entry. That seems pretty heavy-duty for > > something that is literally not needed right now. > > Yeah, I think it's okay to add once actually technically needed. Last > time I looked, if I recall correctly, at least Chrome installs some > heavy duty seccomp programs that go close to prog limit.
Half of that is probably because that seccomp BPF code is so inefficient, though. This snippet shows that those programs constantly recheck the high halves of arguments: 014e if args[1].high == 0x00000000: [true +3, false +0] 0153 if args[1].low == 0x00000003: [true +135, false +0] -> ret ALLOW (syscalls: fcntl) 0155 if args[1].high == 0x00000000: [true +3, false +0] 015a if args[1].low == 0x00000001: [true +128, false +0] -> ret ALLOW (syscalls: fcntl) 015c if args[1].high == 0x00000000: [true +3, false +0] 0161 if args[1].low == 0x00000002: [true +121, false +0] -> ret ALLOW (syscalls: fcntl) 0163 if args[1].high == 0x00000000: [true +3, false +0] 0168 if args[1].low == 0x00000006: [true +114, false +0] -> ret ALLOW (syscalls: fcntl) 016a if args[1].high == 0x00000000: [true +3, false +0] 016f if args[1].low == 0x00000007: [true +107, false +0] -> ret ALLOW (syscalls: fcntl) 0171 if args[1].high == 0x00000000: [true +3, false +0] 0176 if args[1].low == 0x00000005: [true +100, false +0] -> ret ALLOW (syscalls: fcntl) 0178 if args[1].high == 0x00000000: [true +3, false +0] 017d if args[1].low == 0x00000000: [true +93, false +0] -> ret ALLOW (syscalls: fcntl) 017f if args[1].high == 0x00000000: [true +3, false +0] 0184 if args[1].low == 0x00000406: [true +86, false +0] -> ret ALLOW (syscalls: fcntl) 0186 if args[1].high == 0x00000000: [true +3, false +0] 018b if args[1].low != 0x00000004: [true +80, false +0] -> ret TRAP 018d if args[2].high != 0x00000000: [true +78, false +0] -> ret TRAP 018f if args[2].low COMMON-BITS 0xffe363fc: [true +76, false +75] -> ret TRAP 01db ret ALLOW (syscalls: fcntl) Some of the generated code is pointless because all reachable code from that point on has the same outcome (the last "ret ALLOW" in the following sample is unreachable because they've already checked that the high bit of the low half is set, so the low half can't be 3): 00c9 if args[0].low == 0x00000005: [true +16, false +0] -> ret ALLOW (syscalls: clock_gettime, clock_getres) 00cb if args[0].high == 0x00000000: [true +3, false +0] 00d0 if args[0].low == 0x00000003: [true +9, false +0] -> ret ALLOW (syscalls: clock_gettime, clock_getres) 01dc ret TRAP 00cc if args[0].high != 0xffffffff: [true +8, false +0] -> ret TRAP 00ce if args[0].low COMMON-BITS 0x80000000: [true +0, false +6] 00d0 if args[0].low == 0x00000003: [true +9, false +0] -> ret ALLOW (syscalls: clock_gettime, clock_getres) 01dc ret TRAP 01d7 ret TRAP