On Thu, Jan 18, 2018 at 3:05 PM, Greg Kroah-Hartman <gre...@linuxfoundation.org> wrote: > On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote: >> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <ty...@mit.edu> wrote: >> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote: >> >> >> >> If syzkaller can only test one tree than linux-next should be the one. >> > >> > Well, there's been some controversy about that. The problem is that >> > it's often not clear if this is long-standing bug, or a bug which is >> > in a particular subsystem tree --- and if so, *which* subsystem tree, >> > etc. So it gets blasted to linux-kernel, and to get_maintainer.pl, >> > which is often not accurate --- since the location of the crash >> > doesn't necessarily point out where the problem originated, and hence >> > who should look at the syzbot report. And so this has caused >> > some.... irritation. >> >> >> Re set of tested trees. >> >> We now have an interesting spectrum of opinions. >> >> Some assorted thoughts on this: >> >> 1. First, "upstream is clean" won't happen any time soon. There are >> several reasons for this: >> - Currently syzkaller only tests a subset of subsystems that it knows >> how to test, even the ones that it tests it tests poorly. Over time >> it's improved to test most subsystems and existing subsystems better. >> Just few weeks ago I've added some descriptions for crypto subsystem >> and it uncovered 20+ old bugs. >> - syzkaller is guided, genetic fuzzer over time it leans how to do >> more complex things by small steps. It takes time. >> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit >> memory), KTSAN (data races). >> - generic syzkaller smartness will be improved over time. >> - it will get more CPU resources. >> Effect of all of these things is multiplicative: we test more code, >> smarter, with more bug-detection tools, with more resources. So I >> think we need to plan for a mix of old and new bugs for foreseeable >> future. > > That's fine, but when you test Linus's tree, we "know" you are hitting > something that really is an issue, and it's not due to linux-next > oddities. > > When I see a linux-next report, and it looks "odd", my default reaction > is "ugh, must be a crazy patch in some other subsystem, I _know_ my code > in linux-next is just fine." :) > >> 2. get_maintainer.pl and mix of old and new bugs was mentioned as >> harming attribution. I don't see what will change when/if we test only >> upstream. Then the same mix of old/new bugs will be detected just on >> upstream, with all of the same problems for old/new, maintainers, >> which subsystem, etc. I think the amount of bugs in the kernel is >> significant part of the problem, but the exact boundary where we >> decide to start killing them won't affect number of bugs. > > I don't worry about that, the traceback should tell you a lot, and even > when that is wrong (i.e. warnings thrown up by sysfs core calls that are > obviously not a sysfs issue, but rather a subsystem issue), it's easy to > see. > >> 3. If we test only upstream, we increase chances of new security bugs >> sinking into releases. We sure could raise perceived security value of >> the bugs by keeping them private, letting them sink into release, >> letting them sink into distros, and then reporting a high-profile >> vulnerability. I think that's wrong. There is something broken with >> value measuring in security community. Bug that is killed before >> sinking into any release is the highest impact thing. As Alexei noted, >> fixing bugs es early as possible also reduces fix costs, backporting >> burden, etc. This also can eliminate need in bisection in some cases, >> say if you accepted a large change to some files and a bunch of >> crashes appears for these files on your tree soon, it's obvious what >> happens. > > I agree, this is an issue, but I think you have a lot of "low hanging > fruit" in Linus's tree left to find. Testing linux-next is great, but > the odds of something "new" being added there for your type of testing > right now is usually pretty low, right?
So I've dropped linux-next and mmots for now (you still can see them for few days for bugs already in the pipeline) and added bpf-next instead. bpf-next instance tests under root, has net.core.bpf_jit_enable=1 and the following syscalls enabled: "enable_syscalls": [ "bpf", "mkdir", "mount", "close", "perf_event_open", "ioctl$PERF*", "getpid", "gettid", "socketpair", "sendmsg", "recvmsg", "setsockopt$sock_attach_bpf", "socket$kcm", "ioctl$sock_kcm*" ] Let's see how this goes.