On Wed, Dec 2, 2015 at 5:10 PM, Dmitry Vyukov <dvyu...@google.com> wrote: > ping > > Number of bugs found with this coverage in kernel already crossed 40: > https://github.com/google/syzkaller/wiki/Found-Bugs > > > > > On Fri, Nov 27, 2015 at 3:30 PM, Dmitry Vyukov <dvyu...@google.com> wrote: >> +syzkaller group >> >> On Fri, Nov 27, 2015 at 3:28 PM, Dmitry Vyukov <dvyu...@google.com> wrote: >>> Hello, >>> >>> This patch adds support for coverage-guided fuzzing: >>> https://codereview.appspot.com/280140043 >>> >>> Coverage-guided fuzzing is a powerful randomized testing technique >>> that uses coverage feedback to determine new interesting inputs to a >>> program. Some examples of the systems that use it are AFL >>> (http://lcamtuf.coredump.cx/afl/) and LibFuzzer >>> (http://llvm.org/docs/LibFuzzer.html). Compiler coverage >>> instrumentation allows to make fuzzing more efficient as compared to >>> binary instrumentation. >>> >>> Flag that enables coverage is named -fsanitize-coverage=trace-pc to >>> make it consistent with similar functionality in clang, which already >>> supports a set of fuzzing coverage modes: >>> http://clang.llvm.org/docs/SanitizerCoverage.html >>> -fsanitize-coverage=trace-pc is not yet supported in clang, but we >>> plan to add it. >>> >>> This particular coverage mode simply inserts function calls into every >>> basic block. >>> I've built syzkaller, a Linux system call fuzzer >>> (https://github.com/google/syzkaller), using this functionality. The >>> fuzzer has found 30+ previously unknown bugs in kernel >>> (https://github.com/google/syzkaller/wiki/Found-Bugs) in slightly more >>> than a month (while kernel was extensively fuzzed with trinity -- >>> non-guided fuzzer). Quentin also built some kernel fuzzer on top of >>> it. >>> >>> Why not gcov. Typical fuzzing loop looks as follows: (1) reset >>> coverage, (2) execute a bit of code, (3) collect coverage, repeat. A >>> typical coverage can be just a dozen of basic blocks (e.g. invalid >>> input). In such context gcov becomes prohibitively expensive as >>> reset/collect coverage steps depend on total number of basic >>> blocks/edges in program (in case of kernel it is about 2M). Cost of >>> this "tracing" mode depends on number of executed basic blocks/edges. >>> On top of that, kernel required per-thread coverage because there are >>> always background threads and unrelated processes that also produce >>> coverage. With inlined gcov instrumentation per-thread coverage is not >>> possible. >>> Inlined fast paths do not make lots of sense in fuzzing scenario, >>> because lots of code is executed just once in between resets. Also it >>> is not really possible to inline accesses to per-cpu and per-task data >>> structures for kernel. >>> >>> OK for trunk?