On Fri, Jul 14, 2017 at 11:17 PM, Kostya Serebryany <k...@google.com> wrote: >>>> > Hi >>>> > >>>> > I wrote a test for "-fsanitize-coverage=trace-cmp" . >>>> > >>>> > Is there anybody tells me if these codes could be merged into gcc ? >>>> >>>> >>>> Nice! >>>> >>>> We are currently working on Linux kernel fuzzing that use the >>>> comparison tracing. We use clang at the moment, but having this >>>> support in gcc would be great for kernel land. >>>> >>>> One concern I have: do we want to do some final refinements to the API >>>> before we implement this in both compilers? >>>> >>>> 2 things we considered from our perspective: >>>> - communicating to the runtime which operands are constants >>>> - communicating to the runtime which comparisons are counting loop checks >>>> >>>> First is useful if you do "find one operand in input and replace with >>>> the other one" thing. Second is useful because counting loop checks >>>> are usually not useful (at least all but one). >>>> In the original Go implementation I also conveyed signedness of >>>> operands, exact comparison operation (<, >, etc): >>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13 >>>> But I did not find any use for that. >>>> I also gave all comparisons unique IDs: >>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24 >>>> That turned out to be useful. And there are chances we will want this >>>> for C/C++ as well. >>>> >>>> Kostya, did anything like this pop up in your work on libfuzzer? >>>> Can we still change the clang API? At least add an additional argument >>>> to the callbacks? >>> >>> >>> I'd prefer not to change the API, but extend it (new compiler flag, new >>> callbacks), if absolutely needed. >>> Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra >>> parameter that has the ID). >>> I don't like the approach with compiler-generated constant IDs. >> >> Yes, if we do it for C/C++, we need to create globals and pass pointer >> to a global to the callbacks. IDs do not work for C/C++. >> >>> Yes, it's a bit more efficient, but much less flexible in presence of >>> multiple modules, DSOs, dlopen, etc. >>> >>> I was also looking at completely inlining this instrumentation because it's >>> pretty expensive even in it's current form >>> (adding more parameters will make things worse). >>> This is going to be much less flexible, of course, so I'll attack it only >>> once I settle on the algorithm to handle CMPs in libFuzzer. >> >> This will require a new, completely different API for >> compiler<->runtime anyway, so we can put this aside for now. >> >> >>>> At the very least I would suggest that we add an additional arg that >>>> contains some flags (1/2 arg is a const, this is counting loop check, >>>> etc). If we do that we can also have just 1 callback that accepts >>>> uint64's for args because we can pass operand size in the flags: >>> >>> >>> How many flag combinations do we need and do we *really* need them? >>> >>> If the number of flag combinations is small, I'd prefer to have separate >>> callbacks (__sanitizer_cov_trace_cmp_loop_bound ?) >>> >>> Do we really need to know that one arg is a const? >>> It could well be a constant in fact, but compiler won't see it. >>> >>> int foo(int n) { ... if (i < n) ... } >>> ... >>> foo(42); // not inlined. >>> >>> We need to handle both cases the same way. >> >> >> Well, following this line of reasoning we would need only >> __asan_load/storeN callbacks for asan and remove >> __asan_load/store1/2/4/8, because compiler might not know the size at >> compile time. Constant-ness is only an optimization. If compiler does >> not know that something is a const, fine. Based on my experience with >> go-fuzz and our early experience with kernel, we badly need const >> hint. >> Otherwise fuzzer generates gazillions of candidates based on >> comparison arguments. Note that kernel is several order of magnitude >> larger than what people usually fuzz in user-space, inputs are more >> complex and at the same time execution speed is several order of >> magnitude lower. We can't rely on raw speed. >> >> Thinking of this more, I don't thing that globals will be useful in >> the kernel context (the main problem is that we have multiple >> transient isolated kernels). If we track per-comparison site >> information, we will probably use PCs. So I am ready to give up on >> this. >> >> Both of you expressed concerns about performance. Kostya says we >> should not break existing clang API. >> If we limit this to only constant-ness, then I think we can make this >> both forward and backward compatible, which means we don't need to >> handle it now. E.g. we can: >> - if both operands are const (if it's possible at all), don't emit any >> callback >> - if only one is const, emit __sanitizer_cov_trace_cmp_const1 and >> pass the const in a known position (i.e. always first/second arg) >> - if none are const, emit callback __sanitizer_cov_trace_cmp_dyn1 >> Then compiler emits weak aliases form >> __sanitizer_cov_trace_cmp_const/dyn1 to the old >> __sanitizer_cov_trace_cmp1, which makes it backwards compatible. >> New runtimes that implement __sanitizer_cov_trace_cmp_const/dyn1, also >> need to provide the old __sanitizer_cov_trace_cmp1 for old compilers. > > SGTM++ > > Although, maybe we can actually break the API this way to keep things > simpler (no weak aliases) > * leave __sanitizer_cov_trace_cmp[1248] for general case > * add __sanitizer_cov_trace_cmp[1248]_const for constant in the 2-nd parameter > * add __sanitizer_cov_trace_cmp[1248]_loop for loop bound in the 2nd parameter
This would work for us. > * do we need __sanitizer_cov_trace_cmp[1248]_const_loop ? We don't know yet. Potentially yes, because const is reading/writing to a static buffer, while non-const is reading/writing to a buffer with user-provided size.