tuktuk created this revision.
tuktuk added reviewers: kcc, morehouse.
tuktuk added projects: clang, Sanitizers.
Herald added subscribers: llvm-commits, cfe-commits, hiraditya.
Herald added a project: LLVM.

This commit adds two command-line options to clang.
These options let the user decide which functions will receive 
SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted 
coverage-guided fuzzing.

Patch by Yannis Juglaret of DGA-MI, Rennes, France

libFuzzer tests its target against an evolving corpus, and relies on 
SanitizerCoverage instrumentation to collect the code coverage information that 
drives corpus evolution. Currently, libFuzzer collects such information for all 
functions of the target under test, and adds to the corpus every mutated sample 
that finds a new code coverage path in any function of the target. We propose 
instead to let the user specify which functions' code coverage information is 
relevant for building the upcoming fuzzing campaign's corpus. To this end, we 
add two new command line options for clang, enabling targeted coverage-guided 
fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way 
to leverage libFuzzer for big targets with thousands of functions or multiple 
dependencies. We publish this patch as work from DGA-MI of Rennes, France, with 
proper authorization from the hierarchy.

Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. 
First, the compiler will avoid costly instrumentation for non-relevant 
functions, accelerating fuzzer execution for each call to any of these 
functions. Second, the built fuzzer will produce and use a more accurate 
corpus, because it will not keep the samples that find new coverage paths in 
non-relevant functions.

The two new command line options are `-fsanitize-coverage-whitelist` and 
`-fsanitize-coverage-blacklist`. They accept files in the same format as the 
existing `-fsanitize-blacklist` option 
<https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new 
options influence SanitizerCoverage so that it will only instrument a subset of 
the functions in the target. We explain these options in detail in 
`clang/docs/SanitizerCoverage.rst`.

Consider now the woff2 fuzzing example from the libFuzzer tutorial 
<https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>.
 We are aware that we cannot conclude much from this example because mutating 
compressed data is generally a bad idea, but let us use it anyway as an 
illustration for its simplicity. Let us use an empty blacklist together with 
one of the three following whitelists:

  # (a)
  src:*
  fun:*
  
  # (b)
  src:SRC/*
  fun:*
  
  # (c)
  src:SRC/src/woff2_dec.cc
  fun:*

Running the built fuzzers shows how many instrumentation points the compiler 
adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the 
instrument-everything whitelist, it produces 11912 instrumentation points. 
Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring 
the dependency code for brotli (de)compression; it produces 3984 instrumented 
instrumentation points. Whitelist (c) focuses coverage to only instrument 
functions in the main file that deals with WOFF2 to TTF conversion, resulting 
in 1056 instrumentation points.

For experimentation purposes, we ran each fuzzer approximately 100 times, 
single process, with the initial corpus provided in the tutorial. We let the 
fuzzer run until it either found the heap buffer overflow or went out of 
memory. On this simple example, whitelists (b) and (c) found the heap buffer 
overflow more reliably and 5x faster than whitelist (a). The average execution 
times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 
s, and (c) 176 s.

We explain these results by the fact that WOFF2 to TTF conversion calls the 
brotli decompression algorithm's functions, which are mostly irrelevant for 
finding bugs in WOFF2 font reconstruction but nevertheless instrumented and 
used by whitelist (a) to guide fuzzing. This results in longer execution time 
for these functions and a partially irrelevant corpus. Contrary to whitelist 
(a), whitelists (b) and (c) will execute brotli-related functions without 
instrumentation overhead, and ignore new code paths found in them. This results 
in faster bug finding for WOFF2 font reconstruction.

The results for whitelist (b) are similar to the ones for whitelist (c). 
Indeed, WOFF2 to TTF conversion calls functions that are mostly located in 
SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by 
whitelist (b) do not tamper with bug finding, even though they are mostly 
irrelevant, simply because most of these functions do not get called. We get a 
slightly faster average time for bug finding with whitelist (b), which might 
indicate that some of the extra instrumentation points are actually relevant, 
or might just be random noise.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D63616

Files:
  clang/docs/SanitizerCoverage.rst
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/SanitizerArgs.h
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/Driver/SanitizerArgs.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/Transforms/Instrumentation.h
  llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp

Index: llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
===================================================================
--- llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
+++ llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
@@ -34,6 +34,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Support/SpecialCaseList.h"
 #include "llvm/Transforms/Instrumentation.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
@@ -179,8 +180,16 @@
 class SanitizerCoverageModule : public ModulePass {
 public:
   SanitizerCoverageModule(
-      const SanitizerCoverageOptions &Options = SanitizerCoverageOptions())
+      const SanitizerCoverageOptions &Options = SanitizerCoverageOptions(),
+      const std::vector<std::string>& WhitelistFiles = std::vector<std::string>(),
+      const std::vector<std::string>& BlacklistFiles = std::vector<std::string>())
       : ModulePass(ID), Options(OverrideFromCL(Options)) {
+    if (WhitelistFiles.size() > 0) {
+      Whitelist = SpecialCaseList::createOrDie(WhitelistFiles);
+    }
+    if (BlacklistFiles.size() > 0) {
+      Blacklist = SpecialCaseList::createOrDie(BlacklistFiles);
+    }
     initializeSanitizerCoverageModulePass(*PassRegistry::getPassRegistry());
   }
   bool runOnModule(Module &M) override;
@@ -250,6 +259,9 @@
   SmallVector<GlobalValue *, 20> GlobalsToAppendToCompilerUsed;
 
   SanitizerCoverageOptions Options;
+
+  std::unique_ptr<SpecialCaseList> Whitelist;
+  std::unique_ptr<SpecialCaseList> Blacklist;
 };
 
 } // namespace
@@ -313,6 +325,12 @@
 bool SanitizerCoverageModule::runOnModule(Module &M) {
   if (Options.CoverageType == SanitizerCoverageOptions::SCK_None)
     return false;
+  if (Whitelist &&
+      !Whitelist->inSection("coverage", "src", M.getSourceFileName()))
+    return false;
+  if (Blacklist &&
+      Blacklist->inSection("coverage", "src", M.getSourceFileName()))
+    return false;
   C = &(M.getContext());
   DL = &M.getDataLayout();
   CurModule = &M;
@@ -541,6 +559,10 @@
   if (F.hasPersonalityFn() &&
       isAsynchronousEHPersonality(classifyEHPersonality(F.getPersonalityFn())))
     return false;
+  if (Whitelist && !Whitelist->inSection("coverage", "fun", F.getName()))
+    return false;
+  if (Blacklist && Blacklist->inSection("coverage", "fun", F.getName()))
+    return false;
   if (Options.CoverageType >= SanitizerCoverageOptions::SCK_Edge)
     SplitAllCriticalEdges(F, CriticalEdgeSplittingOptions().setIgnoreUnreachableDests());
   SmallVector<Instruction *, 8> IndirCalls;
@@ -898,6 +920,8 @@
                     "ModulePass",
                     false, false)
 ModulePass *llvm::createSanitizerCoverageModulePass(
-    const SanitizerCoverageOptions &Options) {
-  return new SanitizerCoverageModule(Options);
+    const SanitizerCoverageOptions &Options,
+    const std::vector<std::string>& WhitelistFiles,
+    const std::vector<std::string>& BlacklistFiles) {
+  return new SanitizerCoverageModule(Options, WhitelistFiles, BlacklistFiles);
 }
Index: llvm/include/llvm/Transforms/Instrumentation.h
===================================================================
--- llvm/include/llvm/Transforms/Instrumentation.h
+++ llvm/include/llvm/Transforms/Instrumentation.h
@@ -183,7 +183,9 @@
 
 // Insert SanitizerCoverage instrumentation.
 ModulePass *createSanitizerCoverageModulePass(
-    const SanitizerCoverageOptions &Options = SanitizerCoverageOptions());
+    const SanitizerCoverageOptions &Options = SanitizerCoverageOptions(),
+    const std::vector<std::string>& WhitelistFiles = std::vector<std::string>(),
+    const std::vector<std::string>& BlacklistFiles = std::vector<std::string>());
 
 /// Calculate what to divide by to scale counts.
 ///
Index: clang/lib/Frontend/CompilerInvocation.cpp
===================================================================
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -1111,6 +1111,8 @@
   Opts.SanitizeCoveragePCTable = Args.hasArg(OPT_fsanitize_coverage_pc_table);
   Opts.SanitizeCoverageStackDepth =
       Args.hasArg(OPT_fsanitize_coverage_stack_depth);
+  Opts.SanitizeCoverageWhitelistFiles = Args.getAllArgValues(OPT_fsanitize_coverage_whitelist_EQ);
+  Opts.SanitizeCoverageBlacklistFiles = Args.getAllArgValues(OPT_fsanitize_coverage_blacklist_EQ);
   Opts.SanitizeMemoryTrackOrigins =
       getLastArgIntValue(Args, OPT_fsanitize_memory_track_origins_EQ, 0, Diags);
   Opts.SanitizeMemoryUseAfterDtor =
Index: clang/lib/Driver/SanitizerArgs.cpp
===================================================================
--- clang/lib/Driver/SanitizerArgs.cpp
+++ clang/lib/Driver/SanitizerArgs.cpp
@@ -720,6 +720,44 @@
       CoverageFeatures |= CoverageFunc;
   }
 
+  // Parse -fsanitize-coverage-{white,black}list options.
+  for (const auto *Arg : Args) {
+    if (Arg->getOption().matches(options::OPT_fsanitize_coverage_whitelist_EQ)) {
+      Arg->claim();
+      std::string CWLPath = Arg->getValue();
+      if (llvm::sys::fs::exists(CWLPath)) {
+        CoverageWhitelistFiles.push_back(CWLPath);
+        ExtraDeps.push_back(CWLPath);
+      } else {
+        D.Diag(clang::diag::err_drv_no_such_file) << CWLPath;
+      }
+    } else if (Arg->getOption().matches(options::OPT_fsanitize_coverage_blacklist_EQ)) {
+      Arg->claim();
+      std::string CBLPath = Arg->getValue();
+      if (llvm::sys::fs::exists(CBLPath)) {
+        CoverageBlacklistFiles.push_back(CBLPath);
+        ExtraDeps.push_back(CBLPath);
+      } else {
+        D.Diag(clang::diag::err_drv_no_such_file) << CBLPath;
+      }
+    }
+  }
+  // Validate whitelist and blacklist format.
+  if (CoverageWhitelistFiles.size() > 0) {
+    std::string WLError;
+    std::unique_ptr<llvm::SpecialCaseList> SCL(
+        llvm::SpecialCaseList::create(CoverageWhitelistFiles, WLError));
+    if (!SCL.get())
+      D.Diag(clang::diag::err_drv_malformed_sanitizer_coverage_whitelist) << WLError;
+  }
+  if (CoverageBlacklistFiles.size() > 0) {
+    std::string BLError;
+    std::unique_ptr<llvm::SpecialCaseList> SCL(
+        llvm::SpecialCaseList::create(CoverageBlacklistFiles, BLError));
+    if (!SCL.get())
+      D.Diag(clang::diag::err_drv_malformed_sanitizer_coverage_blacklist) << BLError;
+  }
+
   SharedRuntime =
       Args.hasFlag(options::OPT_shared_libsan, options::OPT_static_libsan,
                    TC.getTriple().isAndroid() || TC.getTriple().isOSFuchsia() ||
@@ -886,6 +924,18 @@
       CmdArgs.push_back(F.second);
   }
 
+  // Forward coverage whitelist and blacklist
+  for (const auto &CWLPath : CoverageWhitelistFiles) {
+    SmallString<64> CoverageWhitelistOpt("-fsanitize-coverage-whitelist=");
+    CoverageWhitelistOpt += CWLPath;
+    CmdArgs.push_back(Args.MakeArgString(CoverageWhitelistOpt));
+  }
+  for (const auto &CBLPath : CoverageBlacklistFiles) {
+    SmallString<64> CoverageBlacklistOpt("-fsanitize-coverage-blacklist=");
+    CoverageBlacklistOpt += CBLPath;
+    CmdArgs.push_back(Args.MakeArgString(CoverageBlacklistOpt));
+  }
+
   if (TC.getTriple().isOSWindows() && needsUbsanRt()) {
     // Instruct the code generator to embed linker directives in the object file
     // that cause the required runtime libraries to be linked.
Index: clang/lib/CodeGen/BackendUtil.cpp
===================================================================
--- clang/lib/CodeGen/BackendUtil.cpp
+++ clang/lib/CodeGen/BackendUtil.cpp
@@ -215,7 +215,7 @@
   Opts.Inline8bitCounters = CGOpts.SanitizeCoverageInline8bitCounters;
   Opts.PCTable = CGOpts.SanitizeCoveragePCTable;
   Opts.StackDepth = CGOpts.SanitizeCoverageStackDepth;
-  PM.add(createSanitizerCoverageModulePass(Opts));
+  PM.add(createSanitizerCoverageModulePass(Opts, CGOpts.SanitizeCoverageWhitelistFiles, CGOpts.SanitizeCoverageBlacklistFiles));
 }
 
 // Check if ASan should use GC-friendly instrumentation for globals.
Index: clang/include/clang/Driver/SanitizerArgs.h
===================================================================
--- clang/include/clang/Driver/SanitizerArgs.h
+++ clang/include/clang/Driver/SanitizerArgs.h
@@ -28,6 +28,8 @@
   std::vector<std::string> BlacklistFiles;
   std::vector<std::string> ExtraDeps;
   int CoverageFeatures = 0;
+  std::vector<std::string> CoverageWhitelistFiles;
+  std::vector<std::string> CoverageBlacklistFiles;
   int MsanTrackOrigins = 0;
   bool MsanUseAfterDtor = true;
   bool CfiCrossDso = false;
Index: clang/include/clang/Driver/Options.td
===================================================================
--- clang/include/clang/Driver/Options.td
+++ clang/include/clang/Driver/Options.td
@@ -966,6 +966,16 @@
       Group<f_clang_Group>, Flags<[CoreOption, DriverOption]>,
       HelpText<"Disable specified features of coverage instrumentation for "
                "Sanitizers">, Values<"func,bb,edge,indirect-calls,trace-bb,trace-cmp,trace-div,trace-gep,8bit-counters,trace-pc,trace-pc-guard,no-prune,inline-8bit-counters">;
+def fsanitize_coverage_whitelist_EQ : Joined<["-"], "fsanitize-coverage-whitelist=">,
+    Group<f_clang_Group>, Flags<[CoreOption, DriverOption]>,
+    HelpText<"Restrict sanitizer coverage instrumentation exclusively to modules and functions that match the provided special case list, except the blacklisted ones">;
+def fsanitize_coverage_whitelist : Separate<["-"], "fsanitize-coverage-whitelist">,
+    Group<f_Group>, Flags<[CoreOption, DriverOption]>, Alias<fsanitize_coverage_whitelist_EQ>;
+def fsanitize_coverage_blacklist_EQ : Joined<["-"], "fsanitize-coverage-blacklist=">,
+    Group<f_clang_Group>, Flags<[CoreOption, DriverOption]>,
+    HelpText<"Disable sanitizer coverage instrumentation for modules and functions that match the provided special case list, even the whitelisted ones">;
+def fsanitize_coverage_blacklist : Separate<["-"], "fsanitize-coverage-blacklist">,
+    Group<f_Group>, Flags<[CoreOption, DriverOption]>, Alias<fsanitize_coverage_blacklist_EQ>;
 def fsanitize_memory_track_origins_EQ : Joined<["-"], "fsanitize-memory-track-origins=">,
                                         Group<f_clang_Group>,
                                         HelpText<"Enable origins tracking in MemorySanitizer">;
Index: clang/include/clang/Basic/DiagnosticDriverKinds.td
===================================================================
--- clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -147,6 +147,10 @@
   "invalid argument '%0' to -fdebug-prefix-map">;
 def err_drv_malformed_sanitizer_blacklist : Error<
   "malformed sanitizer blacklist: '%0'">;
+def err_drv_malformed_sanitizer_coverage_whitelist : Error<
+  "malformed sanitizer coverage whitelist: '%0'">;
+def err_drv_malformed_sanitizer_coverage_blacklist : Error<
+  "malformed sanitizer coverage blacklist: '%0'">;
 def err_drv_duplicate_config : Error<
   "no more than one option '--config' is allowed">;
 def err_drv_config_file_not_exist : Error<
Index: clang/include/clang/Basic/CodeGenOptions.h
===================================================================
--- clang/include/clang/Basic/CodeGenOptions.h
+++ clang/include/clang/Basic/CodeGenOptions.h
@@ -306,6 +306,16 @@
   /// List of dynamic shared object files to be loaded as pass plugins.
   std::vector<std::string> PassPlugins;
 
+  /// Path to whitelist file specifying which objects
+  /// (files, functions) should exclusively be instrumented
+  /// by sanitizer coverage pass.
+  std::vector<std::string> SanitizeCoverageWhitelistFiles;
+
+  /// Path to blacklist file specifying which objects
+  /// (files, functions) listed for instrumentation by sanitizer
+  /// coverage pass should actually not be instrumented.
+  std::vector<std::string> SanitizeCoverageBlacklistFiles;
+
 public:
   // Define accessors/mutators for code generation options of enumeration type.
 #define CODEGENOPT(Name, Bits, Default)
Index: clang/docs/SanitizerCoverage.rst
===================================================================
--- clang/docs/SanitizerCoverage.rst
+++ clang/docs/SanitizerCoverage.rst
@@ -289,6 +289,58 @@
   // for every non-constant array index.
   void __sanitizer_cov_trace_gep(uintptr_t Idx);
 
+Partially disabling instrumentation
+===================================
+
+It is sometimes useful to tell SanitizerCoverage to instrument only a subset of the
+functions in your target.
+With ``-fsanitize-coverage-whitelist=whitelist.txt``
+and ``-fsanitize-coverage-blacklist=blacklist.txt``,
+you can specify such a subset through the combination of a whitelist and a blacklist.
+
+SanitizerCoverage will only instrument functions that satisfy two conditions.
+First, the function should belong to a source file with a path that is both whitelisted
+and not blacklisted.
+Second, the function should have a mangled name that is both whitelisted and not blacklisted.
+
+The whitelist and blacklist format is similar to that of the sanitizer blacklist format.
+The default whitelist will match every source file and every function.
+The default blacklist will match no source file and no function.
+
+In most cases, the whitelist will list the folders or source files for which you want
+instrumentation and allow all function names, while the blacklist will opt out some specific
+files or functions that the whitelist loosely allowed.
+
+Here is an example whitelist:
+
+.. code-block:: none
+
+  # Enable instrumentation for a whole folder
+  src:bar/*
+  # Enable instrumentation for a specific source file
+  src:foo/a.cpp
+  # Enable instrumentation for all functions in those files
+  fun:*
+
+And an example blacklist:
+
+.. code-block:: none
+
+  # Disable instrumentation for a specific source file that the whitelist allowed
+  src:bar/b.cpp
+  # Disable instrumentation for a specific function that the whitelist allowed
+  fun:*myFunc*
+
+The use of ``*`` wildcards above is required because function names are matched after mangling.
+Without the wildcards, one would have to write the whole mangled name.
+
+Be careful that the paths of source files are matched exactly as they are provided on the clang
+command line.
+For example, the whitelist above would include file ``bar/b.cpp`` if the path was provided
+exactly like this, but would it would fail to include it with other ways to refer to the same
+file such as ``./bar/b.cpp``, or ``bar\b.cpp`` on Windows.
+So, please make sure to always double check that your lists are correctly applied.
+
 Default implementation
 ======================
 
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to