hfinkel created this revision.
hfinkel added reviewers: rsmith, chandlerc, rcox2, jmolloy, anemet, 
silviu.baranga, mzolotukhin, spatel, rengolin, delena, Carrot, congh, echristo.
hfinkel added a subscriber: cfe-commits.
Herald added subscribers: joker.eph, mcrosier.

This patch implements support for annotated-source optimization reports (a.k.a. 
"listing" files). Aside from optimizer improvements, this is a top feature 
request from my performance-engineering team. Most HPC-relevant compilers have 
some kind of capability along these lines. The DiagnosticInfo infrastructure at 
the IR level was designed specifically to support the development of this kind 
of feature, by allowing diagnostic messages to be subclass carrying arbitrary 
additional payload, although in terms of optimizer feedback, we currently only 
use this with -Rpass and friends. -Rpass and related options are very useful, 
but they can generate a lot of output, and that output lacks significant 
context, making it hard to see if the compiler is really doing what the user 
expects.

For this optimizer report, I focused on making the output as succinct as 
possible while providing information on inlining and loop transformations. The 
goal here is that the source code should still be easily readable in the 
report. My primary inspiration here is the reports generated by Cray's tools 
(http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html).
 These reports are highly regarded within the HPC community. Intel's compiler, 
for example, also has an optimization-report capability 
(https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).

  $ cat /tmp/v.c
  void bar();
  void foo() { bar(); }
  
  void Test(int *res, int *c, int *d, int *p, int n) {
    int i;
  
  #pragma clang loop vectorize(assume_safety)
    for (i = 0; i < 1600; i++) {
      res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    }
  
    for (i = 0; i < 16; i++) {
      res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    }
  
    foo();
  
    foo(); bar(); foo();
  }

The patch -flisting and -flisting=filename. For the first form, where the file 
name is not explicitly specified, the file name is computed automatically just 
as we do for split-debug output files.

$ clang -O3 -o /tmp/v.o -c /tmp/v.c -flisting
$ cat /tmp/v.lst
  < /tmp/v.c
   1     | void bar();
   2     | void foo() { bar(); }
   3     | 
   4     | void Test(int *res, int *c, int *d, int *p, int n) {
   5     |   int i;
   6     | 
   7     | #pragma clang loop vectorize(assume_safety)
   8   V |   for (i = 0; i < 1600; i++) {
   9     |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  10     |   }
  11     | 
  12     |   for (i = 0; i < 16; i++) {
  13  U  |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  14     |   }
  15     | 
  16 I   |   foo();
  17     | 
  18     |   foo(); bar(); foo();
     I   |   ^
     I   |                 ^
  19     | }
  20     | 

Each source line gets a prefix giving the line number, and a few columns for 
important optimizations: inlining, loop unrolling and loop vectorization. An 
'I' is printed next to a line where a function was inlined, a 'U' next to an 
unrolled loop, and 'V' next to a vectorized loop. These are printing on the 
relevant code line when that seems unambiguous, or on subsequent lines when 
multiple potential options exist (messages, both positive and negative, from 
the same optimization with different column numbers are taken to indicate 
potential ambiguity). When on subsequent lines, a '^' is output in the relevant 
column. The fact that the 'U' is on the wrong line is also a problem with 
-Rpass=loop-unroll and may be something we can fix in the backend.

Annotated source for all relevant input files are put into the listing file 
(each starting with '<' and then the file name).

To see what this looks like for C++ code, here's a small excerpt from 
CodeGenAction.cpp:

  340     |   // If the SMDiagnostic has an inline asm source location, 
translate it.
  341 I   |   FullSourceLoc Loc;
  342     |   if (D.getLoc() != SMLoc())
      I   |       ^
      I   |                  ^
      I   |                     ^
  343     |     Loc = ConvertBackendLocation(D, Context->getSourceManager());
      I   |           ^
      I   |                                     ^
  344     | 
  345     |   unsigned DiagID;
  346 I   |   switch (D.getKind()) {

There's obvious bikeshedding to do here, and I'm quite open to suggestions. My 
engineering team often calls these things "listing files", and other tools 
often name this files with lst as an extension, thus the naming in the patch. 
Intel's option is -opt-report-file=filename.

After some backend enhancements (to turn the relevant remark types into proper 
subclasses), I'd like to extend this to also print the vectorization factor, 
interleaving factor and unrolling factor when relevant. After these 
enhancements, I'd l imagine the loop annotations might look like V4,2U4 for a 
loop vectorized with VF == 4 and interleaving by 2, and then partially unrolled 
by a factor of 4.

Please review.


http://reviews.llvm.org/D19678

Files:
  include/clang/Driver/CC1Options.td
  include/clang/Driver/Options.td
  include/clang/Frontend/CodeGenOptions.h
  lib/CodeGen/CodeGenAction.cpp
  lib/Driver/Tools.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/CodeGen/listing.c
  test/Driver/listing.c

Index: test/Driver/listing.c
===================================================================
--- /dev/null
+++ test/Driver/listing.c
@@ -0,0 +1,9 @@
+// RUN: %clang -### -S -o FOO -flisting %s 2>&1 | FileCheck %s
+// RUN: %clang -### -S -o FOO -flisting=BAR.txt %s 2>&1 | FileCheck %s -check-prefix=CHECK-EQ
+
+// CHECK: "-cc1"
+// CHECK: "-listing-file" "listing.lst"
+
+// CHECK-EQ: "-cc1"
+// CHECK-EQ: "-listing-file" "BAR.txt"
+
Index: test/CodeGen/listing.c
===================================================================
--- /dev/null
+++ test/CodeGen/listing.c
@@ -0,0 +1,31 @@
+// RUN: %clang_cc1 -O3 -triple x86_64-unknown-linux-gnu -target-cpu x86-64 %s -o %t -dwarf-column-info -listing-file %t.lst -emit-obj
+// RUN: cat %t.lst | FileCheck %s
+// REQUIRES: x86-registered-target
+
+void bar();
+void foo() { bar(); }
+
+void Test(int *res, int *c, int *d, int *p, int n) {
+  int i;
+
+#pragma clang loop vectorize(assume_safety)
+  for (i = 0; i < 1600; i++) {
+    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
+  }
+
+// CHECK: {{[0-9]+}}     | #pragma clang loop vectorize(assume_safety)
+// CHECK: {{[0-9]+}}   V |   for (i = 0; i < 1600; i++) {
+
+  for (i = 0; i < 16; i++) {
+    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
+  }
+
+  foo();
+// CHECK: {{[0-9]+}} I   |   foo();
+
+  foo(); bar(); foo();
+// CHECK: {{[0-9]+}}     |   foo(); bar(); foo();
+// CHECK-NEXT:       I   |   ^
+// CHECK-NEXT:       I   |                 ^
+}
+
Index: lib/Frontend/CompilerInvocation.cpp
===================================================================
--- lib/Frontend/CompilerInvocation.cpp
+++ lib/Frontend/CompilerInvocation.cpp
@@ -488,6 +488,8 @@
   Opts.DebugTypeExtRefs = Args.hasArg(OPT_dwarf_ext_refs);
   Opts.DebugExplicitImport = Triple.isPS4CPU(); 
 
+  Opts.ListingFile = Args.getLastArgValue(OPT_listing_file);
+
   for (const auto &Arg : Args.getAllArgValues(OPT_fdebug_prefix_map_EQ))
     Opts.DebugPrefixMap.insert(StringRef(Arg).split('='));
 
@@ -764,6 +766,10 @@
   if (!Opts.SampleProfileFile.empty())
     NeedLocTracking = true;
 
+  // To generate an optimization report, source location information is needed.
+  if (!Opts.ListingFile.empty())
+    NeedLocTracking = true;
+
   // If the user requested a flag that requires source locations available in
   // the backend, make sure that the backend tracks source location information.
   if (NeedLocTracking && Opts.getDebugInfo() == codegenoptions::NoDebugInfo)
Index: lib/Driver/Tools.cpp
===================================================================
--- lib/Driver/Tools.cpp
+++ lib/Driver/Tools.cpp
@@ -3141,23 +3141,29 @@
   }
 }
 
-static const char *SplitDebugName(const ArgList &Args, const InputInfo &Input) {
+static const char *getAltExtOutputName(const ArgList &Args,
+                                       const InputInfo &Input,
+                                       const char *Ext) {
   Arg *FinalOutput = Args.getLastArg(options::OPT_o);
   if (FinalOutput && Args.hasArg(options::OPT_c)) {
     SmallString<128> T(FinalOutput->getValue());
-    llvm::sys::path::replace_extension(T, "dwo");
+    llvm::sys::path::replace_extension(T, Ext);
     return Args.MakeArgString(T);
   } else {
     // Use the compilation dir.
     SmallString<128> T(
         Args.getLastArgValue(options::OPT_fdebug_compilation_dir));
     SmallString<128> F(llvm::sys::path::stem(Input.getBaseInput()));
-    llvm::sys::path::replace_extension(F, "dwo");
+    llvm::sys::path::replace_extension(F, Ext);
     T += F;
     return Args.MakeArgString(F);
   }
 }
 
+static const char *SplitDebugName(const ArgList &Args, const InputInfo &Input) {
+  return getAltExtOutputName(Args, Input, "dwo");
+}
+
 static void SplitDebugInfo(const ToolChain &TC, Compilation &C, const Tool &T,
                            const JobAction &JA, const ArgList &Args,
                            const InputInfo &Output, const char *OutFile) {
@@ -3182,6 +3188,10 @@
   C.addCommand(llvm::make_unique<Command>(JA, T, Exec, StripArgs, II));
 }
 
+static const char *getListingName(const ArgList &Args, const InputInfo &Input) {
+  return getAltExtOutputName(Args, Input, "lst");
+}
+
 /// \brief Vectorize at all optimization levels greater than 1 except for -Oz.
 /// For -Oz the loop vectorizer is disable, while the slp vectorizer is enabled.
 static bool shouldEnableVectorizerAtOLevel(const ArgList &Args, bool isSlpVec) {
@@ -5650,6 +5660,17 @@
     CmdArgs.push_back("-fno-math-builtin");
   }
 
+  if (Args.hasFlag(options::OPT_flisting, options::OPT_flisting_EQ,
+                   options::OPT_fno_listing, false)) {
+    CmdArgs.push_back("-listing-file");
+
+    const Arg *A = Args.getLastArg(options::OPT_flisting_EQ);
+    if (A)
+      CmdArgs.push_back(A->getValue());
+    else
+      CmdArgs.push_back(getListingName(Args, Input));
+  }
+
 // Default to -fno-builtin-str{cat,cpy} on Darwin for ARM.
 //
 // FIXME: Now that PR4941 has been fixed this can be enabled.
Index: lib/CodeGen/CodeGenAction.cpp
===================================================================
--- lib/CodeGen/CodeGenAction.cpp
+++ lib/CodeGen/CodeGenAction.cpp
@@ -31,6 +31,7 @@
 #include "llvm/IRReader/IRReader.h"
 #include "llvm/Linker/Linker.h"
 #include "llvm/Pass.h"
+#include "llvm/Support/Format.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/Timer.h"
@@ -60,6 +61,34 @@
     // refers to.
     llvm::Module *CurLinkModule = nullptr;
 
+    struct ListingLineItemInfo {
+      bool Analyzed = false;
+      bool Transformed = false;
+
+      ListingLineItemInfo &operator |= (const ListingLineItemInfo &RHS) {
+        Analyzed |= RHS.Analyzed;
+        Transformed |= RHS.Transformed;
+
+        return *this;
+      }
+    };
+
+    struct ListingLineInfo {
+      ListingLineItemInfo Inlined;
+      ListingLineItemInfo Unrolled;
+      ListingLineItemInfo Vectorized;
+
+      ListingLineInfo &operator |= (const ListingLineInfo &RHS) {
+        Inlined |= RHS.Inlined;
+        Unrolled |= RHS.Unrolled;
+        Vectorized |= RHS.Vectorized;
+
+        return *this;
+      }
+    };
+
+    std::map<SourceLocation, ListingLineInfo> ListingInfo;
+
   public:
     BackendConsumer(
         BackendAction Action, DiagnosticsEngine &Diags,
@@ -180,6 +209,8 @@
       Ctx.setInlineAsmDiagnosticHandler(OldHandler, OldContext);
 
       Ctx.setDiagnosticHandler(OldDiagnosticHandler, OldDiagnosticContext);
+
+      GenerateListingFile();
     }
 
     void HandleTagDeclDefinition(TagDecl *D) override {
@@ -254,6 +285,10 @@
         const llvm::DiagnosticInfoOptimizationRemarkAnalysisAliasing &D);
     void OptimizationFailureHandler(
         const llvm::DiagnosticInfoOptimizationFailure &D);
+
+    void OptimizationRemarkListingHandler(
+      const llvm::DiagnosticInfoOptimizationBase &D, bool Transformed = false);
+    void GenerateListingFile();
   };
   
   void BackendConsumer::anchor() {}
@@ -513,6 +548,9 @@
   if (CodeGenOpts.OptimizationRemarkPattern &&
       CodeGenOpts.OptimizationRemarkPattern->match(D.getPassName()))
     EmitOptimizationMessage(D, diag::remark_fe_backend_optimization_remark);
+
+  // Record optimization decisions for the listing file.
+  OptimizationRemarkListingHandler(D, true);
 }
 
 void BackendConsumer::OptimizationRemarkHandler(
@@ -524,6 +562,9 @@
       CodeGenOpts.OptimizationRemarkMissedPattern->match(D.getPassName()))
     EmitOptimizationMessage(D,
                             diag::remark_fe_backend_optimization_remark_missed);
+
+  // Record optimization decisions for the listing file.
+  OptimizationRemarkListingHandler(D);
 }
 
 void BackendConsumer::OptimizationRemarkHandler(
@@ -537,6 +578,9 @@
        CodeGenOpts.OptimizationRemarkAnalysisPattern->match(D.getPassName())))
     EmitOptimizationMessage(
         D, diag::remark_fe_backend_optimization_remark_analysis);
+
+  // Record optimization decisions for the listing file.
+  OptimizationRemarkListingHandler(D);
 }
 
 void BackendConsumer::OptimizationRemarkHandler(
@@ -550,6 +594,9 @@
        CodeGenOpts.OptimizationRemarkAnalysisPattern->match(D.getPassName())))
     EmitOptimizationMessage(
         D, diag::remark_fe_backend_optimization_remark_analysis_fpcommute);
+
+  // Record optimization decisions for the listing file.
+  OptimizationRemarkListingHandler(D);
 }
 
 void BackendConsumer::OptimizationRemarkHandler(
@@ -563,13 +610,156 @@
        CodeGenOpts.OptimizationRemarkAnalysisPattern->match(D.getPassName())))
     EmitOptimizationMessage(
         D, diag::remark_fe_backend_optimization_remark_analysis_aliasing);
+
+  // Record optimization decisions for the listing file.
+  OptimizationRemarkListingHandler(D);
 }
 
 void BackendConsumer::OptimizationFailureHandler(
     const llvm::DiagnosticInfoOptimizationFailure &D) {
   EmitOptimizationMessage(D, diag::warn_fe_backend_optimization_failure);
 }
 
+void BackendConsumer::OptimizationRemarkListingHandler(
+    const llvm::DiagnosticInfoOptimizationBase &D, bool Transformed) {
+  if (CodeGenOpts.ListingFile.empty() || !D.isLocationAvailable())
+    return;
+
+  SourceManager &SourceMgr = Context->getSourceManager();
+  FileManager &FileMgr = SourceMgr.getFileManager();
+
+  StringRef Filename;
+  unsigned Line, Column;
+  D.getLocation(&Filename, &Line, &Column);
+  const FileEntry *FE = FileMgr.getFile(Filename);
+  if (!FE || !Line)
+    return;
+
+  // If -gcolumn-info was not used, Column will be 0. This upsets the
+  // source manager, so pass 1 if Column is not set.
+  SourceLocation DILoc =
+    SourceMgr.translateFileLineCol(FE, Line, Column ? Column : 1);
+  if (DILoc.isInvalid())
+    return;
+
+  // We track information on both actual and potential transformations. This
+  // way, if there are multiple possible things on a line that are, or could
+  // have been transformed, we can indicate that explicitly in the output.
+  auto UpdateLLII = [Transformed](ListingLineItemInfo &LLII) {
+    LLII.Analyzed = true;
+    if (Transformed)
+      LLII.Transformed = true;
+  };
+
+  // FIXME: The backend should use proper diagnostic subclasses here,
+  // and we should match those instead of looking at the pass name.
+  StringRef PassName = D.getPassName();
+  if (PassName == "inline")
+    UpdateLLII(ListingInfo[DILoc].Inlined);
+  else if (PassName == "loop-unroll")
+    UpdateLLII(ListingInfo[DILoc].Unrolled);
+  else if (PassName == "loop-vectorize")
+    UpdateLLII(ListingInfo[DILoc].Vectorized);
+}
+
+void BackendConsumer::GenerateListingFile() {
+  if (CodeGenOpts.ListingFile.empty())
+    return;
+
+  std::error_code EC;
+  llvm::raw_fd_ostream OS(CodeGenOpts.ListingFile, EC,
+              llvm::sys::fs::F_Text);
+  if (EC) {
+    Diags.Report(diag::err_fe_error_opening) << CodeGenOpts.ListingFile <<
+                                                EC.message();
+    return;
+  }
+
+  SourceManager &SourceMgr = Context->getSourceManager();
+  std::set<FileID> FileIDs;
+  for (auto &I : ListingInfo)
+    FileIDs.insert(SourceMgr.getFileID(I.first));
+
+  for (auto &FID : FileIDs) {
+    SourceLocation FirstLoc = SourceMgr.getLocForStartOfFile(FID);
+    OS << "< " << SourceMgr.getFilename(FirstLoc) << "\n";
+
+    auto I = ListingInfo.lower_bound(FirstLoc);
+    StringRef MB = SourceMgr.getBufferData(FID);
+    const SrcMgr::ContentCache *
+      Content = SourceMgr.getSLocEntry(FID).getFile().getContentCache();
+    unsigned LNDigits = llvm::utostr(Content->NumLines).size();
+    for (unsigned L = 0; L < Content->NumLines - 1; ++L) {
+      unsigned LStartOff = Content->SourceLineCache[L];
+      unsigned LEndOff = (L == Content->NumLines) ?
+                         Content->getSize() :
+                         Content->SourceLineCache[L + 1];
+
+      std::map<unsigned, ListingLineInfo> ColsInfo;
+      unsigned InlinedCols = 0, UnrolledCols = 0, VectorizedCols = 0;
+
+      ListingLineInfo LLI;
+      if (I != ListingInfo.end()) {
+        auto DI = SourceMgr.getDecomposedLoc(I->first);
+        while (I != ListingInfo.end() && DI.first == FID &&
+               DI.second < LStartOff) {
+          ++I;
+          if (I != ListingInfo.end())
+            DI = SourceMgr.getDecomposedLoc(I->first);
+        }
+
+        while (I != ListingInfo.end() && DI.first == FID &&
+            DI.second >= LStartOff && DI.second < LEndOff) {
+          unsigned Col = SourceMgr.getColumnNumber(FID, DI.second);
+          ColsInfo[Col] = I->second;
+          InlinedCols += I->second.Inlined.Analyzed;
+          UnrolledCols += I->second.Unrolled.Analyzed;
+          VectorizedCols += I->second.Vectorized.Analyzed;
+          LLI |= I->second;
+
+          ++I;
+          if (I != ListingInfo.end())
+            DI = SourceMgr.getDecomposedLoc(I->first);
+        }
+      }
+
+      // We try to keep the output as concise as possible. If only one thing on
+      // a given line could have been inlined, vectorized, etc. then we can put
+      // the marker on the source line itself. If there are multiple options
+      // then we want to distinguish them by placing the marker for each
+      // transformation on a separate line following the source line. When we
+      // do this, we use a '^' character to point to the appropriate column in
+      // the source line.
+
+      OS << llvm::format_decimal(L + 1, LNDigits) << " ";
+      OS << (LLI.Inlined.Transformed && InlinedCols < 2 ? "I" : " ");
+      OS << (LLI.Unrolled.Transformed && UnrolledCols < 2 ? "U" : " ");
+      OS << (LLI.Vectorized.Transformed && VectorizedCols < 2 ? "V" : " ");
+
+      OS << " | " << MB.slice(LStartOff, LEndOff);
+
+      for (auto &J : ColsInfo) {
+        if ((J.second.Inlined.Transformed && InlinedCols > 1) ||
+            (J.second.Unrolled.Transformed && UnrolledCols > 1) ||
+            (J.second.Vectorized.Transformed && VectorizedCols > 1)) {
+          OS << std::string(LNDigits + 1, ' ');
+          OS << (J.second.Inlined.Transformed &&
+                 InlinedCols > 1 ? "I" : " ");
+          OS << (J.second.Unrolled.Transformed &&
+                 UnrolledCols > 1 ? "U" : " ");
+          OS << (J.second.Vectorized.Transformed &&
+                 VectorizedCols > 1 ? "V" : " ");
+
+          OS << " | " << std::string(J.first - 1, ' ') << "^\n";
+        }
+      }
+
+      if (LEndOff == Content->getSize())
+        OS << "\n";
+    }
+  }
+}
+
 /// \brief This function is invoked when the backend needs
 /// to report something to the user.
 void BackendConsumer::DiagnosticHandlerImpl(const DiagnosticInfo &DI) {
Index: include/clang/Frontend/CodeGenOptions.h
===================================================================
--- include/clang/Frontend/CodeGenOptions.h
+++ include/clang/Frontend/CodeGenOptions.h
@@ -126,6 +126,9 @@
   /// in the backend for setting the name in the skeleton cu.
   std::string SplitDwarfFile;
 
+  /// The name of the optimization report (listing) file.
+  std::string ListingFile;
+
   /// The name of the relocation model to use.
   std::string RelocationModel;
 
Index: include/clang/Driver/Options.td
===================================================================
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -739,6 +739,11 @@
 def flat__namespace : Flag<["-"], "flat_namespace">;
 def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group<f_Group>;
 def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group<f_Group>;
+def flisting : Flag<["-"], "flisting">, Group<f_Group>,
+  HelpText<"Generate an optimization report file">;
+def fno_listing : Flag<["-"], "fno-listing">, Group<f_Group>, Flags<[NoArgumentUnused]>;
+def flisting_EQ : Joined<["-"], "flisting=">, Group<f_Group>,
+  HelpText<"Generate an optimization report file with the specified name">;
 def flto_EQ : Joined<["-"], "flto=">, Flags<[CC1Option]>, Group<f_Group>,
   HelpText<"Set LTO mode to either 'full' or 'thin'">;
 def flto : Flag<["-"], "flto">, Flags<[CC1Option]>, Group<f_Group>,
Index: include/clang/Driver/CC1Options.td
===================================================================
--- include/clang/Driver/CC1Options.td
+++ include/clang/Driver/CC1Options.td
@@ -483,6 +483,9 @@
 def arcmt_migrate : Flag<["-"], "arcmt-migrate">,
   HelpText<"Apply modifications and produces temporary files that conform to ARC">;
 
+def listing_file : Separate<["-"], "listing-file">,
+  HelpText<"File name to use for optimization listing output">;
+
 def print_stats : Flag<["-"], "print-stats">,
   HelpText<"Print performance metrics and statistics">;
 def fdump_record_layouts : Flag<["-"], "fdump-record-layouts">,
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to