https://github.com/necto updated https://github.com/llvm/llvm-project/pull/131175
>From 6b6d80d42d40d5917622cbc2bc0f2a454c34eca3 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Thu, 13 Mar 2025 18:42:39 +0100 Subject: [PATCH 01/10] [analyzer] Introduce per-entry-point statistics MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit So far CSA was relying on the LLVM Statistic package that allowed us to gather some data about analysis of an entire translation unit. However, the translation unit consists of a collection of loosely related entry points. Aggregating data across multiple such entry points is often counter productive. This change introduces a new lightweight always-on facility to collect Boolean or numerical statistics for each entry point and dump them in a CSV format. Such format makes it easy to aggregate data across multiple translation units and analyze it with common data-processing tools. We break down the existing statistics that were collected on the per-TU basis into values per entry point. Additionally, we enable the statistics unconditionally (STATISTIC -> ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the data with a simple run-time flag rather than having to recompile the analyzer). These statistics are very light and add virtually no overhead. @steakhal (Balázs Benics) started this design and I picked over the baton. --- CPP-6160 --- clang/docs/analyzer/developer-docs.rst | 1 + .../analyzer/developer-docs/Statistics.rst | 21 ++ .../StaticAnalyzer/Core/AnalyzerOptions.def | 6 + .../Core/PathSensitive/EntryPointStats.h | 162 ++++++++++++++ .../Checkers/AnalyzerStatsChecker.cpp | 9 +- clang/lib/StaticAnalyzer/Core/BugReporter.cpp | 28 +-- clang/lib/StaticAnalyzer/Core/CMakeLists.txt | 1 + clang/lib/StaticAnalyzer/Core/CoreEngine.cpp | 16 +- .../StaticAnalyzer/Core/EntryPointStats.cpp | 201 ++++++++++++++++++ clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 24 ++- .../Core/ExprEngineCallAndReturn.cpp | 14 +- clang/lib/StaticAnalyzer/Core/WorkList.cpp | 10 +- .../Core/Z3CrosscheckVisitor.cpp | 31 +-- .../Frontend/AnalysisConsumer.cpp | 62 ++++-- clang/test/Analysis/analyzer-config.c | 1 + clang/test/Analysis/csv2json.py | 98 +++++++++ clang/test/lit.cfg.py | 10 + 17 files changed, 617 insertions(+), 78 deletions(-) create mode 100644 clang/docs/analyzer/developer-docs/Statistics.rst create mode 100644 clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h create mode 100644 clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp create mode 100644 clang/test/Analysis/csv2json.py diff --git a/clang/docs/analyzer/developer-docs.rst b/clang/docs/analyzer/developer-docs.rst index 60c0e71ad847c..a925cf7ca02e1 100644 --- a/clang/docs/analyzer/developer-docs.rst +++ b/clang/docs/analyzer/developer-docs.rst @@ -12,3 +12,4 @@ Contents: developer-docs/nullability developer-docs/RegionStore developer-docs/PerformanceInvestigation + developer-docs/Statistics diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst new file mode 100644 index 0000000000000..d352bb6f01ebc --- /dev/null +++ b/clang/docs/analyzer/developer-docs/Statistics.rst @@ -0,0 +1,21 @@ +====================== +Metrics and Statistics +====================== + +TODO: write this once the design is settled (@reviewer, don't look here yet) + +CSA enjoys two facilities to collect statistics per translation unit and per entry point. + +Mention the following tools: +- STATISTIC macro +- ALLWAYS_ENABLED_STATISTIC macro + +- STAT_COUNTER macro +- STAT_MAX macro + +- BoolEPStat +- UnsignedEPStat +- CounterEPStat +- UnsignedMaxEPStat + +- dump-se-metrics-to-csv="%t.csv" diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index 2aa00db411844..b88bce5e262a7 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -353,6 +353,12 @@ ANALYZER_OPTION(bool, DisplayCTUProgress, "display-ctu-progress", "the analyzer's progress related to ctu.", false) +ANALYZER_OPTION( + StringRef, DumpSEStatsToCSV, "dump-se-stats-to-csv", + "If provided, the analyzer will dump statistics per entry point " + "into the specified CSV file.", + "") + ANALYZER_OPTION(bool, ShouldTrackConditions, "track-conditions", "Whether to track conditions that are a control dependency of " "an already tracked variable.", diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h new file mode 100644 index 0000000000000..16c9fdf97fc30 --- /dev/null +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h @@ -0,0 +1,162 @@ +// EntryPointStats.h - Tracking statistics per entry point -*- C++ -*-// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===---------------------------------------------------------------===// + +#ifndef CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H +#define CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H + +#include "llvm/ADT/Statistic.h" +#include "llvm/ADT/StringRef.h" + +namespace llvm { +class raw_ostream; +} // namespace llvm + +namespace clang { +class Decl; + +namespace ento { + +class EntryPointStat { +public: + llvm::StringLiteral name() const { return Name; } + + static void lockRegistry(); + + static void takeSnapshot(const Decl *EntryPoint); + static void dumpStatsAsCSV(llvm::raw_ostream &OS); + static void dumpStatsAsCSV(llvm::StringRef FileName); + +protected: + explicit EntryPointStat(llvm::StringLiteral Name) : Name{Name} {} + EntryPointStat(const EntryPointStat &) = delete; + EntryPointStat(EntryPointStat &&) = delete; + EntryPointStat &operator=(EntryPointStat &) = delete; + EntryPointStat &operator=(EntryPointStat &&) = delete; + +private: + llvm::StringLiteral Name; +}; + +class BoolEPStat : public EntryPointStat { + std::optional<bool> Value = {}; + +public: + explicit BoolEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value && *Value; } + void set(bool V) { + assert(!Value.has_value()); + Value = V; + } + void reset() { Value = {}; } +}; + +// used by CounterEntryPointTranslationUnitStat +class CounterEPStat : public EntryPointStat { + using EntryPointStat::EntryPointStat; + unsigned Value = {}; + +public: + explicit CounterEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value; } + void reset() { Value = {}; } + CounterEPStat &operator++() { + ++Value; + return *this; + } + + CounterEPStat &operator++(int) { + // No difference as you can't extract the value + return ++(*this); + } + + CounterEPStat &operator+=(unsigned Inc) { + Value += Inc; + return *this; + } +}; + +// used by UnsignedMaxEtryPointTranslationUnitStatistic +class UnsignedMaxEPStat : public EntryPointStat { + using EntryPointStat::EntryPointStat; + unsigned Value = {}; + +public: + explicit UnsignedMaxEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value; } + void reset() { Value = {}; } + void updateMax(unsigned X) { Value = std::max(Value, X); } +}; + +class UnsignedEPStat : public EntryPointStat { + using EntryPointStat::EntryPointStat; + std::optional<unsigned> Value = {}; + +public: + explicit UnsignedEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value.value_or(0); } + void reset() { Value.reset(); } + void set(unsigned V) { + assert(!Value.has_value()); + Value = V; + } +}; + +class CounterEntryPointTranslationUnitStat { + CounterEPStat M; + llvm::TrackingStatistic S; + +public: + CounterEntryPointTranslationUnitStat(const char *DebugType, + llvm::StringLiteral Name, + llvm::StringLiteral Desc) + : M(Name), S(DebugType, Name.data(), Desc.data()) {} + CounterEntryPointTranslationUnitStat &operator++() { + ++M; + ++S; + return *this; + } + + CounterEntryPointTranslationUnitStat &operator++(int) { + // No difference with prefix as the value is not observable. + return ++(*this); + } + + CounterEntryPointTranslationUnitStat &operator+=(unsigned Inc) { + M += Inc; + S += Inc; + return *this; + } +}; + +class UnsignedMaxEtryPointTranslationUnitStatistic { + UnsignedMaxEPStat M; + llvm::TrackingStatistic S; + +public: + UnsignedMaxEtryPointTranslationUnitStatistic(const char *DebugType, + llvm::StringLiteral Name, + llvm::StringLiteral Desc) + : M(Name), S(DebugType, Name.data(), Desc.data()) {} + void updateMax(uint64_t Value) { + M.updateMax(static_cast<unsigned>(Value)); + S.updateMax(Value); + } +}; + +#define STAT_COUNTER(VARNAME, DESC) \ + static clang::ento::CounterEntryPointTranslationUnitStat VARNAME = { \ + DEBUG_TYPE, #VARNAME, DESC} + +#define STAT_MAX(VARNAME, DESC) \ + static clang::ento::UnsignedMaxEtryPointTranslationUnitStatistic VARNAME = { \ + DEBUG_TYPE, #VARNAME, DESC} + +} // namespace ento +} // namespace clang + +#endif // CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H diff --git a/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp index a54f1b1e71d47..d030e69a2a6e0 100644 --- a/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp @@ -13,12 +13,12 @@ #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h" #include "clang/StaticAnalyzer/Core/Checker.h" #include "clang/StaticAnalyzer/Core/CheckerManager.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallString.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/raw_ostream.h" #include <optional> @@ -27,10 +27,9 @@ using namespace ento; #define DEBUG_TYPE "StatsChecker" -STATISTIC(NumBlocks, - "The # of blocks in top level functions"); -STATISTIC(NumBlocksUnreachable, - "The # of unreachable blocks in analyzing top level functions"); +STAT_COUNTER(NumBlocks, "The # of blocks in top level functions"); +STAT_COUNTER(NumBlocksUnreachable, + "The # of unreachable blocks in analyzing top level functions"); namespace { class AnalyzerStatsChecker : public Checker<check::EndAnalysis> { diff --git a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp index a4f9e092e8205..5f78fc433275d 100644 --- a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp +++ b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp @@ -39,6 +39,7 @@ #include "clang/StaticAnalyzer/Core/Checker.h" #include "clang/StaticAnalyzer/Core/CheckerManager.h" #include "clang/StaticAnalyzer/Core/CheckerRegistryData.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h" @@ -54,7 +55,6 @@ #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallString.h" #include "llvm/ADT/SmallVector.h" -#include "llvm/ADT/Statistic.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" #include "llvm/ADT/iterator_range.h" @@ -82,19 +82,19 @@ using namespace llvm; #define DEBUG_TYPE "BugReporter" -STATISTIC(MaxBugClassSize, - "The maximum number of bug reports in the same equivalence class"); -STATISTIC(MaxValidBugClassSize, - "The maximum number of bug reports in the same equivalence class " - "where at least one report is valid (not suppressed)"); - -STATISTIC(NumTimesReportPassesZ3, "Number of reports passed Z3"); -STATISTIC(NumTimesReportRefuted, "Number of reports refuted by Z3"); -STATISTIC(NumTimesReportEQClassAborted, - "Number of times a report equivalence class was aborted by the Z3 " - "oracle heuristic"); -STATISTIC(NumTimesReportEQClassWasExhausted, - "Number of times all reports of an equivalence class was refuted"); +STAT_MAX(MaxBugClassSize, + "The maximum number of bug reports in the same equivalence class"); +STAT_MAX(MaxValidBugClassSize, + "The maximum number of bug reports in the same equivalence class " + "where at least one report is valid (not suppressed)"); + +STAT_COUNTER(NumTimesReportPassesZ3, "Number of reports passed Z3"); +STAT_COUNTER(NumTimesReportRefuted, "Number of reports refuted by Z3"); +STAT_COUNTER(NumTimesReportEQClassAborted, + "Number of times a report equivalence class was aborted by the Z3 " + "oracle heuristic"); +STAT_COUNTER(NumTimesReportEQClassWasExhausted, + "Number of times all reports of an equivalence class was refuted"); BugReporterVisitor::~BugReporterVisitor() = default; diff --git a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt index fb9394a519eb7..d0a9b202f9c52 100644 --- a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt +++ b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt @@ -24,6 +24,7 @@ add_clang_library(clangStaticAnalyzerCore CoreEngine.cpp DynamicExtent.cpp DynamicType.cpp + EntryPointStats.cpp Environment.cpp ExplodedGraph.cpp ExprEngine.cpp diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp index d96211c3a6635..5c05c9c87f124 100644 --- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp +++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp @@ -22,12 +22,12 @@ #include "clang/Basic/LLVM.h" #include "clang/StaticAnalyzer/Core/AnalyzerOptions.h" #include "clang/StaticAnalyzer/Core/PathSensitive/BlockCounter.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h" #include "clang/StaticAnalyzer/Core/PathSensitive/WorkList.h" #include "llvm/ADT/STLExtras.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/Casting.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormatVariadic.h" @@ -43,14 +43,12 @@ using namespace ento; #define DEBUG_TYPE "CoreEngine" -STATISTIC(NumSteps, - "The # of steps executed."); -STATISTIC(NumSTUSteps, "The # of STU steps executed."); -STATISTIC(NumCTUSteps, "The # of CTU steps executed."); -STATISTIC(NumReachedMaxSteps, - "The # of times we reached the max number of steps."); -STATISTIC(NumPathsExplored, - "The # of paths explored by the analyzer."); +STAT_COUNTER(NumSteps, "The # of steps executed."); +STAT_COUNTER(NumSTUSteps, "The # of STU steps executed."); +STAT_COUNTER(NumCTUSteps, "The # of CTU steps executed."); +ALWAYS_ENABLED_STATISTIC(NumReachedMaxSteps, + "The # of times we reached the max number of steps."); +STAT_COUNTER(NumPathsExplored, "The # of paths explored by the analyzer."); //===----------------------------------------------------------------------===// // Core analysis engine. diff --git a/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp b/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp new file mode 100644 index 0000000000000..f17d0522f983a --- /dev/null +++ b/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp @@ -0,0 +1,201 @@ +//===- EntryPointStats.cpp ----------------------------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--------------------------------------------------------------------===// + +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" +#include "clang/AST/DeclBase.h" +#include "clang/Analysis/AnalysisDeclContext.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/StringExtras.h" +#include "llvm/ADT/StringRef.h" +#include "llvm/Support/FileSystem.h" +#include "llvm/Support/ManagedStatic.h" +#include "llvm/Support/raw_ostream.h" +#include <iterator> + +using namespace clang; +using namespace ento; + +namespace { +struct Registry { + std::vector<BoolEPStat *> BoolStats; + std::vector<CounterEPStat *> CounterStats; + std::vector<UnsignedMaxEPStat *> UnsignedMaxStats; + std::vector<UnsignedEPStat *> UnsignedStats; + + bool IsLocked = false; + + struct Snapshot { + const Decl *EntryPoint; + std::vector<bool> BoolStatValues; + std::vector<unsigned> UnsignedStatValues; + + void dumpDynamicStatsAsCSV(llvm::raw_ostream &OS) const; + }; + + std::vector<Snapshot> Snapshots; +}; +} // namespace + +static llvm::ManagedStatic<Registry> StatsRegistry; + +namespace { +template <typename Callback> void enumerateStatVectors(const Callback &Fn) { + Fn(StatsRegistry->BoolStats); + Fn(StatsRegistry->CounterStats); + Fn(StatsRegistry->UnsignedMaxStats); + Fn(StatsRegistry->UnsignedStats); +} +} // namespace + +static void checkStatName(const EntryPointStat *M) { +#ifdef NDEBUG + return; +#endif // NDEBUG + constexpr std::array AllowedSpecialChars = { + '+', '-', '_', '=', ':', '(', ')', '@', '!', '~', + '$', '%', '^', '&', '*', '\'', ';', '<', '>', '/'}; + for (unsigned char C : M->name()) { + if (!std::isalnum(C) && !llvm::is_contained(AllowedSpecialChars, C)) { + llvm::errs() << "Stat name \"" << M->name() << "\" contains character '" + << C << "' (" << static_cast<int>(C) + << ") that is not allowed."; + assert(false && "The Stat name contains unallowed character"); + } + } +} + +void EntryPointStat::lockRegistry() { + auto CmpByNames = [](const EntryPointStat *L, const EntryPointStat *R) { + return L->name() < R->name(); + }; + enumerateStatVectors( + [CmpByNames](auto &Stats) { llvm::sort(Stats, CmpByNames); }); + enumerateStatVectors( + [](const auto &Stats) { llvm::for_each(Stats, checkStatName); }); + StatsRegistry->IsLocked = true; +} + +static bool isRegistered(llvm::StringLiteral Name) { + auto ByName = [Name](const EntryPointStat *M) { return M->name() == Name; }; + bool Result = false; + enumerateStatVectors([ByName, &Result](const auto &Stats) { + Result = Result || llvm::any_of(Stats, ByName); + }); + return Result; +} + +BoolEPStat::BoolEPStat(llvm::StringLiteral Name) : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->BoolStats.push_back(this); +} + +CounterEPStat::CounterEPStat(llvm::StringLiteral Name) : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->CounterStats.push_back(this); +} + +UnsignedMaxEPStat::UnsignedMaxEPStat(llvm::StringLiteral Name) + : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->UnsignedMaxStats.push_back(this); +} + +UnsignedEPStat::UnsignedEPStat(llvm::StringLiteral Name) + : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->UnsignedStats.push_back(this); +} + +static std::vector<unsigned> consumeUnsignedStats() { + std::vector<unsigned> Result; + Result.reserve(StatsRegistry->CounterStats.size() + + StatsRegistry->UnsignedMaxStats.size() + + StatsRegistry->UnsignedStats.size()); + for (auto *M : StatsRegistry->CounterStats) { + Result.push_back(M->value()); + M->reset(); + } + for (auto *M : StatsRegistry->UnsignedMaxStats) { + Result.push_back(M->value()); + M->reset(); + } + for (auto *M : StatsRegistry->UnsignedStats) { + Result.push_back(M->value()); + M->reset(); + } + return Result; +} + +static std::vector<llvm::StringLiteral> getStatNames() { + std::vector<llvm::StringLiteral> Ret; + auto GetName = [](const EntryPointStat *M) { return M->name(); }; + enumerateStatVectors([GetName, &Ret](const auto &Stats) { + transform(Stats, std::back_inserter(Ret), GetName); + }); + return Ret; +} + +void Registry::Snapshot::dumpDynamicStatsAsCSV(llvm::raw_ostream &OS) const { + OS << '"'; + llvm::printEscapedString( + clang::AnalysisDeclContext::getFunctionName(EntryPoint), OS); + OS << "\", "; + auto PrintAsBool = [&OS](bool B) { OS << (B ? "true" : "false"); }; + llvm::interleaveComma(BoolStatValues, OS, PrintAsBool); + OS << ((BoolStatValues.empty() || UnsignedStatValues.empty()) ? "" : ", "); + llvm::interleaveComma(UnsignedStatValues, OS); +} + +static std::vector<bool> consumeBoolStats() { + std::vector<bool> Result; + Result.reserve(StatsRegistry->BoolStats.size()); + for (auto *M : StatsRegistry->BoolStats) { + Result.push_back(M->value()); + M->reset(); + } + return Result; +} + +void EntryPointStat::takeSnapshot(const Decl *EntryPoint) { + auto BoolValues = consumeBoolStats(); + auto UnsignedValues = consumeUnsignedStats(); + StatsRegistry->Snapshots.push_back( + {EntryPoint, std::move(BoolValues), std::move(UnsignedValues)}); +} + +void EntryPointStat::dumpStatsAsCSV(llvm::StringRef FileName) { + std::error_code EC; + llvm::raw_fd_ostream File(FileName, EC, llvm::sys::fs::OF_Text); + if (EC) + return; + dumpStatsAsCSV(File); +} + +void EntryPointStat::dumpStatsAsCSV(llvm::raw_ostream &OS) { + OS << "EntryPoint, "; + llvm::interleaveComma(getStatNames(), OS); + OS << "\n"; + + std::vector<std::string> Rows; + Rows.reserve(StatsRegistry->Snapshots.size()); + for (const auto &Snapshot : StatsRegistry->Snapshots) { + std::string Row; + llvm::raw_string_ostream RowOs(Row); + Snapshot.dumpDynamicStatsAsCSV(RowOs); + RowOs << "\n"; + Rows.push_back(RowOs.str()); + } + llvm::sort(Rows); + for (const auto &Row : Rows) { + OS << Row; + } +} diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp index 914eb0f4ef6bd..12a5b248c843f 100644 --- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp +++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp @@ -49,6 +49,7 @@ #include "clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h" #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/LoopUnrolling.h" #include "clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h" @@ -67,7 +68,6 @@ #include "llvm/ADT/ImmutableSet.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallVector.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/DOTGraphTraits.h" @@ -90,16 +90,18 @@ using namespace ento; #define DEBUG_TYPE "ExprEngine" -STATISTIC(NumRemoveDeadBindings, - "The # of times RemoveDeadBindings is called"); -STATISTIC(NumMaxBlockCountReached, - "The # of aborted paths due to reaching the maximum block count in " - "a top level function"); -STATISTIC(NumMaxBlockCountReachedInInlined, - "The # of aborted paths due to reaching the maximum block count in " - "an inlined function"); -STATISTIC(NumTimesRetriedWithoutInlining, - "The # of times we re-evaluated a call without inlining"); +STAT_COUNTER(NumRemoveDeadBindings, + "The # of times RemoveDeadBindings is called"); +STAT_COUNTER( + NumMaxBlockCountReached, + "The # of aborted paths due to reaching the maximum block count in " + "a top level function"); +STAT_COUNTER( + NumMaxBlockCountReachedInInlined, + "The # of aborted paths due to reaching the maximum block count in " + "an inlined function"); +STAT_COUNTER(NumTimesRetriedWithoutInlining, + "The # of times we re-evaluated a call without inlining"); //===----------------------------------------------------------------------===// // Internal program state traits. diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp index 02facf786830d..1a44ba4f49133 100644 --- a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp +++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp @@ -19,9 +19,9 @@ #include "clang/StaticAnalyzer/Core/CheckerManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h" #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "llvm/ADT/SmallSet.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/SaveAndRestore.h" @@ -32,14 +32,14 @@ using namespace ento; #define DEBUG_TYPE "ExprEngine" -STATISTIC(NumOfDynamicDispatchPathSplits, - "The # of times we split the path due to imprecise dynamic dispatch info"); +STAT_COUNTER( + NumOfDynamicDispatchPathSplits, + "The # of times we split the path due to imprecise dynamic dispatch info"); -STATISTIC(NumInlinedCalls, - "The # of times we inlined a call"); +STAT_COUNTER(NumInlinedCalls, "The # of times we inlined a call"); -STATISTIC(NumReachedInlineCountMax, - "The # of times we reached inline count maximum"); +STAT_COUNTER(NumReachedInlineCountMax, + "The # of times we reached inline count maximum"); void ExprEngine::processCallEnter(NodeBuilderContext& BC, CallEnter CE, ExplodedNode *Pred) { diff --git a/clang/lib/StaticAnalyzer/Core/WorkList.cpp b/clang/lib/StaticAnalyzer/Core/WorkList.cpp index 7042a9020837a..9f40926e9a026 100644 --- a/clang/lib/StaticAnalyzer/Core/WorkList.cpp +++ b/clang/lib/StaticAnalyzer/Core/WorkList.cpp @@ -11,11 +11,11 @@ //===----------------------------------------------------------------------===// #include "clang/StaticAnalyzer/Core/PathSensitive/WorkList.h" -#include "llvm/ADT/PriorityQueue.h" -#include "llvm/ADT/DenseSet.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/DenseSet.h" +#include "llvm/ADT/PriorityQueue.h" #include "llvm/ADT/STLExtras.h" -#include "llvm/ADT/Statistic.h" #include <deque> #include <vector> @@ -24,8 +24,8 @@ using namespace ento; #define DEBUG_TYPE "WorkList" -STATISTIC(MaxQueueSize, "Maximum size of the worklist"); -STATISTIC(MaxReachableSize, "Maximum size of auxiliary worklist set"); +STAT_MAX(MaxQueueSize, "Maximum size of the worklist"); +STAT_MAX(MaxReachableSize, "Maximum size of auxiliary worklist set"); //===----------------------------------------------------------------------===// // Worklist classes for exploration of reachable states. diff --git a/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp b/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp index c4dd016f70d86..fca792cdf86f7 100644 --- a/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp +++ b/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp @@ -14,8 +14,8 @@ #include "clang/StaticAnalyzer/Core/BugReporter/Z3CrosscheckVisitor.h" #include "clang/StaticAnalyzer/Core/AnalyzerOptions.h" #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/SMTAPI.h" #include "llvm/Support/Timer.h" @@ -25,20 +25,21 @@ // Multiple `check()` calls might be called on the same query if previous // attempts of the same query resulted in UNSAT for any reason. Each query is // only counted once for these statistics, the retries are not accounted for. -STATISTIC(NumZ3QueriesDone, "Number of Z3 queries done"); -STATISTIC(NumTimesZ3TimedOut, "Number of times Z3 query timed out"); -STATISTIC(NumTimesZ3ExhaustedRLimit, - "Number of times Z3 query exhausted the rlimit"); -STATISTIC(NumTimesZ3SpendsTooMuchTimeOnASingleEQClass, - "Number of times report equivalenece class was cut because it spent " - "too much time in Z3"); - -STATISTIC(NumTimesZ3QueryAcceptsReport, - "Number of Z3 queries accepting a report"); -STATISTIC(NumTimesZ3QueryRejectReport, - "Number of Z3 queries rejecting a report"); -STATISTIC(NumTimesZ3QueryRejectEQClass, - "Number of times rejecting an report equivalenece class"); +STAT_COUNTER(NumZ3QueriesDone, "Number of Z3 queries done"); +STAT_COUNTER(NumTimesZ3TimedOut, "Number of times Z3 query timed out"); +STAT_COUNTER(NumTimesZ3ExhaustedRLimit, + "Number of times Z3 query exhausted the rlimit"); +STAT_COUNTER( + NumTimesZ3SpendsTooMuchTimeOnASingleEQClass, + "Number of times report equivalenece class was cut because it spent " + "too much time in Z3"); + +STAT_COUNTER(NumTimesZ3QueryAcceptsReport, + "Number of Z3 queries accepting a report"); +STAT_COUNTER(NumTimesZ3QueryRejectReport, + "Number of Z3 queries rejecting a report"); +STAT_COUNTER(NumTimesZ3QueryRejectEQClass, + "Number of times rejecting an report equivalenece class"); using namespace clang; using namespace ento; diff --git a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp index 8a4bb35925e2c..1b9d965bc2999 100644 --- a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp +++ b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp @@ -32,10 +32,10 @@ #include "clang/StaticAnalyzer/Core/CheckerManager.h" #include "clang/StaticAnalyzer/Core/PathDiagnosticConsumers.h" #include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/ScopeExit.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/Program.h" @@ -51,17 +51,18 @@ using namespace ento; #define DEBUG_TYPE "AnalysisConsumer" -STATISTIC(NumFunctionTopLevel, "The # of functions at top level."); -STATISTIC(NumFunctionsAnalyzed, - "The # of functions and blocks analyzed (as top level " - "with inlining turned on)."); -STATISTIC(NumBlocksInAnalyzedFunctions, - "The # of basic blocks in the analyzed functions."); -STATISTIC(NumVisitedBlocksInAnalyzedFunctions, - "The # of visited basic blocks in the analyzed functions."); -STATISTIC(PercentReachableBlocks, "The % of reachable basic blocks."); -STATISTIC(MaxCFGSize, "The maximum number of basic blocks in a function."); - +STAT_COUNTER(NumFunctionTopLevel, "The # of functions at top level."); +ALWAYS_ENABLED_STATISTIC(NumFunctionsAnalyzed, + "The # of functions and blocks analyzed (as top level " + "with inlining turned on)."); +ALWAYS_ENABLED_STATISTIC(NumBlocksInAnalyzedFunctions, + "The # of basic blocks in the analyzed functions."); +ALWAYS_ENABLED_STATISTIC( + NumVisitedBlocksInAnalyzedFunctions, + "The # of visited basic blocks in the analyzed functions."); +ALWAYS_ENABLED_STATISTIC(PercentReachableBlocks, + "The % of reachable basic blocks."); +STAT_MAX(MaxCFGSize, "The maximum number of basic blocks in a function."); //===----------------------------------------------------------------------===// // AnalysisConsumer declaration. //===----------------------------------------------------------------------===// @@ -128,7 +129,9 @@ class AnalysisConsumer : public AnalysisASTConsumer, PP(CI.getPreprocessor()), OutDir(outdir), Opts(opts), Plugins(plugins), Injector(std::move(injector)), CTU(CI), MacroExpansions(CI.getLangOpts()) { + EntryPointStat::lockRegistry(); DigestAnalyzerOptions(); + if (Opts.AnalyzerDisplayProgress || Opts.PrintStats || Opts.ShouldSerializeStats) { AnalyzerTimers = std::make_unique<llvm::TimerGroup>( @@ -653,6 +656,10 @@ void AnalysisConsumer::HandleTranslationUnit(ASTContext &C) { PercentReachableBlocks = (FunctionSummaries.getTotalNumVisitedBasicBlocks() * 100) / NumBlocksInAnalyzedFunctions; + + if (!Opts.DumpSEStatsToCSV.empty()) { + EntryPointStat::dumpStatsAsCSV(Opts.DumpSEStatsToCSV); + } } AnalysisConsumer::AnalysisMode @@ -688,6 +695,36 @@ AnalysisConsumer::getModeForDecl(Decl *D, AnalysisMode Mode) { return Mode; } +template <typename DeclT> +static clang::Decl *preferDefinitionImpl(clang::Decl *D) { + if (auto *X = dyn_cast<DeclT>(D)) + if (auto *Def = X->getDefinition()) + return Def; + return D; +} + +template <> clang::Decl *preferDefinitionImpl<ObjCMethodDecl>(clang::Decl *D) { + if (const auto *X = dyn_cast<ObjCMethodDecl>(D)) { + for (auto *I : X->redecls()) + if (I->hasBody()) + return I; + } + return D; +} + +static Decl *getDefinitionOrCanonicalDecl(Decl *D) { + assert(D); + D = D->getCanonicalDecl(); + D = preferDefinitionImpl<VarDecl>(D); + D = preferDefinitionImpl<FunctionDecl>(D); + D = preferDefinitionImpl<TagDecl>(D); + D = preferDefinitionImpl<ObjCMethodDecl>(D); + assert(D); + return D; +} + +static UnsignedEPStat PathRunningTime("PathRunningTime"); + void AnalysisConsumer::HandleCode(Decl *D, AnalysisMode Mode, ExprEngine::InliningModes IMode, SetOfConstDecls *VisitedCallees) { @@ -732,6 +769,7 @@ void AnalysisConsumer::HandleCode(Decl *D, AnalysisMode Mode, if ((Mode & AM_Path) && checkerMgr->hasPathSensitiveCheckers()) { RunPathSensitiveChecks(D, IMode, VisitedCallees); + EntryPointStat::takeSnapshot(getDefinitionOrCanonicalDecl(D)); if (IMode != ExprEngine::Inline_Minimal) NumFunctionsAnalyzed++; } diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index 978a7509ee5e3..c20fea03ede8d 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -79,6 +79,7 @@ // CHECK-NEXT: debug.AnalysisOrder:RegionChanges = false // CHECK-NEXT: display-checker-name = true // CHECK-NEXT: display-ctu-progress = false +// CHECK-NEXT: dump-se-stats-to-csv = "" // CHECK-NEXT: eagerly-assume = true // CHECK-NEXT: elide-constructors = true // CHECK-NEXT: expand-macros = false diff --git a/clang/test/Analysis/csv2json.py b/clang/test/Analysis/csv2json.py new file mode 100644 index 0000000000000..c0c2f4c706228 --- /dev/null +++ b/clang/test/Analysis/csv2json.py @@ -0,0 +1,98 @@ +#!/usr/bin/env python +# +# ===- csv2json.py - Static Analyzer test helper ---*- python -*-===# +# +# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +# See https://llvm.org/LICENSE.txt for license information. +# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +# +# ===------------------------------------------------------------------------===# + +r""" +Clang Static Analyzer test helper +================================= + +This script converts a CSV file to a JSON file with a specific structure. + +The JSON file contains a single dictionary. The keys of this dictionary +are taken from the first column of the CSV. The values are dictionaries +themselves, mapping the CSV header names (except the first column) to +the corresponding row values. + + +Usage: + csv2json.py <source-file> + +Example: + // RUN: %csv2json.py %t | FileCheck %s +""" + +import csv +import sys +import json + +def csv_to_json_dict(csv_filepath): + """ + Args: + csv_filepath: The path to the input CSV file. + + Raises: + FileNotFoundError: If the CSV file does not exist. + csv.Error: If there is an error parsing the CSV file. + Exception: For any other unexpected errors. + """ + try: + with open(csv_filepath, 'r', encoding='utf-8') as csvfile: + reader = csv.reader(csvfile) + + # Read the header row (column names) + try: + header = next(reader) + except StopIteration: # Handle empty CSV file + json.dumps({}, indent=2) # write an empty dict + return + + if not header: #handle a csv file that contains no rows, not even a header row. + json.dumps({}, indent=2) + return + + other_column_names = [name.strip() for name in header[1:]] + + data_dict = {} + + for row in reader: + if len(row) != len(header): + raise csv.Error("Inconsistent CSV file") + exit(1) + + key = row[0] + value_map = {} + + for i, col_name in enumerate(other_column_names): + value_map[col_name] = row[i + 1].strip() # +1 to skip the first column + + data_dict[key] = value_map + + return json.dumps(data_dict, indent=2) + + except FileNotFoundError: + raise FileNotFoundError(f"Error: CSV file not found at {csv_filepath}") + except csv.Error as e: + raise csv.Error(f"Error parsing CSV file: {e}") + except Exception as e: + raise Exception(f"An unexpected error occurred: {e}") + + +def main(): + """Example usage with error handling.""" + csv_file = sys.argv[1] + + try: + print(csv_to_json_dict(csv_file)) + except (FileNotFoundError, csv.Error, Exception) as e: + print(str(e)) + except: + print("An error occured") + +if __name__ == '__main__': + main() diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py index 9820ddd1f14af..58c3ec03761a5 100644 --- a/clang/test/lit.cfg.py +++ b/clang/test/lit.cfg.py @@ -186,6 +186,16 @@ def have_host_clang_repl_cuda(): ) ) + csv2json_path = os.path.join( + config.test_source_root, "Analysis", "csv2json.py" + ) + config.substitutions.append( + ( + "%csv2json", + '"%s" %s' % (config.python_executable, csv2json_path), + ) + ) + llvm_config.add_tool_substitutions(tools, tool_dirs) config.substitutions.append( >From c8e024fe96070b0bec2d6e207f12343e7311fd7c Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Fri, 14 Mar 2025 08:23:59 +0100 Subject: [PATCH 02/10] [NFC] Fix darker python formatting --- clang/test/Analysis/csv2json.py | 12 ++++++++---- clang/test/lit.cfg.py | 4 +--- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/clang/test/Analysis/csv2json.py b/clang/test/Analysis/csv2json.py index c0c2f4c706228..3c20d689243e7 100644 --- a/clang/test/Analysis/csv2json.py +++ b/clang/test/Analysis/csv2json.py @@ -31,6 +31,7 @@ import sys import json + def csv_to_json_dict(csv_filepath): """ Args: @@ -42,7 +43,7 @@ def csv_to_json_dict(csv_filepath): Exception: For any other unexpected errors. """ try: - with open(csv_filepath, 'r', encoding='utf-8') as csvfile: + with open(csv_filepath, "r", encoding="utf-8") as csvfile: reader = csv.reader(csvfile) # Read the header row (column names) @@ -52,7 +53,8 @@ def csv_to_json_dict(csv_filepath): json.dumps({}, indent=2) # write an empty dict return - if not header: #handle a csv file that contains no rows, not even a header row. + # handle a csv file that contains no rows, not even a header row. + if not header: json.dumps({}, indent=2) return @@ -69,7 +71,8 @@ def csv_to_json_dict(csv_filepath): value_map = {} for i, col_name in enumerate(other_column_names): - value_map[col_name] = row[i + 1].strip() # +1 to skip the first column + # +1 to skip the first column + value_map[col_name] = row[i + 1].strip() data_dict[key] = value_map @@ -94,5 +97,6 @@ def main(): except: print("An error occured") -if __name__ == '__main__': + +if __name__ == "__main__": main() diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py index 58c3ec03761a5..025ef7a9133ea 100644 --- a/clang/test/lit.cfg.py +++ b/clang/test/lit.cfg.py @@ -186,9 +186,7 @@ def have_host_clang_repl_cuda(): ) ) - csv2json_path = os.path.join( - config.test_source_root, "Analysis", "csv2json.py" - ) + csv2json_path = os.path.join(config.test_source_root, "Analysis", "csv2json.py") config.substitutions.append( ( "%csv2json", >From 0485df7e9b90d3867a7a28d7aa00060aaaf067a1 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Fri, 14 Mar 2025 08:45:53 +0100 Subject: [PATCH 03/10] Write documentation --- .../analyzer/developer-docs/Statistics.rst | 36 +++++++++++-------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst index d352bb6f01ebc..f71b51863e8a2 100644 --- a/clang/docs/analyzer/developer-docs/Statistics.rst +++ b/clang/docs/analyzer/developer-docs/Statistics.rst @@ -1,21 +1,27 @@ -====================== -Metrics and Statistics -====================== +=================== +Analysis Statistics +=================== -TODO: write this once the design is settled (@reviewer, don't look here yet) +CSA enjoys two facilities to collect statistics: per translation unit and per entry point. +We use llvm/ADT/Statistic.h for numbers describing the entire translation unit (TU). +We use clang/StatisCnalyzer/Core/PathSensitive/EntryPointStats.h to collect data for each symbolic-execution entry point. -CSA enjoys two facilities to collect statistics per translation unit and per entry point. +In many cases, it makes sense to collect statistics on both translation-unit level and entry-point level. You can use the two macros defined in EntryPointStats.h for that: -Mention the following tools: -- STATISTIC macro -- ALLWAYS_ENABLED_STATISTIC macro +- ``STAT_COUNTER`` for additive statistics, for example, "the number of steps executed", "the number of functions inlined". +- ``STAT_MAX`` for maximizing statistics, for example, "the maximum environment size", or "the longest execution path". -- STAT_COUNTER macro -- STAT_MAX macro +If you want to define a statistic that makes sense only for the entire translation unit, for example, "the number of entry points", Statistic.h defines two macros: ``STATISTIC`` and ``ALLWAYS_ENABLED_STATISTIC``. +You should prefer ``ALLWAYS_ENABLED_STATISTIC`` unless you have a good reason not to. +``STATISTIC`` is controlled by ``LLVM_ENABLE_STATS`` / ``LLVM_FORCE_ENABLE_STATS``. +However, note that with ``LLVM_ENABLE_STATS`` disabled, only storage of the values is disabled, the computations producing those values still carry on unless you took an explicit precaution to make them conditional too. -- BoolEPStat -- UnsignedEPStat -- CounterEPStat -- UnsignedMaxEPStat +If you want to define a statistic only for entry point, EntryPointStats.h has four classes at your disposal: -- dump-se-metrics-to-csv="%t.csv" + +- ``BoolEPStat`` - a boolean value assigned at most once per entry point. For example: "has the inline limit been reached". +- ``UnsignedEPStat`` - an unsigned value assigned at most once per entry point. For example: "the number of source characters in an entry-point body". +- ``CounterEPStat`` - an additive statistic. It starts with 0 and you can add to it as many times as needed. For example: "the number of bugs discovered". +- ``UnsignedMaxEPStat`` - a maximizing statistic. It starts with 0 and when you join it with a value, it picks the maximum of the previous value and the new one. For example, "the longest execution path of a bug". + +To produce a CSV file with all the statistics collected per entry point, use the `dump-se-metrics-to-csv=<file>.csv` parameter. >From a9caa3b300d1a8379701fedaa1b1453985c438c1 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <arseniy.zaostrovn...@sonarsource.com> Date: Fri, 14 Mar 2025 15:10:04 +0100 Subject: [PATCH 04/10] Update clang/docs/analyzer/developer-docs/Statistics.rst MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Donát Nagy <donat.n...@ericsson.com> --- clang/docs/analyzer/developer-docs/Statistics.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst index f71b51863e8a2..9e7705422c0ae 100644 --- a/clang/docs/analyzer/developer-docs/Statistics.rst +++ b/clang/docs/analyzer/developer-docs/Statistics.rst @@ -4,7 +4,7 @@ Analysis Statistics CSA enjoys two facilities to collect statistics: per translation unit and per entry point. We use llvm/ADT/Statistic.h for numbers describing the entire translation unit (TU). -We use clang/StatisCnalyzer/Core/PathSensitive/EntryPointStats.h to collect data for each symbolic-execution entry point. +We use clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h to collect data for each symbolic-execution entry point. In many cases, it makes sense to collect statistics on both translation-unit level and entry-point level. You can use the two macros defined in EntryPointStats.h for that: >From 84e35714a6572a54f50f95b6f38ca12c72b25208 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <arseniy.zaostrovn...@sonarsource.com> Date: Fri, 14 Mar 2025 15:10:35 +0100 Subject: [PATCH 05/10] Update clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Donát Nagy <donat.n...@ericsson.com> --- .../clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h index 16c9fdf97fc30..6fa68a0752231 100644 --- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h @@ -1,4 +1,4 @@ -// EntryPointStats.h - Tracking statistics per entry point -*- C++ -*-// +// EntryPointStats.h - Tracking statistics per entry point --*- C++ -*-// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. >From 3a9d14c3446cb1f7134a0656cca9edb1c9b1bb38 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Fri, 14 Mar 2025 15:14:53 +0100 Subject: [PATCH 06/10] Rename parameter to dump-entry-point-stats-to-csv --- clang/docs/analyzer/developer-docs/Statistics.rst | 2 +- clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def | 2 +- clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp | 4 ++-- clang/test/Analysis/analyzer-config.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst index 9e7705422c0ae..f64c9e9a6be2d 100644 --- a/clang/docs/analyzer/developer-docs/Statistics.rst +++ b/clang/docs/analyzer/developer-docs/Statistics.rst @@ -24,4 +24,4 @@ If you want to define a statistic only for entry point, EntryPointStats.h has fo - ``CounterEPStat`` - an additive statistic. It starts with 0 and you can add to it as many times as needed. For example: "the number of bugs discovered". - ``UnsignedMaxEPStat`` - a maximizing statistic. It starts with 0 and when you join it with a value, it picks the maximum of the previous value and the new one. For example, "the longest execution path of a bug". -To produce a CSV file with all the statistics collected per entry point, use the `dump-se-metrics-to-csv=<file>.csv` parameter. +To produce a CSV file with all the statistics collected per entry point, use the `dump-entry-point-stats-to-csv=<file>.csv` parameter. diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index b88bce5e262a7..f9f22a9ced650 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -354,7 +354,7 @@ ANALYZER_OPTION(bool, DisplayCTUProgress, "display-ctu-progress", false) ANALYZER_OPTION( - StringRef, DumpSEStatsToCSV, "dump-se-stats-to-csv", + StringRef, DumpEntryPointStatsToCSV, "dump-entry-point-stats-to-csv", "If provided, the analyzer will dump statistics per entry point " "into the specified CSV file.", "") diff --git a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp index 1b9d965bc2999..372503bebc174 100644 --- a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp +++ b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp @@ -657,8 +657,8 @@ void AnalysisConsumer::HandleTranslationUnit(ASTContext &C) { (FunctionSummaries.getTotalNumVisitedBasicBlocks() * 100) / NumBlocksInAnalyzedFunctions; - if (!Opts.DumpSEStatsToCSV.empty()) { - EntryPointStat::dumpStatsAsCSV(Opts.DumpSEStatsToCSV); + if (!Opts.DumpEntryPointStatsToCSV.empty()) { + EntryPointStat::dumpStatsAsCSV(Opts.DumpEntryPointStatsToCSV); } } diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index c20fea03ede8d..00177769f3243 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -79,7 +79,7 @@ // CHECK-NEXT: debug.AnalysisOrder:RegionChanges = false // CHECK-NEXT: display-checker-name = true // CHECK-NEXT: display-ctu-progress = false -// CHECK-NEXT: dump-se-stats-to-csv = "" +// CHECK-NEXT: dump-entry-point-stats-to-csv = "" // CHECK-NEXT: eagerly-assume = true // CHECK-NEXT: elide-constructors = true // CHECK-NEXT: expand-macros = false >From 7b11c652da915dd64854faca301f636c27567ed1 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Fri, 14 Mar 2025 15:19:33 +0100 Subject: [PATCH 07/10] Add the missing test case --- .../analyzer-stats/entry-point-stats.cpp | 96 +++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 clang/test/Analysis/analyzer-stats/entry-point-stats.cpp diff --git a/clang/test/Analysis/analyzer-stats/entry-point-stats.cpp b/clang/test/Analysis/analyzer-stats/entry-point-stats.cpp new file mode 100644 index 0000000000000..bddba084ee4bf --- /dev/null +++ b/clang/test/Analysis/analyzer-stats/entry-point-stats.cpp @@ -0,0 +1,96 @@ +// REQUIRES: asserts +// RUN: %clang_analyze_cc1 -analyzer-checker=core \ +// RUN: -analyzer-config dump-entry-point-stats-to-csv="%t.csv" \ +// RUN: -verify %s +// RUN: %csv2json "%t.csv" | FileCheck --check-prefix=CHECK %s +// +// CHECK: { +// CHECK-NEXT: "fib(unsigned int)": { +// CHECK-NEXT: "NumBlocks": "{{[0-9]+}}", +// CHECK-NEXT: "NumBlocksUnreachable": "{{[0-9]+}}", +// CHECK-NEXT: "NumCTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumFunctionTopLevel": "{{[0-9]+}}", +// CHECK-NEXT: "NumInlinedCalls": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReached": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReachedInInlined": "{{[0-9]+}}", +// CHECK-NEXT: "NumOfDynamicDispatchPathSplits": "{{[0-9]+}}", +// CHECK-NEXT: "NumPathsExplored": "{{[0-9]+}}", +// CHECK-NEXT: "NumReachedInlineCountMax": "{{[0-9]+}}", +// CHECK-NEXT: "NumRemoveDeadBindings": "{{[0-9]+}}", +// CHECK-NEXT: "NumSTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassAborted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassWasExhausted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportPassesZ3": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportRefuted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesRetriedWithoutInlining": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3ExhaustedRLimit": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryAcceptsReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3SpendsTooMuchTimeOnASingleEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3TimedOut": "{{[0-9]+}}", +// CHECK-NEXT: "NumZ3QueriesDone": "{{[0-9]+}}", +// CHECK-NEXT: "MaxBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxCFGSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxQueueSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxReachableSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxValidBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "PathRunningTime": "{{[0-9]+}}" +// CHECK-NEXT: }, +// CHECK-NEXT: "main(int, char **)": { +// CHECK-NEXT: "NumBlocks": "{{[0-9]+}}", +// CHECK-NEXT: "NumBlocksUnreachable": "{{[0-9]+}}", +// CHECK-NEXT: "NumCTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumFunctionTopLevel": "{{[0-9]+}}", +// CHECK-NEXT: "NumInlinedCalls": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReached": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReachedInInlined": "{{[0-9]+}}", +// CHECK-NEXT: "NumOfDynamicDispatchPathSplits": "{{[0-9]+}}", +// CHECK-NEXT: "NumPathsExplored": "{{[0-9]+}}", +// CHECK-NEXT: "NumReachedInlineCountMax": "{{[0-9]+}}", +// CHECK-NEXT: "NumRemoveDeadBindings": "{{[0-9]+}}", +// CHECK-NEXT: "NumSTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassAborted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassWasExhausted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportPassesZ3": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportRefuted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesRetriedWithoutInlining": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3ExhaustedRLimit": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryAcceptsReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3SpendsTooMuchTimeOnASingleEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3TimedOut": "{{[0-9]+}}", +// CHECK-NEXT: "NumZ3QueriesDone": "{{[0-9]+}}", +// CHECK-NEXT: "MaxBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxCFGSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxQueueSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxReachableSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxValidBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "PathRunningTime": "{{[0-9]+}}" +// CHECK-NEXT: } +// CHECK-NEXT: } +// CHECK-NOT: non_entry_point + +// expected-no-diagnostics +int non_entry_point(int end) { + int sum = 0; + for (int i = 0; i <= end; ++i) { + sum += i; + } + return sum; +} + +int fib(unsigned n) { + if (n <= 1) { + return 1; + } + return fib(n - 1) + fib(n - 2); +} + +int main(int argc, char **argv) { + int i = non_entry_point(argc); + return i; +} >From 886f5ae102df1da0d95717e1238c202b9f4c7fba Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Fri, 14 Mar 2025 15:21:13 +0100 Subject: [PATCH 08/10] [NFC] Fix typo in a class name --- .../Core/PathSensitive/EntryPointStats.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h index 6fa68a0752231..68f3d16c2944a 100644 --- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h @@ -133,14 +133,14 @@ class CounterEntryPointTranslationUnitStat { } }; -class UnsignedMaxEtryPointTranslationUnitStatistic { +class UnsignedMaxEntryPointTranslationUnitStatistic { UnsignedMaxEPStat M; llvm::TrackingStatistic S; public: - UnsignedMaxEtryPointTranslationUnitStatistic(const char *DebugType, - llvm::StringLiteral Name, - llvm::StringLiteral Desc) + UnsignedMaxEntryPointTranslationUnitStatistic(const char *DebugType, + llvm::StringLiteral Name, + llvm::StringLiteral Desc) : M(Name), S(DebugType, Name.data(), Desc.data()) {} void updateMax(uint64_t Value) { M.updateMax(static_cast<unsigned>(Value)); @@ -153,8 +153,8 @@ class UnsignedMaxEtryPointTranslationUnitStatistic { DEBUG_TYPE, #VARNAME, DESC} #define STAT_MAX(VARNAME, DESC) \ - static clang::ento::UnsignedMaxEtryPointTranslationUnitStatistic VARNAME = { \ - DEBUG_TYPE, #VARNAME, DESC} + static clang::ento::UnsignedMaxEntryPointTranslationUnitStatistic VARNAME = \ + {DEBUG_TYPE, #VARNAME, DESC} } // namespace ento } // namespace clang >From 8e58d6c649ee6831e22de7aacabd84eb7dc01063 Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <necto...@gmail.com> Date: Fri, 14 Mar 2025 17:16:42 +0100 Subject: [PATCH 09/10] Remove unnecesary canonicalization --- .../Frontend/AnalysisConsumer.cpp | 30 +------------------ 1 file changed, 1 insertion(+), 29 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp index 372503bebc174..b4222eddc09f9 100644 --- a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp +++ b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp @@ -695,34 +695,6 @@ AnalysisConsumer::getModeForDecl(Decl *D, AnalysisMode Mode) { return Mode; } -template <typename DeclT> -static clang::Decl *preferDefinitionImpl(clang::Decl *D) { - if (auto *X = dyn_cast<DeclT>(D)) - if (auto *Def = X->getDefinition()) - return Def; - return D; -} - -template <> clang::Decl *preferDefinitionImpl<ObjCMethodDecl>(clang::Decl *D) { - if (const auto *X = dyn_cast<ObjCMethodDecl>(D)) { - for (auto *I : X->redecls()) - if (I->hasBody()) - return I; - } - return D; -} - -static Decl *getDefinitionOrCanonicalDecl(Decl *D) { - assert(D); - D = D->getCanonicalDecl(); - D = preferDefinitionImpl<VarDecl>(D); - D = preferDefinitionImpl<FunctionDecl>(D); - D = preferDefinitionImpl<TagDecl>(D); - D = preferDefinitionImpl<ObjCMethodDecl>(D); - assert(D); - return D; -} - static UnsignedEPStat PathRunningTime("PathRunningTime"); void AnalysisConsumer::HandleCode(Decl *D, AnalysisMode Mode, @@ -769,7 +741,7 @@ void AnalysisConsumer::HandleCode(Decl *D, AnalysisMode Mode, if ((Mode & AM_Path) && checkerMgr->hasPathSensitiveCheckers()) { RunPathSensitiveChecks(D, IMode, VisitedCallees); - EntryPointStat::takeSnapshot(getDefinitionOrCanonicalDecl(D)); + EntryPointStat::takeSnapshot(D); if (IMode != ExprEngine::Inline_Minimal) NumFunctionsAnalyzed++; } >From 36c834647416fec74d3f78b3aa8d0a90cbf9a85a Mon Sep 17 00:00:00 2001 From: Arseniy Zaostrovnykh <arseniy.zaostrovn...@sonarsource.com> Date: Fri, 14 Mar 2025 17:18:21 +0100 Subject: [PATCH 10/10] Update clang/docs/analyzer/developer-docs/Statistics.rst Co-authored-by: Balazs Benics <benicsbal...@gmail.com> --- clang/docs/analyzer/developer-docs/Statistics.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst index f64c9e9a6be2d..c165eebbaa16e 100644 --- a/clang/docs/analyzer/developer-docs/Statistics.rst +++ b/clang/docs/analyzer/developer-docs/Statistics.rst @@ -11,8 +11,8 @@ In many cases, it makes sense to collect statistics on both translation-unit lev - ``STAT_COUNTER`` for additive statistics, for example, "the number of steps executed", "the number of functions inlined". - ``STAT_MAX`` for maximizing statistics, for example, "the maximum environment size", or "the longest execution path". -If you want to define a statistic that makes sense only for the entire translation unit, for example, "the number of entry points", Statistic.h defines two macros: ``STATISTIC`` and ``ALLWAYS_ENABLED_STATISTIC``. -You should prefer ``ALLWAYS_ENABLED_STATISTIC`` unless you have a good reason not to. +If you want to define a statistic that makes sense only for the entire translation unit, for example, "the number of entry points", Statistic.h defines two macros: ``STATISTIC`` and ``ALWAYS_ENABLED_STATISTIC``. +You should prefer ``ALWAYS_ENABLED_STATISTIC`` unless you have a good reason not to. ``STATISTIC`` is controlled by ``LLVM_ENABLE_STATS`` / ``LLVM_FORCE_ENABLE_STATS``. However, note that with ``LLVM_ENABLE_STATS`` disabled, only storage of the values is disabled, the computations producing those values still carry on unless you took an explicit precaution to make them conditional too. _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits