callee analysis (PR #99656)

Doug Wyatt via cfe-commits Tue, 30 Jul 2024 14:19:44 -0700

================
@@ -2397,6 +2397,1262 @@ class UnsafeBufferUsageReporter : public 
UnsafeBufferUsageHandler {
 };
 } // namespace
 
+// 
=============================================================================
+
+namespace FXAnalysis {
+
+enum class DiagnosticID : uint8_t {
+  None = 0, // sentinel for an empty Diagnostic
+  Throws,
+  Catches,
+  CallsObjC,
+  AllocatesMemory,
+  HasStaticLocal,
+  AccessesThreadLocal,
+
+  // These only apply to callees, where the analysis stops at the Decl
+  DeclDisallowsInference,
+
+  CallsDeclWithoutEffect,
+  CallsExprWithoutEffect,
+};
+
+// Holds an effect diagnosis, potentially for the entire duration of the
+// analysis phase, in order to refer to it when explaining why a caller has 
been
+// made unsafe by a callee.
+struct Diagnostic {
+  FunctionEffect Effect;
+  DiagnosticID ID = DiagnosticID::None;
+  SourceLocation Loc;
+  const Decl *Callee = nullptr; // only valid for Calls*
+
+  Diagnostic() = default;
+
+  Diagnostic(const FunctionEffect &Effect, DiagnosticID ID, SourceLocation Loc,
+             const Decl *Callee = nullptr)
+      : Effect(Effect), ID(ID), Loc(Loc), Callee(Callee) {}
+};
+
+enum class SpecialFuncType : uint8_t { None, OperatorNew, OperatorDelete };
+enum class CallType {
+  // unknown: probably function pointer
+  Unknown,
+  Function,
+  Virtual,
+  Block
+};
+
+// Return whether a function's effects CAN be verified.
+// The question of whether it SHOULD be verified is independent.
+static bool functionIsVerifiable(const FunctionDecl *FD) {
+  if (!(FD->hasBody() || FD->isInlined())) {
+    // externally defined; we couldn't verify if we wanted to.
+    return false;
+  }
+  if (FD->isTrivial()) {
+    // Otherwise `struct x { int a; };` would have an unverifiable default
+    // constructor.
+    return true;
+  }
+  return true;
+}
+
+/// A mutable set of FunctionEffect, for use in places where any conditions
+/// have been resolved or can be ignored.
+class EffectSet {
+  // This implementation optimizes footprint, since we hold one of these for
+  // every function visited, which, due to inference, can be many more 
functions
+  // than have declared effects.
+
+  template <typename T, typename SizeT, SizeT Capacity> struct FixedVector {
+    SizeT Count = 0;
+    T Items[Capacity] = {};
+
+    using value_type = T;
+
+    using iterator = T *;
+    using const_iterator = const T *;
+    iterator begin() { return &Items[0]; }
+    iterator end() { return &Items[Count]; }
+    const_iterator begin() const { return &Items[0]; }
+    const_iterator end() const { return &Items[Count]; }
+    const_iterator cbegin() const { return &Items[0]; }
+    const_iterator cend() const { return &Items[Count]; }
+
+    void insert(iterator I, const T &Value) {
+      assert(Count < Capacity);
+      iterator E = end();
+      if (I != E)
+        std::copy_backward(I, E, E + 1);
+      *I = Value;
+      ++Count;
+    }
+
+    void push_back(const T &Value) {
+      assert(Count < Capacity);
+      Items[Count++] = Value;
+    }
+  };
+
+  // As long as FunctionEffect is only 1 byte, and there are only 2 verifiable
+  // effects, this fixed-size vector with a capacity of 7 is more than
+  // sufficient and is only 8 bytes.
+  FixedVector<FunctionEffect, uint8_t, 7> Impl;
----------------
dougsonos wrote:


To summarize the multiple data structures and explore alternatives:

`FunctionEffectsRef` is a pair of `ArrayRef`s, one of `FunctionEffect`, another 
of `EffectConditionExpr`. These are convenient and normalized forms of what is 
held in `FunctionProtoType`'s trailing objects.

`FunctionEffectSet` is a pair of small vectors, also `FunctionEffect` and 
`EffectConditionExpr`. To interoperate with `FunctionEffectsRef`, the vectors 
are not "interleaved".

Once we get to this secondary analysis, none of the effects have conditions any 
more, and we need to hold potentially thousands of little sets to indicate 
which effects are either declared on, or inferred on, every visited `Decl`. 
Here are some possible representations for these sets of effects without 
conditions:

1. Use `FunctionEffectSet` and don't worry that it's 32 bytes where 8 (or less) 
would have sufficed.

2. Use low-level layout tricks to make `FunctionEffectSet` 8 bytes in the 
overwhelmingly common case where no effects have conditions. (e.g. it's either 
a pointer to two SmallVectors, or a uint8_t count and an array of 7 
FunctionEffects, determined by one bit.)

3. Make a `TinyFunctionEffectSet` which holds only effects, no conditions. 
Using a bit vector (of effect kinds) would be possible but make it difficult to 
build the `ArrayRef<FunctionEffect>` needed by some methods of 
`FunctionEffect`. Thus the current array-like implementation here in the PR.

Options 1 and 2 are appealing in that FunctionEffectSet already implements the 
iteration, insertion and set difference options -- albeit with some extra 
business about conflicts that shouldn't happen in the 2nd phase analysis. The 
more I tinker with this third class, the more I appreciate your sense that it's 
redundant. Then it becomes a tradeoff between 24 bytes of overhead times 
possibly thousands of functions in a TU (1), vs. the low-level ugliness / 
space-saving elegance of 2.

https://github.com/llvm/llvm-project/pull/99656
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] nonblocking/nonallocating attributes: 2nd pass caller/callee analysis (PR #99656)

Reply via email to