NoQ added a comment.

Noticed a few more things.

It sounds as if once this first patch lands, the rest should be easy :)

Regarding the comments in the code. I materialized my wishes to something like:

- At the top of the file:

  // In the code, iterator can be represented as a:
  // * type-I: typedef-ed pointer. Operations over such iterator, such as 
comparisons or increments, are modeled straightforwardly by the analyzer.
  // * type-II: structure with its method bodies available.  Operations over 
such iterator are inlined by the analyzer, and results of modeling these 
operations are exposing implementation details of the iterators, which is not 
necessarily helping.
  // * type-III: completely opaque structure. Operations over such iterator are 
modeled conservatively, producing conjured symbols everywhere.
  //
  // Additionally, depending on the circumstances, operators of types II and 
III can be represented as:
  // * type-IIa, type-IIIa: conjured structure symbols - when returned by value 
from conservatively evaluated methods such as `.begin()`.
  // * type-IIb, type-IIIb: memory regions of iterator-typed objects, such as 
variables or temporaries, when the iterator object is currently treated as an 
lvalue.
  // * type-IIc, type-IIIc: compound values of iterator-typed objects, when the 
iterator object is treated as an rvalue taken of a particular lvalue, eg. a 
copy of "type-a" iterator object, or an iterator that existed before the 
analysis has started.

Not sure if type-IIa iterators actually make sense. It's ok if you come up with 
your own classification :)

Then, in methods that deal with iterator `SVal`s directly, i wish we had hints 
explaining what's going on in these ~7 cases. In my opinion, that'd greatly 
help people understand the code later, and it'd help us understand how to avoid 
this variety and provide checker authors with a better API as soon as we get to 
this, so it's the biggest concern for me about this checker.



================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:16
+#include "ClangSACheckers.h"
+#include "clang/AST/DeclTemplate.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
----------------
This header seems unused for now.


================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:22
+
+#include <utility>
+
----------------
This header seems unused for now.


================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:153-155
+REGISTER_MAP_WITH_PROGRAMSTATE(IteratorSymbolMap, SymbolRef, IteratorPosition)
+REGISTER_MAP_WITH_PROGRAMSTATE(IteratorRegionMap, const MemRegion *,
+                               IteratorPosition)
----------------
Carryover from the other review: did you try using `RegionOrSymbol` as a key 
and keep only one map?


================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:166-167
+
+static llvm::APSInt Zero = llvm::APSInt::get(0);
+static llvm::APSInt One = llvm::APSInt::get(1);
+
----------------
I've a bit of doubt about those. Would they call their constructors every time 
clang starts, regardless of whether the analyzer or the checker is enabled? 
Maybe having them as private variables inside the checker class would be better?

As in http://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors


================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:294-297
+    // Assumption: if return value is an iterator which is not yet bound to a
+    //             container, then look for the first iterator argument, and
+    //             bind the return value to the same container. This approach
+    //             works for STL algorithms.
----------------
baloghadamsoftware wrote:
> NoQ wrote:
> > I guess this deserves a test case (we could split this out as a separate 
> > feature as well).
> > 
> > I'm also afraid that we can encounter false positives on functions that are 
> > not STL algorithms. I suggest doing this by default only for STL functions 
> > (or maybe for other specific classes of functions for which we know it 
> > works this way) and do this for other functions under a checker option 
> > (i.e. something like `-analyzer-config 
> > IteratorChecker:AggressiveAssumptions=true`, similar to `MallocChecker`'s 
> > "Optimistic" option).
> I will check whether this piece of code could be moved in a later part of the 
> checker. However, I suggest to first wait for the first false positives 
> before we introduce such an option. This far the false positives in my 
> initial tests had different reasons, not this one.
Unfortunately, we've had a poor experience with this approach in other 
checkers. You never know, and it seems that it's always better to have a safe 
fallback mode available under a flag, because if a few classes of false 
positives are found, and we are forced to reduce the checker to a safer 
behavior, it'd be hard to remember all the places where unsafe heuristics were 
used.


================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:478-479
+    auto &SymMgr = C.getSymbolManager();
+    EndSym = SymMgr.conjureSymbol(CE, C.getLocationContext(),
+                                  C.getASTContext().LongTy, C.blockCount());
+    State = createContainerEnd(State, ContReg, EndSym);
----------------
NoQ wrote:
> baloghadamsoftware wrote:
> > NoQ wrote:
> > > I see what you did here! And i suggest considering `SymbolMetadata` here 
> > > instead of `SymbolConjured`, because it was designed for this purpose: 
> > > the checker for some reason knows there's a special property of an object 
> > > he wants to track, the checker doesn't have a ready-made symbolic value 
> > > to represent this property (because the engine didn't know this property 
> > > even exists, so it didn't denote its value), so the checker comes up with 
> > > its own notation. `SymbolMetadata` is used, for example, in 
> > > `CStringChecker` to denote an unknown string length for a given 
> > > null-terminated string region.
> > > 
> > > This symbol carries the connection to the region (you may use it to 
> > > reverse-lookup the region if necessary), and in particular it is 
> > > considered to be live as long as the base region is live (so you don't 
> > > have to deal with liveness manually; see also comments around 
> > > `SymbolReaper::MarkInUse`). It's also easier to debug, because we can 
> > > track the symbol back to the checker. Generally, symbol class hierarchy 
> > > is cool because we know a lot about the symbol by just looking at it, and 
> > > `SymbolConjured` is a sad fallback we use when we didn't care or manage 
> > > to come up with a better symbol.
> > > 
> > > I'm not sure if `LongTy` is superior to `IntTy` here, since we don't know 
> > > what to expect from the container anyway.
> > > 
> > > Also, please de-duplicate the symbol creation code. Birth of a symbol is 
> > > something so rare it deserves a drum roll :)
> > > 
> > > I'd take a pause to figure out if the same logic should be applied to the 
> > > map from containers to end-iterators.
> > SymbolMetaData is bound to a MemRegion. Iterators are sometimes symbols and 
> > sometimes memory regions, this was one of the first lessons I learned from 
> > my first iterator checker.
> Oh. Hmm. Ok. Right.
> 
> To be sure: in what cases do you need to create a new symbol when the 
> iterator is already a symbol? How broken do we become if we try to say that 
> the symbolic iterator and its own offset are the same thing?
All right, i guess it wasn't a great idea. Even if iterators are plain 
pointers, the offset is a pointer difference. I still have this feeling that 
the analyzer is not good enough for this checker yet, so at least it's great 
that we have it moving.

I once wished we had "metadata regions" for symbols and regions, and then 
automatically gaining symbols for these region values 
(http://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), that would have 
made things a lot cooler maybe. Never mind.


================
Comment at: test/Analysis/iterator-range.cpp:1-2
+// RUN: %clang_analyze_cc1 -std=c++11 
-analyzer-checker=core,cplusplus,alpha.cplusplus.IteratorRange 
-analyzer-eagerly-assume -analyzer-config c++-container-inlining=false %s 
-verify
+// RUN: %clang_analyze_cc1 -std=c++11 
-analyzer-checker=core,cplusplus,alpha.cplusplus.IteratorRange 
-analyzer-eagerly-assume -analyzer-config c++-container-inlining=true 
-DINLINE=1 %s -verify
+
----------------
Could we add run-lines without `-analyzer-eagerly-assume`? Currently all 
variants are passing, but if new tests will fail, we could `#ifndef` them out.


================
Comment at: test/Analysis/iterator-range.cpp:9
+  if (i != v.end())
+    *i; // no-warning
+}
----------------
I'd suggest sticking `clang_analyzer_warnIfReached()` here to demonstrate that 
there is no warning because the branch is dead. That'd make the test stronger.


https://reviews.llvm.org/D32592



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to