[PATCH] D69813: [analyzer][WIP] CERTStrChecker: Model gets()

Csaba Dabis via Phabricator via cfe-commits Tue, 05 Nov 2019 07:07:18 -0800

Charusso added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:124
+  if (const SymbolicRegion *SR = DestMR->getSymbolicBase())
+    if (const Expr *SizeExpr = getDynamicSizeExpr(C.getState(), SR))
+      return exprToStr(SizeExpr, C);
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > Charusso wrote:
> > > > Charusso wrote:
> > > > > NoQ wrote:
> > > > > > Again, you will have to highlight the allocation site with a note. 
> > > > > > Therefore you will have to write a bug visitor that traverses the 
> > > > > > size expression at some point (or, equivalently, a note tag when 
> > > > > > the size expression is evaluated). Therefore you don't need to 
> > > > > > store the expression in the program state.
> > > > > Yes, you have pointed out the necessary visitor, but it needs more 
> > > > > thinking.
> > > > > 
> > > > > I have a memory region which could be any kind of "memory block 
> > > > > region" therefore I have no idea where is the size expression. We are 
> > > > > supporting ~20 different allocations, which is nothing compared to 
> > > > > the wild with the not so uncommon 5+ parameter allocators. Therefore 
> > > > > I still do not want to reverse engineer a small MallocChecker + 
> > > > > ExprEngine + BuiltinFunctionChecker inside my checker. They provide 
> > > > > the necessary `DynamicSizeInfo` easily, which could be used in at 
> > > > > least 4 checkers at the moment (which I have pointed out earlier in 
> > > > > D68725).
> > > > > 
> > > > > If I have the size expression in the dynamic size map, and I can 
> > > > > clearly point out the destination buffer, it is a lot more simplified 
> > > > > to traverse the graph where the buffer and its size comes from.
> > > > Well, you really do not want to store `SizeExpr` of `malloc(SizeExpr)` 
> > > > and you are right I will have to traverse from it to see whether the 
> > > > `SizeExpr` is ambiguous or not, where it comes from.
> > > > 
> > > > I want to rely on the `trackExpressionValue()` as the `SizeExpr` is 
> > > > available by `getDynamicSizeExpr()`, so it is one or two lines of code.
> > > > 
> > > > Would you create your own switch-case to see where is the size 
> > > > expression goes in the allocation and use `trackExpressionValue()` on 
> > > > it? So that you do not store information in the global state which 
> > > > results in better run-time / less memory.
> > > > 
> > > > At first I really wanted to model `malloc()` and `realloc()` and stuff, 
> > > > then I realized the `MallocChecker` provides every information I need. 
> > > > Would it be a better idea to create my own tiny `MallocChecker` inside 
> > > > my checker which does nothing but marks the size expression being 
> > > > interesting with `NoteTags`?
> > > > 
> > > > Also I am thinking of a switch-case on the `DefinedOrUnknownSVal Size` 
> > > > which somewhere has an expression inside it which I could 
> > > > `trackExpressionValue()` on.
> > > > 
> > > > Basically we are missing the rules what to use and I have picked the 
> > > > easiest solution. Could you share please which would be the right 
> > > > direction for such a simple task?
> > > > I want to rely on the `trackExpressionValue()` as the `SizeExpr` is 
> > > > available by `getDynamicSizeExpr()`, so it is one or two lines of code.
> > > 
> > > This won't work. `trackExpressionValue()` can only track an active 
> > > expression (that has, or at least should have, a value in the bug-node's 
> > > environment). You'll have to make a visitor or a note tag.
> > > 
> > > You can either make your own visitor (which will detect the node in which 
> > > the extent information becomes present), or convert `MallocChecker` to 
> > > use note tags and then inter-operate with those tags (though the 
> > > interestingness map - "i mark the symbol as interesting so i'm interested 
> > > in highlighting the allocation site" - or a similar mechanism). The 
> > > second approach is more work because no such interoperation has ever been 
> > > implemented yet, but it should be pretty rewarding for the future.
> > > This won't work. trackExpressionValue() can only track an active 
> > > expression (that has, or at least should have, a value in the bug-node's 
> > > environment). You'll have to make a visitor or a note tag.
> > So because most likely after the `malloc()` the `size` symbol dies, the 
> > `trackExpressionValue()` cannot track dead symbols? Because we could make 
> > the `size` dying base on the `buffer`, we have some dependency logic for 
> > that. It also represents the truth, the size is part of that memory block's 
> > region. After that we could track the expression of the `size`?
> > So because most likely after the malloc() the size symbol dies...?
> 
> After the `malloc()` is consumed, the size //expression// dies and gets 
> cleaned up from the //Environment//. The symbol will only die if the value 
> wasn't put into the //Store// in the process of modeling the statement that 
> consumes the `malloc()` expression (such as an assignment). But 
> `trackExpressionValue()` can only track live (active) expressions.
I see. Now I have tried out what we have. The `trackExpressionValue()` has a 
lookup to see where is the expression available:

```lang=c++
/// Find the ExplodedNode where the lvalue (the value of 'Ex')                  
 
/// was computed.                                                               
 
static const ExplodedNode* findNodeForExpression(const ExplodedNode *N,         
 
                                                 const Expr *Inner) {           
 
  while (N) {                                                                   
 
    if (N->getStmtForDiagnostics() == Inner)                                    
 
      return N;                                                                 
 
    N = N->getFirstPred();                                                      
 
  }                                                                             
 
  return N;                                                                     
 
}
```

from that point the expression was alive, and tracking is fine.

-------------

The `InnerPointerChecker` has introduced a place: `AllocationState.h` to 
communicate with the `MallocBugVisitor`. I believe this is the simplest way to 
communicate.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:184
+  if (IsFix) {
+    if (Optional<std::string> SizeStr = getSizeExprAsString(Call, CallC, C)) {
+      renameFunctionFix(UseSafeFunctions ? "gets_s" : "fgets", Call, *Report);
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > Also, which is probably more important, you will never be able to provide 
> > > a fixit for the malloced memory case, because there may be multiple 
> > > execution paths that reach the current point with different size 
> > > expressions (in fact, not necessarily all of them are malloced).
> > > 
> > > Eg.:
> > > ```lang=c
> > > char *x = 0;
> > > char y[10];
> > > 
> > > if (coin()) {
> > >   x = malloc(20);
> > > } else {
> > >   x = y;
> > > }
> > > 
> > > gets(x);
> > > ```
> > > 
> > > If you suggest replacing `gets(x)` with `gets_s(x, 20)`, you'll still 
> > > have a buffer overflow on the else-branch on which `x` points to an array 
> > > of 10 bytes.
> > This checker going to evolve a lot, and all of the checked function calls 
> > have issues like that. I do not even think what else issues they have. I 
> > would like to cover the false alarm suppression when we are about to alarm. 
> > Is it would be okay? I really would like to see alarms first.
> > 
> > For example, I have seen stuff in the wild so that I can state out 8-param 
> > allocators and we need to rely on the checkers provide information about 
> > allocation.
> *summons @Szelethus*
> 
> Apart from the obviously syntactic cases, you might actually be able to 
> implement fixits for the situation when the reaching-definitions analysis 
> displays exactly one definition for `x`, which additionally coincides with 
> the allocation site. If that definition is a simple assignment, you'll be 
> able to re-run the reaching definitions analysis for the RHS of that 
> assignment. If that definition comes from a function call, you might be able 
> to re-run the reaching definitions analysis on the return statement(s) of 
> that function (note that this function must have been inlined during 
> path-sensitive analysis, otherwise no definition in it would coincide with 
> the allocation site). And so on.
> 
> This problem sheds some light on how much do we want to make the reaching 
> definitions analysis inter-procedural. My current guess is that we probably 
> don't need to; we'd rather have this guided by re-running the 
> reaching-definitions analysis based on the path-sensitive report data, than 
> have the reaching-definitions analysis be inter-procedural on our own.
That is a cool idea! I hope @Szelethus has time for his project.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69813/new/

https://reviews.llvm.org/D69813



_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69813: [analyzer][WIP] CERTStrChecker: Model gets()

Reply via email to