Charusso added inline comments.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:124
+ if (const SymbolicRegion *SR = DestMR->getSymbolicBase())
+ if (const Expr *SizeExpr = getDynamicSizeExpr(C.getState(), SR))
+ return exprToStr(SizeExpr, C);
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > Charusso wrote:
> > > > Charusso wrote:
> > > > > NoQ wrote:
> > > > > > Again, you will have to highlight the allocation site with a note.
> > > > > > Therefore you will have to write a bug visitor that traverses the
> > > > > > size expression at some point (or, equivalently, a note tag when
> > > > > > the size expression is evaluated). Therefore you don't need to
> > > > > > store the expression in the program state.
> > > > > Yes, you have pointed out the necessary visitor, but it needs more
> > > > > thinking.
> > > > >
> > > > > I have a memory region which could be any kind of "memory block
> > > > > region" therefore I have no idea where is the size expression. We are
> > > > > supporting ~20 different allocations, which is nothing compared to
> > > > > the wild with the not so uncommon 5+ parameter allocators. Therefore
> > > > > I still do not want to reverse engineer a small MallocChecker +
> > > > > ExprEngine + BuiltinFunctionChecker inside my checker. They provide
> > > > > the necessary `DynamicSizeInfo` easily, which could be used in at
> > > > > least 4 checkers at the moment (which I have pointed out earlier in
> > > > > D68725).
> > > > >
> > > > > If I have the size expression in the dynamic size map, and I can
> > > > > clearly point out the destination buffer, it is a lot more simplified
> > > > > to traverse the graph where the buffer and its size comes from.
> > > > Well, you really do not want to store `SizeExpr` of `malloc(SizeExpr)`
> > > > and you are right I will have to traverse from it to see whether the
> > > > `SizeExpr` is ambiguous or not, where it comes from.
> > > >
> > > > I want to rely on the `trackExpressionValue()` as the `SizeExpr` is
> > > > available by `getDynamicSizeExpr()`, so it is one or two lines of code.
> > > >
> > > > Would you create your own switch-case to see where is the size
> > > > expression goes in the allocation and use `trackExpressionValue()` on
> > > > it? So that you do not store information in the global state which
> > > > results in better run-time / less memory.
> > > >
> > > > At first I really wanted to model `malloc()` and `realloc()` and stuff,
> > > > then I realized the `MallocChecker` provides every information I need.
> > > > Would it be a better idea to create my own tiny `MallocChecker` inside
> > > > my checker which does nothing but marks the size expression being
> > > > interesting with `NoteTags`?
> > > >
> > > > Also I am thinking of a switch-case on the `DefinedOrUnknownSVal Size`
> > > > which somewhere has an expression inside it which I could
> > > > `trackExpressionValue()` on.
> > > >
> > > > Basically we are missing the rules what to use and I have picked the
> > > > easiest solution. Could you share please which would be the right
> > > > direction for such a simple task?
> > > > I want to rely on the `trackExpressionValue()` as the `SizeExpr` is
> > > > available by `getDynamicSizeExpr()`, so it is one or two lines of code.
> > >
> > > This won't work. `trackExpressionValue()` can only track an active
> > > expression (that has, or at least should have, a value in the bug-node's
> > > environment). You'll have to make a visitor or a note tag.
> > >
> > > You can either make your own visitor (which will detect the node in which
> > > the extent information becomes present), or convert `MallocChecker` to
> > > use note tags and then inter-operate with those tags (though the
> > > interestingness map - "i mark the symbol as interesting so i'm interested
> > > in highlighting the allocation site" - or a similar mechanism). The
> > > second approach is more work because no such interoperation has ever been
> > > implemented yet, but it should be pretty rewarding for the future.
> > > This won't work. trackExpressionValue() can only track an active
> > > expression (that has, or at least should have, a value in the bug-node's
> > > environment). You'll have to make a visitor or a note tag.
> > So because most likely after the `malloc()` the `size` symbol dies, the
> > `trackExpressionValue()` cannot track dead symbols? Because we could make
> > the `size` dying base on the `buffer`, we have some dependency logic for
> > that. It also represents the truth, the size is part of that memory block's
> > region. After that we could track the expression of the `size`?
> > So because most likely after the malloc() the size symbol dies...?
>
> After the `malloc()` is consumed, the size //expression// dies and gets
> cleaned up from the //Environment//. The symbol will only die if the value
> wasn't put into the //Store// in the process of modeling the statement that
> consumes the `malloc()` expression (such as an assignment). But
> `trackExpressionValue()` can only track live (active) expressions.
I see. Now I have tried out what we have. The `trackExpressionValue()` has a
lookup to see where is the expression available:
```lang=c++
/// Find the ExplodedNode where the lvalue (the value of 'Ex')
/// was computed.
static const ExplodedNode* findNodeForExpression(const ExplodedNode *N,
const Expr *Inner) {
while (N) {
if (N->getStmtForDiagnostics() == Inner)
return N;
N = N->getFirstPred();
}
return N;
}
```
from that point the expression was alive, and tracking is fine.
-------------
The `InnerPointerChecker` has introduced a place: `AllocationState.h` to
communicate with the `MallocBugVisitor`. I believe this is the simplest way to
communicate.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:184
+ if (IsFix) {
+ if (Optional<std::string> SizeStr = getSizeExprAsString(Call, CallC, C)) {
+ renameFunctionFix(UseSafeFunctions ? "gets_s" : "fgets", Call, *Report);
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > Also, which is probably more important, you will never be able to provide
> > > a fixit for the malloced memory case, because there may be multiple
> > > execution paths that reach the current point with different size
> > > expressions (in fact, not necessarily all of them are malloced).
> > >
> > > Eg.:
> > > ```lang=c
> > > char *x = 0;
> > > char y[10];
> > >
> > > if (coin()) {
> > > x = malloc(20);
> > > } else {
> > > x = y;
> > > }
> > >
> > > gets(x);
> > > ```
> > >
> > > If you suggest replacing `gets(x)` with `gets_s(x, 20)`, you'll still
> > > have a buffer overflow on the else-branch on which `x` points to an array
> > > of 10 bytes.
> > This checker going to evolve a lot, and all of the checked function calls
> > have issues like that. I do not even think what else issues they have. I
> > would like to cover the false alarm suppression when we are about to alarm.
> > Is it would be okay? I really would like to see alarms first.
> >
> > For example, I have seen stuff in the wild so that I can state out 8-param
> > allocators and we need to rely on the checkers provide information about
> > allocation.
> *summons @Szelethus*
>
> Apart from the obviously syntactic cases, you might actually be able to
> implement fixits for the situation when the reaching-definitions analysis
> displays exactly one definition for `x`, which additionally coincides with
> the allocation site. If that definition is a simple assignment, you'll be
> able to re-run the reaching definitions analysis for the RHS of that
> assignment. If that definition comes from a function call, you might be able
> to re-run the reaching definitions analysis on the return statement(s) of
> that function (note that this function must have been inlined during
> path-sensitive analysis, otherwise no definition in it would coincide with
> the allocation site). And so on.
>
> This problem sheds some light on how much do we want to make the reaching
> definitions analysis inter-procedural. My current guess is that we probably
> don't need to; we'd rather have this guided by re-running the
> reaching-definitions analysis based on the path-sensitive report data, than
> have the reaching-definitions analysis be inter-procedural on our own.
That is a cool idea! I hope @Szelethus has time for his project.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69813/new/
https://reviews.llvm.org/D69813
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits