[PATCH] D71041: [analyzer][discussion] Talk about escapes

Gábor Horváth via Phabricator via cfe-commits Thu, 05 Dec 2019 14:30:49 -0800

xazax.hun marked an inline comment as done.
xazax.hun added inline comments.



================
Comment at: clang/test/Analysis/fuchsia_handle.cpp:210
+  // Because of arrays, structs, the suggestion is to escape when whe no longer
+  // have any pointer to that symbolic region.
+  if (zx_channel_create(0, get_handle_address(), &sb))
----------------
xazax.hun wrote:
> NoQ wrote:
> > xazax.hun wrote:
> > > xazax.hun wrote:
> > > > xazax.hun wrote:
> > > > > NoQ wrote:
> > > > > > xazax.hun wrote:
> > > > > > > NoQ wrote:
> > > > > > > > NoQ wrote:
> > > > > > > > > This has nothing to do with symbolic regions. We can run into 
> > > > > > > > > this problem even if it's a local variable in the current 
> > > > > > > > > stack frame:
> > > > > > > > > ```lang=c++
> > > > > > > > > void foo() {
> > > > > > > > >   zx_handle_t sa, sb;
> > > > > > > > >   escape(&sb); // Escape *before* create!!
> > > > > > > > > 
> > > > > > > > >   zx_channel_create(0, &sa, &sb);
> > > > > > > > >   zx_handle_close(sa);
> > > > > > > > >   close_escaped();
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > The solution that'll obviously work would be to keep track of 
> > > > > > > > > all regions that escaped at least once, and then not even 
> > > > > > > > > start tracking the handle if it's getting placed into a 
> > > > > > > > > region that causes an escape when written into or has itself 
> > > > > > > > > escaped before, but that sounds like a huge overkill.
> > > > > > > > > 
> > > > > > > > > Lemme think. This sounds vaguely familiar but i can't 
> > > > > > > > > immediately recall what my thoughts were last time i thought 
> > > > > > > > > about it.
> > > > > > > > `$ cat test.c`
> > > > > > > > ```lang=c++
> > > > > > > > void manage(void **x);
> > > > > > > > void free_managed();
> > > > > > > > 
> > > > > > > > void foo() {
> > > > > > > >   void *x;
> > > > > > > >   manage(&x);
> > > > > > > >   x = malloc(1);
> > > > > > > >   free_managed();
> > > > > > > > }
> > > > > > > > ```
> > > > > > > > `$ clang --analyze test.c`
> > > > > > > > ```lang=c++
> > > > > > > > test.c:8:3: warning: Potential leak of memory pointed to by 'x'
> > > > > > > >   free_managed();
> > > > > > > >   ^~~~~~~~~~~~~~
> > > > > > > > 1 warning generated.
> > > > > > > > ```
> > > > > > > > Sigh.
> > > > > > > Oh, I see. Yeah this one will be fun to deal with 
> > > > > > The rules are pretty easy though, right?
> > > > > > ```lang=c++
> > > > > >   2680 // A value escapes in four possible cases:
> > > > > >   2681 // (1) We are binding to something that is not a memory 
> > > > > > region.
> > > > > >   2682 // (2) We are binding to a MemRegion that does not have 
> > > > > > stack storage.
> > > > > >   2683 // (3) We are binding to a top-level parameter region with a 
> > > > > > non-trivial
> > > > > >   2684 //     destructor. We won't see the destructor during 
> > > > > > analysis, but it's there.
> > > > > >   2685 // (4) We are binding to a MemRegion with stack storage that 
> > > > > > the store
> > > > > >   2686 //     does not understand.
> > > > > >   2687 ProgramStateRef
> > > > > >   2688 ExprEngine::processPointerEscapedOnBind(ProgramStateRef 
> > > > > > State, SVal Loc,
> > > > > >   2689                                         SVal Val, const 
> > > > > > LocationContext *LCtx) {
> > > > > > ```
> > > > > > Basically, locals are the only special case; writing into anything 
> > > > > > else should be an immediate escape.
> > > > > > 
> > > > > > We could easily update this procedure to additionally keep track of 
> > > > > > all escaped locals in the program state, and then escape all binds 
> > > > > > to previously escaped locals as well.
> > > > > > 
> > > > > > The checker would then have to follow the same rules.
> > > > > > 
> > > > > > In the worst case, manually.
> > > > > > 
> > > > > > But i think we should instead update the `checkPointerEscape()` 
> > > > > > callback to escape the values of out-parameters upon evaluating the 
> > > > > > call conservatively (if it doesn't already) and then teach the 
> > > > > > checker to mark escaped regions explicitly as escaped (as opposed 
> > > > > > to removing them from the program state), and then teach it to 
> > > > > > never transition from escaped to opened. That would be cleaner 
> > > > > > because that'll only require us to hardcode the escaping procedure 
> > > > > > once.
> > > > > > 
> > > > > > Or we could just make the "bool 
> > > > > > doesWriteIntoThisRegionCauseAnEscape?(Region, State)" function 
> > > > > > re-usable.
> > > > > Yeah.  I wonder if this procedure is the right place though. We do 
> > > > > not actually see a bind in the code above.
> > > > Hmm, in the stack example we do see the point of invalidation (which 
> > > > results in an escape). That make things easier, checkers could even 
> > > > work that problem around if they wanted to. In the original example, 
> > > > however, there is no point of invalidation, the region we get is 
> > > > already escaped without seeing any bind later on. And there is no store 
> > > > into the escaped region, since `zx_channel_create` is evaluated 
> > > > conservatively, we just attach the state to the conjured symbol 
> > > > retroactively.
> > > > 
> > > > So at this point I wonder if the checker should ever track any symbols 
> > > > that are is bound to a non-stack region (even if we do not see the bind 
> > > > itself). This might circumvent most of our problems?
> > > > 
> > > > And for the stack example we can just make the escaped state explicit 
> > > > again and never transition from an escaped state to a non-escaped one. 
> > > > 
> > > > WDYT?
> > > Actually, thinking a bit more, this is more of a workaround. It does not 
> > > matter if the region is on the stack or on the heap. What does matter if 
> > > we did see the inception of the region (i.e.: the allocation) or not. If 
> > > we store a handle on the heap but we did see the it from the very 
> > > beginning we should still be able to track it properly.
> > > Hmm, in the stack example we do see the point of invalidation (which 
> > > results in an escape). That make things easier, checkers could even work 
> > > that problem around if they wanted to. In the original example, however, 
> > > there is no point of invalidation, the region we get is already escaped 
> > > without seeing any bind later on.
> > 
> > When in doubt, think what would have happened during concrete execution. 
> > There *is* a point for escape-on-bind in both cases *during* the evaluation 
> > of `zx_channel_create()`, we're just not invoking `checkPointerEscape` on 
> > this point, but we should be.
> > 
> > Indeed, the mental model of behavior of any function that returns a value 
> > through an out-parameter would be:
> > 
> > 1. Come up with the value of the parameter.
> > 2. Write it into the provided region.
> > 
> > Step 2 is basically a store bind and so it should be triggering 
> > escape-on-bind - either because we're writing into an unknown memory space, 
> > or because we're writing into a "pre-escaped local" (a notion i just came 
> > up with in order to explain my example with the local). However, because 
> > the call is evaluated conservatively, these two steps are merged together 
> > and happen simultaneously. To make things worse, the first checker callback 
> > in which the value becomes available (`checkPostCall`) is invoked *after* 
> > the actual step 2. This leads to the paradox that we are already past step 
> > 2 when the checker is just trying to evaluate step 1.
> > 
> > So, our options are either to model step 2 manually in the checker, or to 
> > teach `ExprEngine` to trigger `checkPointerEscape` immediately after 
> > `checkPostCall` with all out-parameter values as roots.
> I think your proposal makes sense. My only concern is understandability. How 
> do we explain this to the users? Yeah, you just got your symbols and they are 
> already escaped even though no one touched them since their inception. But 
> otherwise triggering PointerEscape for out params/return values after a 
> conservative evaluation makes sense to me. It feels a bit like redundant work 
> and undoing what the checker just did :) But I cannot think of a better way 
> for now. 
> So, our options are either to model step 2 manually in the checker, or to 
> teach ExprEngine to trigger checkPointerEscape immediately after 
> checkPostCall with all out-parameter values as roots.

Hmm, coming back to your example:

```
void manage(void **x);
void free_managed();

void foo() {
  void *x;
  manage(&x);
  x = malloc(1);
  free_managed();
}
```

So now have an escaped symbol BEFORE the checker starts to track it. After 
that, we call malloc and end up with a new symbol. So even if we trigger 
`checkPointerEscape` after `manage`, the checker needs some special handling. 
Or do you mean also triggering it after `malloc` call as well when we do the 
store? So it looks like storing a set of escaped regions in the core is kind of 
inevitable? Also, the reason why I really like this example, since for some 
functions we do know that the returned value is not saved elsewhere. Malloc is 
a great example. Should we have a whitelist for those or should we just check 
if something is a system call? Or should we have some other heuristic?

But at least this way the checker might not need to store escaped state for 
untracked symbols (i.e. when a symbol is not stored in the GDM yet and escape 
is the first transition) because it will be renotified after each store. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71041/new/

https://reviews.llvm.org/D71041



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D71041: [analyzer][discussion] Talk about escapes

Reply via email to