xazax.hun marked an inline comment as done. xazax.hun added inline comments.
================ Comment at: clang/test/Analysis/fuchsia_handle.cpp:210 + // Because of arrays, structs, the suggestion is to escape when whe no longer + // have any pointer to that symbolic region. + if (zx_channel_create(0, get_handle_address(), &sb)) ---------------- xazax.hun wrote: > NoQ wrote: > > xazax.hun wrote: > > > xazax.hun wrote: > > > > xazax.hun wrote: > > > > > NoQ wrote: > > > > > > xazax.hun wrote: > > > > > > > NoQ wrote: > > > > > > > > NoQ wrote: > > > > > > > > > This has nothing to do with symbolic regions. We can run into > > > > > > > > > this problem even if it's a local variable in the current > > > > > > > > > stack frame: > > > > > > > > > ```lang=c++ > > > > > > > > > void foo() { > > > > > > > > > zx_handle_t sa, sb; > > > > > > > > > escape(&sb); // Escape *before* create!! > > > > > > > > > > > > > > > > > > zx_channel_create(0, &sa, &sb); > > > > > > > > > zx_handle_close(sa); > > > > > > > > > close_escaped(); > > > > > > > > > } > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > The solution that'll obviously work would be to keep track of > > > > > > > > > all regions that escaped at least once, and then not even > > > > > > > > > start tracking the handle if it's getting placed into a > > > > > > > > > region that causes an escape when written into or has itself > > > > > > > > > escaped before, but that sounds like a huge overkill. > > > > > > > > > > > > > > > > > > Lemme think. This sounds vaguely familiar but i can't > > > > > > > > > immediately recall what my thoughts were last time i thought > > > > > > > > > about it. > > > > > > > > `$ cat test.c` > > > > > > > > ```lang=c++ > > > > > > > > void manage(void **x); > > > > > > > > void free_managed(); > > > > > > > > > > > > > > > > void foo() { > > > > > > > > void *x; > > > > > > > > manage(&x); > > > > > > > > x = malloc(1); > > > > > > > > free_managed(); > > > > > > > > } > > > > > > > > ``` > > > > > > > > `$ clang --analyze test.c` > > > > > > > > ```lang=c++ > > > > > > > > test.c:8:3: warning: Potential leak of memory pointed to by 'x' > > > > > > > > free_managed(); > > > > > > > > ^~~~~~~~~~~~~~ > > > > > > > > 1 warning generated. > > > > > > > > ``` > > > > > > > > Sigh. > > > > > > > Oh, I see. Yeah this one will be fun to deal with > > > > > > The rules are pretty easy though, right? > > > > > > ```lang=c++ > > > > > > 2680 // A value escapes in four possible cases: > > > > > > 2681 // (1) We are binding to something that is not a memory > > > > > > region. > > > > > > 2682 // (2) We are binding to a MemRegion that does not have > > > > > > stack storage. > > > > > > 2683 // (3) We are binding to a top-level parameter region with a > > > > > > non-trivial > > > > > > 2684 // destructor. We won't see the destructor during > > > > > > analysis, but it's there. > > > > > > 2685 // (4) We are binding to a MemRegion with stack storage that > > > > > > the store > > > > > > 2686 // does not understand. > > > > > > 2687 ProgramStateRef > > > > > > 2688 ExprEngine::processPointerEscapedOnBind(ProgramStateRef > > > > > > State, SVal Loc, > > > > > > 2689 SVal Val, const > > > > > > LocationContext *LCtx) { > > > > > > ``` > > > > > > Basically, locals are the only special case; writing into anything > > > > > > else should be an immediate escape. > > > > > > > > > > > > We could easily update this procedure to additionally keep track of > > > > > > all escaped locals in the program state, and then escape all binds > > > > > > to previously escaped locals as well. > > > > > > > > > > > > The checker would then have to follow the same rules. > > > > > > > > > > > > In the worst case, manually. > > > > > > > > > > > > But i think we should instead update the `checkPointerEscape()` > > > > > > callback to escape the values of out-parameters upon evaluating the > > > > > > call conservatively (if it doesn't already) and then teach the > > > > > > checker to mark escaped regions explicitly as escaped (as opposed > > > > > > to removing them from the program state), and then teach it to > > > > > > never transition from escaped to opened. That would be cleaner > > > > > > because that'll only require us to hardcode the escaping procedure > > > > > > once. > > > > > > > > > > > > Or we could just make the "bool > > > > > > doesWriteIntoThisRegionCauseAnEscape?(Region, State)" function > > > > > > re-usable. > > > > > Yeah. I wonder if this procedure is the right place though. We do > > > > > not actually see a bind in the code above. > > > > Hmm, in the stack example we do see the point of invalidation (which > > > > results in an escape). That make things easier, checkers could even > > > > work that problem around if they wanted to. In the original example, > > > > however, there is no point of invalidation, the region we get is > > > > already escaped without seeing any bind later on. And there is no store > > > > into the escaped region, since `zx_channel_create` is evaluated > > > > conservatively, we just attach the state to the conjured symbol > > > > retroactively. > > > > > > > > So at this point I wonder if the checker should ever track any symbols > > > > that are is bound to a non-stack region (even if we do not see the bind > > > > itself). This might circumvent most of our problems? > > > > > > > > And for the stack example we can just make the escaped state explicit > > > > again and never transition from an escaped state to a non-escaped one. > > > > > > > > WDYT? > > > Actually, thinking a bit more, this is more of a workaround. It does not > > > matter if the region is on the stack or on the heap. What does matter if > > > we did see the inception of the region (i.e.: the allocation) or not. If > > > we store a handle on the heap but we did see the it from the very > > > beginning we should still be able to track it properly. > > > Hmm, in the stack example we do see the point of invalidation (which > > > results in an escape). That make things easier, checkers could even work > > > that problem around if they wanted to. In the original example, however, > > > there is no point of invalidation, the region we get is already escaped > > > without seeing any bind later on. > > > > When in doubt, think what would have happened during concrete execution. > > There *is* a point for escape-on-bind in both cases *during* the evaluation > > of `zx_channel_create()`, we're just not invoking `checkPointerEscape` on > > this point, but we should be. > > > > Indeed, the mental model of behavior of any function that returns a value > > through an out-parameter would be: > > > > 1. Come up with the value of the parameter. > > 2. Write it into the provided region. > > > > Step 2 is basically a store bind and so it should be triggering > > escape-on-bind - either because we're writing into an unknown memory space, > > or because we're writing into a "pre-escaped local" (a notion i just came > > up with in order to explain my example with the local). However, because > > the call is evaluated conservatively, these two steps are merged together > > and happen simultaneously. To make things worse, the first checker callback > > in which the value becomes available (`checkPostCall`) is invoked *after* > > the actual step 2. This leads to the paradox that we are already past step > > 2 when the checker is just trying to evaluate step 1. > > > > So, our options are either to model step 2 manually in the checker, or to > > teach `ExprEngine` to trigger `checkPointerEscape` immediately after > > `checkPostCall` with all out-parameter values as roots. > I think your proposal makes sense. My only concern is understandability. How > do we explain this to the users? Yeah, you just got your symbols and they are > already escaped even though no one touched them since their inception. But > otherwise triggering PointerEscape for out params/return values after a > conservative evaluation makes sense to me. It feels a bit like redundant work > and undoing what the checker just did :) But I cannot think of a better way > for now. > So, our options are either to model step 2 manually in the checker, or to > teach ExprEngine to trigger checkPointerEscape immediately after > checkPostCall with all out-parameter values as roots. Hmm, coming back to your example: ``` void manage(void **x); void free_managed(); void foo() { void *x; manage(&x); x = malloc(1); free_managed(); } ``` So now have an escaped symbol BEFORE the checker starts to track it. After that, we call malloc and end up with a new symbol. So even if we trigger `checkPointerEscape` after `manage`, the checker needs some special handling. Or do you mean also triggering it after `malloc` call as well when we do the store? So it looks like storing a set of escaped regions in the core is kind of inevitable? Also, the reason why I really like this example, since for some functions we do know that the returned value is not saved elsewhere. Malloc is a great example. Should we have a whitelist for those or should we just check if something is a system call? Or should we have some other heuristic? But at least this way the checker might not need to store escaped state for untracked symbols (i.e. when a symbol is not stored in the GDM yet and escape is the first transition) because it will be renotified after each store. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71041/new/ https://reviews.llvm.org/D71041 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits