On Tue, May 14, 2019 at 4:05 PM Marc Glisse <marc.gli...@inria.fr> wrote: > > On Tue, 14 May 2019, Richard Biener wrote: > > >>> In princple PTA should know the aliasing cannot happen but possibly > >>> the info is either lost or the IL is too obfuscated at the point it gets > >>> computed. (hello C++...) > >> > >> We don't need much obfuscation for this, a simple C program > >> > >> int g; > >> int*f(int**p){ > >> int*q=*p; > >> int*r=__builtin_malloc(4); > >> *q=1; > >> *r=2; > >> g=*q; > >> return r; > >> } > >> > >> gives > >> > >> q_4 = *p_3(D); > >> r_6 = __builtin_malloc (4); > >> *q_4 = 1; > >> *r_6 = 2; > >> _1 = *q_4; > >> g = _1; > >> return r_6; > >> > >> where we clearly don't manage to prove that q and r don't alias. > > > > We can probably improve this one in general from inside PTA > > by treating escapes through return specially. I wasn't looking > > too closely at the C++ testcase but I simply assumed the > > addition to ESCAPED happens through storing the malloc > > result to memory that PTA cannot prove local. > > Yes, that is indeed the case. I only wrote the version with return to > simplify, but the vector testcase does store the malloc result in > non-local memory, so handling return as a special case wouldn't help it. > > >>> So at ldist time I see > >>> > >>> # PT = null { D.47989 } (escaped, escaped heap) > >>> # ALIGN = 8, MISALIGN = 0 > >>> # USE = nonlocal null { D.47989 } (escaped, escaped heap) > >>> # CLB = nonlocal null { D.47989 } (escaped, escaped heap) > >>> _27 = malloc (8192); > >>> > >>> good. The interesting loop is the following where the allocation PTA > >>> is good and _4 intersect because they include ESCAPED. This is > >>> because _27 itself escapes and PTA not being flow-sensitive. I don't > >>> see an easy way to improve things here but dislike the stmt walking > >>> very much. That is, the proper solution is to re-write PTA in some way. > >> > >> I am trying to imagine what that might look like. I imagine that instead > >> of having a single "escaped" set, we would have multiple escaped sets at > >> different points in the function (at most one per VDEF?). Then _4 would > >> only contain escaped3 while heap5 (*_27) only appears in escaped9 and > >> later? It may be tricky to keep a linear-ish complexity with anything more > >> powerful than the current PTA. I don't know what LLVM is doing, but they > >> seem to manage. > > > > IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally > > less powerful. We are using Andersen constraint-based analysis because > > Steensgard was patented back in time (that patent has now expired). > > In https://llvm.org/docs/AliasAnalysis.html they say that Steensgaard’s > algorithm is flow-insensitive and context-insensitive, so it might not be > it. They run about 5 different alias analysis engines and combine their > results to disambiguate as much as they can. > > > One approach to make PTA "more" flow-sensitive is to partition the function > > into regions (basically represent the function as a series of function calls > > to its parts). > > > > For the issue above there's the long-standing issue of splitting escapes > > from function return from escapes to functions called to properly represent > > local variable lifetime. > > > > That said, your simpler patch still relies on up-to-date dominators, sth > > not guaranteed or required previously. > > Ah, dom_info_available_p does not mean that the available information is > up-to-date? :-(
Kind-of. A safer test is dom_info_state (cfun, CDI_DOMINATORS) == DOM_OK. I think the in_gimple_form test is superfluous. As you write the heuristic you could as well remove the malloc result points-to set from the others after points-to analysis is finished? One could even devise something like a "negative" live set (all-locals & ~live-locals), pruning the points-to sets of SSA names on the whole CFG considering that int *p; int y; void foo() { int x; *p = &y; if (p != &y) abort (); *p = &x; } computes p to point to both y and x at the point of the comparison. That is, this isn't special to malloc but to all variables that come into scope only after function entry. This is something I would like much more than these kinds of local tricks in the oracle itself. Richard. > > We might consider implementing a separate (region-based) analysis > > adjusting/pruning points-to information with flow-sensitive knowlege. > > This might even work when done from inside PTA as post-processing. > > -- > Marc Glisse