On Tue, May 14, 2019 at 4:05 PM Marc Glisse <marc.gli...@inria.fr> wrote:
>
> On Tue, 14 May 2019, Richard Biener wrote:
>
> >>> In princple PTA should know the aliasing cannot happen but possibly
> >>> the info is either lost or the IL is too obfuscated at the point it gets
> >>> computed.  (hello C++...)
> >>
> >> We don't need much obfuscation for this, a simple C program
> >>
> >> int g;
> >> int*f(int**p){
> >>    int*q=*p;
> >>    int*r=__builtin_malloc(4);
> >>    *q=1;
> >>    *r=2;
> >>    g=*q;
> >>    return r;
> >> }
> >>
> >> gives
> >>
> >>    q_4 = *p_3(D);
> >>    r_6 = __builtin_malloc (4);
> >>    *q_4 = 1;
> >>    *r_6 = 2;
> >>    _1 = *q_4;
> >>    g = _1;
> >>    return r_6;
> >>
> >> where we clearly don't manage to prove that q and r don't alias.
> >
> > We can probably improve this one in general from inside PTA
> > by treating escapes through return specially.  I wasn't looking
> > too closely at the C++ testcase but I simply assumed the
> > addition to ESCAPED happens through storing the malloc
> > result to memory that PTA cannot prove local.
>
> Yes, that is indeed the case. I only wrote the version with return to
> simplify, but the vector testcase does store the malloc result in
> non-local memory, so handling return as a special case wouldn't help it.
>
> >>> So at ldist time I see
> >>>
> >>>  # PT = null { D.47989 } (escaped, escaped heap)
> >>>  # ALIGN = 8, MISALIGN = 0
> >>>  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
> >>>  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
> >>>  _27 = malloc (8192);
> >>>
> >>> good.  The interesting loop is the following where the allocation PTA
> >>> is good and _4 intersect because they include ESCAPED.  This is
> >>> because _27 itself escapes and PTA not being flow-sensitive.  I don't
> >>> see an easy way to improve things here but dislike the stmt walking
> >>> very much.  That is, the proper solution is to re-write PTA in some way.
> >>
> >> I am trying to imagine what that might look like. I imagine that instead
> >> of having a single "escaped" set, we would have multiple escaped sets at
> >> different points in the function (at most one per VDEF?). Then _4 would
> >> only contain escaped3 while heap5 (*_27) only appears in escaped9 and
> >> later? It may be tricky to keep a linear-ish complexity with anything more
> >> powerful than the current PTA. I don't know what LLVM is doing, but they
> >> seem to manage.
> >
> > IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally
> > less powerful.  We are using Andersen constraint-based analysis because
> > Steensgard was patented back in time (that patent has now expired).
>
> In https://llvm.org/docs/AliasAnalysis.html they say that Steensgaard’s
> algorithm is flow-insensitive and context-insensitive, so it might not be
> it. They run about 5 different alias analysis engines and combine their
> results to disambiguate as much as they can.
>
> > One approach to make PTA "more" flow-sensitive is to partition the function
> > into regions (basically represent the function as a series of function calls
> > to its parts).
> >
> > For the issue above there's the long-standing issue of splitting escapes
> > from function return from escapes to functions called to properly represent
> > local variable lifetime.
> >
> > That said, your simpler patch still relies on up-to-date dominators, sth
> > not guaranteed or required previously.
>
> Ah, dom_info_available_p does not mean that the available information is
> up-to-date? :-(

Kind-of.  A safer test is dom_info_state (cfun, CDI_DOMINATORS) == DOM_OK.
I think the in_gimple_form test is superfluous.

As you write the heuristic you could as well remove the malloc result
points-to set from the others after points-to analysis is finished?

One could even devise something like a "negative" live set
(all-locals & ~live-locals), pruning the points-to sets of SSA names
on the whole CFG considering that

int *p;
int y;
void foo()
{
   int x;
   *p = &y;
   if (p != &y) abort ();
   *p = &x;
}

computes p to point to both y and x at the point of the comparison.

That is, this isn't special to malloc but to all variables that come into
scope only after function entry.

This is something I would like much more than these kinds of local
tricks in the oracle itself.

Richard.


> > We might consider implementing a separate (region-based) analysis
> > adjusting/pruning points-to information with flow-sensitive knowlege.
> > This might even work when done from inside PTA as post-processing.
>
> --
> Marc Glisse

Reply via email to