Re: [PATCH] doc: clarify the situation with pointer arithmetic

Richard Biener Tue, 28 Jan 2020 02:02:54 -0800

On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amona...@ispras.ru> wrote:
>
> On Tue, 28 Jan 2020, Uecker, Martin wrote:
>
> > > (*) this also shows the level of "obfuscation" needed to fool compilers
> > > to lose provenance knowledge is hard to predict.
> >
> > Well, this is exactly the problem we want to address by defining
> > a clear way to do this. Casting to an integer would be the way
> > to state: "consider the pointer as escaped and forget the
> > provenance"  and casting an integer to a  pointer would
> > mean "this pointer may point to all objects whose pointer has
> > escaped". This would give the programmer explicit control about
> > this aspect and make most existing code using pointer-to-integer
> > casts well-defined. At the same time, this should be simple
> > to add to existing points-to analysis.
>
> Can you explain why you make it required for the compiler to treat the
> points-to set unnecessarily broader than it could prove? In the Matlab
> example, there's a simple chain of computations that the compiler is
> following to prove that the pointer resulting from the final cast is
> derived from exactly one other pointer (no other pointers have
> participated in the computations).
>
> Or, in other words:
>
> is there an example where a programmer can distinguish between the
> requirement you explain above vs. the fine-grained interpretation
> that GIMPLE aims to implement (at least as I understand it), which is:
>
>   when the program creates a pointer by means of non-pointer computations
>   (casts, representation access, etc), the resulting pointer may point to:
>
>     * any object which address could have participated in the computation
>       (which is at worst the entire set of "exposed" objects up to that
>        program point, but can be much narrower if the compiler can see
>        the entire chain of computations)
>
>     * any objects which is not "exposed" but could have known address, e.g.
>       because it is placed at a specific address during linking


Note for the current PTA implementation there's almost no cases we can
handle conservatively enough.  Consider the simple

 int a[4];
 int *p = &a[1];
 uintptr_t pi = (uintptr_t)p;
 pi += 4;
 int *q = (int *)pi;

our PTA knows that p points to a (but not the exact offset), same for pi
(the cast doesn't change the value).  Now you add 4 - this could lead
you outside of 'a' so the points-to set becomes 'a and anything'.

I'm also not sure what PVNI does to

 int a[4];
 int *p = &a[1];
 p += 10;
 uintptr_t pi = (uintptr_t)p;
 p = (int *)pi;

we assume that p points to a even after p += 10 (but it of course points
outside of the object - obvious here, but not necessarily in more
obfuscated cases).
Now, can we assume pi points to a?  The cast isn't value-changing.  Do we have
to assume (int *)pi points to anything?  So, is

 p = (int *)(uintptr_t)p;

something like "laundering" a pointer?  We don't preserve such simple round-trip
casts since they are value-preserving.  Are they provenance preserving?

Richard.

> Thanks.
> Alexander

Re: [PATCH] doc: clarify the situation with pointer arithmetic

Reply via email to