On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amona...@ispras.ru> wrote: > > On Tue, 28 Jan 2020, Uecker, Martin wrote: > > > > (*) this also shows the level of "obfuscation" needed to fool compilers > > > to lose provenance knowledge is hard to predict. > > > > Well, this is exactly the problem we want to address by defining > > a clear way to do this. Casting to an integer would be the way > > to state: "consider the pointer as escaped and forget the > > provenance" and casting an integer to a pointer would > > mean "this pointer may point to all objects whose pointer has > > escaped". This would give the programmer explicit control about > > this aspect and make most existing code using pointer-to-integer > > casts well-defined. At the same time, this should be simple > > to add to existing points-to analysis. > > Can you explain why you make it required for the compiler to treat the > points-to set unnecessarily broader than it could prove? In the Matlab > example, there's a simple chain of computations that the compiler is > following to prove that the pointer resulting from the final cast is > derived from exactly one other pointer (no other pointers have > participated in the computations). > > Or, in other words: > > is there an example where a programmer can distinguish between the > requirement you explain above vs. the fine-grained interpretation > that GIMPLE aims to implement (at least as I understand it), which is: > > when the program creates a pointer by means of non-pointer computations > (casts, representation access, etc), the resulting pointer may point to: > > * any object which address could have participated in the computation > (which is at worst the entire set of "exposed" objects up to that > program point, but can be much narrower if the compiler can see > the entire chain of computations) > > * any objects which is not "exposed" but could have known address, e.g. > because it is placed at a specific address during linking
Note for the current PTA implementation there's almost no cases we can handle conservatively enough. Consider the simple int a[4]; int *p = &a[1]; uintptr_t pi = (uintptr_t)p; pi += 4; int *q = (int *)pi; our PTA knows that p points to a (but not the exact offset), same for pi (the cast doesn't change the value). Now you add 4 - this could lead you outside of 'a' so the points-to set becomes 'a and anything'. I'm also not sure what PVNI does to int a[4]; int *p = &a[1]; p += 10; uintptr_t pi = (uintptr_t)p; p = (int *)pi; we assume that p points to a even after p += 10 (but it of course points outside of the object - obvious here, but not necessarily in more obfuscated cases). Now, can we assume pi points to a? The cast isn't value-changing. Do we have to assume (int *)pi points to anything? So, is p = (int *)(uintptr_t)p; something like "laundering" a pointer? We don't preserve such simple round-trip casts since they are value-preserving. Are they provenance preserving? Richard. > Thanks. > Alexander