Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener: > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > > <martin.uec...@med.uni-goettingen.de> wrote: > > > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > > <martin.uec...@med.uni-goettingen.de> wrote:
.... > > > Let's consider this example: > > > > > > int x; > > > int y; > > > uintptr_t pi = (uintptr_t)&x; > > > uintptr_t pj = (uintptr_t)&y; > > > > > > if (pi + 4 == pj) { > > > > > > int* p = (int*)pj; // can be one-after pointer of 'x' > > > p[-1] = 1; // well defined? > > > } > > > > > > If I understand correctly, a pointer obtained from > > > pi + 4 would have a "anything" provenance (which is > > > fine). But the pointer obtained from 'pj' would have the > > > provenance of 'y' so the access to 'x' would not > > > be allowed. > > > > Correct. This is the most difficult case for us to handle > > exactly also because (also valid for the proposal?) > > > > int x; > > int y; > > uintptr_t pi = (uintptr_t)&x; > > uintptr_t pj = (uintptr_t)&y; > > > > if (pi + 4 == pj) { > > > > int* p = (int*)(pi + 4); // can be one-after pointer of 'x' > > p[-1] = 1; // well defined? > > } > > > > while well-handled by GCC in the written form (as you > > say, pi + 4 yields "anything" provenance), GCC itself > > may tranform it into the first variant by noticing > > the conditional equivalence and substituting pj for > > pi + 4. Integers are just integers in the proposal, so conditional equivalence is not a problem for them. In my opinion this is a strength of the proposal. Tracking provenance for integers would mean that all computations would be affected by such subtle semantics issues (where you can not even replace an integer by an equivalent one). In this proposal this is limited to pointers where it at least makes some sense. > > > But according to the preferred version of > > > our proposal, the pointer could also be used to > > > access 'x' because it is also exposed. > > > > > > GCC could make pj have a "anything" provenance > > > even though it is not modified. (This would break > > > some optimization such as the one for Matlab.) > > > > > > Maybe one could also refine this optimization to check > > > for additional conditions which rule out the case > > > that there is another object the pointer could point > > > to. > > > > The only feasible solution would be to not track > > provenance through non-pointers and make > > conversions of non-pointers to pointers have > > "anything" provenance. This would be one solution, yes. But you could reattach the same provenance if you know that the pointer points in the middle of an object (so is not a first or one-after pointer) or if you know that there is no exposed object directly adjacent to this object, etc.. > > The additional issue that appears here though > > is that we cannot even turn (int *)(uintptr_t)p > > into p anymore since with the conditional > > substitution we can then still arrive at > > effectively (&y)[-1] = 1 which is of course > > undefined behavior. > > > > That is, your proposal makes > > > > ((int *)(uintptr_t)&y)[-1] = 1 > > > > well-defined (if &y - 1 == &x) but keeps > > > > (&y)[-1] = 1 > > > > as undefined which strikes me as a little bit > > inconsistent. If that's true it's IMHO worth > > a defect report and second consideration. This is true. But I would not call it inconsistent. It is just unusual if you expect that casts to integers and back are no-ops. In this proposal a round-trip has the effect of stripping the original provenance and attaching a new one (which could be the same as the old one). While in this specific scenario this might seem unreasonable, there are other examples where you may want to be able to get from one object to the others. and using casts to integers would then be the blessed way to express this. In my opinion, this is also intuitive: By casting to an integer one then gets simple discrete pointer semantics where one does not have provenance. > Similarly that > > int x; > int y; > uintptr_t pj = (uintptr_t)&y; > > if (&x + 1 == &y) { > > int* p = (int*)pj; // can be one-after pointer of 'x' > p[-1] = 1; // well defined? > } > > is undefined but when I add a no-op > > (uintptr_t)&x; > > it is well-defined is undesirable. Can this no-op > stmt appear in another function? Or even in > another translation unit (if x and y are global variables)? > And does such stmt have to be present (in another > TU) to make the example valid in this case? Without that statement, the example is not valid as the address of 'x' is not exposed. With the statement this becomes valid and it does not matter where this statement appears. Again, I agree that he fact that such a statement has a side-effect is something one needs to get used to. But adress-taken already has side-effect which could be surprising, doesn't it? If I understood your answer above correctly, for GCC you get this side-effect already without the cast: &x; For the statement to appear elsewhere, the address must escape first. I would expect a compiler to treat a cast to an integer identically to an escaped address. > To me all this makes requiring exposal through a cast > to a non-pointer (or accessing its representation) not > in any way more "useful" for an optimizing compiler than > modeling exposal through address-taking. There would be a difference for cases like this: int x[3]; int y; x[0] = 1; uintptr_t pj = (uintptr_t)&y; if (pi + 4 == pj) { int* p = (int*)(pi + 4); p[-1] = 1; } Here 'x' is not exposed in our proposal so the assignment via 'p' is invalid but the address is taken implicitly. Other examples is storage allocated via malloc/alloca where there is always a pointer involved but which is not automatically exposed in our proposal. Best, Martin