Re: [PATCH] doc: clarify the situation with pointer arithmetic

Uecker, Martin Fri, 31 Jan 2020 04:05:36 -0800

Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener:
> On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
> <martin.uec...@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz:
> > > Hi,
> > > 
> > > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> > > 
> > > > > guarantees face serious implementation difficulties I think
> > > > > so the only alternative to PVNI (which I think is implementable
> > > > > but at a optimization opportunity cost) is one that makes
> > > > > two pointers with the same value always have the same
> > > > > provenance (and otherwise make the behavior undefined).
> > > > 
> > > > This would need to come with precise rules about
> > > > when the occurance of two such pointers is UB,
> > > > e.g. comparisons of such pointers, or that
> > > > two such pointers are cast to int in the same
> > > > execution.
> > > > 
> > > > The mere existance of such pointers should be
> > > > quite common and should not already be UB.
> > > > 
> > > > But I am uncomfortable with the idea that
> > > > comparison of pointers is always allowed except
> > > > for some special case which then is UB. This
> > > > might cause are and very difficult to find bugs.
> > > 
> > > As Richi said, the comparison itself wouldn't be UB, all comparisons would
> > > be allowed.  But _if_ the pointers compare equal, they must have same (or
> > > overlapping) provenance (i.e. when they have not, then _that_ is UB).
> > 
> > Sorry, I still don't get it.  In the following example,
> > 
> > int a[1];
> > int b[1];
> > 
> > it is often the case that &a[1] and &b[0] compare equal
> > because they have the same address but the pointer
> > have different provenance.
> > 
> > Or does there need to be an actual evaluation of a comparison
> > operations? In this case, I do not see the difference to what
> > I said.
> 
> I guess I wanted to say that if you do
> 
>   if (&a[1] == &b[0])
>     if (&a[1] != &b[0])
>       abort ();
> 
> then the abort might happen.  I'm using the term "undefined behavior"
> here.  So whenever you create a value based on two pointers with
> the same value and different provenance you invoke undefined behavior.


Yes, but it is tricky because one needs to define
"create a value based on two pointers with..."

Assuming one does not track provenance through integers,
the only way to create expressions using two pointers
are comparisons, pointer subtraction, and the tertiary
operator. 

The tertiary operator seems unproblematic. For pointer
subtraction, the standard already requires same provenance.

For comparisons, one could consider making this case UB.
But I fear this could be the source of subtle bugs.

Then there is the question about what happens if a
programm inspects the representation bytes  of a 
pointer directly...

> That allows the compiler to optimize
> 
> int *q, *r;
> if (q == r)
>   *r = 1;
> 
> into
> 
> if (q == r)
>   *q = 1;
> 
> which it is currently not allowed to do because of that dread one-after-the
> object equality compare, not because of PNVI, but similar cases

Yes, but as provenance is tracked at compile-time, you could still
do the optimization if you assign the right provenance to the
replaced variable, i.e. you replace 'r' with 'q' but keep the
provenance of 'r'. So while this puts a burden on the compiler
writers, it seems feasible. Or am  I missing something?

> obviously can be constructed with integers (and make our live difficult
> as we're tracking provenance through integers).

As in PVNI integers do not have provenance, such an optimization would
always be valid for integers as would all other natural algebraic
optimizations for integers. I consider this a major strength of
the proposal and I kind of hoped that compiler writers would agree.

> Compilers fundamentally work with value-equivalences, the above example
> shows we may not.  That's IMHO a defect in the standard.

I consider provenance to be part of the value. Think about
architectures with descriptors that actually trap if you use
the wrong pointer. This nicely corresponds to a concept
of abstract pointers which not simple the address of a
memory location.

The problems we have that we can not (cheaply) track provenance
at runtime on modern CPUs and only the address part of the pointer
is available ar runtime. For the standard, this implies
that the rules must work both abstract pointers with provenance
and address-only pointers where information about provenance
is not available. Whenever there is a discrepancy between
these two models, we can either make it UB or use the semantics
of the address-only case.

The only real problematic case we have with PVNI is comparisons
for one-after-the object pointers with a pointer of different
provenance. The only choices we have is to make this UB or 
to make the result well-defined and based on the address. 
Both choices have disadvantages.

If we track provenance through integers, there are many
other difficult problems. The reason is that you then
cannot work with value-equivalences anymore even for
integer expressions which are much more complex.
The amount of additional problems we create here 
is the main reason we want to have PVNI and not
track provenance through integers.

Best,
Martin

Re: [PATCH] doc: clarify the situation with pointer arithmetic

Reply via email to