On Fri, Mar 21, 2025 at 12:27 AM Krister Walfridsson
<krister.walfrids...@gmail.com> wrote:
>
> On Thu, 20 Mar 2025, Richard Biener wrote:
>
> >> Pointer arithmetic -- POINTER_DIFF_EXPR
> >> ---------------------------------------
> >> Subtracting a pointer q from a pointer p is done using POINTER_DIFF_EXPR.
> >>   * It is UB if the difference does not fit in a signed integer with the
> >>     same precision as the pointers.
> >
> > Yep.
> >
> >> This implies that an object's size must be less than half the address
> >> space; otherwise, POINTER_DIFF_EXPR cannot be used to compute sizes in C.
> >> But there may be additional restrictions. For example, compiling the
> >> function:
> >>
> >>    void foo(int *p, int *q, int n)
> >>    {
> >>      for (int i = 0; i < n; i++)
> >>        p[i] = q[i] + 1;
> >>    }
> >>
> >> causes the vectorized code to perform overlap checks like:
> >>
> >>    _7 = q_11(D) + 4;
> >>    _25 = p_12(D) - _7;
> >>    _26 = (sizetype) _25;
> >>    _27 = _26 > 8;
> >>    _28 = _27;
> >>    if (_28 != 0)
> >>      goto <bb 11>;
> >>    else
> >>      goto <bb 12>;
> >>
> >> which takes the difference between two pointers that may point to
> >> different objects. This suggests that all objects must fit within half the
> >> address space.
> >
> > Interesting detail.  But yes, the above is either incorrect code (I didn't
> > double-check) or this condition must hold.  For 64bit virtual address-space
> > it likely holds in practice.  For 32bit virtual address-space it might not.
> >
> >> Question: What are the restrictions on valid address ranges and object
> >> sizes?
> >
> > The restriction on object size is documented, as well as that objects
> > may not cross/include 'NULL'.  Can you open a bugreport for the
> > vectorizer overlap test issue above?  I think we want to track this and
> > at least document the restriction (we could possibly add a target hook
> > that can tell whether such a simplified check is OK).
>
> PR119399
>
>
> >> Issues
> >> ------
> >> The semantics I've described above result in many reports of
> >> miscompilations that I haven't reported yet.
> >>
> >> As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate
> >> pointers that extend up to a vector length past the object (and similarly,
> >> up to one vector length before the object in a backwards loop). This is
> >> invalid if the object is too close to address 0 or 0xffffffffffffffff.
> >>
> >> And this out of bounds distance can be made arbitrarily large, as can be
> >> seen by compiling the following function below for x86_64 with -O3:
> >>
> >> void foo(int *p, uint64_t s, int64_t n)
> >> {
> >>    for (int64_t i = 0; i < n; i++)
> >>      {
> >>        int *q = p + i * s;
> >>        for (int j = 0; j < 4; j++)
> >>         *q++ = j;
> >>      }
> >> }
> >>
> >> Here, the vectorized code add s * 4 to the pointer ivtmp_8 at the end of
> >> each iteration:
> >>
> >>    <bb 5> :
> >>    bnd.8_38 = (unsigned long) n_10(D);
> >>    _21 = s_11(D) * 4;
> >>
> >>    <bb 3> :
> >>    # ivtmp_8 = PHI <ivtmp_7(6), p_12(D)(5)>
> >>    # ivtmp_26 = PHI <ivtmp_20(6), 0(5)>
> >>    MEM <vector(4) int> [(int *)ivtmp_8] = { 0, 1, 2, 3 };
> >>    ivtmp_7 = ivtmp_8 + _21;
> >>    ivtmp_20 = ivtmp_26 + 1;
> >>    if (ivtmp_20 < bnd.8_38)
> >>      goto <bb 6>;
> >>    else
> >>      goto <bb 7>;
> >>
> >>    <bb 6> :
> >>    goto <bb 3>;
> >>
> >> This means calling foo with a sufficiently large s can guarantee wrapping
> >> or evaluating to 0, even though the original IR before optimization did
> >> not wrap or evaluating to 0. For example, when calling foo as:
> >>
> >>    foo(p, -((uintptr_t)p / 4), 1);
> >>
> >> I guess this is essentially the same issue as PR113590, and could be fixed
> >> by moving all induction variable updates to the latch.
> >
> > Yes.
> >
> > But if that
> >> happens, the motivating examples for needing to handle invalid pointer
> >> values would no longer occur. Would that mean smtgcc should adopt a more
> >> restrictive semantics for pointer values?
> >
> > In principle yes, but PR113590 blocks this I guess.  Could smtgcc consider
> > a SSA def to be happening only at the latest possible point, that is,
> > "virtually"
> > sink it to the latch?  (I realize when you have two dependent defs that can
> > be moved together only such "virtual" sinking could be somewhat complicated)
>
> I think it feels a bit strange (and brittle) to define the semantics by
> virtually sinking everything as much as possible. But isn't this sinking
> just another way of saying that the operation is UB only if the value is
> used? That is, we can solve this essentially the same way LLVM does with
> its deferred UB.

Yeah, that's another way of thinking.  Though technically GCC doesn't implement
defered UB - for the case above we're simply lucky nothing exploits the UB
that is in principle there.

> The idea is that POINTER_DIFF_EXPR, PLUS_EXPR, etc. are defined for all
> inputs, but the result is a poison value when it doesn't fit in the return
> type. Any use of a poison value is UB.
>
> This is trivial to implement in smtgcc and would solve PR113590.

I _think_ it's a reasonable thing for smtgcc to do.  For GCC itself it would
complicate what is UB and what not quite a bit - what's the ultimate "use"
UB triggers?  I suppose POISON + 1 is simply POISON again.  Can
you store POISON to memory or is that then UB at the point of the store?
Can you pass it to a call?  What if the call is inlined and just does
return arg + 1?  IIRC I read some paper about this defered UB in LLVM
and wasn't convinced they solve any real problem.

Richard.

>
>     /Krister

Reply via email to