On Fri, Mar 21, 2025 at 12:27 AM Krister Walfridsson <krister.walfrids...@gmail.com> wrote: > > On Thu, 20 Mar 2025, Richard Biener wrote: > > >> Pointer arithmetic -- POINTER_DIFF_EXPR > >> --------------------------------------- > >> Subtracting a pointer q from a pointer p is done using POINTER_DIFF_EXPR. > >> * It is UB if the difference does not fit in a signed integer with the > >> same precision as the pointers. > > > > Yep. > > > >> This implies that an object's size must be less than half the address > >> space; otherwise, POINTER_DIFF_EXPR cannot be used to compute sizes in C. > >> But there may be additional restrictions. For example, compiling the > >> function: > >> > >> void foo(int *p, int *q, int n) > >> { > >> for (int i = 0; i < n; i++) > >> p[i] = q[i] + 1; > >> } > >> > >> causes the vectorized code to perform overlap checks like: > >> > >> _7 = q_11(D) + 4; > >> _25 = p_12(D) - _7; > >> _26 = (sizetype) _25; > >> _27 = _26 > 8; > >> _28 = _27; > >> if (_28 != 0) > >> goto <bb 11>; > >> else > >> goto <bb 12>; > >> > >> which takes the difference between two pointers that may point to > >> different objects. This suggests that all objects must fit within half the > >> address space. > > > > Interesting detail. But yes, the above is either incorrect code (I didn't > > double-check) or this condition must hold. For 64bit virtual address-space > > it likely holds in practice. For 32bit virtual address-space it might not. > > > >> Question: What are the restrictions on valid address ranges and object > >> sizes? > > > > The restriction on object size is documented, as well as that objects > > may not cross/include 'NULL'. Can you open a bugreport for the > > vectorizer overlap test issue above? I think we want to track this and > > at least document the restriction (we could possibly add a target hook > > that can tell whether such a simplified check is OK). > > PR119399 > > > >> Issues > >> ------ > >> The semantics I've described above result in many reports of > >> miscompilations that I haven't reported yet. > >> > >> As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate > >> pointers that extend up to a vector length past the object (and similarly, > >> up to one vector length before the object in a backwards loop). This is > >> invalid if the object is too close to address 0 or 0xffffffffffffffff. > >> > >> And this out of bounds distance can be made arbitrarily large, as can be > >> seen by compiling the following function below for x86_64 with -O3: > >> > >> void foo(int *p, uint64_t s, int64_t n) > >> { > >> for (int64_t i = 0; i < n; i++) > >> { > >> int *q = p + i * s; > >> for (int j = 0; j < 4; j++) > >> *q++ = j; > >> } > >> } > >> > >> Here, the vectorized code add s * 4 to the pointer ivtmp_8 at the end of > >> each iteration: > >> > >> <bb 5> : > >> bnd.8_38 = (unsigned long) n_10(D); > >> _21 = s_11(D) * 4; > >> > >> <bb 3> : > >> # ivtmp_8 = PHI <ivtmp_7(6), p_12(D)(5)> > >> # ivtmp_26 = PHI <ivtmp_20(6), 0(5)> > >> MEM <vector(4) int> [(int *)ivtmp_8] = { 0, 1, 2, 3 }; > >> ivtmp_7 = ivtmp_8 + _21; > >> ivtmp_20 = ivtmp_26 + 1; > >> if (ivtmp_20 < bnd.8_38) > >> goto <bb 6>; > >> else > >> goto <bb 7>; > >> > >> <bb 6> : > >> goto <bb 3>; > >> > >> This means calling foo with a sufficiently large s can guarantee wrapping > >> or evaluating to 0, even though the original IR before optimization did > >> not wrap or evaluating to 0. For example, when calling foo as: > >> > >> foo(p, -((uintptr_t)p / 4), 1); > >> > >> I guess this is essentially the same issue as PR113590, and could be fixed > >> by moving all induction variable updates to the latch. > > > > Yes. > > > > But if that > >> happens, the motivating examples for needing to handle invalid pointer > >> values would no longer occur. Would that mean smtgcc should adopt a more > >> restrictive semantics for pointer values? > > > > In principle yes, but PR113590 blocks this I guess. Could smtgcc consider > > a SSA def to be happening only at the latest possible point, that is, > > "virtually" > > sink it to the latch? (I realize when you have two dependent defs that can > > be moved together only such "virtual" sinking could be somewhat complicated) > > I think it feels a bit strange (and brittle) to define the semantics by > virtually sinking everything as much as possible. But isn't this sinking > just another way of saying that the operation is UB only if the value is > used? That is, we can solve this essentially the same way LLVM does with > its deferred UB.
Yeah, that's another way of thinking. Though technically GCC doesn't implement defered UB - for the case above we're simply lucky nothing exploits the UB that is in principle there. > The idea is that POINTER_DIFF_EXPR, PLUS_EXPR, etc. are defined for all > inputs, but the result is a poison value when it doesn't fit in the return > type. Any use of a poison value is UB. > > This is trivial to implement in smtgcc and would solve PR113590. I _think_ it's a reasonable thing for smtgcc to do. For GCC itself it would complicate what is UB and what not quite a bit - what's the ultimate "use" UB triggers? I suppose POISON + 1 is simply POISON again. Can you store POISON to memory or is that then UB at the point of the store? Can you pass it to a call? What if the call is inlined and just does return arg + 1? IIRC I read some paper about this defered UB in LLVM and wasn't convinced they solve any real problem. Richard. > > /Krister