On September 9, 2019 4:17:20 PM GMT+02:00, Ulrich Weigand <uweig...@de.ibm.com> wrote: >Richard Biener wrote: >> On Fri, Sep 6, 2019 at 3:00 PM Ulrich Weigand <uweig...@de.ibm.com> >wrote: >> > But as far as I can see, for *atomic* operations at least, we do >make >> > that assumption. The x86 back-end for example just assumes that >any >> > "int" or "long" object that is the target of an atomic operation is >> > naturally aligned, or else the generated code would just crash. >So >> > if you used your example with a packed struct and passed that >pointer >> > to an atomic, the back-end would still assume the alignment and the >> > code would crash. But I'd still consider this a perfectly >reasonable >> > and expected behavior in this case ... >> >> Would it crash? I think it would be not atomic if it crossed a >cache-line >> boundary. > >Sorry, I misremembered the x86 operations, it does indeed work for >unaligned 4- or 8-byte accesses. However, for 16-byte accesses, >CMPXCHG16B does require aligned memory, the manual says: > > Note that CMPXCHG16B requires that the destination (memory) > operand be 16-byte aligned. >[...] >64-Bit Mode Exceptions >[...] >#GP(0) If the memory address is in a non-canonical form. > If memory operand for CMPXCHG16B is not aligned on a > 16-byte boundary. >[...] > >So this is basically the same situation as on s390, except that on x86 >the default TImode alignment is already 16 bytes. > >> > Of course if some part of the middle end get the alignment wrong, >we >> > have a problem. But I'm not sure how this could happen here. I >agree >> > that it might be the case that a user-specified *under*-alignment >might >> > get lost somewhere (e.g. you have a packed int and somewhere in the >> > optimizers this gets accidentally cast to a normal int with the >default >> > alignment). But in *this* case, what would have to happen is that >the >> > middle-end accidentally casts something to a pointer with *higher* >> > than the default alignment for the type, even though no such >pointer >> > cast is present in the source. Can this really happen? >> >> If the cast to the lower-aligned type is lost and there is an earlier >cast >> to the aligned type. > >My point is that this "cast to the aligned type" must have come from >the >user in this case (since the aligned type is *more* aligned that any >standard version of the typ), and if the user casts the value to the >aligned >type, it is undefined behavior anyway if the value was in fact not >aligned. > >> > This would actually >> > be wrong on s390. The problem is that all atomic operations on any >> > one single object need to be consistent: they either all use the >> > 16-byte atomic instruction, or else they all protect the access >with >> > a lock. If you have parts of the code use the lock and other parts >> > use the instruction, they're not actually protected against each >other. >> >> But then the user has to be consistent in accessing the object >> atomically. If he accesses it once as (aligned_int128_t *) >> and once as (int128_t *) it's his fault, no? > >I'm not sure why this should be a requirement. E.g. if we have a >set of subroutines that operates (correctly) on any int128_t *, >aligned or not, and have one user of those routines that actually >locally has an aligned_int128_t, then that user could no longer >safely pass that a pointer to its aligned variable to that >subsystem if it also does atomic operations locally ... > >> If we'd document that the user invokes undefined behavior >> when performing __builtin_atomic () on objects that are not >> sufficiently aligned according to target specific needs then >> we are of course fine and should simply assume the memory >> is aligned accordingly (similar to your patch but probably with >> some target hook specifying the atomic alignment requirement >> when it differs from mode alignment). But I don't read the >> documentation of our atomic builtins that way. >> >> Does _Atomic __int128_t work properly on s390? > >Yes, it currently does work properly in all cases (just not in >all cases as efficiently as it could be). > >The rule to perform atomic operations on __int128_t on s390 is: > - If the object is *actually* 16-byte aligned at runtime, then > every atomic access must be performed using one of the atomic > instructions (CDSG, LPQ, STPQ). > - If the object is actually *not* 16-byte aligned, then every > atomic access must be performed under protection of an > out-of-line lock. > >This rule is correctly implemented by: > - The __builtin_*_16 family of libatomic library routines; > these all perform a run-time alignment check and use either > the instruction or the lock, as appropriate; and > - Compiler-generated inline code; > this will always use the instruction, but the compiler will > generate it only if it can prove at compile-time that the > object *must* be aligned at run-time. > >[ However, this rule is *not* implemented by the _n family of >libatomic library routines. That is not a problem at the moment >since those will *never* get called on any object of size 16; >but they would be if we implemented your proposal; that's why >I pointed out that this would then *introduce* unsafe behavior. ] > >The original point of why I started this discussion is nothing to >do with safety -- code is currently safe and would remain safe >after this change -- but simply performance: we would get compiler >generated inline code more frequently than we do today. > >(This is not *just* performance but also convenience, since we >can avoid having to explicitly link in libatomic in many cases >as well. This would help with portability of some Makefiles >etc. to s390, where they don't link in libatomic since it is >not needed on x86 ...) > >> > This is why the _16 version of the library routine does the runtime >> > alignment check, so that all accesses to actually 16-byte aligned >> > objects use the instruction, both in the library and in compiler- >> > generated code. The _n version doesn't do that. >> > >> > So I guess we'd have to convert the _n version back to the _16 >> > library call if the size is constant 16, or something? >> >> If it is aligned 16? But not sure - see above. > >No, it would have to be done no matter whether it is (i.e. can be >proven at compiler-time) aligned or not, the point is we must call >the _16 version of the library routine so that it can do the run-time >alignment check and perform the appropriate action. > >> > > So yes, I'd say try tackling the issue in the frontend which is >the >> > > last to know the alignment in the most optimistic way (according >> > > to the C standard). At RTL expansion time we have to be (very) >> > > conservative. >> > > >> > > Btw, the alternative is to add an extra alignment parameter >> > > to all of the atomic builtins where the FE could stick the >alignment >> > > for us to use during RTL expansion. But given we have the >_{1,2,4,8,16,N} >> > > variants it sounds easier to use those... >> > > >> > > Note we only document __atomic_load and __atomic_load_n so >> > > the _{1,2,4,8,16} variants seem to be implementation detail. >> > >> > Adding the alignment parameter would work, I think. >> >> But then the user still needs to access the int128_t with >> consistent alignment, no? Otherwise it will again not protect >> against each other? > >See above, it will still work *correctly* in either case. The >alignment info simply allows the compiler to safely choose the >fast (inlined) variant. > >> Why not always use the slow & correct variant for __int128? > >As above, both slow & fast variants are correct; it's just that >the fast variant is, well, faster and more convenient.
So the compiler already does that correctly then via get_pointer_alignment. Why not inline the runtime alignment check as well plus the fast case and the library call fallback? Richard. >Bye, >Ulrich