On Tue, 31 Jul 2018, Martin Sebor wrote: > On 07/31/2018 09:48 AM, Jakub Jelinek wrote: > > On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote: > > > On 07/31/2018 12:38 AM, Jakub Jelinek wrote: > > > > On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote: > > > > > Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past > > > > > the end of subobjects by string functions. With _FORTIFY_SOURCE=2 > > > > > it calls abort. This is the default on popular distributions, > > > > > > > > Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the > > > > standard > > > > requires, imposes extra requirements. So from what this mode accepts or > > > > rejects we shouldn't determine what is or isn't considered valid. > > > > > > I'm not sure what the additional requirements are but the ones > > > I am referring to are the enforcing of struct member boundaries. > > > This is in line with the standard requirements of not accessing > > > [sub]objects via pointers derived from other [sub]objects. > > > > In the middle-end the distinction between what was originally a reference > > to subobjects and what was a reference to objects is quickly lost > > (whether through SCCVN or other optimizations). > > We've run into this many times with the __builtin_object_size already. > > So, if e.g. > > struct S { char a[3]; char b[5]; } s = { "abc", "defg" }; > > ... > > strlen ((char *) &s) is well defined but > > strlen (s.a) is not in C, for the middle-end you might not figure out which > > one is which. > > Yes, I'm aware of the middle-end transformation to MEM_REF > -- it's one of the reasons why detecting invalid accesses > by the middle end warnings, including -Warray-bounds, > -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict, > is less than perfect. > > But is strlen(s.a) also meant to be well-defined in the middle > end (with the semantics of computing the length or "abcdefg"?)
Yes. > And if so, what makes it well defined? The fact that strlen takes a char * argument and thus inline-expansion of a trivial implementation like int len = 0; for (; *p; ++p) ++len; will have p = &s.a; and the middle-end doesn't reconstruct s.a[..] from the pointer access. > > Certainly not every "strlen" has these semantics. For example, > this open-coded one doesn't: > > int len = 0; > for (int i = 0; s.a[i]; ++i) > ++len; > > It computes 2 (with no warning for the out-of-bounds access). Yes. > So if the standard doesn't guarantee it and different kinds > of accesses behave differently, how do we explain what "works" > and what doesn't without relying on GCC implementation details? In the middle-end accesses via pointers - accesses where the access path is not visible in the access itself - are not constrained by the "access" path of how the pointer was built. > If we can't then the only language we have in common with users > is the standard. (This, by the way, is what the C memory model > group is trying to address -- the language or feature that's > missing from the standard that says when, if ever, these things > might be valid.) Well, you simply have to not compare apples and oranges, a strlen implementation that isn't a strlen implementation and strlen. Richard.