On Tue, 31 Jul 2018, Martin Sebor wrote:

> On 07/31/2018 09:48 AM, Jakub Jelinek wrote:
> > On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote:
> > > On 07/31/2018 12:38 AM, Jakub Jelinek wrote:
> > > > On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote:
> > > > > Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
> > > > > the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
> > > > > it calls abort.  This is the default on popular distributions,
> > > > 
> > > > Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the
> > > > standard
> > > > requires, imposes extra requirements.  So from what this mode accepts or
> > > > rejects we shouldn't determine what is or isn't considered valid.
> > > 
> > > I'm not sure what the additional requirements are but the ones
> > > I am referring to are the enforcing of struct member boundaries.
> > > This is in line with the standard requirements of not accessing
> > > [sub]objects via pointers derived from other [sub]objects.
> > 
> > In the middle-end the distinction between what was originally a reference
> > to subobjects and what was a reference to objects is quickly lost
> > (whether through SCCVN or other optimizations).
> > We've run into this many times with the __builtin_object_size already.
> > So, if e.g.
> > struct S { char a[3]; char b[5]; } s = { "abc", "defg" };
> > ...
> > strlen ((char *) &s) is well defined but
> > strlen (s.a) is not in C, for the middle-end you might not figure out which
> > one is which.
> 
> Yes, I'm aware of the middle-end transformation to MEM_REF
> -- it's one of the reasons why detecting invalid accesses
> by the middle end warnings, including -Warray-bounds,
> -Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict,
> is less than perfect.
> 
> But is strlen(s.a) also meant to be well-defined in the middle
> end (with the semantics of computing the length or "abcdefg"?)

Yes.

> And if so, what makes it well defined?

The fact that strlen takes a char * argument and thus inline-expansion
of a trivial implementation like

 int len = 0;
 for (; *p; ++p)
   ++len;

will have

 p = &s.a;

and the middle-end doesn't reconstruct s.a[..] from the pointer
access.

> 
> Certainly not every "strlen" has these semantics.  For example,
> this open-coded one doesn't:
> 
>   int len = 0;
>   for (int i = 0; s.a[i]; ++i)
>     ++len;
> 
> It computes 2 (with no warning for the out-of-bounds access).

Yes.

> So if the standard doesn't guarantee it and different kinds
> of accesses behave differently, how do we explain what "works"
> and what doesn't without relying on GCC implementation details?

In the middle-end accesses via pointers - accesses where the
access path is not visible in the access itself - are not
constrained by the "access" path of how the pointer was built.

> If we can't then the only language we have in common with users
> is the standard.  (This, by the way, is what the C memory model
> group is trying to address -- the language or feature that's
> missing from the standard that says when, if ever, these things
> might be valid.)

Well, you simply have to not compare apples and oranges,
a strlen implementation that isn't a strlen implementation
and strlen.

Richard.

Reply via email to