On 08/04/18 22:52, Martin Sebor wrote: > On 08/03/2018 01:43 AM, Jakub Jelinek wrote: >> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote: >>>> If I call this with foo (2, 1), do you still claim it is not valid C? >>> >>> String functions like strlen operate on character strings stored >>> in character arrays. Calling strlen (&s[1]) is invalid because >>> &s[1] is not the address of a character array. The fact that >>> objects can be represented as arrays of bytes doesn't change >>> that. The standard may be somewhat loose with words on this >>> distinction but the intent certainly isn't for strlen to traverse >>> arbitrary sequences of bytes that cross subobject boundaries. >>> (That is the intent behind the raw memory functions, but >>> the current text doesn't make the distinction clear.) >> >> But the standard doesn't say that right now. > > It does, in the restriction on multi-dimensional array accesses. > Given the array 'char a[2][2];' it's only valid to access a[0][0] > and a[0][1], and a[1][0], and a[1][1]. It's not valid to access > a[2][0] or a[2][1], even though they happen to be located at > the same addresses as a[1][0] and a[1][1]. > > There is no exception for distinct struct members. So in > a struct { char a[2], b[2]; }, even though a and b and laid > out the same way as char[2][2] would be, it's not valid to > treat a as such. There is no distinction between array > subscripting and pointer arithmetic, so it doesn't matter > what form the access takes. > > Yes, the standard could be clearer. There probably even are > ambiguities and contradictions (the authors of the Object Model > proposal believe there are and are trying to clarify/remove > them). But the intent is clearly there. It's especially > important for adjacent members of different types (say a char[8] > followed by a function pointer. We definitely don't want writes > to the array to be allowed to change the function pointer.) > >> Plus, at least from the middle-end POV, there is also the case of >> placement new and stores changing the dynamic type of the object, >> previously say a struct with two fields, then a placement new with a single >> char array over it (the placement new will not survive in the middle-end, so >> it will be just a memcpy or strcpy or some other byte copy over the original >> object, and due to the CSE/SCCVN etc. of pointer to pointer conversions >> being in the middle-end useless means you can see a pointer to the struct >> with two fields rather than pointer to char array. > > There may be challenges in the middle-end, you would know much > better than me. All I'm saying is that it's not valid to access > [sub]objects by dereferencing pointers to other subobjects. All > the examples in this discussion have been of that form. >
These examples do not aim to be valid C, they just point out limitations of the middle-end design, and a good deal of the problems are due to trying to do things that are not safe within the boundaries given by the middle-end design. Bernd. >> >> Consider e.g. >> typedef __typeof__ (sizeof 0) size_t; >> void *operator new (size_t, void *p) { return p; } >> void *operator new[] (size_t, void *p) { return p; } >> struct S { char a; char b[64]; }; >> void baz (char *); >> >> size_t >> foo (S *p) >> { >> baz (&p->a); >> char *q = new (p) char [16]; >> baz (q); >> return __builtin_strlen (q); >> } >> >> I don't think it is correct to say that strlen must be 0. In this testcase >> the pointer passed to strlen is still S *, though I think with enough >> tweaking you could also have something where the argument is &p->a. > > I think the problem here is changing the type of p->a. I'm > not up on the latest C++ changes here but I think it's a known > problem with the specification. A similar (known) problem also > comes in the case of dynamically allocated objects: > > char *p = (char*)operator new (2); > char *p1 = new (p) char ('a'); > char *p2 = new (p) char ('\0'); > strlen (p1); > > Is the strlen(p) call valid when there's no string or array > at p: there is a singlelton char object that just happens > to be followed by another singleton char object. It's not > an array of two elements. Each is [an array of] one char. > > This is a (specification) problem for sequence containers like > vector where strictly speaking, it's not valid to iterate over > them because of the array restriction. > >> >> I have no problem for strlen to return 0 if it sees a toplevel object of >> size 1, but note that if it is extern, it already might be a problem in some >> cases: >> struct T { char a; char a2[]; } b; >> extern struct T c; >> void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); } >> If c's definition is struct T c = { ' ', "abcde" }; >> then the object doesn't have length of 1. > > I'm assuming above you meant strlen(&b) and strlen(&c) (or > equivalently, strlen(&b.a) and strlen(&c.a). If so, it's > the same problem. The strlen call is invalid unless b.a and > c.a are nul. > > Martin