Re: [PATCH] Make strlen range computations more conservative

Richard Biener Sun, 05 Aug 2018 10:27:58 -0700

On August 4, 2018 10:52:02 PM GMT+02:00, Martin Sebor <mse...@gmail.com> wrote:
>On 08/03/2018 01:43 AM, Jakub Jelinek wrote:
>> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote:
>>>> If I call this with foo (2, 1), do you still claim it is not valid
>C?
>>>
>>> String functions like strlen operate on character strings stored
>>> in character arrays.  Calling strlen (&s[1]) is invalid because
>>> &s[1] is not the address of a character array.  The fact that
>>> objects can be represented as arrays of bytes doesn't change
>>> that.  The standard may be somewhat loose with words on this
>>> distinction but the intent certainly isn't for strlen to traverse
>>> arbitrary sequences of bytes that cross subobject boundaries.
>>> (That is the intent behind the raw memory functions, but
>>> the current text doesn't make the distinction clear.)
>>
>> But the standard doesn't say that right now.
>
>It does, in the restriction on multi-dimensional array accesses.
>Given the array 'char a[2][2];' it's only valid to access a[0][0]
>and a[0][1], and a[1][0], and a[1][1].  It's not valid to access
>a[2][0] or a[2][1], even though they happen to be located at
>the same addresses as a[1][0] and a[1][1].
>
>There is no exception for distinct struct members.  So in
>a struct { char a[2], b[2]; }, even though a and b and laid
>out the same way as char[2][2] would be, it's not valid to
>treat a as such.  There is no distinction between array
>subscripting and pointer arithmetic, so it doesn't matter
>what form the access takes.


What does the standard say to comparing & s. a[2] and & s. b[0] and what does 
that mean when you consider converting those to uintptr_t and back and then 
access the data pointed to? 
Points-to analysis considers the first pointer to point to both subobjects 
while the second only to the second. (just pointing out other maybe 
inconsistent itself within GIMPLE handling of subobjects in points-to analysis) 

Richard. 

>Yes, the standard could be clearer.  There probably even are
>ambiguities and contradictions (the authors of the Object Model
>proposal believe there are and are trying to clarify/remove
>them).  But the intent is clearly there.  It's especially
>important for adjacent members of different types (say a char[8]
>followed by a function pointer.  We definitely don't want writes
>to the array to be allowed to change the function pointer.)
>
>> Plus, at least from the middle-end POV, there is also the case of
>> placement new and stores changing the dynamic type of the object,
>> previously say a struct with two fields, then a placement new with a
>single
>> char array over it (the placement new will not survive in the
>middle-end, so
>> it will be just a memcpy or strcpy or some other byte copy over the
>original
>> object, and due to the CSE/SCCVN etc. of pointer to pointer
>conversions
>> being in the middle-end useless means you can see a pointer to the
>struct
>> with two fields rather than pointer to char array.
>
>There may be challenges in the middle-end, you would know much
>better than me.  All I'm saying is that it's not valid to access
>[sub]objects by dereferencing pointers to other subobjects.  All
>the examples in this discussion have been of that form.
>
>>
>> Consider e.g.
>> typedef __typeof__ (sizeof 0) size_t;
>> void *operator new (size_t, void *p) { return p; }
>> void *operator new[] (size_t, void *p) { return p; }
>> struct S { char a; char b[64]; };
>> void baz (char *);
>>
>> size_t
>> foo (S *p)
>> {
>>   baz (&p->a);
>>   char *q = new (p) char [16];
>>   baz (q);
>>   return __builtin_strlen (q);
>> }
>>
>> I don't think it is correct to say that strlen must be 0.  In this
>testcase
>> the pointer passed to strlen is still S *, though I think with enough
>> tweaking you could also have something where the argument is &p->a.
>
>I think the problem here is changing the type of p->a.  I'm
>not up on the latest C++ changes here but I think it's a known
>problem with the specification.  A similar (known) problem also
>comes in the case of dynamically allocated objects:
>
>   char *p = (char*)operator new (2);
>   char *p1 = new (p) char ('a');
>   char *p2 = new (p) char ('\0');
>   strlen (p1);
>
>Is the strlen(p) call valid when there's no string or array
>at p: there is a singlelton char object that just happens
>to be followed by another singleton char object.  It's not
>an array of two elements.  Each is [an array of] one char.
>
>This is a (specification) problem for sequence containers like
>vector where strictly speaking, it's not valid to iterate over
>them because of the array restriction.
>
>>
>> I have no problem for strlen to return 0 if it sees a toplevel object
>of
>> size 1, but note that if it is extern, it already might be a problem
>in some
>> cases:
>> struct T { char a; char a2[]; } b;
>> extern struct T c;
>> void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); }
>> If c's definition is struct T c = { ' ', "abcde" };
>> then the object doesn't have length of 1.
>
>I'm assuming above you meant strlen(&b) and strlen(&c) (or
>equivalently, strlen(&b.a) and strlen(&c.a).  If so, it's
>the same problem.  The strlen call is invalid unless b.a and
>c.a are nul.
>
>Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to