Re: [PATCH] Make strlen range computations more conservative

Jeff Law Sun, 05 Aug 2018 08:49:55 -0700

On 08/05/2018 12:51 AM, Bernd Edlinger wrote:
> On 08/04/18 22:52, Martin Sebor wrote:
>> On 08/03/2018 01:43 AM, Jakub Jelinek wrote:
>>> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote:
>>>>> If I call this with foo (2, 1), do you still claim it is not valid C?
>>>>
>>>> String functions like strlen operate on character strings stored
>>>> in character arrays.  Calling strlen (&s[1]) is invalid because
>>>> &s[1] is not the address of a character array.  The fact that
>>>> objects can be represented as arrays of bytes doesn't change
>>>> that.  The standard may be somewhat loose with words on this
>>>> distinction but the intent certainly isn't for strlen to traverse
>>>> arbitrary sequences of bytes that cross subobject boundaries.
>>>> (That is the intent behind the raw memory functions, but
>>>> the current text doesn't make the distinction clear.)
>>>
>>> But the standard doesn't say that right now.
>>
>> It does, in the restriction on multi-dimensional array accesses.
>> Given the array 'char a[2][2];' it's only valid to access a[0][0]
>> and a[0][1], and a[1][0], and a[1][1].  It's not valid to access
>> a[2][0] or a[2][1], even though they happen to be located at
>> the same addresses as a[1][0] and a[1][1].
>>
>> There is no exception for distinct struct members.  So in
>> a struct { char a[2], b[2]; }, even though a and b and laid
>> out the same way as char[2][2] would be, it's not valid to
>> treat a as such.  There is no distinction between array
>> subscripting and pointer arithmetic, so it doesn't matter
>> what form the access takes.
>>
>> Yes, the standard could be clearer.  There probably even are
>> ambiguities and contradictions (the authors of the Object Model
>> proposal believe there are and are trying to clarify/remove
>> them).  But the intent is clearly there.  It's especially
>> important for adjacent members of different types (say a char[8]
>> followed by a function pointer.  We definitely don't want writes
>> to the array to be allowed to change the function pointer.)
>>
>>> Plus, at least from the middle-end POV, there is also the case of
>>> placement new and stores changing the dynamic type of the object,
>>> previously say a struct with two fields, then a placement new with a single
>>> char array over it (the placement new will not survive in the middle-end, so
>>> it will be just a memcpy or strcpy or some other byte copy over the original
>>> object, and due to the CSE/SCCVN etc. of pointer to pointer conversions
>>> being in the middle-end useless means you can see a pointer to the struct
>>> with two fields rather than pointer to char array.
>>
>> There may be challenges in the middle-end, you would know much
>> better than me.  All I'm saying is that it's not valid to access
>> [sub]objects by dereferencing pointers to other subobjects.  All
>> the examples in this discussion have been of that form.
>>
> 
> These examples do not aim to be valid C, they just point out limitations
> of the middle-end design, and a good deal of the problems are due
> to trying to do things that are not safe within the boundaries given
> by the middle-end design.
I really think this is important -- and as such I think we need to move
away from trying to describe scenarios in C because doing so keeps
bringing us back to the "C doesn't allow XYZ" kinds of arguments when
what we're really discussing are GIMPLE semantic issues.


So examples should be GIMPLE.  You might start with (possibly invalid) C
code to generate the GIMPLE, but the actual discussion needs to be
looking at GIMPLE.  We might include the C code in case someone wants to
look at things in a debugger, but bringing the focus to GIMPLE is really
important here.

jeff

Re: [PATCH] Make strlen range computations more conservative

Reply via email to