Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Bernd Edlinger Mon, 20 Aug 2018 23:43:56 -0700

On 08/21/18 01:23, Martin Sebor wrote:
> On 08/20/2018 04:17 PM, Jeff Law wrote:
>> On 08/18/2018 11:38 AM, Martin Sebor wrote:
>>> On 08/17/2018 09:32 PM, Jeff Law wrote:
>>>> On 08/17/2018 02:17 PM, Martin Sebor wrote:
>>>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
>>>>>> On 08/17/18 20:23, Martin Sebor wrote:
>>>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote:
>>>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote:
>>>>>>>>
>>>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote:
>>>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote:
>>>>>>>>>>
>>>>>>>>>>> restores previous behavior.  The sprintf bits want to count
>>>>>>>>>>> element
>>>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be
>>>>>>>>>>
>>>>>>>>>>>    /* Compute the range the argument's length can be in.  */
>>>>>>>>>>> -  fmtresult slen = get_string_length (arg);
>>>>>>>>>>> +  int count_by = dir.specifier == 'S' || dir.modifier ==
>>>>>>>>>>> FMT_LEN_l ? 4 : 1;
>>>>>>>>>>
>>>>>>>>>> I don't see how a hardcoded 4 is correct here.  Surely you need to
>>>>>>>>>> example
>>>>>>>>>> wchar_type_node to determine its actual size for this target.
>>>>>>>>> We did kick this around a little.  IIRC Martin didn't think that it
>>>>>>>>> was
>>>>>>>>> worth handling the 2 byte wchar case.
>>>>>>>
>>>>>>> Sorry, I think we may have miscommunicated -- I didn't think it
>>>>>>> was useful to pass a size of the character type to the function.
>>>>>>> I agree that passing in a hardcoded constant doesn't seem right
>>>>>>> (even if GCC's wchar_t were always 4 bytes wide).
>>>>>>>
>>>>>>> I'm still not sure I see the benefit of passing in the expected
>>>>>>> element size given that the patch causes c_strlen() fail when
>>>>>>> the actual element size doesn't match ELTSIZE.  If the caller
>>>>>>> asks for the number of bytes in a const wchar_t array it should
>>>>>>> get back the number bytes.  (I could see it fail if the caller
>>>>>>> asked for the number of words in a char array whose size was
>>>>>>> not evenly divisible by wordsize.)
>>>>>>>
>>>>>>
>>>>>> I think in this case c_strlen should use the type which the %S format
>>>>>> uses at runtime, otherwise it will not have anything to do with
>>>>>> the reality.
>>>>>
>>>>> %S is not what I'm talking about.
>>>>>
>>>>> Failing in the case I described (a caller asking for the size
>>>>> in bytes of a constant object whose type is larger) prevents
>>>>> callers that don't care about the type from detecting the actual
>>>>> size of a constant.
>>>>>
>>>>> Specifically for sprintf, it means that the buffer overflow
>>>>> below is no longer diagnosed:
>>>>>
>>>>>   struct S { char a[2]; void (*pf)(void); };
>>>>>
>>>>>   void f (struct S *p)
>>>>>   {
>>>>>     const char *q = (char*)L"\x41424344\x45464748";
>>>>>
>>>>>     sprintf (p->a, "%s", q);
>>>>>   }
>>>> I don't think this is in the testsuite, is it?  I verified that there
>>>> was no regressions when I installed Bernd's patch and when I installed
>>>> yours.
>>>
>>> No, there are very few tests that exercise these kinds of mixed
>>> argument types.  Code like that is most likely the result of
>>> a mistake, but it's not the kind of a bug I had even thought
>>> about until some of the codegen issues with mixed argument types
>>> were brought up (PR 86711/86714).
>> Phew.  I was worried I'd somehow missed the failure or tested the wrong
>> patch or who knows what.
>>
>> Can you add that test, xfailed for now to the testsuite?
> 
> Done in r263676.
>


gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar 
builtin-sprintf-warn-20.c
builtin-sprintf-warn-20.c: In function 'test':
builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range
19 |     ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748";
    |                                       ^~~~~~~~~~~~~~~~~~~~~~~

Hmm, this test might create some noise on short-wchar targets.

I would prefer a warning here, about the wrong type of the parameter.
The buffer overflow is only a secondary thing.

For constant objects like those, the GIMPLE type is still guaranteed to be 
reliable,
right?


Bernd.

>>> I would just like the ability to get the length/size somehow
>>> so that the questionable code that started us down this path
>>> can be detected.  This is not just the sprintf(d, "%s", L"1")
>>> kind of a mismatch but also the missing nul detection in cases
>>> like:
>> [ ... ]
>> I know.  And ideally we'll be able to handle everything you want to.
>> But we also have to realize that sometimes we may have to punt.
>>
>> Also remember that incremental progress is (usually) good.
>>
>>
>>>
>>>   const wchar_t a[1] = L"\xffffffff";
>>>   strcpy(d, (char*)a);
>> This touches on both the representational issue (excess elements in the
>> initializer) and how to handle a non-terminated string.  Both are issues
>> we're debating right now.
>>
>>>
>>> I don't think this is necessarily an important use case to
>>> optimize for but it is one that GCC optimizes already nonetheless,
>>> and always has.  For example, this has always been folded into 4
>>> by GCC and still is even with the patch:
>>>
>>>   const __WCHAR_TYPE__ wc[] = L"\x12345678";
>>>
>>>   int f (void)
>>>   {
>>>     const char *p = (char*)wc;
>>>     return strlen (p);            // folded to 4
>>>   }
>>>
>>> It's optimized because fold_const_call() relies on c_getstr()
>>> which returns the underlying byte representation of the wide
>>> string, and despite c_strlen() now trying to prevent it.
>> And I think you're hitting on some of issues already raised in the thread.
>>
>> In this specific case though ISTM 4 is the right answer.  We casted to a
>> char *, so that's what we should be counting.  Or am I missing
>> something?  Also note that's the value we'd get from the strlen C
>> library call IIUC.
> 
> It is the right answer.  My point is that this is optimized
> but the change to c_strlen() prevents the same optimization
> in other similar cases.  For example, GCC 6 folds this into
> memcpy:
> 
>    __builtin_strcpy (d, (char*)L"\x12345678");
> 
> GCC 7 and 8 do too but get the byte count wrong (my bad).
> 
> Current trunk doesn't optimize it.  If restoring the original
> behavior is the intent (and not just fixing the counting bug)
> then c_strlen() should be fixed to fold this again.
> 
> (I'm not a fan of the strcpy to memcpy transformation because
> it makes other things difficult but that's beside the point.)
> 
>>> The missing nul variant of the same test case isn't folded (it
>>> ends up calling the library strlen) but the bug cannot be
>>> detected using the modified c_strlen():
>>>
>>>   const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul
>>>
>>>   int f (void)
>>>   {
>>>     const char *p = (char*)wc;
>>>     return strlen (p);            // not folded
>>>   }
>>>
>>> To detect it, I'd have to use some other function, like
>>> c_getstr().  I could certainly do that but I don't think
>>> I should need to.
>> And that's on the list of things we're going to try and address next.  I
>> don't think it needs to be conflated with change which indicated to
>> c_strlen how to count.
> 
> The two are one and the same: if c_strlen() doesn't count bytes
> in arrays of wide characters (wchar_t, char16_t, or char32_t)
> then the original GCC behavior is not restored and the missing
> nul detection won't work for these "mixed type" cases.
> 
>> I do hope you're getting all these testcases recorded.  I'd even support
>> installing them into the suite now, properly xfailed.  It's easier to
>> see how any patch under consideration affects the wider set of tests.
> 
> I haven't been recording any of them --  I have no idea where
> this is going.  As I've said, these patches prevent some of
> the work I have already submitted.  That's why I have been
> objecting to them, including adding the ELTSIZE argument to
> c_strlen().
> 
> Martin

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Reply via email to