On 08/21/18 01:23, Martin Sebor wrote: > On 08/20/2018 04:17 PM, Jeff Law wrote: >> On 08/18/2018 11:38 AM, Martin Sebor wrote: >>> On 08/17/2018 09:32 PM, Jeff Law wrote: >>>> On 08/17/2018 02:17 PM, Martin Sebor wrote: >>>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote: >>>>>> On 08/17/18 20:23, Martin Sebor wrote: >>>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote: >>>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote: >>>>>>>> >>>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote: >>>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote: >>>>>>>>>> >>>>>>>>>>> restores previous behavior. The sprintf bits want to count >>>>>>>>>>> element >>>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be >>>>>>>>>> >>>>>>>>>>> /* Compute the range the argument's length can be in. */ >>>>>>>>>>> - fmtresult slen = get_string_length (arg); >>>>>>>>>>> + int count_by = dir.specifier == 'S' || dir.modifier == >>>>>>>>>>> FMT_LEN_l ? 4 : 1; >>>>>>>>>> >>>>>>>>>> I don't see how a hardcoded 4 is correct here. Surely you need to >>>>>>>>>> example >>>>>>>>>> wchar_type_node to determine its actual size for this target. >>>>>>>>> We did kick this around a little. IIRC Martin didn't think that it >>>>>>>>> was >>>>>>>>> worth handling the 2 byte wchar case. >>>>>>> >>>>>>> Sorry, I think we may have miscommunicated -- I didn't think it >>>>>>> was useful to pass a size of the character type to the function. >>>>>>> I agree that passing in a hardcoded constant doesn't seem right >>>>>>> (even if GCC's wchar_t were always 4 bytes wide). >>>>>>> >>>>>>> I'm still not sure I see the benefit of passing in the expected >>>>>>> element size given that the patch causes c_strlen() fail when >>>>>>> the actual element size doesn't match ELTSIZE. If the caller >>>>>>> asks for the number of bytes in a const wchar_t array it should >>>>>>> get back the number bytes. (I could see it fail if the caller >>>>>>> asked for the number of words in a char array whose size was >>>>>>> not evenly divisible by wordsize.) >>>>>>> >>>>>> >>>>>> I think in this case c_strlen should use the type which the %S format >>>>>> uses at runtime, otherwise it will not have anything to do with >>>>>> the reality. >>>>> >>>>> %S is not what I'm talking about. >>>>> >>>>> Failing in the case I described (a caller asking for the size >>>>> in bytes of a constant object whose type is larger) prevents >>>>> callers that don't care about the type from detecting the actual >>>>> size of a constant. >>>>> >>>>> Specifically for sprintf, it means that the buffer overflow >>>>> below is no longer diagnosed: >>>>> >>>>> struct S { char a[2]; void (*pf)(void); }; >>>>> >>>>> void f (struct S *p) >>>>> { >>>>> const char *q = (char*)L"\x41424344\x45464748"; >>>>> >>>>> sprintf (p->a, "%s", q); >>>>> } >>>> I don't think this is in the testsuite, is it? I verified that there >>>> was no regressions when I installed Bernd's patch and when I installed >>>> yours. >>> >>> No, there are very few tests that exercise these kinds of mixed >>> argument types. Code like that is most likely the result of >>> a mistake, but it's not the kind of a bug I had even thought >>> about until some of the codegen issues with mixed argument types >>> were brought up (PR 86711/86714). >> Phew. I was worried I'd somehow missed the failure or tested the wrong >> patch or who knows what. >> >> Can you add that test, xfailed for now to the testsuite? > > Done in r263676. >
gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar builtin-sprintf-warn-20.c builtin-sprintf-warn-20.c: In function 'test': builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range 19 | ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748"; | ^~~~~~~~~~~~~~~~~~~~~~~ Hmm, this test might create some noise on short-wchar targets. I would prefer a warning here, about the wrong type of the parameter. The buffer overflow is only a secondary thing. For constant objects like those, the GIMPLE type is still guaranteed to be reliable, right? Bernd. >>> I would just like the ability to get the length/size somehow >>> so that the questionable code that started us down this path >>> can be detected. This is not just the sprintf(d, "%s", L"1") >>> kind of a mismatch but also the missing nul detection in cases >>> like: >> [ ... ] >> I know. And ideally we'll be able to handle everything you want to. >> But we also have to realize that sometimes we may have to punt. >> >> Also remember that incremental progress is (usually) good. >> >> >>> >>> const wchar_t a[1] = L"\xffffffff"; >>> strcpy(d, (char*)a); >> This touches on both the representational issue (excess elements in the >> initializer) and how to handle a non-terminated string. Both are issues >> we're debating right now. >> >>> >>> I don't think this is necessarily an important use case to >>> optimize for but it is one that GCC optimizes already nonetheless, >>> and always has. For example, this has always been folded into 4 >>> by GCC and still is even with the patch: >>> >>> const __WCHAR_TYPE__ wc[] = L"\x12345678"; >>> >>> int f (void) >>> { >>> const char *p = (char*)wc; >>> return strlen (p); // folded to 4 >>> } >>> >>> It's optimized because fold_const_call() relies on c_getstr() >>> which returns the underlying byte representation of the wide >>> string, and despite c_strlen() now trying to prevent it. >> And I think you're hitting on some of issues already raised in the thread. >> >> In this specific case though ISTM 4 is the right answer. We casted to a >> char *, so that's what we should be counting. Or am I missing >> something? Also note that's the value we'd get from the strlen C >> library call IIUC. > > It is the right answer. My point is that this is optimized > but the change to c_strlen() prevents the same optimization > in other similar cases. For example, GCC 6 folds this into > memcpy: > > __builtin_strcpy (d, (char*)L"\x12345678"); > > GCC 7 and 8 do too but get the byte count wrong (my bad). > > Current trunk doesn't optimize it. If restoring the original > behavior is the intent (and not just fixing the counting bug) > then c_strlen() should be fixed to fold this again. > > (I'm not a fan of the strcpy to memcpy transformation because > it makes other things difficult but that's beside the point.) > >>> The missing nul variant of the same test case isn't folded (it >>> ends up calling the library strlen) but the bug cannot be >>> detected using the modified c_strlen(): >>> >>> const __WCHAR_TYPE__ wc[1] = L"\x12345678"; // no nul >>> >>> int f (void) >>> { >>> const char *p = (char*)wc; >>> return strlen (p); // not folded >>> } >>> >>> To detect it, I'd have to use some other function, like >>> c_getstr(). I could certainly do that but I don't think >>> I should need to. >> And that's on the list of things we're going to try and address next. I >> don't think it needs to be conflated with change which indicated to >> c_strlen how to count. > > The two are one and the same: if c_strlen() doesn't count bytes > in arrays of wide characters (wchar_t, char16_t, or char32_t) > then the original GCC behavior is not restored and the missing > nul detection won't work for these "mixed type" cases. > >> I do hope you're getting all these testcases recorded. I'd even support >> installing them into the suite now, properly xfailed. It's easier to >> see how any patch under consideration affects the wider set of tests. > > I haven't been recording any of them -- I have no idea where > this is going. As I've said, these patches prevent some of > the work I have already submitted. That's why I have been > objecting to them, including adding the ELTSIZE argument to > c_strlen(). > > Martin