On 08/17/2018 09:32 PM, Jeff Law wrote:
On 08/17/2018 02:17 PM, Martin Sebor wrote:
On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
On 08/17/18 20:23, Martin Sebor wrote:
On 08/17/2018 06:14 AM, Joseph Myers wrote:
On Fri, 17 Aug 2018, Jeff Law wrote:
On 08/16/2018 05:01 PM, Joseph Myers wrote:
On Thu, 16 Aug 2018, Jeff Law wrote:
restores previous behavior. The sprintf bits want to count element
sized chunks, which for wchars is 4 bytes (that count will then be
/* Compute the range the argument's length can be in. */
- fmtresult slen = get_string_length (arg);
+ int count_by = dir.specifier == 'S' || dir.modifier ==
FMT_LEN_l ? 4 : 1;
I don't see how a hardcoded 4 is correct here. Surely you need to
example
wchar_type_node to determine its actual size for this target.
We did kick this around a little. IIRC Martin didn't think that it
was
worth handling the 2 byte wchar case.
Sorry, I think we may have miscommunicated -- I didn't think it
was useful to pass a size of the character type to the function.
I agree that passing in a hardcoded constant doesn't seem right
(even if GCC's wchar_t were always 4 bytes wide).
I'm still not sure I see the benefit of passing in the expected
element size given that the patch causes c_strlen() fail when
the actual element size doesn't match ELTSIZE. If the caller
asks for the number of bytes in a const wchar_t array it should
get back the number bytes. (I could see it fail if the caller
asked for the number of words in a char array whose size was
not evenly divisible by wordsize.)
I think in this case c_strlen should use the type which the %S format
uses at runtime, otherwise it will not have anything to do with
the reality.
%S is not what I'm talking about.
Failing in the case I described (a caller asking for the size
in bytes of a constant object whose type is larger) prevents
callers that don't care about the type from detecting the actual
size of a constant.
Specifically for sprintf, it means that the buffer overflow
below is no longer diagnosed:
struct S { char a[2]; void (*pf)(void); };
void f (struct S *p)
{
const char *q = (char*)L"\x41424344\x45464748";
sprintf (p->a, "%s", q);
}
I don't think this is in the testsuite, is it? I verified that there
was no regressions when I installed Bernd's patch and when I installed
yours.
No, there are very few tests that exercise these kinds of mixed
argument types. Code like that is most likely the result of
a mistake, but it's not the kind of a bug I had even thought
about until some of the codegen issues with mixed argument types
were brought up (PR 86711/86714).
As we've discussed, the C/GIMPLE semantic issues mean we can not rely on
types. One could use that as an argument that the check is
fundamentally wrong, but in the case where things go screwy here we
always return NULL indicating we do not know the length. So it's the
safe thing to do.
On the other hand one could argue that the test is a canary that
something unexpected has happened, either in terms of the passed in
size, or the types of the argument. For the case above, we miss the
warning. Missing a warning is less worrisome than generating wrong code.
Another alternative (and I hate to suggest it) is to indicate via an
argument how the result of c_strlen is used. ie, is it used just for
warnings or is it used for opt/codegen. For the former we wouldn't use
the test. For the latter we would. That ultimately gets the warning
you want, but also keeps us safe for codegen/opt purposes.
I would just like the ability to get the length/size somehow
so that the questionable code that started us down this path
can be detected. This is not just the sprintf(d, "%s", L"1")
kind of a mismatch but also the missing nul detection in cases
like:
const wchar_t a[1] = L"\xffffffff";
strcpy(d, (char*)a);
I don't think this is necessarily an important use case to
optimize for but it is one that GCC optimizes already nonetheless,
and always has. For example, this has always been folded into 4
by GCC and still is even with the patch:
const __WCHAR_TYPE__ wc[] = L"\x12345678";
int f (void)
{
const char *p = (char*)wc;
return strlen (p); // folded to 4
}
It's optimized because fold_const_call() relies on c_getstr()
which returns the underlying byte representation of the wide
string, and despite c_strlen() now trying to prevent it.
The missing nul variant of the same test case isn't folded (it
ends up calling the library strlen) but the bug cannot be
detected using the modified c_strlen():
const __WCHAR_TYPE__ wc[1] = L"\x12345678"; // no nul
int f (void)
{
const char *p = (char*)wc;
return strlen (p); // not folded
}
To detect it, I'd have to use some other function, like
c_getstr(). I could certainly do that but I don't think
I should need to.
Martin