Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Martin Sebor Tue, 21 Aug 2018 09:10:01 -0700

On 08/21/2018 09:27 AM, Jeff Law wrote:

On 08/20/2018 11:17 PM, Bernd Edlinger wrote:

On 08/21/18 01:23, Martin Sebor wrote:

On 08/20/2018 04:17 PM, Jeff Law wrote:

On 08/18/2018 11:38 AM, Martin Sebor wrote:

On 08/17/2018 09:32 PM, Jeff Law wrote:

On 08/17/2018 02:17 PM, Martin Sebor wrote:

On 08/17/2018 12:44 PM, Bernd Edlinger wrote:

On 08/17/18 20:23, Martin Sebor wrote:

On 08/17/2018 06:14 AM, Joseph Myers wrote:

On Fri, 17 Aug 2018, Jeff Law wrote:

On 08/16/2018 05:01 PM, Joseph Myers wrote:

On Thu, 16 Aug 2018, Jeff Law wrote:

restores previous behavior.  The sprintf bits want to count
element
sized chunks, which for wchars is 4 bytes (that count will then be

   /* Compute the range the argument's length can be in.  */
-  fmtresult slen = get_string_length (arg);
+  int count_by = dir.specifier == 'S' || dir.modifier ==
FMT_LEN_l ? 4 : 1;


I don't see how a hardcoded 4 is correct here.  Surely you need to
example
wchar_type_node to determine its actual size for this target.

We did kick this around a little.  IIRC Martin didn't think that it
was
worth handling the 2 byte wchar case.


Sorry, I think we may have miscommunicated -- I didn't think it
was useful to pass a size of the character type to the function.
I agree that passing in a hardcoded constant doesn't seem right
(even if GCC's wchar_t were always 4 bytes wide).

I'm still not sure I see the benefit of passing in the expected
element size given that the patch causes c_strlen() fail when
the actual element size doesn't match ELTSIZE.  If the caller
asks for the number of bytes in a const wchar_t array it should
get back the number bytes.  (I could see it fail if the caller
asked for the number of words in a char array whose size was
not evenly divisible by wordsize.)


I think in this case c_strlen should use the type which the %S format
uses at runtime, otherwise it will not have anything to do with
the reality.


%S is not what I'm talking about.

Failing in the case I described (a caller asking for the size
in bytes of a constant object whose type is larger) prevents
callers that don't care about the type from detecting the actual
size of a constant.

Specifically for sprintf, it means that the buffer overflow
below is no longer diagnosed:

  struct S { char a[2]; void (*pf)(void); };

  void f (struct S *p)
  {
    const char *q = (char*)L"\x41424344\x45464748";

    sprintf (p->a, "%s", q);
  }

I don't think this is in the testsuite, is it?  I verified that there
was no regressions when I installed Bernd's patch and when I installed
yours.


No, there are very few tests that exercise these kinds of mixed
argument types.  Code like that is most likely the result of
a mistake, but it's not the kind of a bug I had even thought
about until some of the codegen issues with mixed argument types
were brought up (PR 86711/86714).

Phew.  I was worried I'd somehow missed the failure or tested the wrong
patch or who knows what.

Can you add that test, xfailed for now to the testsuite?


Done in r263676.

I would just like the ability to get the length/size somehow
so that the questionable code that started us down this path
can be detected.  This is not just the sprintf(d, "%s", L"1")
kind of a mismatch but also the missing nul detection in cases
like:

[ ... ]
I know.  And ideally we'll be able to handle everything you want to.
But we also have to realize that sometimes we may have to punt.

Also remember that incremental progress is (usually) good.


  const wchar_t a[1] = L"\xffffffff";
  strcpy(d, (char*)a);

This touches on both the representational issue (excess elements in the
initializer) and how to handle a non-terminated string.  Both are issues
we're debating right now.


I don't think this is necessarily an important use case to
optimize for but it is one that GCC optimizes already nonetheless,
and always has.  For example, this has always been folded into 4
by GCC and still is even with the patch:

  const __WCHAR_TYPE__ wc[] = L"\x12345678";

  int f (void)
  {
    const char *p = (char*)wc;
    return strlen (p);            // folded to 4
  }

It's optimized because fold_const_call() relies on c_getstr()
which returns the underlying byte representation of the wide
string, and despite c_strlen() now trying to prevent it.

And I think you're hitting on some of issues already raised in the thread.

In this specific case though ISTM 4 is the right answer.  We casted to a
char *, so that's what we should be counting.  Or am I missing
something?  Also note that's the value we'd get from the strlen C
library call IIUC.


It is the right answer.  My point is that this is optimized
but the change to c_strlen() prevents the same optimization
in other similar cases.  For example, GCC 6 folds this into
memcpy:

   __builtin_strcpy (d, (char*)L"\x12345678");

GCC 7 and 8 do too but get the byte count wrong (my bad).

Current trunk doesn't optimize it.  If restoring the original
behavior is the intent (and not just fixing the counting bug)
then c_strlen() should be fixed to fold this again.

(I'm not a fan of the strcpy to memcpy transformation because
it makes other things difficult but that's beside the point.)


Yes, this is a good example why it is not good to fold
stuff that is invalid:

  __builtin_strcpy(d, (char*)L"\x12345678");
  =>
  __builtin_memcpy(d, (char*)L"\x12345678", 5);

because it folds away the invalidness, makes invalid things
valid, and prevents later passes to spot the bug.

As a matter of common sense, we should not fold invalid stuff
to valid stuff, even if that is historic behavior.

Generally agreed.  I'd rather see an invalid call of this nature simply
turned into a trap rather than folded into something that is neither
safe nor correct.


I see this discussion as an opportunity to try to get consensus
on how to handle these cases in a consistent way -- they seem
to be coming up over and over lately (not just in this thread).

IMO, there are cases when I view "folding invalid stuff into
valid stuff" preferable to doing nothing (though transforming
strcpy into memcpy isn't what I have in mind).  Letting GCC
call a library function with an invalid argument when the call
can be folded into some safe value (with a warning) is one of
them.

Trapping is also better than letting the call take place,
especially without a warning.

Either way, if consensus on a single solution cannot be reached
then perhaps adding an option to control the behavior might be
the way to go.

Martin

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Reply via email to