Re: [PATCH] Make strlen range computations more conservative

Martin Sebor Sat, 04 Aug 2018 13:52:34 -0700

On 08/03/2018 01:43 AM, Jakub Jelinek wrote:

On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote:

If I call this with foo (2, 1), do you still claim it is not valid C?


String functions like strlen operate on character strings stored
in character arrays.  Calling strlen (&s[1]) is invalid because
&s[1] is not the address of a character array.  The fact that
objects can be represented as arrays of bytes doesn't change
that.  The standard may be somewhat loose with words on this
distinction but the intent certainly isn't for strlen to traverse
arbitrary sequences of bytes that cross subobject boundaries.
(That is the intent behind the raw memory functions, but
the current text doesn't make the distinction clear.)


But the standard doesn't say that right now.


It does, in the restriction on multi-dimensional array accesses.
Given the array 'char a[2][2];' it's only valid to access a[0][0]
and a[0][1], and a[1][0], and a[1][1].  It's not valid to access
a[2][0] or a[2][1], even though they happen to be located at
the same addresses as a[1][0] and a[1][1].

There is no exception for distinct struct members.  So in
a struct { char a[2], b[2]; }, even though a and b and laid
out the same way as char[2][2] would be, it's not valid to
treat a as such.  There is no distinction between array
subscripting and pointer arithmetic, so it doesn't matter
what form the access takes.

Yes, the standard could be clearer.  There probably even are
ambiguities and contradictions (the authors of the Object Model
proposal believe there are and are trying to clarify/remove
them).  But the intent is clearly there.  It's especially
important for adjacent members of different types (say a char[8]
followed by a function pointer.  We definitely don't want writes
to the array to be allowed to change the function pointer.)

Plus, at least from the middle-end POV, there is also the case of
placement new and stores changing the dynamic type of the object,
previously say a struct with two fields, then a placement new with a single
char array over it (the placement new will not survive in the middle-end, so
it will be just a memcpy or strcpy or some other byte copy over the original
object, and due to the CSE/SCCVN etc. of pointer to pointer conversions
being in the middle-end useless means you can see a pointer to the struct
with two fields rather than pointer to char array.


There may be challenges in the middle-end, you would know much
better than me.  All I'm saying is that it's not valid to access
[sub]objects by dereferencing pointers to other subobjects.  All
the examples in this discussion have been of that form.


Consider e.g.
typedef __typeof__ (sizeof 0) size_t;
void *operator new (size_t, void *p) { return p; }
void *operator new[] (size_t, void *p) { return p; }
struct S { char a; char b[64]; };
void baz (char *);

size_t
foo (S *p)
{
  baz (&p->a);
  char *q = new (p) char [16];
  baz (q);
  return __builtin_strlen (q);
}

I don't think it is correct to say that strlen must be 0.  In this testcase
the pointer passed to strlen is still S *, though I think with enough
tweaking you could also have something where the argument is &p->a.


I think the problem here is changing the type of p->a.  I'm
not up on the latest C++ changes here but I think it's a known
problem with the specification.  A similar (known) problem also
comes in the case of dynamically allocated objects:

  char *p = (char*)operator new (2);
  char *p1 = new (p) char ('a');
  char *p2 = new (p) char ('\0');
  strlen (p1);

Is the strlen(p) call valid when there's no string or array
at p: there is a singlelton char object that just happens
to be followed by another singleton char object.  It's not
an array of two elements.  Each is [an array of] one char.

This is a (specification) problem for sequence containers like
vector where strictly speaking, it's not valid to iterate over
them because of the array restriction.


I have no problem for strlen to return 0 if it sees a toplevel object of
size 1, but note that if it is extern, it already might be a problem in some
cases:
struct T { char a; char a2[]; } b;
extern struct T c;
void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); }
If c's definition is struct T c = { ' ', "abcde" };
then the object doesn't have length of 1.


I'm assuming above you meant strlen(&b) and strlen(&c) (or
equivalently, strlen(&b.a) and strlen(&c.a).  If so, it's
the same problem.  The strlen call is invalid unless b.a and
c.a are nul.

Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to