Re: [PATCH] Make strlen range computations more conservative

Martin Sebor Wed, 25 Jul 2018 12:37:41 -0700

BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?


Consider

struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
{
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
    abort ();
}

here the GIMPLE IL looks like

  const char * _1;

  <bb 2> [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do not
(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).


Only if the strlen argument were p->c.

NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


The dynamic type is a murky area.  As you said, above we don't
know whether *p is an allocated object or not.  Strictly speaking,
we would need to treat it as such.  It would basically mean
throwing out all type information and treating objects simply
as blobs of bytes.  But that's not what GCC or other compilers do
either.  For instance, in the modified foo below, GCC eliminates
the test because it assumes that *p and *q don't overlap.  It
does that because they are members of structs of unrelated types
access to which cannot alias.  I.e., not just the type of
the access matters (here int and char) but so does the type of
the enclosing object.  If it were otherwise and only the type
of the access mattered then eliminating the test below wouldn't
be valid (objects can have their stored value accessed by either
an lvalue of a compatible type or char).

  void foo (struct X *p, struct Y *q)
  {
    int j = p->j;
    q->c[__builtin_offsetof (struct X, j)] = 0;
    if (j != p->j)
      __builtin_abort ();
}

Clarifying (and adjusting if necessary) this area is among
the goals of the C object model proposal and the ongoing study
group.  We have been talking about some of these cases there
and trying to come up with ways to let code do what it needs
to do without compromising existing language rules, which was
the consensus position within WG14 when the study group was
formed: i.e., to clarify or reaffirm existing rules and, in
cases of ambiguity or where the standard is unintentionally
overly permissive), favor tighter rules over looser ones.

Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to