Re: [PATCH] Make strlen range computations more conservative

Martin Sebor Tue, 07 Aug 2018 07:37:58 -0700

On 08/07/2018 02:51 AM, Richard Biener wrote:

On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor <mse...@gmail.com> wrote:

On 07/26/2018 02:55 AM, Richard Biener wrote:

On Wed, 25 Jul 2018, Martin Sebor wrote:

BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?

Consider

struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
{
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
    abort ();
}

here the GIMPLE IL looks like

  const char * _1;

  <bb 2> [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do not
(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).


Only if the strlen argument were p->c.

NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


The dynamic type is a murky area.


It's well-specified in the middle-end.  A store changes the
dynamic type of the stored-to object.  If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.


I never responded to this.  Using a dynamic (effective) type as
you describe it would invalidate the aggressive loop optimization
in the following:

  void foo (struct X *p)
  {
      struct Y y = { "12345678" };
      memcpy (p, &y, sizeof (struct Y));

      // *p effective type is now struct Y

      int n = 0;
      while (p->c[n])
        ++n;

      if (n < 7)
        abort ();
  }

GCC unconditionally aborts, just as it does with strlen(p->c).
Why is that not wrong (in either case)?

Because the code is invalid either way, for two reasons:


No, because the storage has only 4 non-null characters starting at offset 4?


No, for the reasons below.  I made a mistake of making
the initializer string too short.  If we make it longer it
still aborts.  Say with this

      struct Y y = { "123456789012345" };

we end up with this DCE:

 struct Y y;

  <bb 2> :
  MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231;
  __builtin_abort ();

With -fdump-tree-cddce1-details (and a patch to show the upper
bound) we see:

  Found loop 1 to be finite: upper bound found: 3.

With -fno-aggressive-loop-optimizations the abort becomes
conditional because the array bound isn't considered. I would
expect you to know this since you implemented the feature.

Martin

1) it accesses an object of (effective) type struct Y via
   an lvalue of type struct X (specifically via (*p).c)
2) it relies on p->c

The loop optimization relies on the exact same requirement
as the strlen one.  Either they are both valid or neither is.

Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to