Re: [PATCH] Make strlen range computations more conservative

Martin Sebor Fri, 17 Aug 2018 08:49:23 -0700

On 08/17/2018 04:31 AM, Richard Biener wrote:

On Tue, 7 Aug 2018, Martin Sebor wrote:

On 08/07/2018 11:44 AM, Richard Biener wrote:

On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor <mse...@gmail.com>
wrote:

On 08/07/2018 02:51 AM, Richard Biener wrote:

On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor

<mse...@gmail.com> wrote:

On 07/26/2018 02:55 AM, Richard Biener wrote:

On Wed, 25 Jul 2018, Martin Sebor wrote:

BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?

Consider

struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
{
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
    abort ();
}

here the GIMPLE IL looks like

  const char * _1;

  <bb 2> [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do

not

(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).


Only if the strlen argument were p->c.

NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


The dynamic type is a murky area.


It's well-specified in the middle-end.  A store changes the
dynamic type of the stored-to object.  If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.


I never responded to this.  Using a dynamic (effective) type as
you describe it would invalidate the aggressive loop optimization
in the following:

  void foo (struct X *p)
  {
      struct Y y = { "12345678" };
      memcpy (p, &y, sizeof (struct Y));

      // *p effective type is now struct Y

      int n = 0;
      while (p->c[n])
        ++n;

      if (n < 7)
        abort ();
  }

GCC unconditionally aborts, just as it does with strlen(p->c).
Why is that not wrong (in either case)?

Because the code is invalid either way, for two reasons:


No, because the storage has only 4 non-null characters starting at

offset 4?

No, for the reasons below.  I made a mistake of making
the initializer string too short.  If we make it longer it
still aborts.  Say with this

      struct Y y = { "123456789012345" };

we end up with this DCE:

 struct Y y;

  <bb 2> :
  MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231;
  __builtin_abort ();

With -fdump-tree-cddce1-details (and a patch to show the upper
bound) we see:

  Found loop 1 to be finite: upper bound found: 3.

With -fno-aggressive-loop-optimizations the abort becomes
conditional because the array bound isn't considered. I would
expect you to know this since you implemented the feature.


Honza added the array indexing part and it may very well be too aggressive.
I have to take a closer look after vacation to tell. Can you open a PR and
CC me there?


I opened bug 86884.


Now that I returned from vacation the testcase is simply bogus.  The
access p->c[n] requires an affective type of X.


I agree.  And so does strlen(p->c).  It was your example
(with strlen) to illustrate a valid use case.

Saying that the loop is invalid when p->c doesn't contain
zero:

  int n = 0;
  while (p->c[n])
    ++n;

means that the equivalent loop below is also invalid:

  int n = 0;
  for (const char *q = p->c; *q; ++q)
    ++n;

The three (the two loops and the strlen(p->c) call) are
all equivalent and semantically identical.

Since one of them is invalid, the other two must be as well.
It's not wrong to emit different code for equivalent invalid
constructs, but as a matter of a QoI, it makes no sense to
_guarantee_ different semantics for constructs that, at
the C/C++ language level, are semantically equivalent, just
because they use subtly different syntax.

Put another way, a programmer who uses strlen(p->c) in this
context and who is assured that it's valid code should be
free to replace that call with a loop and have it continue
to work without change.

Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to