Re: Make tree-ssa-strlen.c handle partial unterminated strings

Richard Sandiford Wed, 10 May 2017 12:21:53 -0700

Jakub Jelinek <ja...@redhat.com> writes:
> Hi!
>
> Note the intent of the pass is to handle the most common cases, it is fine
> if some cases that aren't common aren't handled, it is all about the extra
> complexity vs. how much it helps on real-world code.


OK.

> On Sun, May 07, 2017 at 10:10:48AM +0100, Richard Sandiford wrote:
>> I've got most of the way through a version that uses min_length instead.
>> But one thing that the terminated flag allows that a constant min_length
>> doesn't is:
>> 
>>   size_t
>>   f1 (char *a1)
>>   {
>>     size_t x = strlen (a1);
>>     char *a3 = a1 + x;
>>     a3[0] = '1';  // a1 length x + 1, unterminated  (min length x + 1)
>>     a3[1] = '2';  // a1 length x + 2, unterminated  (min length x + 2)
>>     a3[2] = '3';  // a1 length x + 3, unterminated  (min length x + 3)
>>     a3[3] = 0;    // a1 length x + 3, terminated
>>     return strlen (a1);
>>   }
>> 
>> For the a3[3] = 0, we know a3's min_length is 3 and so it's obvious
>> that we can convert its min_length to a length.  But even if we allow
>> a1's min_length to be nonconstant, it seems a bit risky to assume that
>> we can convert its min_length to a length as well.  It would only work
>> if the min_lengths in related strinfos are kept in sync, whereas it
>> ought to be safe to say that the minimum length of something is 0.
>
> And we have code for that.  If verify_related_strinfos returns non-NULL,
> we can adjust all the related strinfos that need adjusting.
> See e.g. zero_length_string on how it uses that.  It is just that we should
> decide what is the acceptable complexity of the length/min_length
> expressions (whether INTEGER_CST or SSA_NAME is enough, then the above
> would not work, but is that really that important), or if we e.g. allow
> SSA_NAME + INTEGER_CST in addition to that, or sum of 2 SSA_NAMEs, etc.).
> I don't see how terminated vs. unterminated (which is misnamed anyway, it
> means that it isn't known to be terminated, it might be terminated or not)
> would help with that.

The example above works with the flag because we already allow
SSA_NAME + INTEGER_CST for the length field, thanks to:

      tree adj = NULL_TREE;
      if (oldlen == NULL_TREE)
        ;
      else if (integer_zerop (oldlen))
        adj = srclen;
      else if (TREE_CODE (oldlen) == INTEGER_CST
               || TREE_CODE (srclen) == INTEGER_CST)
        adj = fold_build2_loc (loc, MINUS_EXPR,
                               TREE_TYPE (srclen), srclen,
                               fold_convert_loc (loc, TREE_TYPE (srclen),
                                                 oldlen));
      if (adj != NULL_TREE)
        adjust_related_strinfos (loc, dsi, adj);

etc.  So with a constant min_length we lose out (compared to the flag)
by making min_length more restrictive.

Like you say later, min_length is the number of characters that are
known to be nonzero, and length is the number of characters that are
known to be nonzero and followed by a zero, so even if we do relax the
rules for min_length to match length, I think in almost all useful cases
the length will be equal to the min_length or will be null (i.e. it'll
act almost like a de facto flag).

If the name's a problem, how about "known_terminated_p" instead of
"terminated_p"?

>> So I think that gives four possiblities:
>> 
>>   (1) Keep the terminated flag, but extend the original patch to handle
>>       strings built up a character at a time.  This would handle f1 above.
>
> Only if you allow complex expressions like SSA_NAME + INTEGER_CST in length.
>
>>   (2) Replace the terminated flag with a constant minimum length, don't
>>       handle f1 above.
>
> Sure, if you only allow constants, it will be limited to constants.
>
>>   (3) Replace the terminated flag with an arbitrary minimum length and
>>       ensure that it's always valid to copy the minimum length to the
>>       length when we do so for the final strinfo in a chain.
>
> Even length doesn't allow arbitrary expressions, the more complex it is,
> the more expensive will it be to compute it when you e.g. replace
> strlen with that.
>
> I'd introduce min_length, start with INTEGER_CST, once it is handled
> everywhere in the pass properly, see if there is enough code in the wild
> that would justify allowing more than that.
>
> min_length is a simple guarantee that there are no zero bytes among the
> first min_length bytes, length is the same plus that there is a zero byte
> right after that, so it is easy to argue about what happens if you store
> non-zero somewhere into it, or store zero, etc.

I think that's true of the flag version too though.  If you store a zero
into X <= length, you set the length to X and set known_terminated_p.
If you store a nonzero into X < length, nothing changes.  If you store
an unknown value into X < length, you set the length to X and clear
the known_terminated_p flag.

Thanks,
Richard

Re: Make tree-ssa-strlen.c handle partial unterminated strings

Reply via email to