Jakub Jelinek <ja...@redhat.com> writes: > Hi! > > Note the intent of the pass is to handle the most common cases, it is fine > if some cases that aren't common aren't handled, it is all about the extra > complexity vs. how much it helps on real-world code.
OK. > On Sun, May 07, 2017 at 10:10:48AM +0100, Richard Sandiford wrote: >> I've got most of the way through a version that uses min_length instead. >> But one thing that the terminated flag allows that a constant min_length >> doesn't is: >> >> size_t >> f1 (char *a1) >> { >> size_t x = strlen (a1); >> char *a3 = a1 + x; >> a3[0] = '1'; // a1 length x + 1, unterminated (min length x + 1) >> a3[1] = '2'; // a1 length x + 2, unterminated (min length x + 2) >> a3[2] = '3'; // a1 length x + 3, unterminated (min length x + 3) >> a3[3] = 0; // a1 length x + 3, terminated >> return strlen (a1); >> } >> >> For the a3[3] = 0, we know a3's min_length is 3 and so it's obvious >> that we can convert its min_length to a length. But even if we allow >> a1's min_length to be nonconstant, it seems a bit risky to assume that >> we can convert its min_length to a length as well. It would only work >> if the min_lengths in related strinfos are kept in sync, whereas it >> ought to be safe to say that the minimum length of something is 0. > > And we have code for that. If verify_related_strinfos returns non-NULL, > we can adjust all the related strinfos that need adjusting. > See e.g. zero_length_string on how it uses that. It is just that we should > decide what is the acceptable complexity of the length/min_length > expressions (whether INTEGER_CST or SSA_NAME is enough, then the above > would not work, but is that really that important), or if we e.g. allow > SSA_NAME + INTEGER_CST in addition to that, or sum of 2 SSA_NAMEs, etc.). > I don't see how terminated vs. unterminated (which is misnamed anyway, it > means that it isn't known to be terminated, it might be terminated or not) > would help with that. The example above works with the flag because we already allow SSA_NAME + INTEGER_CST for the length field, thanks to: tree adj = NULL_TREE; if (oldlen == NULL_TREE) ; else if (integer_zerop (oldlen)) adj = srclen; else if (TREE_CODE (oldlen) == INTEGER_CST || TREE_CODE (srclen) == INTEGER_CST) adj = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (srclen), srclen, fold_convert_loc (loc, TREE_TYPE (srclen), oldlen)); if (adj != NULL_TREE) adjust_related_strinfos (loc, dsi, adj); etc. So with a constant min_length we lose out (compared to the flag) by making min_length more restrictive. Like you say later, min_length is the number of characters that are known to be nonzero, and length is the number of characters that are known to be nonzero and followed by a zero, so even if we do relax the rules for min_length to match length, I think in almost all useful cases the length will be equal to the min_length or will be null (i.e. it'll act almost like a de facto flag). If the name's a problem, how about "known_terminated_p" instead of "terminated_p"? >> So I think that gives four possiblities: >> >> (1) Keep the terminated flag, but extend the original patch to handle >> strings built up a character at a time. This would handle f1 above. > > Only if you allow complex expressions like SSA_NAME + INTEGER_CST in length. > >> (2) Replace the terminated flag with a constant minimum length, don't >> handle f1 above. > > Sure, if you only allow constants, it will be limited to constants. > >> (3) Replace the terminated flag with an arbitrary minimum length and >> ensure that it's always valid to copy the minimum length to the >> length when we do so for the final strinfo in a chain. > > Even length doesn't allow arbitrary expressions, the more complex it is, > the more expensive will it be to compute it when you e.g. replace > strlen with that. > > I'd introduce min_length, start with INTEGER_CST, once it is handled > everywhere in the pass properly, see if there is enough code in the wild > that would justify allowing more than that. > > min_length is a simple guarantee that there are no zero bytes among the > first min_length bytes, length is the same plus that there is a zero byte > right after that, so it is easy to argue about what happens if you store > non-zero somewhere into it, or store zero, etc. I think that's true of the flag version too though. If you store a zero into X <= length, you set the length to X and set known_terminated_p. If you store a nonzero into X < length, nothing changes. If you store an unknown value into X < length, you set the length to X and clear the known_terminated_p flag. Thanks, Richard