Re: [PATCH] Make strlen range computations more conservative

Richard Biener Wed, 08 Aug 2018 23:27:32 -0700

On August 9, 2018 7:26:19 AM GMT+02:00, Jeff Law <l...@redhat.com> wrote:
>On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
>> On 07/24/18 23:46, Jeff Law wrote:
>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>> Hi!
>>>>
>>>> This patch makes strlen range computations more conservative.
>>>>
>>>> Firstly if there is a visible type cast from type A to B before
>passing
>>>> then value to strlen, don't expect the type layout of B to restrict
>the
>>>> possible return value range of strlen.
>>> Why do you think this is the right thing to do?  ie, is there
>language
>>> in the standards that makes you think the code as it stands today is
>>> incorrect from a conformance standpoint?  Is there a significant
>body of
>>> code that is affected in an adverse way by the current code?  If so,
>>> what code?
>>>
>>>
>> 
>> I think if you have an object, of an effective type A say char[100],
>then
>> you can cast the address of A to B, say typedef char (*B)[2] for
>instance
>> and then to const char *, say for use in strlen.  I may be wrong, but
>I think
>> that we should at least try not to pick up char[2] from B, but
>instead
>> use A for strlen ranges, or leave this range open.  Currently the
>range
>> info for strlen is [0..1] in this case, even if we see the type cast
>> in the generic tree.
>Coming back to this... I'd like to hope we can depend on the type of
>the
>strlen argument.  Obviously if it's a char *, we know nothing.  But if
>it's an ARRAY_TYPE it'd be advantageous if we could use the array
>bounds
>to bound the length.


 But the FE made it char * and only because pointer type conversions are 
useless we may see sth else. So we cannot use the type of the argument. 

>
>*But* we have code which will turn pointer access into array indexing.
>tree-ssa-forwprop.c can do that, there may be others.  So if we
>originally had a char * argument to strlen, but forwprop changed it
>into
>a char[N] type, then have we just broken things?  I'd totally forgotten
>about this behavior from forwprop.  PR 46393 contains sample code where
>this happens.

If we really still have this it must be very constrained. Because we used to 
have sorts of wrong code issues with this and data dependence analysis. 

>I also thought we had code to recover array indexing from pointer
>arithmetic in the C/C++ front-end, but I can't seem to find it tonight.
>But it would raise similar concerns.

I've removed that. What I did at some point is avoid decaying too much array to 
pointer conversions in the FEs to preserve array refs instead of trying to 
reconstruct them late. 

>
>
>
>> 
>> One other example I have found in one of the test cases:
>> 
>> char c;
>> 
>> if (strlen(&c) != 0) abort();
>> 
>> this is now completely elided, but why?  Is there a code base where
>> that is used?  I doubt, but why do we care to eliminate something
>> stupid like that?  If we would emit a warning for that I'm fine with
>it,
>> But if we silently remove code like that I don't think that it
>> will improve anything.  So I ask, where is the code base which
>> gets an improvement from that optimization?
>I think it falls out naturally from trying to get accurate
>computations.
>I don't think either Martin or I really care about optimizing strlen in
>this case.  In fact it's so clearly erroneous that it ought to generate
>a diagnostic on its own.  Knowing Martin it was probably included in
>the
>tests for completeness.
>
>However, there is a fair amount of code that passes addresses of
>characters to functions that want char * string arguments and those
>functions promptly walk off the end of the single character,
>unterminated string.  We actually just saw one of these in glibc that
>was detected by Martin's recent work.  So it's definitely useful to
>track how these kinds of values get used.
>
>> 
>>> Ultimately we want highly accurate string lengths to help improve
>the
>>> quality of the warnings we generate for potentially dangerous code.
>>> These changes seem to take us in the opposite direction.
>>>
>> 
>> No, I don't think so, we have full control on the direction, when
>> I do what Richi requested on his response, we will have one function
>> where the string length estimation is based upon, instead of several
>> open coded tree walks.
>I don't think anyone objects to consolidating length computation.  What
>I think we're hashing through is how does the object model in GIMPLE
>affect the assumptions that can be correctly made about lengths of
>objects.  When I ACK'd Martin's patches I'd totally forgotten about
>these issues in GIMPLE and the impact they'd have if they were used in
>areas that affect code generation.  That is absolutely and totally my
>mistake.
>
>I suspect that we're ultimately going to have to refine the design a
>bit
>so that the lengths that impact code generation/optimization are
>distinct from those that are used for warnings.  I'm not keen on this
>concept, but I believe it's better than just reverting all the work on
>issuing diagnostics for fishy code.
>
>
>We're going to be kicking this around immediately -- there's a concern
>that some of this may have gotten into the gcc-7 and/or gcc-8 codebase.
>We need to get a sense of scale of the damage as well as a sense of
>scale for how to go about fixing the codegen issues while still keeping
>the benefits in the warning code.
>
>If we go quiet, it's not from a lack of caring about this issue.  Quite
>the opposite, we want to make sure we address these issues correctly
>without just churning on the trunk.
>
>Jeff

Re: [PATCH] Make strlen range computations more conservative

Reply via email to