On August 9, 2018 7:26:19 AM GMT+02:00, Jeff Law <l...@redhat.com> wrote: >On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >> On 07/24/18 23:46, Jeff Law wrote: >>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>> Hi! >>>> >>>> This patch makes strlen range computations more conservative. >>>> >>>> Firstly if there is a visible type cast from type A to B before >passing >>>> then value to strlen, don't expect the type layout of B to restrict >the >>>> possible return value range of strlen. >>> Why do you think this is the right thing to do? ie, is there >language >>> in the standards that makes you think the code as it stands today is >>> incorrect from a conformance standpoint? Is there a significant >body of >>> code that is affected in an adverse way by the current code? If so, >>> what code? >>> >>> >> >> I think if you have an object, of an effective type A say char[100], >then >> you can cast the address of A to B, say typedef char (*B)[2] for >instance >> and then to const char *, say for use in strlen. I may be wrong, but >I think >> that we should at least try not to pick up char[2] from B, but >instead >> use A for strlen ranges, or leave this range open. Currently the >range >> info for strlen is [0..1] in this case, even if we see the type cast >> in the generic tree. >Coming back to this... I'd like to hope we can depend on the type of >the >strlen argument. Obviously if it's a char *, we know nothing. But if >it's an ARRAY_TYPE it'd be advantageous if we could use the array >bounds >to bound the length.
But the FE made it char * and only because pointer type conversions are useless we may see sth else. So we cannot use the type of the argument. > >*But* we have code which will turn pointer access into array indexing. >tree-ssa-forwprop.c can do that, there may be others. So if we >originally had a char * argument to strlen, but forwprop changed it >into >a char[N] type, then have we just broken things? I'd totally forgotten >about this behavior from forwprop. PR 46393 contains sample code where >this happens. If we really still have this it must be very constrained. Because we used to have sorts of wrong code issues with this and data dependence analysis. >I also thought we had code to recover array indexing from pointer >arithmetic in the C/C++ front-end, but I can't seem to find it tonight. >But it would raise similar concerns. I've removed that. What I did at some point is avoid decaying too much array to pointer conversions in the FEs to preserve array refs instead of trying to reconstruct them late. > > > >> >> One other example I have found in one of the test cases: >> >> char c; >> >> if (strlen(&c) != 0) abort(); >> >> this is now completely elided, but why? Is there a code base where >> that is used? I doubt, but why do we care to eliminate something >> stupid like that? If we would emit a warning for that I'm fine with >it, >> But if we silently remove code like that I don't think that it >> will improve anything. So I ask, where is the code base which >> gets an improvement from that optimization? >I think it falls out naturally from trying to get accurate >computations. >I don't think either Martin or I really care about optimizing strlen in >this case. In fact it's so clearly erroneous that it ought to generate >a diagnostic on its own. Knowing Martin it was probably included in >the >tests for completeness. > >However, there is a fair amount of code that passes addresses of >characters to functions that want char * string arguments and those >functions promptly walk off the end of the single character, >unterminated string. We actually just saw one of these in glibc that >was detected by Martin's recent work. So it's definitely useful to >track how these kinds of values get used. > >> >>> Ultimately we want highly accurate string lengths to help improve >the >>> quality of the warnings we generate for potentially dangerous code. >>> These changes seem to take us in the opposite direction. >>> >> >> No, I don't think so, we have full control on the direction, when >> I do what Richi requested on his response, we will have one function >> where the string length estimation is based upon, instead of several >> open coded tree walks. >I don't think anyone objects to consolidating length computation. What >I think we're hashing through is how does the object model in GIMPLE >affect the assumptions that can be correctly made about lengths of >objects. When I ACK'd Martin's patches I'd totally forgotten about >these issues in GIMPLE and the impact they'd have if they were used in >areas that affect code generation. That is absolutely and totally my >mistake. > >I suspect that we're ultimately going to have to refine the design a >bit >so that the lengths that impact code generation/optimization are >distinct from those that are used for warnings. I'm not keen on this >concept, but I believe it's better than just reverting all the work on >issuing diagnostics for fishy code. > > >We're going to be kicking this around immediately -- there's a concern >that some of this may have gotten into the gcc-7 and/or gcc-8 codebase. >We need to get a sense of scale of the damage as well as a sense of >scale for how to go about fixing the codegen issues while still keeping >the benefits in the warning code. > >If we go quiet, it's not from a lack of caring about this issue. Quite >the opposite, we want to make sure we address these issues correctly >without just churning on the trunk. > >Jeff