On 08/04/2018 03:56 PM, Martin Sebor wrote: > On 08/03/2018 01:00 AM, Jeff Law wrote: >> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>> On 07/24/18 23:46, Jeff Law wrote: >>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> This patch makes strlen range computations more conservative. >>>>> >>>>> Firstly if there is a visible type cast from type A to B before >>>>> passing >>>>> then value to strlen, don't expect the type layout of B to restrict >>>>> the >>>>> possible return value range of strlen. >>>> Why do you think this is the right thing to do? ie, is there language >>>> in the standards that makes you think the code as it stands today is >>>> incorrect from a conformance standpoint? Is there a significant >>>> body of >>>> code that is affected in an adverse way by the current code? If so, >>>> what code? >>>> >>>> >>> >>> I think if you have an object, of an effective type A say char[100], >>> then >>> you can cast the address of A to B, say typedef char (*B)[2] for >>> instance >>> and then to const char *, say for use in strlen. I may be wrong, but >>> I think >>> that we should at least try not to pick up char[2] from B, but instead >>> use A for strlen ranges, or leave this range open. Currently the range >>> info for strlen is [0..1] in this case, even if we see the type cast >>> in the generic tree. >> ISTM that you're essentially saying that the cast to const char * >> destroys any type information we can exploit here. But if that's the >> case, then I don't think we can even derive a range of [0,99]. What's >> to say that "A" didn't result from a similar cast of some object that >> was char[200] that happened out of the scope of what we could see during >> the strlen range computation? >> >> If that is what you're arguing, then I think there's a re-evaluation >> that needs to happen WRT strlen range computation/ >> >> And just to be clear, I do see this as a significant correctness >> question. >> >> Martin, thoughts? > > The argument is that given: > > struct S { char a[4], b; }; > > char a[8] = "1234567"; > > this is valid and should pass: > > __attribute__ ((noipa)) > void f (struct S *p) > { > assert (7 == strlen (p->a)); > } > > int main (void) > { > f ((struct S*)a); > } > > (This is the basic premise behind pr86259.) > > This argument is wrong and the code is invalid. For the access > to p->a to be valid p must point to an object of struct S (it > doesn't) and the p->a array must hold a nul-terminated string > (it also doesn't). I agree with you for C/C++, but I think it's been shown elsewhere in this thread that GIMPLE semantics to not respect the subobject boundaries. That's a sad reality.
[ ... ] > > I care less about the optimization than I do about the basic > premise that it's essential to respect subobject boundaries(*). I understand, but the semantics of GIMPLE do not respect them. We can argue about whether or not those should change and what it would take to fix that. But right now the existing semantics do not respect those boundaries. > It would make little sense to undo the strlen optimization > without also undoing the optimization for the direct array > access case. Undoing either would raise the question about > the validity of the _FORRTIFY_SOURCE=2 behavior. That would > be a huge step backwards in terms of code security. If we > did some of these but not others, it would make the behavior > inconsistent and surprising, all to accommodate one instance > of invalid code. In the direct array access case I think (and I'm sure Jakub, Richi and others will correct me if I'm wrong), we can use the object's type because the dereferences are actually using the array's type. > > If we had a valid test case where the strlen optimization > leads to invalid code, or even if there were a significant > number of bug reports showing that it breaks an invalid > but common idiom, I would certainly feel compelled to > make it right somehow. But there has been just one bug > report with clearly invalid code that should be easily > corrected. Again, I think you're too narrowly focused on C/C++ semantics here. What matters are the semantics in GIMPLE. > > Martin > > [*] I also care deeply about all the warnings that depend > on it to avoid false positives: that's pretty much all those > I have implemented in the middle-end: -Wformat-{overflow, > truncation}, -Wstringop-{overflow,truncation}, and likely > even -Wrestrict. I know. And I care deeply about your work to improve the preciseness of the warnings and ultimately improve the overall quality of code compiled with GCC. I just think we can't currently rely on the semantics you want to exploit to improve the precision of string lengths. Jeff