On August 6, 2018 6:32:41 PM GMT+02:00, Martin Sebor <mse...@gmail.com> wrote: >On 08/06/2018 09:12 AM, Jeff Law wrote: >> On 08/04/2018 03:56 PM, Martin Sebor wrote: >>> On 08/03/2018 01:00 AM, Jeff Law wrote: >>>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote: >>>>> On 07/24/18 23:46, Jeff Law wrote: >>>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: >>>>>>> Hi! >>>>>>> >>>>>>> This patch makes strlen range computations more conservative. >>>>>>> >>>>>>> Firstly if there is a visible type cast from type A to B before >>>>>>> passing >>>>>>> then value to strlen, don't expect the type layout of B to >restrict >>>>>>> the >>>>>>> possible return value range of strlen. >>>>>> Why do you think this is the right thing to do? ie, is there >language >>>>>> in the standards that makes you think the code as it stands today >is >>>>>> incorrect from a conformance standpoint? Is there a significant >>>>>> body of >>>>>> code that is affected in an adverse way by the current code? If >so, >>>>>> what code? >>>>>> >>>>>> >>>>> >>>>> I think if you have an object, of an effective type A say >char[100], >>>>> then >>>>> you can cast the address of A to B, say typedef char (*B)[2] for >>>>> instance >>>>> and then to const char *, say for use in strlen. I may be wrong, >but >>>>> I think >>>>> that we should at least try not to pick up char[2] from B, but >instead >>>>> use A for strlen ranges, or leave this range open. Currently the >range >>>>> info for strlen is [0..1] in this case, even if we see the type >cast >>>>> in the generic tree. >>>> ISTM that you're essentially saying that the cast to const char * >>>> destroys any type information we can exploit here. But if that's >the >>>> case, then I don't think we can even derive a range of [0,99]. >What's >>>> to say that "A" didn't result from a similar cast of some object >that >>>> was char[200] that happened out of the scope of what we could see >during >>>> the strlen range computation? >>>> >>>> If that is what you're arguing, then I think there's a >re-evaluation >>>> that needs to happen WRT strlen range computation/ >>>> >>>> And just to be clear, I do see this as a significant correctness >>>> question. >>>> >>>> Martin, thoughts? >>> >>> The argument is that given: >>> >>> struct S { char a[4], b; }; >>> >>> char a[8] = "1234567"; >>> >>> this is valid and should pass: >>> >>> __attribute__ ((noipa)) >>> void f (struct S *p) >>> { >>> assert (7 == strlen (p->a)); >>> } >>> >>> int main (void) >>> { >>> f ((struct S*)a); >>> } >>> >>> (This is the basic premise behind pr86259.) >>> >>> This argument is wrong and the code is invalid. For the access >>> to p->a to be valid p must point to an object of struct S (it >>> doesn't) and the p->a array must hold a nul-terminated string >>> (it also doesn't). >> I agree with you for C/C++, but I think it's been shown elsewhere in >> this thread that GIMPLE semantics to not respect the subobject >> boundaries. That's a sad reality. >> >> [ ... ] >> >>> >>> I care less about the optimization than I do about the basic >>> premise that it's essential to respect subobject boundaries(*). >> I understand, but the semantics of GIMPLE do not respect them. We >can >> argue about whether or not those should change and what it would take >to >> fix that. But right now the existing semantics do not respect those >> boundaries. > >They don't respect them in all cases (i.e., when MEM_REF loses >information about the structure of an access) but in a good >number of them GCC can still derive useful information from >the access. It's relied on to great a effect by _FORTIFTY_SOURCE. >I think it would be a welcome enhancement if besides out-of- >bounds writes _FORTIFTY_SOURCE also prevented out-of-bounds >reads. > >>> It would make little sense to undo the strlen optimization >>> without also undoing the optimization for the direct array >>> access case. Undoing either would raise the question about >>> the validity of the _FORRTIFY_SOURCE=2 behavior. That would >>> be a huge step backwards in terms of code security. If we >>> did some of these but not others, it would make the behavior >>> inconsistent and surprising, all to accommodate one instance >>> of invalid code. >> In the direct array access case I think (and I'm sure Jakub, Richi >and >> others will correct me if I'm wrong), we can use the object's type >> because the dereferences are actually using the array's type. > >Subscripting and pointer access are identical both in C/C++ >and in GCC's _FORTIFY_SOURCE. The absence of a distinction >between the two is essential for preventing writes past >the end by string functions like strcpy (_FORTIFY_SOURCE). > >>> If we had a valid test case where the strlen optimization >>> leads to invalid code, or even if there were a significant >>> number of bug reports showing that it breaks an invalid >>> but common idiom, I would certainly feel compelled to >>> make it right somehow. But there has been just one bug >>> report with clearly invalid code that should be easily >>> corrected. >> Again, I think you're too narrowly focused on C/C++ semantics here. >> What matters are the semantics in GIMPLE. > >I don't get that. GCC is a C/C++ compiler (besides other >languages), but not a GIMPLE compiler. The only reason this >came up at all is a bug report with an invalid C test case that >reads past the end. The only reason in my mind to consider >relaxing an assumption/restriction would be a valid test case >(in any supported language) that the optimization invalidates. > >But as I said, far more essential than the optimization is >the ability to detect these invalid access (both reads and >writes), such as in:
The essential thing is to not introduce latent wrong code issues because you exploit C language constraints that are not preserved by GIMPLE transforms because they are not constraints in the GIMPLE IL _WHICH_ _IS_ _NOT_ _C_. Richard. > > struct S { char a[4], b[2], c; }; > > void f (struct S *p) > { > strcpy (p->a, "1234"); // certain buffer overflow > > sprintf (p->b, "%s", p->a); // potential buffer overflow > > // ...but, to avoid false positives: > sprintf (p->a, "%s", p->b); // no buffer overflow here > } > >You've recently made comment elsewhere that you wish GCC would >be more aggressive in detecting preventing undefined behavior >by inserting traps. I agree but I don't see how we can aim >for both looser and stricter UB detection at the same time. > >In any event, it seems clear to me that I've lost the argument. >If you undo the optimization then please retain the functionality >for the warnings. Otherwise you might as well remove those too. > >Martin