On 11/21/2017 12:07 PM, Martin Sebor wrote: > On 11/21/2017 09:55 AM, Jeff Law wrote: >> On 11/19/2017 04:28 PM, Martin Sebor wrote: >>> On 11/18/2017 12:53 AM, Jeff Law wrote: >>>> On 11/17/2017 12:36 PM, Martin Sebor wrote: >>>>> The attached patch enhances -Wstringop-overflow to detect more >>>>> instances of buffer overflow at compile time by handling non- >>>>> constant offsets into the destination object that are known to >>>>> be in some range. The solution could be improved by handling >>>>> even more cases (e.g., anti-ranges or offsets relative to >>>>> pointers beyond the beginning of an object) but it's a start. >>>>> >>>>> In addition to bootsrapping/regtesting GCC, also tested with >>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no >>>>> regressions. >>>>> >>>>> Martin >>>>> >>>>> The top of GDB fails to compile at the moment so the validation >>>>> there was incomplete. >>>>> >>>>> gcc-77608.diff >>>>> >>>>> >>>>> PR middle-end/77608 - missing protection on trivially detectable >>>>> runtime buffer overflow >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> PR middle-end/77608 >>>>> * builtins.c (compute_objsize): Handle non-constant offsets. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> PR middle-end/77608 >>>>> * gcc.dg/Wstringop-overflow.c: New test. >>>> The recursive call into compute_objsize passing in the ostype avoids >>>> having to think about the whole object vs nearest containing object >>>> issues. Right? >>>> >>>> What's left to worry about is maximum or minimum remaining bytes in the >>>> object. At least that's my understanding of how ostype works here. >>>> >>>> So we get the amount remaining, ignoring the variable offset, from the >>>> recursive call (SIZE). The space left after we account for the >>>> variable >>>> offset is [SIZE - MAX, SIZE - MIN]. So ISTM for type 0/1 you have to >>>> return SIZE-MIN (which you do) and for type 2/3 you have to return >>>> SIZE-MAX which I think you get wrong (and you have to account for the >>>> possibility that MAX or MIN is greater than SIZE and thus there's >>>> nothing left). >>> >>> Subtracting the upper bound of the offset from the size instead >>> of the lower bound when the caller is asking for the minimum >>> object size would make the result essentially meaningless in >>> common cases where the offset is smaller than size_t, as in: >>> >>> char a[7]; >>> >>> void f (const char *s, unsigned i) >>> { >>> __builtin_strcpy (a + i, s); >>> } >>> >>> Here, i's range is [0, UINT_MAX]. >>> >>> IMO, it's only useful to use the lower bound here, otherwise >>> the result would only rarely be non-zero. >> But when we're asking for the minimum left, aren't we essentially asking >> for "how much space am I absolutely sure I can write"? And if that is >> the question, then the only conservatively correct answer is to subtract >> the high bound. > > I suppose you could look at it that way but IME with this work > (now, and also last year when I submitted a patch actually > changing the built-in), using the upper bound is just not that > useful because it's too often way too big. There's no way to > distinguish an out-of-range upper bound that's the result of > an inadequate attempt to constrain a value from an out-of-range > upper bound that is sufficiently constrained but in a way GCC > doesn't see. Understood.
So while it's reasonable to not warn in those cases where we just have crap range information (that's always going to be the case for some code regardless of how good my work or Andrew/Aldy's work is), we have to be very careful and make sure that nobody acts on this information for optimization purposes because what we're returning is not conservatively correct. > > There are no clients of this API that would be affected by > the decision one way or the other (unless the user specifies > a -Wstringop-overflow= argument greater than the default 2) > so I don't think what we do now matters much, if at all. Right, but what's to stop someone without knowledge of the implementation and its quirk of not returning the conservatively safe result from using the results in other ways. What would the impact be of simply not supporting those queries, essentially returning "I don't know" rather than returning something that isn't conservatively correct? If we have to support both queries and we're going to return something that is not conservatively correct, then it really needs to be documented to avoid folks using the code in ways that are not safe. > Perhaps in the future with some of the range improvements > that you, Aldy and Andrew have been working on. I think they'll help, but there's always going to be codes that can't be analyzed, it's just inherent in the languages we use. > > That said, if it helps us move forward with this enhancement > I'll use the upper bound -- let me know. In the future, when > it is actually used, we'll adjust it as necessary. Understood. Let's answer the questions above first. Namely, do we have to support the other mode? I thought I saw something that indicated it wasn't ever used that way by the current clients. So if we can just return "don't know" for those ostype queries that's a reasonable place to go. Else we should look at a clear comment about the limitations of the code to help prevent (ab)use of the analysis in contexts were it's not appropriate. Jeff