On 08/20/2018 09:15 AM, Bernd Edlinger wrote: > On 08/20/18 16:26, Jeff Law wrote: >> On 08/20/2018 04:23 AM, Bernd Edlinger wrote: >>> On 08/20/18 12:12, Richard Biener wrote: >>>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <l...@redhat.com> wrote: >>>>> >>>>> On 08/10/2018 10:56 AM, Martin Sebor wrote: >>>>>> On 08/08/2018 11:36 PM, Jeff Law wrote: >>>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote: >>>>>>> >>>>>>>> The warning bits are definitely not okay by me. The purpose >>>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>>>>>>> to detect buffer overflows. When a warning doesn't have access >>>>>>>> to string length information for dynamically created strings >>>>>>>> (like the strlen pass does) it uses array sizes as a proxy. >>>>>>>> This is useful both to detect possible buffer overflows and >>>>>>>> to prevent false positives for overflows that cannot happen >>>>>>>> in correctly written programs. >>>>>>> So how much of this falling-back to array sizes as a proxy would become >>>>>>> unnecessary if sprintf had access to the strlen pass as an analysis >>>>>>> module? >>>>>>> >>>>>>> As you know we've been kicking that around and from my investigations >>>>>>> that doesn't really look hard to do. Encapsulate the data structures in >>>>>>> a class, break up the statement handling into analysis and optimization >>>>>>> and we should be good to go. I'm hoping to start prototyping this week. >>>>>>> >>>>>>> If we think that has a reasonable chance to eliminate the array-size >>>>>>> fallback, then that seems like the most promising path forward. >>>>>> >>>>>> We discussed this idea this morning so let me respond here and >>>>>> reiterate the answer. Using the strlen data will help detect >>>>>> buffer overflow where the array size isn't available but it >>>>>> cannot replace the array size heuristic. Here's a simple >>>>>> example: >>>>>> >>>>>> struct S { char a[8]; }; >>>>>> >>>>>> char d[8]; >>>>>> void f (struct S *s, int i) >>>>>> { >>>>>> sprintf (d, "%s-%i", s[i].a, i); >>>>>> } >>>>>> >>>>>> We don't know the length of s->a but we do know that it can >>>>>> be up to 7 bytes long (assuming it's nul-terminated of course) >>>>>> so we know the sprintf call can overflow. Conversely, if >>>>>> the size of the destination is increased to 20 the sprintf >>>>>> call cannot overflow so the diagnostic can be avoided. >>>>>> >>>>>> Removing the array size heuristic would force us to either give >>>>>> up on diagnosing the first case or issue false positives for >>>>>> the second case. I think the second alternative would make >>>>>> the warning too noisy to be useful. >>>>>> >>>>>> The strlen pass will help detect buffer overflows in cases >>>>>> where the array size isn't known (e.g., with dynamically >>>>>> allocated buffers) but where the length of the string store >>>>>> in the array is known. It will also help avoid false positives >>>>>> in cases where the string stored in an array of known size is >>>>>> known to be too short to cause an overflow. For instance here: >>>>>> >>>>>> struct S { char a[8]; }; >>>>>> >>>>>> char d[8]; >>>>>> void f (struct S *s, int i) >>>>>> { >>>>>> if (strlen (s->a) < 4 && i >= 0 && i < 100) >>>>>> sprintf (d, "%s-%i", s->a, i); >>>>>> } >>>>> ACK. Thanks for explaining things here too. I can't speak for others, >>>>> but seeing examples along with the explanation is easier for me to absorb. >>>>> >>>>> For Bernd and others -- after kicking things around a bit with Martin, >>>>> what we're recommending is that compute_string_length we compute the >>>>> bounds using both GIMPLE and C semantics and return both. >>>> >>>> But you can't do this because GIMPLE did transforms that are not valid in >>>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret >>>> it as GIMPLE. What you'd do is return GIMPLE semantics length >>>> and "foobar" semantics length which doesn't match the original source. >>>> >>> >>> If I understood that suggestion right, it means, we >>> live with some false positive or missing warnings due to those >>> transformations. >>> That means, get_range_strlen with the 2-parameter overload is used >>> for warnings only. And it returns most of the time a correct range info, >>> that is good enough for warnings. >> Correct. 99.9% of the time using the ranges implied by the array types >> is better for the warning code. So the idea is to return two ranges. >> One which uses GIMPLE semantics for code generation and optimization >> purposes, the other which uses ranges implied by the array types for >> warning purposes. >> >> Martin suggested that we always compute and return both rather than >> having a boolean argument to select between the behavior. >> >> > > Okay, but there is already the "strict" parameter: [ ... ] Sorry, I misspoke. There's 3 ranges to return. I believe Martin clarified this in a subsequent message. Sorry for any confusion.
jeff