On 08/10/2018 10:56 AM, Martin Sebor wrote: > On 08/08/2018 11:36 PM, Jeff Law wrote: >> On 08/02/2018 09:42 AM, Martin Sebor wrote: >> >>> The warning bits are definitely not okay by me. The purpose >>> of the warnings (-W{format,sprintf}-{overflow,truncation} is >>> to detect buffer overflows. When a warning doesn't have access >>> to string length information for dynamically created strings >>> (like the strlen pass does) it uses array sizes as a proxy. >>> This is useful both to detect possible buffer overflows and >>> to prevent false positives for overflows that cannot happen >>> in correctly written programs. >> So how much of this falling-back to array sizes as a proxy would become >> unnecessary if sprintf had access to the strlen pass as an analysis >> module? >> >> As you know we've been kicking that around and from my investigations >> that doesn't really look hard to do. Encapsulate the data structures in >> a class, break up the statement handling into analysis and optimization >> and we should be good to go. I'm hoping to start prototyping this week. >> >> If we think that has a reasonable chance to eliminate the array-size >> fallback, then that seems like the most promising path forward. > > We discussed this idea this morning so let me respond here and > reiterate the answer. Using the strlen data will help detect > buffer overflow where the array size isn't available but it > cannot replace the array size heuristic. Here's a simple > example: > > struct S { char a[8]; }; > > char d[8]; > void f (struct S *s, int i) > { > sprintf (d, "%s-%i", s[i].a, i); > } > > We don't know the length of s->a but we do know that it can > be up to 7 bytes long (assuming it's nul-terminated of course) > so we know the sprintf call can overflow. Conversely, if > the size of the destination is increased to 20 the sprintf > call cannot overflow so the diagnostic can be avoided. > > Removing the array size heuristic would force us to either give > up on diagnosing the first case or issue false positives for > the second case. I think the second alternative would make > the warning too noisy to be useful. > > The strlen pass will help detect buffer overflows in cases > where the array size isn't known (e.g., with dynamically > allocated buffers) but where the length of the string store > in the array is known. It will also help avoid false positives > in cases where the string stored in an array of known size is > known to be too short to cause an overflow. For instance here: > > struct S { char a[8]; }; > > char d[8]; > void f (struct S *s, int i) > { > if (strlen (s->a) < 4 && i >= 0 && i < 100) > sprintf (d, "%s-%i", s->a, i); > } ACK. Thanks for explaining things here too. I can't speak for others, but seeing examples along with the explanation is easier for me to absorb.
For Bernd and others -- after kicking things around a bit with Martin, what we're recommending is that compute_string_length we compute the bounds using both GIMPLE and C semantics and return both. Anything which influences code generation or optimization must use the GIMPLE semantics. Warnings may use the C semantics in an effort to improve preciseness. Martin has some other stuff to flush out of his queue, then he'll be focused on the changes to compute_string_length. Jeff