On 08/10/2018 10:56 AM, Martin Sebor wrote:
> On 08/08/2018 11:36 PM, Jeff Law wrote:
>> On 08/02/2018 09:42 AM, Martin Sebor wrote:
>>
>>> The warning bits are definitely not okay by me.  The purpose
>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is
>>> to detect buffer overflows.  When a warning doesn't have access
>>> to string length information for dynamically created strings
>>> (like the strlen pass does) it uses array sizes as a proxy.
>>> This is useful both to detect possible buffer overflows and
>>> to prevent false positives for overflows that cannot happen
>>> in correctly written programs.
>> So how much of this falling-back to array sizes as a proxy would become
>> unnecessary if sprintf had access to the strlen pass as an analysis
>> module?
>>
>> As you know we've been kicking that around and from my investigations
>> that doesn't really look hard to do.  Encapsulate the data structures in
>> a class, break up the statement handling into analysis and optimization
>> and we should be good to go.  I'm hoping to start prototyping this week.
>>
>> If we think that has a reasonable chance to eliminate the array-size
>> fallback, then that seems like the most promising path forward.
> 
> We discussed this idea this morning so let me respond here and
> reiterate the answer.  Using the strlen data will help detect
> buffer overflow where the array size isn't available but it
> cannot replace the array size heuristic. Here's a simple
> example:
> 
>   struct S { char a[8]; };
> 
>   char d[8];
>   void f (struct S *s, int i)
>   {
>     sprintf (d, "%s-%i",  s[i].a, i);
>   }
> 
> We don't know the length of s->a but we do know that it can
> be up to 7 bytes long (assuming it's nul-terminated of course)
> so we know the sprintf call can overflow.  Conversely, if
> the size of the destination is increased to 20  the sprintf
> call cannot overflow so the diagnostic can be avoided.
> 
> Removing the array size heuristic would force us to either give
> up on diagnosing the first case or issue false positives for
> the second case.  I think the second alternative would make
> the warning too noisy to be useful.
> 
> The strlen pass will help detect buffer overflows in cases
> where the array size isn't known (e.g., with dynamically
> allocated buffers) but where the length of the string store
> in the array is known.  It will also help avoid false positives
> in cases where the string stored in an array of known size is
> known to be too short to cause an overflow.  For instance here:
> 
>   struct S { char a[8]; };
> 
>   char d[8];
>   void f (struct S *s, int i)
>   {
>     if (strlen (s->a) < 4 && i >= 0 && i < 100)
>       sprintf (d, "%s-%i",  s->a, i);
>   }
ACK.  Thanks for explaining things here too.  I can't speak for others,
but seeing examples along with the explanation is easier for me to absorb.

For Bernd and others -- after kicking things around a bit with Martin,
what we're recommending is that compute_string_length we compute the
bounds using both GIMPLE and C semantics and return both.

Anything which influences code generation or optimization must use the
GIMPLE semantics.  Warnings may use the C semantics in an effort to
improve preciseness.

Martin has some other stuff to flush out of his queue, then he'll be
focused on the changes to compute_string_length.
Jeff

Reply via email to