Re: [PATCH] Make strlen range computations more conservative

Jeff Law Tue, 21 Aug 2018 14:58:14 -0700

On 08/20/2018 09:15 AM, Bernd Edlinger wrote:
> On 08/20/18 16:26, Jeff Law wrote:
>> On 08/20/2018 04:23 AM, Bernd Edlinger wrote:
>>> On 08/20/18 12:12, Richard Biener wrote:
>>>> On Wed, Aug 15, 2018 at 6:39 AM Jeff Law <l...@redhat.com> wrote:
>>>>>
>>>>> On 08/10/2018 10:56 AM, Martin Sebor wrote:
>>>>>> On 08/08/2018 11:36 PM, Jeff Law wrote:
>>>>>>> On 08/02/2018 09:42 AM, Martin Sebor wrote:
>>>>>>>
>>>>>>>> The warning bits are definitely not okay by me.  The purpose
>>>>>>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is
>>>>>>>> to detect buffer overflows.  When a warning doesn't have access
>>>>>>>> to string length information for dynamically created strings
>>>>>>>> (like the strlen pass does) it uses array sizes as a proxy.
>>>>>>>> This is useful both to detect possible buffer overflows and
>>>>>>>> to prevent false positives for overflows that cannot happen
>>>>>>>> in correctly written programs.
>>>>>>> So how much of this falling-back to array sizes as a proxy would become
>>>>>>> unnecessary if sprintf had access to the strlen pass as an analysis
>>>>>>> module?
>>>>>>>
>>>>>>> As you know we've been kicking that around and from my investigations
>>>>>>> that doesn't really look hard to do.  Encapsulate the data structures in
>>>>>>> a class, break up the statement handling into analysis and optimization
>>>>>>> and we should be good to go.  I'm hoping to start prototyping this week.
>>>>>>>
>>>>>>> If we think that has a reasonable chance to eliminate the array-size
>>>>>>> fallback, then that seems like the most promising path forward.
>>>>>>
>>>>>> We discussed this idea this morning so let me respond here and
>>>>>> reiterate the answer.  Using the strlen data will help detect
>>>>>> buffer overflow where the array size isn't available but it
>>>>>> cannot replace the array size heuristic. Here's a simple
>>>>>> example:
>>>>>>
>>>>>>     struct S { char a[8]; };
>>>>>>
>>>>>>     char d[8];
>>>>>>     void f (struct S *s, int i)
>>>>>>     {
>>>>>>       sprintf (d, "%s-%i",  s[i].a, i);
>>>>>>     }
>>>>>>
>>>>>> We don't know the length of s->a but we do know that it can
>>>>>> be up to 7 bytes long (assuming it's nul-terminated of course)
>>>>>> so we know the sprintf call can overflow.  Conversely, if
>>>>>> the size of the destination is increased to 20  the sprintf
>>>>>> call cannot overflow so the diagnostic can be avoided.
>>>>>>
>>>>>> Removing the array size heuristic would force us to either give
>>>>>> up on diagnosing the first case or issue false positives for
>>>>>> the second case.  I think the second alternative would make
>>>>>> the warning too noisy to be useful.
>>>>>>
>>>>>> The strlen pass will help detect buffer overflows in cases
>>>>>> where the array size isn't known (e.g., with dynamically
>>>>>> allocated buffers) but where the length of the string store
>>>>>> in the array is known.  It will also help avoid false positives
>>>>>> in cases where the string stored in an array of known size is
>>>>>> known to be too short to cause an overflow.  For instance here:
>>>>>>
>>>>>>     struct S { char a[8]; };
>>>>>>
>>>>>>     char d[8];
>>>>>>     void f (struct S *s, int i)
>>>>>>     {
>>>>>>       if (strlen (s->a) < 4 && i >= 0 && i < 100)
>>>>>>         sprintf (d, "%s-%i",  s->a, i);
>>>>>>     }
>>>>> ACK.  Thanks for explaining things here too.  I can't speak for others,
>>>>> but seeing examples along with the explanation is easier for me to absorb.
>>>>>
>>>>> For Bernd and others -- after kicking things around a bit with Martin,
>>>>> what we're recommending is that compute_string_length we compute the
>>>>> bounds using both GIMPLE and C semantics and return both.
>>>>
>>>> But you can't do this because GIMPLE did transforms that are not valid in
>>>> C, thus you can't interpret the GIMPLE IL as "C", you can only interpret
>>>> it as GIMPLE.  What you'd do is return GIMPLE semantics length
>>>> and "foobar" semantics length which doesn't match the original source.
>>>>
>>>
>>> If I understood that suggestion right, it means, we
>>> live with some false positive or missing warnings due to those 
>>> transformations.
>>> That means, get_range_strlen with the 2-parameter overload is used
>>> for warnings only.  And it returns most of the time a correct range info,
>>> that is good enough for warnings.
>> Correct.  99.9% of the time using the ranges implied by the array types
>> is better for the warning code.  So the idea is to return two ranges.
>> One which uses GIMPLE semantics for code generation and optimization
>> purposes, the other which uses ranges implied by the array types for
>> warning purposes.
>>
>> Martin suggested that we always compute and return both rather than
>> having a boolean argument to select between the behavior.
>>
>>
> 
> Okay, but there is already the "strict" parameter:
[ ... ]
Sorry, I misspoke.  There's 3 ranges to return.  I believe Martin
clarified this in a subsequent message.  Sorry for any confusion.


jeff

Re: [PATCH] Make strlen range computations more conservative

Reply via email to