Re: [PATCH] Make strlen range computations more conservative

Richard Biener Mon, 06 Aug 2018 10:45:02 -0700

On August 6, 2018 6:32:41 PM GMT+02:00, Martin Sebor <mse...@gmail.com> wrote:
>On 08/06/2018 09:12 AM, Jeff Law wrote:
>> On 08/04/2018 03:56 PM, Martin Sebor wrote:
>>> On 08/03/2018 01:00 AM, Jeff Law wrote:
>>>> On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
>>>>> On 07/24/18 23:46, Jeff Law wrote:
>>>>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> This patch makes strlen range computations more conservative.
>>>>>>>
>>>>>>> Firstly if there is a visible type cast from type A to B before
>>>>>>> passing
>>>>>>> then value to strlen, don't expect the type layout of B to
>restrict
>>>>>>> the
>>>>>>> possible return value range of strlen.
>>>>>> Why do you think this is the right thing to do?  ie, is there
>language
>>>>>> in the standards that makes you think the code as it stands today
>is
>>>>>> incorrect from a conformance standpoint?  Is there a significant
>>>>>> body of
>>>>>> code that is affected in an adverse way by the current code?  If
>so,
>>>>>> what code?
>>>>>>
>>>>>>
>>>>>
>>>>> I think if you have an object, of an effective type A say
>char[100],
>>>>> then
>>>>> you can cast the address of A to B, say typedef char (*B)[2] for
>>>>> instance
>>>>> and then to const char *, say for use in strlen.  I may be wrong,
>but
>>>>> I think
>>>>> that we should at least try not to pick up char[2] from B, but
>instead
>>>>> use A for strlen ranges, or leave this range open.  Currently the
>range
>>>>> info for strlen is [0..1] in this case, even if we see the type
>cast
>>>>> in the generic tree.
>>>> ISTM that you're essentially saying that the cast to const char *
>>>> destroys any type information we can exploit here.  But if that's
>the
>>>> case, then I don't think we can even derive a range of [0,99]. 
>What's
>>>> to say that "A" didn't result from a similar cast of some object
>that
>>>> was char[200] that happened out of the scope of what we could see
>during
>>>> the strlen range computation?
>>>>
>>>> If that is what you're arguing, then I think there's a
>re-evaluation
>>>> that needs to happen WRT strlen range computation/
>>>>
>>>> And just to be clear, I do see this as a significant correctness
>>>> question.
>>>>
>>>> Martin, thoughts?
>>>
>>> The argument is that given:
>>>
>>>   struct S { char a[4], b; };
>>>
>>>   char a[8] = "1234567";
>>>
>>> this is valid and should pass:
>>>
>>>   __attribute__ ((noipa))
>>>   void f (struct S *p)
>>>   {
>>>     assert (7 == strlen (p->a));
>>>   }
>>>
>>>   int main (void)
>>>   {
>>>     f ((struct S*)a);
>>>   }
>>>
>>> (This is the basic premise behind pr86259.)
>>>
>>> This argument is wrong and the code is invalid.  For the access
>>> to p->a to be valid p must point to an object of struct S (it
>>> doesn't) and the p->a array must hold a nul-terminated string
>>> (it also doesn't).
>> I agree with you for C/C++, but I think it's been shown elsewhere in
>> this thread that GIMPLE semantics to not respect the subobject
>> boundaries.  That's a sad reality.
>>
>> [ ... ]
>>
>>>
>>> I care less about the optimization than I do about the basic
>>> premise that it's essential to respect subobject boundaries(*).
>> I understand, but the semantics of GIMPLE do not respect them.  We
>can
>> argue about whether or not those should change and what it would take
>to
>> fix that. But right now the existing semantics do not respect those
>> boundaries.
>
>They don't respect them in all cases (i.e., when MEM_REF loses
>information about the structure of an access) but in a good
>number of them GCC can still derive useful information from
>the access.  It's relied on to great a effect by _FORTIFTY_SOURCE.
>I think it would be a welcome enhancement if besides out-of-
>bounds writes _FORTIFTY_SOURCE also prevented out-of-bounds
>reads.
>
>>> It would make little sense to undo the strlen optimization
>>> without also undoing the optimization for the direct array
>>> access case.  Undoing either would raise the question about
>>> the validity of the _FORRTIFY_SOURCE=2 behavior.  That would
>>> be a huge step backwards in terms of code security.  If we
>>> did some of these but not others, it would make the behavior
>>> inconsistent and surprising, all to accommodate one instance
>>> of invalid code.
>> In the direct array access case I think (and I'm sure Jakub, Richi
>and
>> others will correct me if I'm wrong), we can use the object's type
>> because the dereferences are actually using the array's type.
>
>Subscripting and pointer access are identical both in C/C++
>and in GCC's _FORTIFY_SOURCE.  The absence of a distinction
>between the two is essential for preventing writes past
>the end by string functions like strcpy (_FORTIFY_SOURCE).
>
>>> If we had a valid test case where the strlen optimization
>>> leads to invalid code, or even if there were a significant
>>> number of bug reports showing that it breaks an invalid
>>> but common idiom, I would certainly feel compelled to
>>> make it right somehow.  But there has been just one bug
>>> report with clearly invalid code that should be easily
>>> corrected.
>> Again, I think you're too narrowly focused on C/C++ semantics here.
>> What matters are the semantics in GIMPLE.
>
>I don't get that.  GCC is a C/C++ compiler (besides other
>languages), but not a GIMPLE compiler.   The only reason this
>came up at all is a bug report with an invalid C test case that
>reads past the end.  The only reason in my mind to consider
>relaxing an assumption/restriction would be a valid test case
>(in any supported language) that the optimization invalidates.
>
>But as I said, far more essential than the optimization is
>the ability to detect these invalid access (both reads and
>writes), such as in:


The essential thing is to not introduce latent wrong code issues because you 
exploit C language constraints that are not preserved by GIMPLE transforms 
because they are not constraints in the GIMPLE IL _WHICH_ _IS_ _NOT_ _C_. 

Richard. 

>
>   struct S { char a[4], b[2], c; };
>
>   void f (struct S *p)
>   {
>     strcpy (p->a, "1234");        // certain buffer overflow
>
>     sprintf (p->b, "%s", p->a);   // potential buffer overflow
>
>     // ...but, to avoid false positives:
>     sprintf (p->a, "%s", p->b);   // no buffer overflow here
>   }
>
>You've recently made comment elsewhere that you wish GCC would
>be more aggressive in detecting preventing undefined behavior
>by inserting traps.  I agree but I don't see how we can aim
>for both looser and stricter UB detection at the same time.
>
>In any event, it seems clear to me that I've lost the argument.
>If you undo the optimization then please retain the functionality
>for the warnings.  Otherwise you might as well remove those too.
>
>Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to