Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

Yury Gribov Thu, 03 Aug 2017 08:52:37 -0700

On Thu, Aug 3, 2017 at 4:28 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law <l...@redhat.com> wrote:
>> On 08/03/2017 08:17 AM, Yury Gribov wrote:
>>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov <tetra2005.patc...@gmail.com> 
>>>> wrote:
>>>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov 
>>>>>> <tetra2005.patc...@gmail.com> wrote:
>>>>>>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>>>>>>
>>>>>>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>>>>>>>> <tetra2005.patc...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 02.08.2017 21:48, H.J. Lu wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>>>>>>>>>> <tetra2005.patc...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02.08.2017 20:02, Jeff Law wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's 
>>>>>>>>>>>>>>> suggestion.
>>>>>>>>>>>>>>> Original patch description/motivation/questions are in
>>>>>>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is his stuff used for exception handling?  If so, doesn't that 
>>>>>>>>>>>>>> make
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> performance a significant concern (ie that msync call?)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the performance issue, I wonder if it wouldn't be better to 
>>>>>>>>>>>>> just
>>>>>>>>>>>>> compile the unwinder twice, basically include everything needed 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> the
>>>>>>>>>>>>> verification backtrace in a single *.c file with special defines, 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> not export anything from that *.o file except the single 
>>>>>>>>>>>>> entrypoint.
>>>>>>>>>>>>> That would also handle the ABI problems.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yea.
>>>>>>>>>>>>
>>>>>>>>>>>> Given that the vast majority of the time the addresses are valid, I
>>>>>>>>>>>> wonder if we could take advantage of that property to keep the
>>>>>>>>>>>> overhead
>>>>>>>>>>>> lower in the most common cases.
>>>>>>>>>>>>
>>>>>>>>>>>>> For the concurrent calls of other threads unmapping stacks of 
>>>>>>>>>>>>> running
>>>>>>>>>>>>> threads (that is where most of the verification goes against), I'm
>>>>>>>>>>>>> afraid
>>>>>>>>>>>>> that is simply non-solveable, even if you don't cache anything, it
>>>>>>>>>>>>> will
>>>>>>>>>>>>> still be racy.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Absolutely -- I think solving this stuff 100% reliably without 
>>>>>>>>>>>> races
>>>>>>>>>>>> isn't possible.  I think the question is can we make this
>>>>>>>>>>>> significantly
>>>>>>>>>>>> better.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> All,
>>>>>>>>>>>
>>>>>>>>>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>>>>>>>>>> robust
>>>>>>>>>>> solution, just something to allow tools like Asan to produce more 
>>>>>>>>>>> sane
>>>>>>>>>>> backtraces on crash (currently when stack is corrupted they would 
>>>>>>>>>>> crash
>>>>>>>>>>> in
>>>>>>>>>>> the unwinder without printing at least partial backtrace).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack 
>>>>>>>>>> is
>>>>>>>>>> corrupted:
>>>>>>>>>>
>>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>>>>>>>>>
>>>>>>>>>> There is very little point to unwind stack when stack is corrupted.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>>>>>>>> used
>>>>>>>>> in production code where you'd want to halt as soon as corruption is
>>>>>>>>> detected to prevent further damage so even printing message is an
>>>>>>>>> additional
>>>>>>>>> risk.  Debugging tools (e.g. sanitizers) are different and would 
>>>>>>>>> prefer
>>>>>>>>> to
>>>>>>>>> print as many survived frames as possible.
>>>>>>>>>
>>>>>>>>
>>>>>>>> But stack unwinder in libgcc is primarily for production use, not for
>>>>>>>> debugging.
>>>>>>>
>>>>>>>
>>>>>>> I can see it used in different projects to print backtraces on error 
>>>>>>> (bind9,
>>>>>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>>>>>>> and
>>>>>>> seems to be meant primarily for debugging (at least many projects use 
>>>>>>> it for
>>>>>>> this purpose:
>>>>>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). 
>>>>>>> Same for
>>>>>>> libbacktrace which, as its README mentions, is meant to "print a 
>>>>>>> detailed
>>>>>>> backtrace when an error occurs or to gather detailed profiling 
>>>>>>> information".
>>>>>>
>>>>>> But it shouldn't be performed on a corrupted stack.  When a stack is
>>>>>> corrupted, there is just no way to reliably unwind it.
>>>>>
>>>>> Why not? Attached patch shows that it's possible (up to a point of
>>>>> corruption of course).
>>>>>
>>>>>> It isn't a problem
>>>>>> for a debugger which is designed to analyze corrupted program.
>>>>>
>>>>> Yes but forcing all applications that would like to print helpful
>>>>> message on stack corruption to employ gdb sounds less efficient that
>>>>> providing fail-safe APIs in standard library...
>>>>                 ^^^^^^^^^^^  There is no such a thing of fail-safe on 
>>>> corrupted
>>>> stack.   A bad, but valid address, can be put on corrupted stack.  When
>>>> you unwind it, the unwinder may not crash, but give you bogus stack
>>>> backtrace.
>>>
>>> Agreed but at least my users were fine with this.  Having at least
>>> partial stack is very helpful when you app fails early during OS
>>> startup with no chance to attach debugger.
>> Once it's corrupted, you should stop the backtrace, unless you are
>> absolutely certain that trying to interpret the data in the stack can't
>> cause your unwinder to perform undefined behavior.
>>
>
> That is what we did in glibc 2.26.


SSP is special - you know for sure that stack was corrupted and can
easily decide to abort.

In a typical C application user would only unwind stack when printing
an error message.  If one of the higher frames is corrupted, unwinder
will be fooled to accessing random memory address in 99% of cases.
But this is unknown at the time _Unwind_Backtrace (or backtrace(3)) is
called.

-Y

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

Reply via email to