On Thu, Aug 3, 2017 at 4:28 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law <l...@redhat.com> wrote: >> On 08/03/2017 08:17 AM, Yury Gribov wrote: >>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov <tetra2005.patc...@gmail.com> >>>> wrote: >>>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>>>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov >>>>>> <tetra2005.patc...@gmail.com> wrote: >>>>>>> On 02.08.2017 23:04, H.J. Lu wrote: >>>>>>>> >>>>>>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >>>>>>>> <tetra2005.patc...@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 02.08.2017 21:48, H.J. Lu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >>>>>>>>>> <tetra2005.patc...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 02.08.2017 20:02, Jeff Law wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's >>>>>>>>>>>>>>> suggestion. >>>>>>>>>>>>>>> Original patch description/motivation/questions are in >>>>>>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is his stuff used for exception handling? If so, doesn't that >>>>>>>>>>>>>> make >>>>>>>>>>>>>> the >>>>>>>>>>>>>> performance a significant concern (ie that msync call?) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For the performance issue, I wonder if it wouldn't be better to >>>>>>>>>>>>> just >>>>>>>>>>>>> compile the unwinder twice, basically include everything needed >>>>>>>>>>>>> for >>>>>>>>>>>>> the >>>>>>>>>>>>> verification backtrace in a single *.c file with special defines, >>>>>>>>>>>>> and >>>>>>>>>>>>> not export anything from that *.o file except the single >>>>>>>>>>>>> entrypoint. >>>>>>>>>>>>> That would also handle the ABI problems. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yea. >>>>>>>>>>>> >>>>>>>>>>>> Given that the vast majority of the time the addresses are valid, I >>>>>>>>>>>> wonder if we could take advantage of that property to keep the >>>>>>>>>>>> overhead >>>>>>>>>>>> lower in the most common cases. >>>>>>>>>>>> >>>>>>>>>>>>> For the concurrent calls of other threads unmapping stacks of >>>>>>>>>>>>> running >>>>>>>>>>>>> threads (that is where most of the verification goes against), I'm >>>>>>>>>>>>> afraid >>>>>>>>>>>>> that is simply non-solveable, even if you don't cache anything, it >>>>>>>>>>>>> will >>>>>>>>>>>>> still be racy. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Absolutely -- I think solving this stuff 100% reliably without >>>>>>>>>>>> races >>>>>>>>>>>> isn't possible. I think the question is can we make this >>>>>>>>>>>> significantly >>>>>>>>>>>> better. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> All, >>>>>>>>>>> >>>>>>>>>>> First of all, thanks for the feedback. This is indeed not a 100% >>>>>>>>>>> robust >>>>>>>>>>> solution, just something to allow tools like Asan to produce more >>>>>>>>>>> sane >>>>>>>>>>> backtraces on crash (currently when stack is corrupted they would >>>>>>>>>>> crash >>>>>>>>>>> in >>>>>>>>>>> the unwinder without printing at least partial backtrace). >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack >>>>>>>>>> is >>>>>>>>>> corrupted: >>>>>>>>>> >>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >>>>>>>>>> >>>>>>>>>> There is very little point to unwind stack when stack is corrupted. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I'd argue that situation with __stack_chk_fail is very special. It's >>>>>>>>> used >>>>>>>>> in production code where you'd want to halt as soon as corruption is >>>>>>>>> detected to prevent further damage so even printing message is an >>>>>>>>> additional >>>>>>>>> risk. Debugging tools (e.g. sanitizers) are different and would >>>>>>>>> prefer >>>>>>>>> to >>>>>>>>> print as many survived frames as possible. >>>>>>>>> >>>>>>>> >>>>>>>> But stack unwinder in libgcc is primarily for production use, not for >>>>>>>> debugging. >>>>>>> >>>>>>> >>>>>>> I can see it used in different projects to print backtraces on error >>>>>>> (bind9, >>>>>>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder >>>>>>> and >>>>>>> seems to be meant primarily for debugging (at least many projects use >>>>>>> it for >>>>>>> this purpose: >>>>>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). >>>>>>> Same for >>>>>>> libbacktrace which, as its README mentions, is meant to "print a >>>>>>> detailed >>>>>>> backtrace when an error occurs or to gather detailed profiling >>>>>>> information". >>>>>> >>>>>> But it shouldn't be performed on a corrupted stack. When a stack is >>>>>> corrupted, there is just no way to reliably unwind it. >>>>> >>>>> Why not? Attached patch shows that it's possible (up to a point of >>>>> corruption of course). >>>>> >>>>>> It isn't a problem >>>>>> for a debugger which is designed to analyze corrupted program. >>>>> >>>>> Yes but forcing all applications that would like to print helpful >>>>> message on stack corruption to employ gdb sounds less efficient that >>>>> providing fail-safe APIs in standard library... >>>> ^^^^^^^^^^^ There is no such a thing of fail-safe on >>>> corrupted >>>> stack. A bad, but valid address, can be put on corrupted stack. When >>>> you unwind it, the unwinder may not crash, but give you bogus stack >>>> backtrace. >>> >>> Agreed but at least my users were fine with this. Having at least >>> partial stack is very helpful when you app fails early during OS >>> startup with no chance to attach debugger. >> Once it's corrupted, you should stop the backtrace, unless you are >> absolutely certain that trying to interpret the data in the stack can't >> cause your unwinder to perform undefined behavior. >> > > That is what we did in glibc 2.26.
SSP is special - you know for sure that stack was corrupted and can easily decide to abort. In a typical C application user would only unwind stack when printing an error message. If one of the higher frames is corrupted, unwinder will be fooled to accessing random memory address in 99% of cases. But this is unknown at the time _Unwind_Backtrace (or backtrace(3)) is called. -Y