Excessive calls to iterate_phdr during exception handling

Ryan Johnson Mon, 27 May 2013 15:20:31 -0700

Hi all,

(please CC me in replies, not a list member)

I have a large C++ app that throws exceptions to unwind anywhere from5-20 stack frames when an error prevents the request from being served(which happens rather frequently). Works fine single-threaded, butperformance is terrible for 24 threads on a 48-thread Ubuntu 10 machine.Profiling points to a global mutex acquire in __GI___dl_iterate_phdr asthe culprit, with _Unwind_Find_FDE as its caller.

Tracing the attached test case with the attached gdb script shows thatthe -DDTOR case executes ~20k instructions during unwind and callsiterate_phdr 12 times. The -DTRY case executes ~33k instructions andcalls iterate_phdr 18 times. The exception in this test case onlyaffects three stack frames, with minimal cleanup required, and the traceis taken on the second call to the function that swallows the error, towarm up libgcc's internal caches [1].

The instruction counts aren't terribly surprising---I know unwinding iscomplex---but might it be possible to throw and catch a previously-seenexception through a previously-seen stack trace, with something fewerthan 4-6 global mutex acquires for each frame unwound? As it stands, thedeeper the stack trace (= the more desirable to throw rather than returnan error), the more of a scalability bottleneck unwinding becomes. Myactual app would apparently suffer anywhere from 25 to 80 global mutexacquires for each exception thrown, which probably explains why thebottleneck arises...

I'm bringing the issue up here, rather than filing a bug, because I'mnot sure whether this is an oversight, a known problem that's hard tofix, or a feature (e.g. somehow required for reliable unwinding). Isuspect the former, because _Unwind_Find_FDE tries a call to_Unwind_Find_registered_FDE before falling back to dl_iterate_phdr, butthe former never succeeds in my trace (iterate_phdr is always called).

FWIW, I've tested both gcc-4.6 and 4.8 but see no meaningful differencebetween them.

[1] The cache can be seen in libgcc/unwind-dw2-fde-dip.c, though theywill do little to prevent mutex bottlenecks because they're accessedfrom the iterate_phdr callback, behind the mutex acuqire.


Thoughts?
Ryan

#include <cstdio>

void ding() { fputs("Ding!\n", stderr); }

struct ding_unless {
    bool commit;
    ding_unless() : commit(false) { }
    ~ding_unless() { if (not commit) ding(); }
};

void __attribute__((noinline)) sentinel() { printf("Done\n"); }

int __attribute__((noinline)) foo() { throw 42; }

int __attribute__((noinline)) bar() {
#ifdef TRY
    try { return 1+foo(); }
    catch (...) { ding(); throw; }
#elif defined(DTOR)
    ding_unless x;
    int ans = 1+foo();
    x.commit = true;
    return ans;
#endif
}

int __attribute__((noinline)) baz() {
    int ans;
    try { ans = 1+bar(); }
    catch (...) { ans = -1; }
    sentinel();
    return ans;
}

int main() {
    baz();
    baz();
    return 0;
}

b baz
r
c
display/i $pc
si
while ($pc != sentinel)
si
end
k
q

Excessive calls to iterate_phdr during exception handling

Reply via email to