Hi,

I have a Go executable that uses a shared C library which spawns it own 
threads. In case of a crash (in Go or C code), I want to dump all 
stacktraces of all C threads and Go routines into a crash dump file. Go 
does not handle signals in non-Go threads executing non-Go code, so I have 
to install a custom C signal handler to handle those cases. And as Go does 
not invoke a preinstalled C handler in case of crashes in Go code, the C 
handler has to be registered after the Go handler.

After some experiments - restricted to Linux amd64 - I got it working 
somehow (https://gist.github.com/trxa/302c5dbe9055ef287da9139e68d0a93e). 
But it feels a bit hacky with some drawbacks and I wonder if somebody can 
propose a better solution or improvements.


How it basically works:

The Go handlers are stored when the C handler gets installed. 
If invoked, for example by a SIGSEGV, the handler opens a file and writes 
the stack trace of the current thread into that file.
Then, it signals all other threads to dump their stack into the file too.
After all threads are dumped, the IP of the failing instruction is saved 
and the Go handler is invoked by calling it directly to keep the ucontext 
of the crash.
After the Go handler has returned, it is checked whether the IP of the 
uc_mcontext has been changed by Go.
If it is changed, the IP points to runtime.sigpanic which triggers a panic 
and dumps the Go routine stacks to STDERR. 
If it is not changed, the crash was in non-Go code on a non-Go thread and 
Go does not handle the crash. In that case, the IP register in uc_mcontext 
is set to the function pointer of an exposed Cgo function which calls 
panic() to dump the stack to STDERR.
Before returning from the C handler, the STDERR file descriptor is replaced 
by the crash dump file descriptor, so that Go panics into the file. (The Go 
handlers should probably be restored before returning, if Go still wants to 
backtrace the threads via SIGQUIT itself.)
After the C handler has returned, runtime.sigaction or the cgo function is 
executed and does not return. 


Here are the disadvantages and things to watch out, which makes the 
solution a bit creepy:

1. signal.Notify has to be called for all signals you want to handle for C 
crashes, although they are not handled in Go. Otherwise the Go handler does 
not return in the "non-Go-code/thread" case, but creates a core dump.

2. Setting the IP to a cgo function to be executed when the handler 
returns, makes the program panicing synchronously, as with 
runtime.sigpanic, but is probably not async-signal-safe, for example if it 
has to request more stack.
A workaround would be to panic in Go, if the signal is read from the notify 
channel. In addition, the C handler must not return to avoid reexecution of 
the faulting instruction. This can be done by putting the thread to sleep. 
Doing this is probably even more platform independent, but that way, a 
synchronous signal from C is handled as an asynchronous one and you don't 
have a chance to distinguish it in Go by the information you get (in case 
you only want to dump and continue for asynchronous signals).

3. Cloning the STDERR file descriptor to point to a file feels also a bit 
fragile compared to directly writing to it. Another thread might write to 
it. The fd cannot be closed (except maybe in a global destructor) and the 
OS would have to flush the buffers correctly (or I have to use synchronous 
write mode, which slows writing the dump down tremendously).

4. There are duplicate stack traces, and it's not always obvious to match a 
thread stack trace to the running go routine.

5. It would be desirable to have the stack trace of the failing instruction 
redundantly in the crash file and in the log file, but with this solution 
it is only possible for C frames and the first Go frame on top of the 
thread, at least if you use a common unwinder library.

There might be more.


>From my point of view, a better solution would be, when Go has an option 
(maybe via GOTRACEBACK env var) to trace C threads as well, for example by 
using the cgo traceback functions introduced in Go 1.7. Also setting a file 
descriptor/handle as target for a dump should be allowed (maybe in addition 
to the dump on STDERR). In addition to the cgo traceback functions, there 
might be one or more functions for gathering additional information, which 
will be printed in the crash dump. A use case for that would be a list of 
loaded modules/libraries or environment variables.
I can imagine that it's easier said than done, but that's what I would 
prefer.


Thanks for your opinions!
Martin

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to