Peter Edwards <pea...@arista.com> added the comment:

Hi - we ran into what looks like exactly this issue on an x86_64 sporadically, 
and tracked down the root cause.

When faulthandler.c uses sigaltstack(2), the stack size is set up with a buffer 
of size SIGSTKSZ. That is, sadly, only 8k.

When a signal is raised, before the handler is called, the kernel stores the 
machine state on the user's (possibly "alternate") stack. The size of that 
state is very much variable, depending on the CPU.

When we chain the signal handler in the sigaction variant of the code in 
faulthandler, we raise the signal with the existing handler still on the stack, 
and save a second copy of the CPU state.

Finally, when any part of that signal handler has to invoke a function that 
requires the dynamic linker's intervention to resolve, it will call some form 
of _dl_runtime_resolve* - likely _dl_runtime_resolve_xsave or 
_dl_runtime_resolve_xsavec.

These functions will also have to save machine state. So, how big is the 
machine state? Well, it depends on the CPU. 
On one machine I have access to, /proc/cpuinfo shows "Intel(R) Xeon(R) CPU 
E5-2640 v4", I have:

> (gdb) p _rtld_local_ro._dl_x86_cpu_features.xsave_state_size
> $1 = 896

On another machine, reporting as "Intel(R) Xeon(R) Gold 5118 CPU", I have:

> (gdb) p _rtld_local_ro._dl_x86_cpu_features.xsave_state_size
> $1 = 2560

This means that the required stack space to hold 3 sets of CPU state is over 
7.5k. And, for the signal handlers, it's actually worse: more like 3.25k per 
frame. A chained signal handler that needs to invoke dynamic linking will 
therefore consume more than the default stack space allocated in 
faulthandler.c, just in machine-state saves alone. So, the failing test is 
failing because its scribbling on random memory before the allocated stack 
space.

My guess is that the previous architectures this manifested in have larger 
stack demands for signal handling than x86_64, but clearly newer x86_64 
processors are starting to get tickled by this.

Fix is pretty simple - just allocate more stack space. The attached patch uses 
pthread_attr_getstacksize to find the system's default stack size, and then 
uses that as the default, and also defines an absolute minimum stack size of 
1M. This fixes the issue on our machine with the big xsave state size. (I'm 
sure I'm getting the feature test macros wrong for testing for pthreads 
availability)

Also, I think in the case of a threaded environment, using the altstack might 
not be the best choice - I think multiple threads handling signals that run on 
that stack will wind up stomping on the same memory - is there a strong reason 
to maintain this altstack behaviour?

----------
keywords: +patch
nosy: +peadar
Added file: https://bugs.python.org/file48353/sigaltstack-stacksize.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue21131>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to