> On Nov 9, 2022, at 10:03 AM, Tomas Kalibera <tomas.kalib...@gmail.com> wrote:
> 
> 
> On 11/7/22 01:58, luke-tier...@uiowa.edu wrote:
>> On Sun, 6 Nov 2022, Simon Urbanek wrote:
>> 
>>> Carl,
>>> 
>>> first, setting such low interval won't work anyway - the overhead is bigger 
>>> than the sampled time, so we should really not allow it to begin with (on 
>>> my machine the timer signals arrive before anything can be done so you have 
>>> to kill R and you get no output).
>>> 
>>> That said, it crashes in doprof() which is called on all threads - the main 
>>> R one is ok, but one of the other threads crashes in pthread_self(). At 
>>> that time R is trying to propagate the signal from all threads to the main 
>>> thread which seems odd to me (since the main thread already got the 
>>> signal), I'm CCing Luke in the hope that he has any ideas. This may fall in 
>>> the category of "don't do this" and the fix may be to set a lower bound on 
>>> the interval.
>> 
>> I can't reproduce this on Linux or macOS.
>> 
>> On Linux only one thread receives a signal sent to a process, but the
>> kernel picks which one if multiple threads have the signal unblocked,
>> so we make sure the signal gets relayed to the main thread. If macOS
>> behaves differently then someone who knows how signals and threads
>> interact there would have to adjust this code.
> 
> From my reading this is the same on macOS. The profiling signal is 
> asynchronous, sent to the process, it will be served by one thread which is 
> picked by the OS. POSIX doesn't say which thread is preferred.


Yes, I saw the same with extra detail that thread signal blocking doesn't seem 
to necessarily work on macOS.


> While some OSes prefer the main thread (I read macOS and Linux do, but from 
> non-authoritative sources), R may also be embedded and not run on the main 
> thread.
> 
> We have to do something to ensure the R thread is not running while we sample 
> its R stack, anyway. On Windows we suspend the R thread for that. On Unix we 
> do the relaying.  We could in principle suspend the R thread on macOS as 
> well, but would have to use Mach calls directly.
> 
>> Disallowing such a low interval is reasonable, but if there is a real
>> issue on macOS then it would only mask the problem.
> 
> Yes. The key question is why pthread_self() crashed.


Yes, that is the main mystery. Looking at the xnu kernel sources it is 
equivalent to pthread_getspecific(0) [since it's just the first slot in TSD] 
plus a check of a magic content in there. I suspect it's that check which 
segfaults for whatever reason. I wanted to see if just comparing the pointer 
from pthread_getspecific(0) instead of pthread_self() would work since we don't 
care if the pthread_t is valid as we only compare it to the main thread value 
(not that I would propose that as a fix since it's very 
implementation-specific, just curious), but I didn't get that far (I cannot 
really reproduce it - the closest I get is a mach exception under lldb).



> Otherwise, from the stack trace, the behavior looks ok. The main thread (also 
> R thread) is serving the signal, hence the signal is blocked, but it is 
> received again, so another thread is picked to serve it, and it is relaying 
> it to the main thread. One more thread is picked to serve it, and it crashes 
> while calling pthread_self(). There is also one more thread not involved in 
> the signal handling.
> 
> POSIX statest that pthread_self() is async-signal-safe. macOS 12.6 manuals 
> (sigaction) however doesn't include any pthread function in the list of 
> async-signal-functions.
> 
> We could do some work-around (hiding the problem a bit more) like exit from 
> the handler if the signal is being served by another thread. We could also 
> report such situation to indicate that the interval is unreasonable. But it 
> would be good first to know for sure what caused the problem.
> 

How can you check anything if pthread functions fail? If a simple pthead_self() 
crashes then I don't see how you can do anything since we don't even know what 
thread we are, cannot call mutexes etc.

Cheers,
Simon

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Reply via email to