Re: [R-SIG-Mac] [External] Re: problem with Rprof

luke-tierney Wed, 09 Nov 2022 08:43:41 -0800

On Tue, 8 Nov 2022, Simon Urbanek wrote:

On Nov 9, 2022, at 10:03 AM, Tomas Kalibera <tomas.kalib...@gmail.com> wrote:


On 11/7/22 01:58, luke-tier...@uiowa.edu wrote:

On Sun, 6 Nov 2022, Simon Urbanek wrote:

Carl,

first, setting such low interval won't work anyway - the overhead is bigger 
than the sampled time, so we should really not allow it to begin with (on my 
machine the timer signals arrive before anything can be done so you have to 
kill R and you get no output).

That said, it crashes in doprof() which is called on all threads - the main R one is ok, 
but one of the other threads crashes in pthread_self(). At that time R is trying to 
propagate the signal from all threads to the main thread which seems odd to me (since the 
main thread already got the signal), I'm CCing Luke in the hope that he has any ideas. 
This may fall in the category of "don't do this" and the fix may be to set a 
lower bound on the interval.


I can't reproduce this on Linux or macOS.

On Linux only one thread receives a signal sent to a process, but the
kernel picks which one if multiple threads have the signal unblocked,
so we make sure the signal gets relayed to the main thread. If macOS
behaves differently then someone who knows how signals and threads
interact there would have to adjust this code.


From my reading this is the same on macOS. The profiling signal is 
asynchronous, sent to the process, it will be served by one thread which is 
picked by the OS. POSIX doesn't say which thread is preferred.



Yes, I saw the same with extra detail that thread signal blocking doesn't seem 
to necessarily work on macOS.

While some OSes prefer the main thread (I read macOS and Linux do, but from 
non-authoritative sources), R may also be embedded and not run on the main 
thread.

We have to do something to ensure the R thread is not running while we sample 
its R stack, anyway. On Windows we suspend the R thread for that. On Unix we do 
the relaying.  We could in principle suspend the R thread on macOS as well, but 
would have to use Mach calls directly.

Disallowing such a low interval is reasonable, but if there is a real
issue on macOS then it would only mask the problem.


Yes. The key question is why pthread_self() crashed.



Yes, that is the main mystery. Looking at the xnu kernel sources it is 
equivalent to pthread_getspecific(0) [since it's just the first slot in TSD] 
plus a check of a magic content in there. I suspect it's that check which 
segfaults for whatever reason. I wanted to see if just comparing the pointer 
from pthread_getspecific(0) instead of pthread_self() would work since we don't 
care if the pthread_t is valid as we only compare it to the main thread value 
(not that I would propose that as a fix since it's very 
implementation-specific, just curious), but I didn't get that far (I cannot 
really reproduce it - the closest I get is a mach exception under lldb).

Otherwise, from the stack trace, the behavior looks ok. The main thread (also R 
thread) is serving the signal, hence the signal is blocked, but it is received 
again, so another thread is picked to serve it, and it is relaying it to the 
main thread. One more thread is picked to serve it, and it crashes while 
calling pthread_self(). There is also one more thread not involved in the 
signal handling.

POSIX statest that pthread_self() is async-signal-safe. macOS 12.6 manuals 
(sigaction) however doesn't include any pthread function in the list of 
async-signal-functions.

We could do some work-around (hiding the problem a bit more) like exit from the 
handler if the signal is being served by another thread. We could also report 
such situation to indicate that the interval is unreasonable. But it would be 
good first to know for sure what caused the problem.


How can you check anything if pthread functions fail? If a simple pthead_self() 
crashes then I don't see how you can do anything since we don't even know what 
thread we are, cannot call mutexes etc.


Some random googling seems to suggest other systems have had issues
with pthread_self not being async-signal-safe on macOS. One example is
https://bugs.openjdk.org/browse/JDK-8235962. (Not directly about
pthread_self but looks like the same issue.) They suggest a possible
mach-specific work-around. We might be able to make use of something
along those lines with a modest amount of effort.

Another option is to look at using a timing thread similar to the
Windows approach. That could probably be designed to send the signal
directly to the main thread and avoid any pthread calls in the signal
handler. There is also a wishlist item from Lionel in bugzilla to
allow a realtime timer which might also be addressed at the same time.
On the other hand, this is all a fair amount of work and not really a
priority.

Best,

luke

Cheers,
Simon


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Re: [R-SIG-Mac] [External] Re: problem with Rprof

Reply via email to