On Thu, 10 Nov 2022, Tomas Kalibera wrote:
On 11/9/22 00:22, Simon Urbanek wrote:
On Nov 9, 2022, at 10:03 AM, Tomas Kalibera <tomas.kalib...@gmail.com>
wrote:
On 11/7/22 01:58, luke-tier...@uiowa.edu wrote:
On Sun, 6 Nov 2022, Simon Urbanek wrote:
Carl,
first, setting such low interval won't work anyway - the overhead is
bigger than the sampled time, so we should really not allow it to begin
with (on my machine the timer signals arrive before anything can be done
so you have to kill R and you get no output).
That said, it crashes in doprof() which is called on all threads - the
main R one is ok, but one of the other threads crashes in
pthread_self(). At that time R is trying to propagate the signal from
all threads to the main thread which seems odd to me (since the main
thread already got the signal), I'm CCing Luke in the hope that he has
any ideas. This may fall in the category of "don't do this" and the fix
may be to set a lower bound on the interval.
I can't reproduce this on Linux or macOS.
On Linux only one thread receives a signal sent to a process, but the
kernel picks which one if multiple threads have the signal unblocked,
so we make sure the signal gets relayed to the main thread. If macOS
behaves differently then someone who knows how signals and threads
interact there would have to adjust this code.
From my reading this is the same on macOS. The profiling signal is
asynchronous, sent to the process, it will be served by one thread which
is picked by the OS. POSIX doesn't say which thread is preferred.
Yes, I saw the same with extra detail that thread signal blocking doesn't
seem to necessarily work on macOS.
While some OSes prefer the main thread (I read macOS and Linux do, but
from non-authoritative sources), R may also be embedded and not run on the
main thread.
We have to do something to ensure the R thread is not running while we
sample its R stack, anyway. On Windows we suspend the R thread for that.
On Unix we do the relaying. We could in principle suspend the R thread on
macOS as well, but would have to use Mach calls directly.
Disallowing such a low interval is reasonable, but if there is a
real
issue on macOS then it would only mask the problem.
Yes. The key question is why pthread_self() crashed.
Yes, that is the main mystery. Looking at the xnu kernel sources it is
equivalent to pthread_getspecific(0) [since it's just the first slot in
TSD] plus a check of a magic content in there. I suspect it's that check
which segfaults for whatever reason. I wanted to see if just comparing the
pointer from pthread_getspecific(0) instead of pthread_self() would work
since we don't care if the pthread_t is valid as we only compare it to the
main thread value (not that I would propose that as a fix since it's very
implementation-specific, just curious), but I didn't get that far (I cannot
really reproduce it - the closest I get is a mach exception under lldb).
Yes, this is a mystery. The pthread_t validation may probably crash if
pthread_t was corrupted, but, it is not clear why it should be. Then there is
the pointer authentication check which I wonder if does anything at all on
Intel, and the report was from an Intel machine.
What I also find puzzling is that the stack trace doesn't show much about the
crashed thread. The 1st frame on thread 0 is "start" as it is the main
thread. The other threads start with "thread_start/_pthread_start". But, the
crashed thread 6 only with "_sigtramp" for the handler. No previous frames.
Also, the crash has is due to "no mapping for user data read", a page fault,
so probably some pointer on the stack points to the wrong place. As if the
stack was corrupted or the thread didn't get a chance to be initialized
properly before the signal has arrived (not sure if that is possible).
Again, I cannot reproduce this on my Intel Mac (R 4.2.1, macOS 11.6.8)
Carl has not told us how he is running R (from a terminal, the R GUI,
RStudio, ...)
When I use the Activity Monitor to look at an R process started from
the terminal then I see one thread.
With the R GUI I see a number of threads that seems to fluctuate
between 5 and 9 (without any user activity in the console, just sitting
there at the prompt). With RStudio I see 21-23, also fluctuating while
sitting at the prompt.
So it looks like in R GUI and RStudio threads are being created and
destroyed. It is possible that a signal arriving between mach thread
creation and setting up the pthread structure will see an invalid
structure. With a huge number of signals the chance of that happening
is higher, though you would still also need a lot of threads created
to see this reliably.
Best,
luke
Carl, is the problem repeatable on your machine? If yes, what are the steps
to repeat it on your machine?
I was trying on M1, but didn't find a way to provoke it.
Best
Tomas
Otherwise, from the stack trace, the behavior looks ok. The main thread
(also R thread) is serving the signal, hence the signal is blocked, but it
is received again, so another thread is picked to serve it, and it is
relaying it to the main thread. One more thread is picked to serve it, and
it crashes while calling pthread_self(). There is also one more thread not
involved in the signal handling.
POSIX statest that pthread_self() is async-signal-safe. macOS 12.6 manuals
(sigaction) however doesn't include any pthread function in the list of
async-signal-functions.
We could do some work-around (hiding the problem a bit more) like exit
from the handler if the signal is being served by another thread. We could
also report such situation to indicate that the interval is unreasonable.
But it would be good first to know for sure what caused the problem.
How can you check anything if pthread functions fail? If a simple
pthead_self() crashes then I don't see how you can do anything since we
don't even know what thread we are, cannot call mutexes etc.
Cheers,
Simon
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac