Re: [R-SIG-Mac] [External] Re: problem with Rprof

Carl Witthoft Thu, 10 Nov 2022 05:01:50 -0800

Tomas,

Every time I set the time interval to a value of 1e-5 or smaller (Ithink! maybe it was 1e-6 or smaller) , R will crash on my machine.


On 11/10/22 4:53 AM, Tomas Kalibera wrote:

On 11/9/22 00:22, Simon Urbanek wrote:
On Nov 9, 2022, at 10:03 AM, Tomas Kalibera<tomas.kalib...@gmail.com> wrote:
On 11/7/22 01:58, luke-tier...@uiowa.edu wrote:
On Sun, 6 Nov 2022, Simon Urbanek wrote:
Carl,
first, setting such low interval won't work anyway - the overheadis bigger than the sampled time, so we should really not allow itto begin with (on my machine the timer signals arrive beforeanything can be done so you have to kill R and you get no output).
That said, it crashes in doprof() which is called on all threads -the main R one is ok, but one of the other threads crashes inpthread_self(). At that time R is trying to propagate the signalfrom all threads to the main thread which seems odd to me (sincethe main thread already got the signal), I'm CCing Luke in the hopethat he has any ideas. This may fall in the category of "don't dothis" and the fix may be to set a lower bound on the interval.
I can't reproduce this on Linux or macOS.

On Linux only one thread receives a signal sent to a process, but the
kernel picks which one if multiple threads have the signal unblocked,
so we make sure the signal gets relayed to the main thread. If macOS
behaves differently then someone who knows how signals and threads
interact there would have to adjust this code.
From my reading this is the same on macOS. The profiling signal isasynchronous, sent to the process, it will be served by one threadwhich is picked by the OS. POSIX doesn't say which thread is preferred.
Yes, I saw the same with extra detail that thread signal blockingdoesn't seem to necessarily work on macOS.
While some OSes prefer the main thread (I read macOS and Linux do,but from non-authoritative sources), R may also be embedded and notrun on the main thread.
We have to do something to ensure the R thread is not running whilewe sample its R stack, anyway. On Windows we suspend the R thread forthat. On Unix we do the relaying. We could in principle suspend theR thread on macOS as well, but would have to use Mach calls directly.
Disallowing such a low interval is reasonable, but if there is a real
issue on macOS then it would only mask the problem.
Yes. The key question is why pthread_self() crashed.
Yes, that is the main mystery. Looking at the xnu kernel sources it isequivalent to pthread_getspecific(0) [since it's just the first slotin TSD] plus a check of a magic content in there. I suspect it's thatcheck which segfaults for whatever reason. I wanted to see if justcomparing the pointer from pthread_getspecific(0) instead ofpthread_self() would work since we don't care if the pthread_t isvalid as we only compare it to the main thread value (not that I wouldpropose that as a fix since it's very implementation-specific, justcurious), but I didn't get that far (I cannot really reproduce it -the closest I get is a mach exception under lldb).
Yes, this is a mystery. The pthread_t validation may probably crash ifpthread_t was corrupted, but, it is not clear why it should be. Thenthere is the pointer authentication check which I wonder if doesanything at all on Intel, and the report was from an Intel machine.
What I also find puzzling is that the stack trace doesn't show muchabout the crashed thread. The 1st frame on thread 0 is "start" as it isthe main thread. The other threads start with"thread_start/_pthread_start". But, the crashed thread 6 only with"_sigtramp" for the handler. No previous frames. Also, the crash has isdue to "no mapping for user data read", a page fault, so probably somepointer on the stack points to the wrong place. As if the stack wascorrupted or the thread didn't get a chance to be initialized properlybefore the signal has arrived (not sure if that is possible).
Carl, is the problem repeatable on your machine? If yes, what are thesteps to repeat it on your machine?
I was trying on M1, but didn't find a way to provoke it.

Best
Tomas
Otherwise, from the stack trace, the behavior looks ok. The mainthread (also R thread) is serving the signal, hence the signal isblocked, but it is received again, so another thread is picked toserve it, and it is relaying it to the main thread. One more threadis picked to serve it, and it crashes while calling pthread_self().There is also one more thread not involved in the signal handling.
POSIX statest that pthread_self() is async-signal-safe. macOS 12.6manuals (sigaction) however doesn't include any pthread function inthe list of async-signal-functions.
We could do some work-around (hiding the problem a bit more) likeexit from the handler if the signal is being served by anotherthread. We could also report such situation to indicate that theinterval is unreasonable. But it would be good first to know for surewhat caused the problem.
How can you check anything if pthread functions fail? If a simplepthead_self() crashes then I don't see how you can do anything sincewe don't even know what thread we are, cannot call mutexes etc.
Cheers,
Simon


--
Carl Witthoft
personal: c...@witthoft.com
The Witthoft Group, Consulting
https://witthoftgroup.weebly.com/

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Re: [R-SIG-Mac] [External] Re: problem with Rprof

Reply via email to