We're using the CpuProfiler in an Isolate that can have multiple threads 
running and and taking turns in the Isolate via Unlocker. Frequently, we 
want to profile what's happening on a thread so we do a StartProfiling for 
a thread, let it run for a while, and then collect the results via 
StopProfiling. Works great.

However, I've noticed a difference between Windows and Linux in the 
profiles, specifically, Windows collects samples for a thread when it's 
inside an Unlocker bracket whereas Linux does not. So if we have a native 
function sleep(), that unlocks the Isolate, in Windows I see a high hit 
count for stacks that have functionName "sleep" and sourceType callback in 
the bottom stack entry. In Linux, there are no hits where I'm inside the 
sleep native function. For our purposes, the Windows behavior is much nicer 
as we can easily see how much of the thread's time is being spent in sleep. 
In Linux, there is absolutely no way of knowing because the information is 
not captured.

Unfortunately, Windows nicer (IMO) behavior comes at a cost. Specifically, 
if indeed other threads do run in the Isolate while the profiled thread is 
sampled when unlocked, there is a tendency for core dumps while getting the 
stack trace. This is unsurprising because the stack trace needs to look in 
the Isolate's heap to get things like function names and that's not really 
a good idea when a thread doesn't have the Isolate lock.

The following code samples are all from src/libsampler/sampler.cc.

Under Linux the profiler protects itself in SamplerManager::DoSample 
(called by the SIGPROF handler) with:

  for (Sampler* sampler : samplers) {
    ...
    if (v8::Locker::IsActive() && !Locker::IsLocked(isolate)) continue;
    sampler->SampleStack(state);
  }
  
That Locker::isLocked is a bit of a misnomer and really means 
Locker::isLockedByCurrentThread and, because our SIGPROF handler runs on 
the profiled thread, if that tread is in an Unlocker bracket in a native 
function, isLocked returns false and we don't collect a sample. 

In Windows, the sampler does not run on the profiled thread and, instead, 
does a SuspendThread of the profiled thread from another thread and then 
uses GetThreadContext:

void Sampler::DoSample() {
  HANDLE profiled_thread = platform_data()->profiled_thread();
  if (profiled_thread == nullptr) return;

  const DWORD kSuspendFailed = static_cast<DWORD>(-1);
  if (SuspendThread(profiled_thread) == kSuspendFailed) return;

  // Context used for sampling the register state of the profiled thread.
  CONTEXT context;
  memset(&context, 0, sizeof(context));
  context.ContextFlags = CONTEXT_FULL;
  if (GetThreadContext(profiled_thread, &context) != 0) {
     ...
     SampleStack(state);
  }

There is no check if the Isolate is actually locked by the profiled thread 
so we can get a sample for it but also, unfortunately, seg faults and the 
like if another thread is running.

So it seems that there really should be a check in Sampler::DoSample for 
Windows to check if the Isolate is locked by the profiled thread. This 
would require maybe adding another method to Locker like 
isLockedByThread(ThreadId id) or something like that and then checking that 
in Sampler::DoSample. When PlatformData is instantiated for Windows it 
could stash the ThreadId for the current thread so it could be retrieved in 
Sampler::DoSample. Doesn't seem all that daunting and am willing to put 
together a PR for this if this seems reasonable and unless someone more 
conversant with V8 internals doesn't pick it up.

However, while this fix would eliminate Windows seg faults when doing 
profiling in a multi-threaded environment, it would also make me sad 
because now we would have no way of getting time spent unlocked in my 
native function. To provide this functionality, it seems like it would be 
nice to be able to create a provisional TickSample before the Isolate is 
unlocked in a native function and use that in DoSample if the Isolate is 
not locked by the profiled thread. Obviously, this would/should only be 
done if the thread is being profiled.

There are a lot of ways this could work but a straw man is that there could 
be CpuProfiler methods called SaveProvisionalThreadSample and 
ClearProvisionalThreadSample. The former, at least, would have to be called 
while one holds the Isolate lock so one would presumably do such a call 
right before Unlocker instantiation and then do a 
ClearProvisionalThreadSample call right after the end of the Unlocker 
context. Then, if DoSample decides that Isolate is not locked by the 
sampled thread, it could look for a saved provisional sample for the thread 
and, if available, use that for its sample. Note that a provisional 
TickSample could be used many times as a thread might be unlocked over many 
samples.

Again, assuming no one else picks this up, happy to take a swing at a PR, 
myself. While more complicated than the Windows fix, it doesn't seem that 
daunting.

Opinions?
 
Sorry if this would have been more appropriately posted on v8-dev.

Thanks
 

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/8abd14c1-1cc8-4aaf-a7c9-e78b03665f67o%40googlegroups.com.

Reply via email to