On Fri, Mar 23, 2012 at 12:35:17AM -0700, Sushanth Rai wrote:
> What I mean by inconsistent is that, a process with lots of threads
> takes a while before all threads are suspended. When I look at the
> resulting core file, the state of some of the shared data is not
> exactly what I was expecting when I issued the gcore command. It is
> quite possible that state might have changed even before ptrace() had
> a chance to issue SIGSTOP. But I am looking at any improvement that
> can be reasonably done in kernel.
>
> As you described suspension are checked at safe points and only when
> threads reach those, they get suspended. I understand and agree with
> the reasons behind asynchronous stopping. But the net effect is that
> threads can potentially run for a short duration before they suspend
> themselves. So, I am trying to figure out ways to reduce this duration
> as much as possible.
>
> One thing I noticed is that in sig_suspend_threads(), we check if
> the threads are sleeping interruptibly. If so, they get suspended
> immediately. therwise, set TDF_ASTPENDING and if the thread is
> running on CPU we send IPI_AST to that CPU. What about the target
> process's threads that are on the runq ? It looks like the thread
> will only notice the flag when it is at user->kernel boundary. Can we
> safely remove them out of the runq ?
No, since thread on runq shall be considered the same as the thread
actually executing on CPU. It is unsafe to suspend the thread in this
state, due to it potentially owning a kernel resource.

It the thread on runq but not on CPU is set up to return to usermode
'immediately' after putting back on CPU, then normal AST check would
cause its suspend.
>
> With respect to PT_SUSPEND, as part of PT_ATTACH request I was
> thinking of explicitly suspending all the threads by setting
> TDF_DBSUSPEND instead of posting SIGSTOP. As each thread in the target
> process calls thread_suspend_check(), it would notice this flag and
> suspend itself. PT_ATTACH command would then wait until all threads
> are suspended before returning to the caller. This is the general
> approach and ofcourse it is missing details at this point. The idea
> again is to suspend all threads as quickly as possible.
I do not see how this would provide any significant difference comparing
with SIGSTOP delivery. The points were signals are checked and the points
were suspension can be applied are essentially the same.

>
> I'm running on 7.2. Cursory look at trunk version didn't show major
> changes in this area.

Except bug fixes, there were no big changes I could remember.

Attachment: pgpHqD2NJL1eC.pgp
Description: PGP signature

Reply via email to