On Wed, Jun 03, 2015 at 07:05:15AM +0200, Thierry Goubier wrote:
> Hi Dave,
> 
> Le 03/06/2015 03:15, David T. Lewis a ?crit :
> >Hi Thierry and Jose,
> >
> >I am reading this thread with interest and will help if I can.
> >
> >I do have one idea that we have not tried before. I have a theory that 
> >this may
> >be an intermittent problem caused by SIGCHLD signals (from the external OS 
> >process
> >when it exits) being missed by the UnixOSProcessAccessor>>grimReaperProcess
> >that handles them.
> >
> >If this is happening, then I may be able to change grimReaperProcess to
> >work around the problem.
> >
> >When you see the OS deadlock condition, are you able tell if your Pharo VM
> >process has subprocesses in the zombie state (indicating that 
> >grimReaperProcess
> >did not clean them up)? The unix command "ps -axf | less" will let you look
> >at the process tree and that may give us a clue if this is happening.
> 
> I found it very easy to reproduce and I do have a zombie children 
> process to the pharo process.

Jose confirms this also (thanks).

Can you try filing in the attached UnixOSProcessAccessor>>grimReaperProcess
and see if it helps? I do not know if it will make a difference, but the
idea is to put a timeout on the semaphore that is waiting for signals from
SIGCHLD. I am hoping that if these signals are sometimes being missed, then
the timeout will allow the process to recover from the problem.


> 
> Interesting enough, the lock-up happens in a very specific place, a call 
> to git branch, which is a very short command returning just a few 
> characters (where all other commands have longuer outputs). Reducing the 
> frequency of the calls to git branch by a bit of caching reduces the 
> chances of a lock-up.
>

This is a good clue, and it may indicate a different kind of problem (so
maybe I am looking in the wrong place). Ben's suggestion of adding a delay
to the external process sounds like a good idea to help troubleshoot it.

Dave

 
'From Squeak4.5 of 30 May 2015 [latest update: #15039] on 2 June 2015 at 
9:35:21 pm'!

!UnixOSProcessAccessor methodsFor: 'initialize - release' stamp: 'dtl 6/2/2015 
20:54'!
grimReaperProcess
        "This is a process which waits for the death of a child OSProcess, and 
        informs any dependents of the change. Use SIGCHLD events if possible,
        otherwise a Delay to poll for exiting child processes."

        | eventWaiter processSynchronizationDelay |
        ^ self canAccessSystem
                ifTrue:
                        [eventWaiter := (self canAccessSystem and: [self 
canForwardExternalSignals])
                                ifTrue: [self sigChldSemaphore "semaphore 
signaled by SIGCHLD" ]
                                ifFalse: [Delay forMilliseconds: 200 "simple 
polling loop" ].
                        processSynchronizationDelay := Delay forMilliseconds: 
20.
                        grimReaper ifNil:
                                [grimReaper :=
                                        [[(eventWaiter respondsTo: 
#waitTimeoutMSecs: )
                                                ifTrue: [eventWaiter 
waitTimeoutMSecs: 1000 "semaphore with timeout"]
                                                ifFalse: [eventWaiter wait].
                                        processSynchronizationDelay wait. 
"Avoids lost signals in heavy process switching"
                                        self changed: #childProcessStatus] 
repeat] newProcess.
                                        grimReaper resume.
                                        "name selected to look reasonable in 
the process browser"
                                        grimReaper name: ((ReadStream on: 
grimReaper hash asString) next: 5)
                                                        , ': the child 
OSProcess watcher']]
                ifFalse:
                        [nil]
! !

Reply via email to