On Wed, Jun 03, 2015 at 05:03:18PM +0200, Jose San Leandro wrote: > Unfortunately it doesn't fix it, or at least I get the same sympthoms.
Thanks for trying it. Sorry it did not help :-/ Dave > > Sending SIGUSR1 prints this: > > SIGUSR1 Wed Jun 3 16:53:50 2015 > > > /home/chous/toolbox/pharo-4.0/pharo-vm/pharo > pharo VM version: 3.9-7 #1 Thu Apr 2 00:51:45 CEST 2015 gcc 4.6.3 > [Production ITHB VM] > Built from: NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.21 uuid: > 4d9b9bdf-2dfa-4c0b-99eb-5b110dadc697 Apr 2 2015 > With: NBCogit NativeBoost-CogPlugin-EstebanLorenzano.21 uuid: > 4d9b9bdf-2dfa-4c0b-99eb-5b110dadc697 Apr 2 2015 > Revision: https://github.com/pharo-project/pharo-vm.git Commit: > 32d18ba0f2db9bee7f3bdbf16bdb24fe4801cfc5 Date: 2015-03-24 11:08:14 +0100 > By: Esteban Lorenzano <esteba...@gmail.com> Jenkins build #14904 > Build host: Linux pharo-linux 3.2.0-31-generic-pae #50-Ubuntu SMP Fri Sep 7 > 16:39:45 UTC 2012 i686 i686 i386 GNU/Linux > plugin path: /home/chous/toolbox/pharo-4.0/pharo-vm/ [default: > /home/chous/toolbox/pharo-4.0/pharo-vm/] > > > C stack backtrace & registers: > eax 0xff981e94 ebx 0xff981db0 ecx 0xff981e48 edx 0xff981dfc > edi 0xff981c80 esi 0xff981c80 ebp 0xff981d18 esp 0xff981d64 > eip 0xff981f78 > *[0xff981f78] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x80a33a2] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x80a3649] > linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf773acc0] > /home/chous/toolbox/pharo/pharo-vm/pharo(signalSemaphoreWithIndex+0x28)[0x809d8c8] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x810868c] > linux-gate.so.1(__kernel_sigreturn+0x0)[0xf773acb0] > /home/chous/toolbox/pharo/pharo-vm/pharo(signalSemaphoreWithIndex+0x5e)[0x809d8fe] > /home/chous/toolbox/pharo/pharo-vm/pharo(aioPoll+0x22f)[0x809f0af] > /home/chous/toolbox/pharo-4.0/pharo-vm/vm-display-X11(+0xe671)[0xf772a671] > /home/chous/toolbox/pharo/pharo-vm/pharo(ioRelinquishProcessorForMicroseconds+0x17)[0x80a1887] > /home/chous/toolbox/pharo/pharo-vm/pharo[0x80767fa] > [0xb4a2fe0c] > [0xb4a2d700] > [0xb53b9382] > [0xb4a2d648] > [0x5b] > > > All Smalltalk process stacks (active first): > Process 0xb6d930c4 priority 10 > 0xff9ad450 M ProcessorScheduler class>idleProcess 0xb4d935c0: a(n) > ProcessorScheduler class > 0xff9ad470 I [] in ProcessorScheduler class>startUp 0xb4d935c0: a(n) > ProcessorScheduler class > 0xff9ad490 I [] in BlockClosure>newProcess 0xb6d92fe8: a(n) BlockClosure > > suspended processes > Process 0xb68e1984 priority 50 > 0xff9a6490 M WeakArray class>finalizationProcess 0xb4d93790: a(n) WeakArray > class > 0xb69beb68 s [] in WeakArray class>restartFinalizationProcess > 0xb68e1924 s [] in BlockClosure>newProcess > > Process 0xb5ced038 priority 80 > 0xff9af490 M DelayMicrosecondScheduler>runTimerEventLoop 0xb5bb6f9c: a(n) > DelayMicrosecondScheduler > 0xb6098314 s [] in DelayMicrosecondScheduler>startTimerEventLoop > 0xb5cecfd8 s [] in BlockClosure>newProcess > > Process 0xb68ec880 priority 40 > 0xff9b2478 M [] in UnixOSProcessAccessor>(nil) 0xb60dc6d0: a(n) > UnixOSProcessAccessor > 0xff9b2490 M BlockClosure>repeat 0xb68ef2d4: a(n) BlockClosure > 0xb68ef278 s [] in UnixOSProcessAccessor>(nil) > 0xb68ec820 s [] in BlockClosure>newProcess > > Process 0xb6d92d78 priority 60 > 0xff98742c M InputEventFetcher>waitForInput 0xb5a09718: a(n) > InputEventFetcher > 0xff987450 M InputEventFetcher>eventLoop 0xb5a09718: a(n) InputEventFetcher > 0xff987470 I [] in InputEventFetcher>installEventLoop 0xb5a09718: a(n) > InputEventFetcher > 0xff987490 I [] in BlockClosure>newProcess 0xb6d92c9c: a(n) BlockClosure > > Process 0xb6f25f94 priority 60 > 0xb6f25fcc s SmalltalkImage>lowSpaceWatcher > 0xb71523e4 s [] in SmalltalkImage>installLowSpaceWatcher > 0xb6f25f34 s [] in BlockClosure>newProcess > > Process 0xb73a4e7c priority 30 > 0xff99b470 M [] in AioEventHandler>handleExceptions:readEvents:writeEvents: > 0xb73a49e4: a(n) AioEventHandler > 0xff99b490 I [] in BlockClosure>newProcess 0xb73a4d90: a(n) BlockClosure > Process 0xb6686c88 priority 40 > 0xffa073d0 M [] in Delay>wait 0xb73a63fc: a(n) Delay > 0xffa073f0 M BlockClosure>ifCurtailed: 0xb73a6614: a(n) BlockClosure > 0xffa0740c M Delay>wait 0xb73a63fc: a(n) Delay > 0xffa07428 M PipeableOSProcess(PipeJunction)>outputOn: 0xb73a0d34: a(n) > PipeableOSProcess > 0xffa07444 M PipeableOSProcess(PipeJunction)>output 0xb73a0d34: a(n) > PipeableOSProcess > 0xffa0746c M [] in MCFileTreeGitRepository class>runOSProcessGitCommand:in: > 0xb611fa88: a(n) MCFileTreeGitRepository class > 0xffa0748c M BlockClosure>ensure: 0xb739d9dc: a(n) BlockClosure > 0xff9e538c M MCFileTreeGitRepository class>runOSProcessGitCommand:in: > 0xb611fa88: a(n) MCFileTreeGitRepository class > 0xff9e53ac M MCFileTreeGitRepository class>runGitCommand:in: 0xb611fa88: > a(n) MCFileTreeGitRepository class > 0xff9e53cc M MCFileTreeGitRepository>gitCommand:in: 0xb612926c: a(n) > MCFileTreeGitRepository > 0xff9e53f4 M MCFileTreeGitRepository>gitVersionsForPackage: 0xb612926c: > a(n) MCFileTreeGitRepository > 0xff9e543c M [] in MCFileTreeGitRepository>loadAllFileNames 0xb612926c: > a(n) MCFileTreeGitRepository > 0xff9e5458 M FileSystemDirectoryEntry(Object)>in: 0xb71b3fe8: a(n) > FileSystemDirectoryEntry > 0xff9e548c M [] in MCFileTreeGitRepository>loadAllFileNames 0xb612926c: > a(n) MCFileTreeGitRepository > 0xffa04310 M BlockClosure>cull: 0xb71b4894: a(n) BlockClosure > 0xffa04338 I [] in Job>run 0xb71b48b4: a(n) Job > 0xffa04350 M BlockClosure>on:do: 0xb71b56b8: a(n) BlockClosure > 0xffa0437c I [] in Job>run 0xb71b48b4: a(n) Job > 0xffa0439c M BlockClosure>ensure: 0xb71b4980: a(n) BlockClosure > 0xffa043c4 I Job>run 0xb71b48b4: a(n) Job > 0xffa043e4 I MorphicUIManager(UIManager)>displayProgress:from:to:during: > 0xb50a8790: a(n) MorphicUIManager > 0xffa04414 I ByteString(String)>displayProgressFrom:to:during: 0xb61238d8: > a(n) ByteString > 0xffa04444 M MCFileTreeGitRepository>loadAllFileNames 0xb612926c: a(n) > MCFileTreeGitRepository > 0xffa04464 I MCFileTreeGitRepository>allFileNames 0xb612926c: a(n) > MCFileTreeGitRepository > 0xffa0448c M MCFileTreeGitRepository>goferVersionFrom: 0xb612926c: a(n) > MCFileTreeGitRepository > 0xff9e238c I > MetacelloCachingGoferResolvedReference(GoferResolvedReference)>version > 0xb71b3134: a(n) MetacelloCachingGoferResolvedReference > 0xff9e23a4 M MetacelloCachingGoferResolvedReference>version 0xb71b3134: > a(n) MetacelloCachingGoferResolvedReference > 0xff9e23bc M [] in > MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: > a(n) MetacelloFetchingMCSpecLoader > 0xff9e23e0 M OrderedCollection>do: 0xb71b3234: a(n) OrderedCollection > 0xff9e240c M [] in > MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: > a(n) MetacelloFetchingMCSpecLoader > 0xff9e2424 M BlockClosure>on:do: 0xb71b3334: a(n) BlockClosure > 0xff9e244c M > MetacelloFetchingMCSpecLoader>resolveDependencies:nearest:into: 0xb706d83c: > a(n) MetacelloFetchingMCSpecLoader > 0xff9e2490 M [] in > MetacelloFetchingMCSpecLoader>linearLoadPackageSpec:gofer: 0xb706d83c: a(n) > MetacelloFetchingMCSpecLoader > 0xff9ae318 M MetacelloPharo30Platform(MetacelloPlatform)>do:displaying: > 0xb50e8b94: a(n) MetacelloPharo30Platform > 0xff9ae338 M MetacelloFetchingMCSpecLoader>linearLoadPackageSpec:gofer: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae358 M MetacelloPackageSpec>loadUsing:gofer: 0xb706be54: a(n) > MetacelloPackageSpec > 0xff9ae37c M [] in > MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae3a0 M OrderedCollection>do: 0xb70c807c: a(n) OrderedCollection > 0xff9ae3c0 M > MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae3f0 I [] in > MetacelloFetchingMCSpecLoader>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xff9ae410 M BlockClosure>ensure: 0xb70c813c: a(n) BlockClosure > 0xff9ae438 I MetacelloLoaderPolicy>pushLoadDirective:during: 0xb706cb7c: > a(n) MetacelloLoaderPolicy > 0xff9ae460 I MetacelloLoaderPolicy>pushLinearLoadDirectivesDuring:for: > 0xb706cb7c: a(n) MetacelloLoaderPolicy > 0xff9ae488 I > MetacelloFetchingMCSpecLoader>linearLoadPackageSpecs:repositories: > 0xb706d83c: a(n) MetacelloFetchingMCSpecLoader > 0xb70c33c0 s MetacelloFetchingMCSpecLoader(MetacelloCommonMCSpecLoader)>load > 0xb706d898 s MetacelloMCVersionSpecLoader>load > 0xb71948d0 s MetacelloMCVersion>executeLoadFromArray: > 0xb719492c s [] in MetacelloMCVersion>fetchRequiredFromArray: > 0xb7194988 s [] in > MetacelloPharo30Platform(MetacelloPlatform)>useStackCacheDuring:defaultDictionary: > 0xb706d96c s BlockClosure>on:do: > 0xb706d2d8 s > MetacelloPharo30Platform(MetacelloPlatform)>useStackCacheDuring:defaultDictionary: > 0xb706d258 s [] in MetacelloMCVersion>fetchRequiredFromArray: > 0xb71949e4 s BlockClosure>ensure: > 0xb706d15c s [] in MetacelloMCVersion>fetchRequiredFromArray: > 0xb706d1e4 s MetacelloPharo30Platform(MetacelloPlatform)>do:displaying: > 0xb706d0e4 s MetacelloMCVersion>fetchRequiredFromArray: > 0xb706ccc0 s [] in MetacelloMCVersion>doLoadRequiredFromArray: > 0xb715327c s BlockClosure>ensure: > 0xb706cc34 s MetacelloMCVersion>doLoadRequiredFromArray: > 0xb71532d8 s MetacelloMCVersion>load > 0xb7153334 s UndefinedObject>(nil) > 0xb7153390 s OpalCompiler>evaluate > 0xb706ab30 s RubSmalltalkEditor>evaluate:andDo: > 0xb706a7f4 s RubSmalltalkEditor>highlightEvaluateAndDo: > 0xb7152edc s [] in > GLMMorphicPharoPlaygroundRenderer(GLMMorphicPharoCodeRenderer)>actOnHighlightAndEvaluate: > 0xb7152f38 s RubEditingArea(RubAbstractTextArea)>handleEdit: > 0xb706a784 s [] in > GLMMorphicPharoPlaygroundRenderer(GLMMorphicPharoCodeRenderer)>actOnHighlightAndEvaluate: > 0xb7152f94 s WorldState>runStepMethodsIn: > 0xb7152ff0 s WorldMorph>runStepMethods > 0xb706a1cc s WorldState>doOneCycleNowFor: > 0xb715304c s WorldState>doOneCycleFor: > 0xb71530a8 s WorldMorph>doOneCycle > 0xb6686f8c s [] in MorphicUIManager>spawnNewProcess > 0xb6686c28 s [] in BlockClosure>newProcess > > Most recent primitives > primCreatePipe > new: > at:put: > at:put: > basicNew > basicNew: > basicNew > basicNew: > primSQFileSetBlocking: > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > basicNew > primSigPipeNumber > basicNew > wait > at:put: > signal > primForwardSignal:toSemaphore: > wait > at:put: > signal > primCreatePipe > new: > at:put: > at:put: > basicNew > basicNew: > basicNew > basicNew: > primSQFileSetNonBlocking: > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > basicNew > signal > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > new: > basicNew > new: > replaceFrom:to:with:startingAt: > basicNew > basicNew: > primSQFileSetNonBlocking: > basicNew > stringHash:initialHash: > primOSFileHandle: > basicNew > wait > at:put: > signal > primAioEnable:forSemaphore:externalObject: > basicNew > objectAt: > basicNew: > stackp: > basicNew > primitiveResume > wait > wait > signal > wait > signal > primAioHandle:exceptionEvents:readEvents:writeEvents: > signal > basicNew: > basicAt:put: > primSQFileSetNonBlocking: > basicNew: > basicAt:put: > basicNew: > basicAt:put: > at:put: > basicNew > basicNew > wait > signal > primUTCMicrosecondsClock > + > >= > + > < > primSignal:atUTCMicroseconds: > wait > signal > wait > wait > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > primUTCMicrosecondsClock > >= > signal > + > primSignal:atUTCMicroseconds: > wait > basicNew > basicNew > basicNew > basicNew > signal > basicNew > signal > basicNew > new: > wait > new: > at:put: > at:put: > at:put: > basicNew: > at:put: > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > basicNew > new: > at:put: > new: > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > at:put: > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > at:put: > at:put: > at:put: > new: > replaceFrom:to:with:startingAt: > primSizeOfPointer > new: > at:put: > at:put: > at:put: > primSizeOfPointer > basicNew: > basicNew > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > new: > basicNew > new: > at:put: > at:put: > at:put: > at:put: > at:put: > at:put: > new: > replaceFrom:to:with:startingAt: > new: > at:put: > at:put: > primGetCurrentWorkingDirectory > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > primForkExec:stdIn:stdOut:stdErr:argBuf:argOffsets:envBuf:envOffsets:workingDir: > primGetPid > primGetPid > primGetPid > basicNew > basicNew > wait > at:put: > signal > wait > shallowCopy > new: > replaceFrom:to:with:startingAt: > signal > wait > replaceFrom:to:with:startingAt: > at:put: > signal > primCloseNoError: > primCloseNoError: > primCloseNoError: > signal > basicNew: > basicNew > basicNew > basicNew > wait > signal > primUTCMicrosecondsClock > + > >= > + > < > primSignal:atUTCMicroseconds: > wait > signal > wait > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > basicNew: > primRead:into:startingAt:count: > basicNew > signal > wait > basicNew: > basicNew > basicNew: > replaceFrom:to:with:startingAt: > replaceFrom:to:with:startingAt: > signal > basicNew > signal > basicNew > new: > wait > signal > wait > signal > primAioHandle:exceptionEvents:readEvents:writeEvents: > signal > wait > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > relinquishProcessorForMicroseconds: > > stack page bytes 4096 available headroom 3300 minimum unused headroom 2152 > > (SIGUSR1) > > > 2015-06-03 14:15 GMT+02:00 David T. Lewis <le...@mail.msen.com>: > > > On Wed, Jun 03, 2015 at 07:05:15AM +0200, Thierry Goubier wrote: > > > Hi Dave, > > > > > > Le 03/06/2015 03:15, David T. Lewis a ?crit : > > > >Hi Thierry and Jose, > > > > > > > >I am reading this thread with interest and will help if I can. > > > > > > > >I do have one idea that we have not tried before. I have a theory that > > > >this may > > > >be an intermittent problem caused by SIGCHLD signals (from the external > > OS > > > >process > > > >when it exits) being missed by the > > UnixOSProcessAccessor>>grimReaperProcess > > > >that handles them. > > > > > > > >If this is happening, then I may be able to change grimReaperProcess to > > > >work around the problem. > > > > > > > >When you see the OS deadlock condition, are you able tell if your Pharo > > VM > > > >process has subprocesses in the zombie state (indicating that > > > >grimReaperProcess > > > >did not clean them up)? The unix command "ps -axf | less" will let you > > look > > > >at the process tree and that may give us a clue if this is happening. > > > > > > I found it very easy to reproduce and I do have a zombie children > > > process to the pharo process. > > > > Jose confirms this also (thanks). > > > > Can you try filing in the attached UnixOSProcessAccessor>>grimReaperProcess > > and see if it helps? I do not know if it will make a difference, but the > > idea is to put a timeout on the semaphore that is waiting for signals from > > SIGCHLD. I am hoping that if these signals are sometimes being missed, then > > the timeout will allow the process to recover from the problem. > > > > > > > > > > Interesting enough, the lock-up happens in a very specific place, a call > > > to git branch, which is a very short command returning just a few > > > characters (where all other commands have longuer outputs). Reducing the > > > frequency of the calls to git branch by a bit of caching reduces the > > > chances of a lock-up. > > > > > > > This is a good clue, and it may indicate a different kind of problem (so > > maybe I am looking in the wrong place). Ben's suggestion of adding a delay > > to the external process sounds like a good idea to help troubleshoot it. > > > > Dave > > > > > >