On Tue, Jun 02, 2015 at 02:34:49PM +0200, Thierry Goubier wrote:
> 2015-06-02 12:14 GMT+02:00 Jose San Leandro <jose.sanlean...@osoco.es>:
> 
> > Hi Thierry,
> >
> > ConfigurationOfOSProcess-ThierryGoubier.38.mcz, which corresponds to
> > version 4.6.2.
> >
> 
> Ok, then this is the latest.
> 
> 
> >
> > Another workaround that would work for me is to be able to "resume" a
> > previous load attempt of a Metacello project. Or a custom "hook" in
> > Metacello to save the image after every dependency is successfully loaded.
> >
> 
> Yes, this would work. I'll ask again Dave if he has any idea; the bug is
> hard to reproduce.


Hi Thierry and Jose,

I am reading this thread with interest and will help if I can.

I do have one idea that we have not tried before. I have a theory that this may
be an intermittent problem caused by SIGCHLD signals (from the external OS 
process
when it exits) being missed by the UnixOSProcessAccessor>>grimReaperProcess
that handles them.

If this is happening, then I may be able to change grimReaperProcess to
work around the problem.

When you see the OS deadlock condition, are you able tell if your Pharo VM
process has subprocesses in the zombie state (indicating that grimReaperProcess
did not clean them up)? The unix command "ps -axf | less" will let you look
at the process tree and that may give us a clue if this is happening.

Thanks!

Dave



> 
> Would you mind telling the linux kernel / libc version of your gentoo box?
> 
> Thierry
> 
> 
> >
> >
> > 2015-06-02 11:25 GMT+02:00 Thierry Goubier <thierry.goub...@gmail.com>:
> >
> >> Hi Jose,
> >>
> >> yes, I've noticed that as well. It was, at a point, drastic (i.e. almost
> >> allways lock-up) on my work development laptop; it now happens far less
> >> often (but it does happens to me from time to time).
> >>
> >> Dave Lewis, the author of OSProcess, fixed one issue which solved most of
> >> the lockups I had, but not all of them. The lockup is in the interaction
> >> between OSProcess inside Pharo and the external shell command (i.e. it
> >> concerns anything which uses OSProcess), and seems like missing a signal.
> >> It is also machine and linux version dependent (Ubuntu 14.10 was horrible,
> >> 14.04 and 15.04 on the same hardware are far less sensitive), and seems to
> >> also depend on the load of the machine itself.
> >>
> >> By the way, which version of OSProcess you are using?
> >>
> >> Thierry
> >>
> >>
> >> 2015-06-02 11:10 GMT+02:00 Jose San Leandro <jose.sanlean...@osoco.es>:
> >>
> >>> Hi,
> >>>
> >>> In one of our projects we are using Pharo4. The image gets built by
> >>> gradle, which loads the Metacello project. Sometimes, we see the build
> >>> process hangs. It just don't progress.
> >>>
> >>> When adding local gitfiletree:// dependencies manually through
> >>> Monticello after a while Pharo gets frozen. It's not always the same
> >>> repository, it's not always the same number of repositories before it 
> >>> hangs.
> >>>
> >>> I launched the image with strace, and attached gdb to the frozen process.
> >>> It turns out It's waiting for a lock that gets never released.
> >>>
> >>> The environment is a 64b Gentoo Linux with enough of everything
> >>> (multiple monitors, multiple cores, enough RAM).
> >>>
> >>> I hope anybody could point me how to dig deeper into this.
> >>>
> >>> # gdb
> >>> (gdb) attach [pid]
> >>> [..]
> >>> Reading symbols from /usr/lib32/libbz2.so.1...(no debugging symbols
> >>> found)...done.
> >>> Loaded symbols for /usr/lib32/libbz2.so.1
> >>> 0x0809d8bb in signalSemaphoreWithIndex ()
> >>> (gdb) backtrace
> >>> #0  0x0809d8bb in signalSemaphoreWithIndex ()
> >>> #1  0x0810868c in handleSignal ()
> >>> #2  <signal handler called>
> >>> #3  0x0809d8c8 in signalSemaphoreWithIndex ()
> >>> #4  0x0809f0af in aioPoll ()
> >>> #5  0xf76f9671 in display_ioRelinquishProcessorForMicroseconds () from
> >>> /home/chous/realhome/toolbox/pharo-5.0/pharo-vm/vm-display-X11
> >>> #6  0x080a1887 in ioRelinquishProcessorForMicroseconds ()
> >>> #7  0x080767fa in primitiveRelinquishProcessor ()
> >>> #8  0xb6fc838c in ?? ()
> >>> #9  0xb6fc3700 in ?? ()
> >>> #10 0xb7952882 in ?? ()
> >>> #11 0xb6fc3648 in ?? ()
> >>> (gdb) disassemble
> >>> Dump of assembler code for function handleSignal:
> >>>    0x081085e0 <+0>:     sub    $0x9c,%esp
> >>>    0x081085e6 <+6>:     mov    %ebx,0x90(%esp)
> >>>    0x081085ed <+13>:    mov    0xa0(%esp),%ebx
> >>>    0x081085f4 <+20>:    mov    %esi,0x94(%esp)
> >>>    0x081085fb <+27>:    mov    %edi,0x98(%esp)
> >>>    0x08108602 <+34>:    movzbl 0x8168420(%ebx),%esi
> >>>    0x08108609 <+41>:    mov    %ebx,%eax
> >>>    0x0810860b <+43>:    mov    %esi,%edx
> >>>    0x0810860d <+45>:    call   0x81070d0 <forwardSignaltoSemaphoreAt>
> >>>    0x08108612 <+50>:    call   0x805aae0 <pthread_self@plt>
> >>>    0x08108617 <+55>:    mov    0x8168598,%edi
> >>>    0x0810861d <+61>:    cmp    %edi,%eax
> >>>    0x0810861f <+63>:    je     0x8108680 <handleSignal+160>
> >>>    0x08108621 <+65>:    lea    0x10(%esp),%esi
> >>>    0x08108625 <+69>:    mov    %esi,(%esp)
> >>>    0x08108628 <+72>:    call   0x805b330 <sigemptyset@plt>
> >>>    0x0810862d <+77>:    mov    %ebx,0x4(%esp)
> >>>    0x08108631 <+81>:    mov    %esi,(%esp)
> >>>    0x08108634 <+84>:    call   0x805b0c0 <sigaddset@plt>
> >>>    0x08108639 <+89>:    movl   $0x0,0x8(%esp)
> >>>    0x08108641 <+97>:    mov    %esi,0x4(%esp)
> >>>    0x08108645 <+101>:   movl   $0x0,(%esp)
> >>>    0x0810864c <+108>:   call   0x805ada0 <pthread_sigmask@plt>
> >>>    0x08108651 <+113>:   mov    %ebx,0x4(%esp)
> >>>    0x08108655 <+117>:   mov    %edi,(%esp)
> >>>    0x08108658 <+120>:   call   0x805b240 <pthread_kill@plt>
> >>>    0x0810865d <+125>:   mov    0x90(%esp),%ebx
> >>>    0x08108664 <+132>:   mov    0x94(%esp),%esi
> >>>    0x0810866b <+139>:   mov    0x98(%esp),%edi
> >>>    0x08108672 <+146>:   add    $0x9c,%esp
> >>>    0x08108678 <+152>:   ret
> >>>    0x08108679 <+153>:   lea    0x0(%esi,%eiz,1),%esi
> >>>    0x08108680 <+160>:   test   %esi,%esi
> >>>    0x08108682 <+162>:   je     0x810865d <handleSignal+125>
> >>>    0x08108684 <+164>:   mov    %esi,(%esp)
> >>>    0x08108687 <+167>:   call   0x809d8a0 <signalSemaphoreWithIndex>
> >>> => 0x0810868c <+172>:   jmp    0x810865d <handleSignal+125>
> >>> End of assembler dump.
> >>> (gdb) up 3
> >>> (gdb) disassemble
> >>> Dump of assembler code for function signalSemaphoreWithIndex:
> >>>    0x0809d8a0 <+0>:     push   %esi
> >>>    0x0809d8a1 <+1>:     xor    %eax,%eax
> >>>    0x0809d8a3 <+3>:     push   %ebx
> >>>    0x0809d8a4 <+4>:     sub    $0x24,%esp
> >>>    0x0809d8a7 <+7>:     mov    0x30(%esp),%esi
> >>>    0x0809d8ab <+11>:    test   %esi,%esi
> >>>    0x0809d8ad <+13>:    jle    0x809d918 <signalSemaphoreWithIndex+120>
> >>>    0x0809d8af <+15>:    mov    $0x1,%edx
> >>>    0x0809d8b4 <+20>:    lea    0x0(%esi,%eiz,1),%esi
> >>>    0x0809d8b8 <+24>:    mfence
> >>>    0x0809d8bb <+27>:    mov    $0x0,%eax
> >>>    0x0809d8c0 <+32>:    lock cmpxchg %edx,0x8152d80
> >>> => 0x0809d8c8 <+40>:    mov    %eax,0x1c(%esp)
> >>>    0x0809d8cc <+44>:    mov    0x1c(%esp),%eax
> >>>    0x0809d8d0 <+48>:    test   %eax,%eax
> >>>    0x0809d8d2 <+50>:    jne    0x809d8b8 <signalSemaphoreWithIndex+24>
> >>>    0x0809d8d4 <+52>:    mov    0x8152d84,%edx
> >>>    0x0809d8da <+58>:    cmp    $0x1ff,%edx
> >>>    0x0809d8e0 <+64>:    lea    0x1(%edx),%ebx
> >>>    0x0809d8e3 <+67>:    cmove  %eax,%ebx
> >>>    0x0809d8e6 <+70>:    mov    0x8152d88,%eax
> >>>    0x0809d8eb <+75>:    cmp    %ebx,%eax
> >>>    0x0809d8ed <+77>:    je     0x809d920 <signalSemaphoreWithIndex+128>
> >>>    0x0809d8ef <+79>:    mov    0x8152d84,%eax
> >>>    0x0809d8f4 <+84>:    mov    %esi,0x8152da0(,%eax,4)
> >>>    0x0809d8fb <+91>:    mfence
> >>>    0x0809d8fe <+94>:    mov    %ebx,0x8152d84
> >>>    0x0809d904 <+100>:   movl   $0x0,0x8152d80
> >>>    0x0809d90e <+110>:   call   0x807c2c0 <forceInterruptCheck>
> >>>    0x0809d913 <+115>:   mov    $0x1,%eax
> >>>    0x0809d918 <+120>:   add    $0x24,%esp
> >>>    0x0809d91b <+123>:   pop    %ebx
> >>>    0x0809d91c <+124>:   pop    %esi
> >>>    0x0809d91d <+125>:   ret
> >>>    0x0809d91e <+126>:   xchg   %ax,%ax
> >>>    0x0809d920 <+128>:   movl   $0x810c888,(%esp)
> >>>    0x0809d927 <+135>:   movl   $0x0,0x8152d80
> >>>    0x0809d931 <+145>:   call   0x80a3720 <error>
> >>>    0x0809d936 <+150>:   jmp    0x809d8ef <signalSemaphoreWithIndex+79>
> >>> End of assembler dump.
> >>>
> >>> Meanwhile, strace gets frozen showing this:
> >>> [..]
> >>> clone(child_stack=0,
> >>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> >>> child_tidptr=0x7f63665cd9d0) = 3736
> >>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> >>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> >>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> >>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> >>> rt_sigaction(SIGINT, {0x42a8a0, [], SA_RESTORER, 0x7f6365ba3ad0},
> >>> {SIG_DFL, [], SA_RESTORER, 0x7f6365ba3ad0}, 8) = 0
> >>> wait4(-1, 0x7ffc4ef7f7e8, 0, NULL)      = ? ERESTARTSYS (To be restarted
> >>> if SA_RESTART is set)
> >>> --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
> >>> wait4(-1,
> >>>
> >>
> >>
> >

Reply via email to