2015-06-02 15:03 GMT+02:00 Jose San Leandro <jose.sanlean...@osoco.es>:
> No problem, of course. > > It's a dual-core running a custom 4.0.4-hardened-r2 kernel, > hardened/linux/amd64/selinux profile (but in permissive mode), glibc > version 2.20-r2, with multilib and selinux USE flags active. > > I can provide more information if that helps, of course. Even ssh to a > Docker container running in it, but it won't support X I fear. > When the pharo process get locked, can you do a kill -SIGUSR1 on the pharo process and look at the output? It will give the status inside the vm. Thierry > > Thanks! > > 2015-06-02 14:34 GMT+02:00 Thierry Goubier <thierry.goub...@gmail.com>: > >> >> >> 2015-06-02 12:14 GMT+02:00 Jose San Leandro <jose.sanlean...@osoco.es>: >> >>> Hi Thierry, >>> >>> ConfigurationOfOSProcess-ThierryGoubier.38.mcz, which corresponds to >>> version 4.6.2. >>> >> >> Ok, then this is the latest. >> >> >>> >>> Another workaround that would work for me is to be able to "resume" a >>> previous load attempt of a Metacello project. Or a custom "hook" in >>> Metacello to save the image after every dependency is successfully loaded. >>> >> >> Yes, this would work. I'll ask again Dave if he has any idea; the bug is >> hard to reproduce. >> >> Would you mind telling the linux kernel / libc version of your gentoo box? >> >> Thierry >> >> >>> >>> >>> 2015-06-02 11:25 GMT+02:00 Thierry Goubier <thierry.goub...@gmail.com>: >>> >>>> Hi Jose, >>>> >>>> yes, I've noticed that as well. It was, at a point, drastic (i.e. >>>> almost allways lock-up) on my work development laptop; it now happens far >>>> less often (but it does happens to me from time to time). >>>> >>>> Dave Lewis, the author of OSProcess, fixed one issue which solved most >>>> of the lockups I had, but not all of them. The lockup is in the interaction >>>> between OSProcess inside Pharo and the external shell command (i.e. it >>>> concerns anything which uses OSProcess), and seems like missing a signal. >>>> It is also machine and linux version dependent (Ubuntu 14.10 was horrible, >>>> 14.04 and 15.04 on the same hardware are far less sensitive), and seems to >>>> also depend on the load of the machine itself. >>>> >>>> By the way, which version of OSProcess you are using? >>>> >>>> Thierry >>>> >>>> >>>> 2015-06-02 11:10 GMT+02:00 Jose San Leandro <jose.sanlean...@osoco.es>: >>>> >>>>> Hi, >>>>> >>>>> In one of our projects we are using Pharo4. The image gets built by >>>>> gradle, which loads the Metacello project. Sometimes, we see the build >>>>> process hangs. It just don't progress. >>>>> >>>>> When adding local gitfiletree:// dependencies manually through >>>>> Monticello after a while Pharo gets frozen. It's not always the same >>>>> repository, it's not always the same number of repositories before it >>>>> hangs. >>>>> >>>>> I launched the image with strace, and attached gdb to the frozen >>>>> process. >>>>> It turns out It's waiting for a lock that gets never released. >>>>> >>>>> The environment is a 64b Gentoo Linux with enough of everything >>>>> (multiple monitors, multiple cores, enough RAM). >>>>> >>>>> I hope anybody could point me how to dig deeper into this. >>>>> >>>>> # gdb >>>>> (gdb) attach [pid] >>>>> [..] >>>>> Reading symbols from /usr/lib32/libbz2.so.1...(no debugging symbols >>>>> found)...done. >>>>> Loaded symbols for /usr/lib32/libbz2.so.1 >>>>> 0x0809d8bb in signalSemaphoreWithIndex () >>>>> (gdb) backtrace >>>>> #0 0x0809d8bb in signalSemaphoreWithIndex () >>>>> #1 0x0810868c in handleSignal () >>>>> #2 <signal handler called> >>>>> #3 0x0809d8c8 in signalSemaphoreWithIndex () >>>>> #4 0x0809f0af in aioPoll () >>>>> #5 0xf76f9671 in display_ioRelinquishProcessorForMicroseconds () from >>>>> /home/chous/realhome/toolbox/pharo-5.0/pharo-vm/vm-display-X11 >>>>> #6 0x080a1887 in ioRelinquishProcessorForMicroseconds () >>>>> #7 0x080767fa in primitiveRelinquishProcessor () >>>>> #8 0xb6fc838c in ?? () >>>>> #9 0xb6fc3700 in ?? () >>>>> #10 0xb7952882 in ?? () >>>>> #11 0xb6fc3648 in ?? () >>>>> (gdb) disassemble >>>>> Dump of assembler code for function handleSignal: >>>>> 0x081085e0 <+0>: sub $0x9c,%esp >>>>> 0x081085e6 <+6>: mov %ebx,0x90(%esp) >>>>> 0x081085ed <+13>: mov 0xa0(%esp),%ebx >>>>> 0x081085f4 <+20>: mov %esi,0x94(%esp) >>>>> 0x081085fb <+27>: mov %edi,0x98(%esp) >>>>> 0x08108602 <+34>: movzbl 0x8168420(%ebx),%esi >>>>> 0x08108609 <+41>: mov %ebx,%eax >>>>> 0x0810860b <+43>: mov %esi,%edx >>>>> 0x0810860d <+45>: call 0x81070d0 <forwardSignaltoSemaphoreAt> >>>>> 0x08108612 <+50>: call 0x805aae0 <pthread_self@plt> >>>>> 0x08108617 <+55>: mov 0x8168598,%edi >>>>> 0x0810861d <+61>: cmp %edi,%eax >>>>> 0x0810861f <+63>: je 0x8108680 <handleSignal+160> >>>>> 0x08108621 <+65>: lea 0x10(%esp),%esi >>>>> 0x08108625 <+69>: mov %esi,(%esp) >>>>> 0x08108628 <+72>: call 0x805b330 <sigemptyset@plt> >>>>> 0x0810862d <+77>: mov %ebx,0x4(%esp) >>>>> 0x08108631 <+81>: mov %esi,(%esp) >>>>> 0x08108634 <+84>: call 0x805b0c0 <sigaddset@plt> >>>>> 0x08108639 <+89>: movl $0x0,0x8(%esp) >>>>> 0x08108641 <+97>: mov %esi,0x4(%esp) >>>>> 0x08108645 <+101>: movl $0x0,(%esp) >>>>> 0x0810864c <+108>: call 0x805ada0 <pthread_sigmask@plt> >>>>> 0x08108651 <+113>: mov %ebx,0x4(%esp) >>>>> 0x08108655 <+117>: mov %edi,(%esp) >>>>> 0x08108658 <+120>: call 0x805b240 <pthread_kill@plt> >>>>> 0x0810865d <+125>: mov 0x90(%esp),%ebx >>>>> 0x08108664 <+132>: mov 0x94(%esp),%esi >>>>> 0x0810866b <+139>: mov 0x98(%esp),%edi >>>>> 0x08108672 <+146>: add $0x9c,%esp >>>>> 0x08108678 <+152>: ret >>>>> 0x08108679 <+153>: lea 0x0(%esi,%eiz,1),%esi >>>>> 0x08108680 <+160>: test %esi,%esi >>>>> 0x08108682 <+162>: je 0x810865d <handleSignal+125> >>>>> 0x08108684 <+164>: mov %esi,(%esp) >>>>> 0x08108687 <+167>: call 0x809d8a0 <signalSemaphoreWithIndex> >>>>> => 0x0810868c <+172>: jmp 0x810865d <handleSignal+125> >>>>> End of assembler dump. >>>>> (gdb) up 3 >>>>> (gdb) disassemble >>>>> Dump of assembler code for function signalSemaphoreWithIndex: >>>>> 0x0809d8a0 <+0>: push %esi >>>>> 0x0809d8a1 <+1>: xor %eax,%eax >>>>> 0x0809d8a3 <+3>: push %ebx >>>>> 0x0809d8a4 <+4>: sub $0x24,%esp >>>>> 0x0809d8a7 <+7>: mov 0x30(%esp),%esi >>>>> 0x0809d8ab <+11>: test %esi,%esi >>>>> 0x0809d8ad <+13>: jle 0x809d918 <signalSemaphoreWithIndex+120> >>>>> 0x0809d8af <+15>: mov $0x1,%edx >>>>> 0x0809d8b4 <+20>: lea 0x0(%esi,%eiz,1),%esi >>>>> 0x0809d8b8 <+24>: mfence >>>>> 0x0809d8bb <+27>: mov $0x0,%eax >>>>> 0x0809d8c0 <+32>: lock cmpxchg %edx,0x8152d80 >>>>> => 0x0809d8c8 <+40>: mov %eax,0x1c(%esp) >>>>> 0x0809d8cc <+44>: mov 0x1c(%esp),%eax >>>>> 0x0809d8d0 <+48>: test %eax,%eax >>>>> 0x0809d8d2 <+50>: jne 0x809d8b8 <signalSemaphoreWithIndex+24> >>>>> 0x0809d8d4 <+52>: mov 0x8152d84,%edx >>>>> 0x0809d8da <+58>: cmp $0x1ff,%edx >>>>> 0x0809d8e0 <+64>: lea 0x1(%edx),%ebx >>>>> 0x0809d8e3 <+67>: cmove %eax,%ebx >>>>> 0x0809d8e6 <+70>: mov 0x8152d88,%eax >>>>> 0x0809d8eb <+75>: cmp %ebx,%eax >>>>> 0x0809d8ed <+77>: je 0x809d920 <signalSemaphoreWithIndex+128> >>>>> 0x0809d8ef <+79>: mov 0x8152d84,%eax >>>>> 0x0809d8f4 <+84>: mov %esi,0x8152da0(,%eax,4) >>>>> 0x0809d8fb <+91>: mfence >>>>> 0x0809d8fe <+94>: mov %ebx,0x8152d84 >>>>> 0x0809d904 <+100>: movl $0x0,0x8152d80 >>>>> 0x0809d90e <+110>: call 0x807c2c0 <forceInterruptCheck> >>>>> 0x0809d913 <+115>: mov $0x1,%eax >>>>> 0x0809d918 <+120>: add $0x24,%esp >>>>> 0x0809d91b <+123>: pop %ebx >>>>> 0x0809d91c <+124>: pop %esi >>>>> 0x0809d91d <+125>: ret >>>>> 0x0809d91e <+126>: xchg %ax,%ax >>>>> 0x0809d920 <+128>: movl $0x810c888,(%esp) >>>>> 0x0809d927 <+135>: movl $0x0,0x8152d80 >>>>> 0x0809d931 <+145>: call 0x80a3720 <error> >>>>> 0x0809d936 <+150>: jmp 0x809d8ef <signalSemaphoreWithIndex+79> >>>>> End of assembler dump. >>>>> >>>>> Meanwhile, strace gets frozen showing this: >>>>> [..] >>>>> clone(child_stack=0, >>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, >>>>> child_tidptr=0x7f63665cd9d0) = 3736 >>>>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >>>>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 >>>>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >>>>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 >>>>> rt_sigaction(SIGINT, {0x42a8a0, [], SA_RESTORER, 0x7f6365ba3ad0}, >>>>> {SIG_DFL, [], SA_RESTORER, 0x7f6365ba3ad0}, 8) = 0 >>>>> wait4(-1, 0x7ffc4ef7f7e8, 0, NULL) = ? ERESTARTSYS (To be >>>>> restarted if SA_RESTART is set) >>>>> --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- >>>>> wait4(-1, >>>>> >>>> >>>> >>> >> >