On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
<judayki...@gmail.com> wrote:
>
> I have a situation on zombie parent scenario with golang
>
>  A process (in the case replicator) has many goroutines internally
>
> We hit into panic() and I see the replicator process is in Zombie state
>
> <<>>>:~$ ps -ef | grep replicator
>
> root      87548  87507  0 Aug23 ?        00:00:00 [replicator] <defunct>
>
>
>
> Main go routine (or the supporting P) excited, but panic left the other P 
> thread to be still in executing state (main P could be 87548 and supporting P 
> thread 87561 is still there) in blocked state
>
> bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx------. 1 root 
> root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx------. 1 root root 64 Aug 25 
> 10:59 2 -> pipe:[606649]
>
> Stack trace
>
> bash-4.2# cat /proc/87548/task/87561/stack[<ffffffffbb114714>] 
> futex_wait_queue_me+0xc4/0x120[<ffffffffbb11520a>] 
> futex_wait+0x10a/0x250[<ffffffffbb1182ce>] 
> do_futex+0x35e/0x5b0[<ffffffffbb11865b>] 
> SyS_futex+0x13b/0x180[<ffffffffbb003c09>] 
> do_syscall_64+0x79/0x1b0[<ffffffffbba00081>] 
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2[<ffffffffffffffff>] 
> 0xffffffffffffffff
>
>
>
> We have panic internally from main go routine
>
> fatal error: concurrent map writes
>
> goroutine 666359 [running]:
> runtime.throw(0x101d6ae, 0x15)
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
>  +0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
> runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990, 0x83, 
> 0xc0009d03c8)
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
>  +0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
> github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990,
>  0x83, 0x0)
>
> .......
>
> goroutine 665516 [chan receive, 2 minutes]:
> zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0, 0xc002e906c0, 0x52, 
> 0xc00302ec60, 0x29)
> /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
> created by zeus.(*Leadership).LeaderValue
> /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
> 2020-08-03 00:35:04 rolled over log file
> ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123 dataset.go:26] 
> initialize zfs linking
> ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123 dataset.go:34] 
> completed zfs linking successfully
> I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid: 
> c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
> I0803 00:35:04.433460 196123 main.go:99] Component name using for this 
> process : abc-c238e584-0eeb-48bd-b299-2a25b13602f1
> I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB
>
>  If there is panic() from main P thread, as I understand we exit() and 
> cleanup all P threads of the process.
>
>  Are we hitting into the following scenario, I did not look into M-P-G 
> implantation in detail.
>
>  Example:
>
> #include <stdio.h>
> #include <pthread.h>
> #include <unistd.h>
> #include <stdlib.h>
>
> void *thread_function(void *args)
> {
> printf("The is new thread! Sleep 20 seconds...\n");
> sleep(100);
> printf("Exit from thread\n");
> pthread_exit(0);
> }
>
> int main(int argc, char **argv)
> {
> pthread_t thrd;
> pthread_attr_t attr;
> int res = 0;
> res = pthread_attr_init(&attr);
> res = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
> res = pthread_create(&thrd, &attr, thread_function, NULL);
> res = pthread_attr_destroy(&attr);
> printf("Main thread. Sleep 5 seconds\n");
> sleep(5);
> printf("Exit from main process\n");
> pthread_exit(0);
> }
>
> kkk@ ~/mycode/go () $ ./a.out &
> [1] 108418Main thread. Sleep 5 secondsThe is new thread! Sleep 20 seconds...
> kkk@ ~/mycode/go () $
> Exit from main processs
> PID TTY          TIME CMD
> 49313 pts/26   00:00:01 bash108418 pts/26   00:00:00 [a.out] <defunct>108449 
> pts/26   00:00:00 ps
>
>  See the main process is <defunct> and child is still hanging around
>
> kkk@ ~/mycode/go () $ sudo cat 
> /proc/108418/task/108420/stack[<ffffffff810b4c1d>] 
> hrtimer_nanosleep+0xbd/0x1d0[<ffffffff810b4dae>] 
> SyS_nanosleep+0x7e/0x90[<ffffffff816a63c9>] 
> system_call_fastpath+0x16/0x1b[<ffffffffffffffff>] 
> 0xffffffffffffffffujonnala@ ~/mycode/go () $ Exit from thread
>
>  Any help in this regard is appreciated.


I think you are misreading something somewhere.  Zombie status is a
feature of a process, not a thread.  It means that the child process
has exited but that the parent process, the one which started the
child process via the fork system call (or, on GNU/Linux, the clone
system call), has not called the wait (or waitpid or wait3 or wait4)
system call to collect its status.

So don't look at threads or P's.  Look at the parent process that
started the process that became a zombie.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXCc05jyP6OzKt0vRJ7nUod%3DFT9JTAivU3ACfDHxGg%3Djw%40mail.gmail.com.

Reply via email to