The following reply was made to PR kern/166340; it has been noted by GNATS.
From: Christian Esken <christian.es...@trivago.com> To: bug-follo...@freebsd.org Cc: Konstantin Belousov <kostik...@gmail.com>, a...@freebsd.org Subject: Re: misc/166340: Process under FreeBSD 9.0 hangs in uninterruptable sleep with apparently no syscall (empty wchan) Date: Tue, 27 Mar 2012 17:30:48 +0200 --=-3sl93MaMYlu/dkvvUUBe Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Konstantin Belousov wrote: > Thank you for the data. Semi-obviously, the callout_stop() call in > sleepq_check_timeout() have to return 0, otherwise we would not call > mi_switch() there. But I do not see how this can happen, because > the callout state, printed from kgdb, still indicates that callout > is pending. Callout cannot be reset while in sleepq code. > > So there are two possible routes to go forward: preferrable is for > you to extract the self-contained C program that would illustrate > the issue and send this sample to me. Second is to recompile your > kernel with INVARIANTS/WITNESS and possibly KTR and see what happen. I repeated the test with INVARIANTS/WITNESS and KTR compiled in (actually WITNESS was already included during the last test). I ran KTR with nothing filtered out, and formatted the dump with "ktrdump -cftH -i ktr.out". The whole log is excessive (1GB), so I have extrated two short sections (see attachment). The first section shows the last action of the application, namely a succselful sendto() to a TCP socket, and then waiting for an answer via recvfrom(). The second section illustrates the lock/unlock sequence of the sleep mutex for the recfrom(). It goes like LOCK, LOCK, UNLOCK. This time the signal status is different. We have a pending signal: USER PID PPID PENDING CAUGHT IGNORED BLOCKED STAT WCHAN nobody 9163 1 4000 80005006 79f88010 0 D - Looks like SIGPROF (27). Just wondering where it comes from. By the way: I evaluated the possibility to implement a standalone test case. It would be extremely complicated, as the issue is while writing to the socket, and thus it would require extracting the socket code from the Thrift procect (http://thrift.apache.org/ ). Christian --=-3sl93MaMYlu/dkvvUUBe Content-Disposition: attachment; filename="wait_recvfrom.txt" Content-Type: text/plain; name="wait_recvfrom.txt"; charset="UTF-8" Content-Transfer-Encoding: 7bit Last actions of pid 9163 (serelog): sendto() succesful recvfrom() waits for data, uisng sleep mutex 7463551 5 5644560314159 /usr/src/sys/kern/kern_sx.c:352 0xfffffe01972b2480 XUNLOCK (sx) so_snd_sx 0xfffffe0344b07490 r = 0 at /usr/src/sys/kern/uipc_sockbuf.c:160 7463552 1 5644560316280 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe01bcf4d000 LOCK (sleep mutex) kqueue 0xfffffe03eef5eb00 r = 0 at /usr/src/sys/kern/kern_event.c:1779 7463553 5 5644560319107 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:149 0xfffffe01972b2480 syscall: p=0xfffffe01bc1b0910 error=0 return 0x77d 0x77d 7463557 5 5644560329931 /usr/src/sys/kern/subr_trap.c:101 0xfffffe01972b2480 userret: thread 0xfffffe01972b2480 (pid 9163, serelog) 7463559 4 5644560336733 /usr/src/sys/vm/uma_core.c:1975 0xfffffe0008364480 uma_zalloc_arg thread 8364480 zone mbuf flags 1 7463561 6 5644560344432 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:84 0xfffffe01972ae900 syscall: td=0xfffffe01972ae900 pid 9767 gettimeofday (0x7fffffffac40, 0, 0x43fd18) 7463562 1 5644560347528 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe03eef5eb00 r = 0 at /usr/src/sys/kern/kern_event.c:1796 7463563 6 5644560348788 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:149 0xfffffe01972ae900 syscall: p=0xfffffe01bc1ad000 error=0 return 0 0x43fd18 7463564 5 5644560351047 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972b2480 syscall sendto exit thread 0xfffffe01972b2480 pid 9163 proc serelog 7463565 1 5644560354848 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe01bcf4d000 LOCK (sleep mutex) kqueue 0xfffffe02f7525700 r = 0 at /usr/src/sys/kern/kern_event.c:1779 7463567 5 5644560360499 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:84 0xfffffe01972b2480 syscall: td=0xfffffe01972b2480 pid 9163 gettimeofday (0x7fffffffc600, 0, 0xffffffff ) 7463568 2 5644560364617 /usr/src/sys/kern/kern_mutex.c:356 0xfffffe004e384000 _mtx_lock_sleep: taskqueue contested (lock=0xfffffe0008375000) at /usr/src/sys/kern/kern_mutex.c :147 7463569 5 5644560366559 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:149 0xfffffe01972b2480 syscall: p=0xfffffe01bc1b0910 error=0 return 0 0xffffffff 7463570 3 5644560369374 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe0008375000 UNLOCK (sleep mutex) taskqueue 0xfffffe004e2604b0 r = 0 at /usr/src/sys/kern/subr_taskqueue.c:21 6 7463572 6 5644560378372 /usr/src/sys/kern/subr_trap.c:101 0xfffffe01972ae900 userret: thread 0xfffffe01972ae900 (pid 9767, httpd) 7463573 3 5644560380378 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe0008375000 LOCK (sleep mutex) bio queue 0xffffffff81146ad0 r = 0 at /usr/src/sys/geom/geom_io.c:77 7463574 4 5644560385085 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe0008364480 LOCK (sleep mutex) mbuf 0xfffffe043ffe5e10 r = 0 at /usr/src/sys/vm/uma_core.c:2013 7463575 1 5644560388724 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe02f7525700 r = 0 at /usr/src/sys/kern/kern_event.c:1796 7463576 6 5644560391648 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972ae900 syscall gettimeofday exit thread 0xfffffe01972ae900 pid 9767 proc httpd 7463577 3 5644560394202 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe0008375000 UNLOCK (sleep mutex) bio queue 0xffffffff81146ad0 r = 0 at /usr/src/sys/geom/geom_io.c:84 7463578 1 5644560396480 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe01bcf4d000 LOCK (sleep mutex) kqueue 0xfffffe01e96bd000 r = 0 at /usr/src/sys/kern/kern_event.c:1779 7463580 5 5644560404315 /usr/src/sys/kern/subr_trap.c:101 0xfffffe01972b2480 userret: thread 0xfffffe01972b2480 (pid 9163, serelog) 7463581 2 5644560408289 /usr/src/sys/kern/kern_mutex.c:374 0xfffffe004e384000 _mtx_lock_sleep: spinning on 0xfffffe004e2604b0 held by 0xfffffe0008375000 7463582 5 5644560410639 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972b2480 syscall gettimeofday exit thread 0xfffffe01972b2480 pid 9163 proc serelog 7463583 2 5644560414981 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe004e384000 LOCK (sleep mutex) taskqueue 0xfffffe004e2604b0 r = 0 at /usr/src/sys/kern/kern_mutex.c:147 7463584 5 5644560419167 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:84 0xfffffe01972b2480 syscall: td=0xfffffe01972b2480 pid 9163 recvfrom (0x6, 0x7fffffffc734, 0x4) 7463585 1 5644560421688 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe01e96bd000 r = 0 at /usr/src/sys/kern/kern_event.c:1796 7463586 5 5644560423707 /usr/src/sys/kern/kern_sx.c:291 0xfffffe01972b2480 XLOCK (sx) so_rcv_sx 0xfffffe0344b073a0 r = 0 at /usr/src/sys/kern/uipc_sockbuf.c:148 7463587 3 5644560426330 /usr/src/sys/geom/geom_io.c:678 0xfffffe0008375000 g_up biodone bp 0xfffffe004e8a0740 provider da0s1 off 134525138944 len 131072 7463588 5 5644560428231 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe01972b2480 LOCK (sleep mutex) so_rcv 0xfffffe0344b07380 r = 0 at /usr/src/sys/kern/uipc_socket.c:1488 This is the last place where I see 0xfffffe01972b2480 or "pid 9163". The value behind so_rcv os 0xfffffe0344b07380, and that one comes twice again: 7464953 4 5644563693233 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe0008364480 LOCK (sleep mutex) so_rcv 0xfffffe0344b07380 r = 0 at /usr/src/sys/netinet/tcp_input.c:2834 7464954 0 5644563696896 /usr/src/sys/kern/kern_mutex.c:205 0xfffffe004e75a900 LOCK (sleep mutex) KNOTE 0xfffffe004e104010 r = 0 at /usr/src/sys/vm/uma_core.c:2013 7464955 6 5644563699004 /usr/src/sys/kern/subr_trap.c:101 0xfffffe01972ae900 userret: thread 0xfffffe01972ae900 (pid 9767, httpd) 7464956 1 5644563701340 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe01f3fb8900 r = 0 at /usr/src/sys/kern/kern_event.c:1796 7464957 6 5644563704636 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972ae900 syscall gettimeofday exit thread 0xfffffe01972ae900 pid 9767 proc httpd 7464958 3 5644563706986 /usr/src/sys/geom/geom_io.c:165 0xfffffe0008375000 #2 0xffffffff807c5a45 at g_io_schedule_up+0x175 7464959 4 5644563708981 /usr/src/sys/kern/kern_mutex.c:222 0xfffffe0008364480 UNLOCK (sleep mutex) so_rcv 0xfffffe0344b07380 r = 0 at /usr/src/sys/kern/uipc_sockbuf.c:201 What I find notewothy is, that the lock/unlock sequence with so_rcv 0xfffffe0344b07380 looks odd: LOCK, LOCK, UNLOCK Natively one would expect to see one more UNLOCK like this: LOCK, UNLOCK, LOCK, UNLOCK --=-3sl93MaMYlu/dkvvUUBe-- _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"