Среда, 28 октября 2015, 12:00 UTC от freebsd-stable-requ...@freebsd.org:
Send freebsd-stable mailing list submissions to
freebsd-stable@freebsd.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-stable-requ...@freebsd.org
You can reach the person managing the list at
freebsd-stable-ow...@freebsd.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."
Today's Topics:
1. Re: Stuck processes in unkillable (STOP) state, listen queue
overflow (Zara Kanaeva)
2. Re: Stuck processes in unkillable (STOP) state, listen queue
overflow (Nagy, Attila)
----------------------------------------------------------------------
Message: 1
Date: Tue, 27 Oct 2015 14:42:42 +0100
From: Zara Kanaeva < zara.kana...@ggi.uni-tuebingen.de >
To: freebsd-stable@freebsd.org
Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
overflow
Message-ID:
< 20151027144242.horde.3xc1_rqzavmaz12x6opx...@webmail.uni-tuebingen.de >
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
Hello,
I have the same experience with apache and mapserver. It happens on
physical machine and ends with spontaneous reboot. This machine is
updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
this machine doesn't have enough RAM (only 8GB), but I think that must
not be a reason for a spontaneous reboot.
I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
on it (I am not 100% sure, I have yet no possibility to test it).
Regards, Z. Kanaeva.
Zitat von "Nagy, Attila" < b...@fsn.hu >:
Hi,
Recently I've started to see a lot of cases, where the log is full
with "listen queue overflow" messages and the process behind the
network socket is unavailable.
When I open a TCP to it, it opens but nothing happens (for example I
get no SMTP banner from postfix, nor I get a log entry about the new
connection).
I've seen this with Java programs, postfix and redis, basically
everything which opens a TCP and listens on the machine.
For example, I have a redis process, which listens on 6381. When I
telnet into it, the TCP opens, but the program doesn't respond.
When I kill it, nothing happens. Even with kill -9 yields only this state:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME
WCPU COMMAN
776 redis 2 20 0 24112K 2256K STOP 3 16:56
0.00% redis-
When I tcpdrop the connections of the process, tcpdrop reports
success for the first time and failure for the second (No such
process), but the connections remain:
# sockstat -4 | grep 776
redis redis-serv 776 6 tcp4 *:6381 *:*
redis redis-serv 776 9 tcp4 *:16381 *:*
redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460
redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795
redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379
redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384
redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354
redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999
redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488
redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491
# sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided,
or not known
tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided,
or not known
tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process
tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process
# sockstat -4 | grep 776
redis redis-serv 776 6 tcp4 *:6381 *:*
redis redis-serv 776 9 tcp4 *:16381 *:*
redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460
redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795
redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379
redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384
redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354
redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999
redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488
redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491
$ procstat -k 776
PID TID COMM TDNAME KSTACK
776 100725 redis-server - mi_switch
sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall
Xfast_syscall
776 100744 redis-server - mi_switch
thread_suspend_switch thread_single exit1 sigexit postsig ast
doreti_ast
I can do nothing to get out from this state, only reboot helps.
The OS is stable/10@r289313, but I could observe this behaviour with
earlier releases too.
The dmesg is full with lines like these:
sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193
already in queue awaiting acceptance (3142 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3068 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3057 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3037 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3015 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3035 occurrences)
I guess this is the effect of the process freeze, not the cause (the
listen queue fills up because the app can't handle the incoming
connections).
I'm not sure it matters, but some of the machines (and the above)
runs on an ESX hypervisor (but as far as I can remember, I could see
this on physical machines too, but I'm not sure about that).
Also -so far- I could only see this where some "exotic" stuff ran,
like a java or erlang based server (opendj, elasticsearch and
rabbitmq).
Also not sure about which triggers this. I've never seen this after
some hours of uptime, at least some days or a week must've been
passed to get stuck like the above.
Any ideas about this?
Thanks,
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to " freebsd-stable-unsubscr...@freebsd.org "
--
Dipl.-Inf. Zara Kanaeva
Heidelberger Akademie der Wissenschaften
Forschungsstelle "The role of culture in early expansions of humans"
an der Universit?t T?bingen
Geographisches Institut
Universit?t T?bingen
Ruemelinstr. 19-23
72070 Tuebingen
Tel.: +49-(0)7071-2972132
e-mail: zara.kana...@geographie.uni-tuebingen.de
-------
- Theory is when you know something but it doesn't work.
- Practice is when something works but you don't know why.
- Usually we combine theory and practice:
Nothing works and we don't know why.
------------------------------
Message: 2
Date: Tue, 27 Oct 2015 17:25:01 +0100
From: "Nagy, Attila" < b...@fsn.hu >
To: Zara Kanaeva < zara.kana...@ggi.uni-tuebingen.de >,
freebsd-stable@freebsd.org
Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
overflow
Message-ID: < 562fa55d.6050...@fsn.hu >
Content-Type: text/plain; charset=utf-8; format=flowed
Hi,
(following topposting)
I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't
matter.
Do you use zfs? Although it doesn't seem to be stuck on IO...
On 10/27/15 14:42, Zara Kanaeva wrote:
Hello,
I have the same experience with apache and mapserver. It happens on
physical machine and ends with spontaneous reboot. This machine is
updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
this machine doesn't have enough RAM (only 8GB), but I think that must
not be a reason for a spontaneous reboot.
I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
on it (I am not 100% sure, I have yet no possibility to test it).
Regards, Z. Kanaeva.