Re: Stuck processes in unkillable (STOP) state, listen queue overflow

Zara Kanaeva Thu, 29 Oct 2015 08:48:55 -0700

Hello Дмитрий,

thank you very much for your message.

First of all: I like FreeBSD (the installation logic, the gooddocumentation etc.), this is why I use FreeBSD as Server OS. But in mycase I must desagree your strong theoretical probabilityconsideration. In my case I have one machine (7 years old), that had1-2 spontaneous rebootes in a year. In my case I got a lot of "alreadyin queue awaiting acceptance"-Errors and the machine rebootesimmediately after this.

I will get soon a new replacement for this old machine with at least32 GB RAM and (of course) new power supply. So I will see if myproblem (perhaps it is only my problem) still persist.


Greetings, Z. Kanaeva.

Zitat von Дмитрий Долбнин <bad_...@list.ru>:

Good day everyone !

From my point of view it seems like you're experiencing the"downgraded" hardware performance which causes you the problems youmeet.

Try to switch for the "new-one" power supply at least.

Why I think so ? Because the bad power supplies are met much moreoften than the bad source code for FreeBSD. Of course I can't tellyou you're completely wrong.

Best regards, Dimitry.

Среда, 28 октября 2015, 12:00 UTC от freebsd-stable-requ...@freebsd.org:

Send freebsd-stable mailing list submissions to
freebsd-stable@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-stable-requ...@freebsd.org

You can reach the person managing the list at
freebsd-stable-ow...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."


Today's Topics:

   1. Re: Stuck processes in unkillable (STOP) state, listen queue
      overflow (Zara Kanaeva)
   2. Re: Stuck processes in unkillable (STOP) state, listen queue
      overflow (Nagy, Attila)


----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Oct 2015 14:42:42 +0100
From: Zara Kanaeva < zara.kana...@ggi.uni-tuebingen.de >
To:  freebsd-stable@freebsd.org
Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
overflow
Message-ID:
< 20151027144242.horde.3xc1_rqzavmaz12x6opx...@webmail.uni-tuebingen.de >

Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes

Hello,

I have the same experience with apache and mapserver. It happens on
physical machine and ends with spontaneous reboot. This machine is
updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
this machine doesn't have enough RAM (only 8GB), but I think that must
not be a reason for a spontaneous reboot.

I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
on it (I am not 100% sure, I have yet no possibility to test it).

Regards, Z. Kanaeva.

Zitat von "Nagy, Attila" < b...@fsn.hu >:

Hi,

Recently I've started to see a lot of cases, where the log is full
with "listen queue overflow" messages and the process behind the
network socket is unavailable.
When I open a TCP to it, it opens but nothing happens (for example I
get no SMTP banner from postfix, nor I get a log entry about the new
connection).

I've seen this with Java programs, postfix and redis, basically
everything which opens a TCP and listens on the machine.

For example, I have a redis process, which listens on 6381. When I
telnet into it, the TCP opens, but the program doesn't respond.
When I kill it, nothing happens. Even with kill -9 yields only this state:

PID USERNAME THR PRI NICE SIZE RES STATE C TIMEWCPU COMMAN

  776 redis            2  20    0 24112K  2256K STOP    3 16:56
0.00% redis-

When I tcpdrop the connections of the process, tcpdrop reports
success for the first time and failure for the second (No such
process), but the connections remain:
# sockstat -4 | grep 776
redis    redis-serv 776   6  tcp4   *:6381 *:*
redis    redis-serv 776   9  tcp4   *:16381 *:*
redis    redis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
redis    redis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
redis    redis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
redis    redis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
redis    redis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
redis    redis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
redis    redis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
redis    redis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491
# sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided,
or not known
tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided,
or not known
tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process
tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process
# sockstat -4 | grep 776
redis    redis-serv 776   6  tcp4   *:6381 *:*
redis    redis-serv 776   9  tcp4   *:16381 *:*
redis    redis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
redis    redis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
redis    redis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
redis    redis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
redis    redis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
redis    redis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
redis    redis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
redis    redis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491

$ procstat -k 776
  PID    TID COMM             TDNAME KSTACK
  776 100725 redis-server     -                mi_switch
sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall
Xfast_syscall
  776 100744 redis-server     -                mi_switch
thread_suspend_switch thread_single exit1 sigexit postsig ast
doreti_ast

I can do nothing to get out from this state, only reboot helps.

The OS is stable/10@r289313, but I could observe this behaviour with
earlier releases too.

The dmesg is full with lines like these:
sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193
already in queue awaiting acceptance (3142 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3068 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3057 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3037 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3015 occurrences)
sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
already in queue awaiting acceptance (3035 occurrences)

I guess this is the effect of the process freeze, not the cause (the
listen queue fills up because the app can't handle the incoming
connections).

I'm not sure it matters, but some of the machines (and the above)
runs on an ESX hypervisor (but as far as I can remember, I could see
this on physical machines too, but I'm not sure about that).
Also -so far- I could only see this where some "exotic" stuff ran,
like a java or erlang based server (opendj, elasticsearch and
rabbitmq).

Also not sure about which triggers this. I've never seen this after
some hours of uptime, at least some days or a week must've been
passed to get stuck like the above.

Any ideas about this?

Thanks,
_______________________________________________
 freebsd-stable@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to " freebsd-stable-unsubscr...@freebsd.org "




--
Dipl.-Inf. Zara Kanaeva
Heidelberger Akademie der Wissenschaften
Forschungsstelle "The role of culture in early expansions of humans"
an der Universit?t T?bingen
Geographisches Institut
Universit?t T?bingen
Ruemelinstr. 19-23
72070 Tuebingen

Tel.: +49-(0)7071-2972132
e-mail:  zara.kana...@geographie.uni-tuebingen.de
-------
- Theory is when you know something but it doesn't work.
- Practice is when something works but you don't know why.
- Usually we combine theory and practice:
         Nothing works and we don't know why.



------------------------------

Message: 2
Date: Tue, 27 Oct 2015 17:25:01 +0100
From: "Nagy, Attila" < b...@fsn.hu >
To: Zara Kanaeva < zara.kana...@ggi.uni-tuebingen.de >,
freebsd-stable@freebsd.org
Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
overflow
Message-ID: < 562fa55d.6050...@fsn.hu >
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

(following topposting)
I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't
matter.
Do you use zfs? Although it doesn't seem to be stuck on IO...

On 10/27/15 14:42, Zara Kanaeva wrote:

Hello,

I have the same experience with apache and mapserver. It happens on
physical machine and ends with spontaneous reboot. This machine is
updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
this machine doesn't have enough RAM (only 8GB), but I think that must
not be a reason for a spontaneous reboot.

I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
on it (I am not 100% sure, I have yet no possibility to test it).

Regards, Z. Kanaeva.


_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"




--
Dipl.-Inf. Zara Kanaeva
Heidelberger Akademie der Wissenschaften
Forschungsstelle "The role of culture in early expansions of humans"
an der Universität Tübingen
Geographisches Institut
Universität Tübingen
Ruemelinstr. 19-23
72070 Tuebingen

Tel.: +49-(0)7071-2972132
e-mail: zara.kana...@geographie.uni-tuebingen.de
-------
- Theory is when you know something but it doesn't work.
- Practice is when something works but you don't know why.
- Usually we combine theory and practice:
        Nothing works and we don't know why.

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Stuck processes in unkillable (STOP) state, listen queue overflow

Reply via email to