On 2013-11-14 17:04, Mark Andrews wrote:
In message
<fd9b2cb2b33e394fae3b7466954760571d666...@dfwx10hmptc01.amer.dell.co
M>, vinny_abe...@dell.com writes:
Hi Everyone,
I recently had a recursive server running BIND 9.9.4 on FreeBSD 9.2
appear to wedge and stop responding to clients. I had a flurry of these
errors on the console:
sonewconn: pcb 0xfffffe007211d930: Listen queue overflow: 16 already in
queue awaiting acceptance
I couldn't trace that directly back to the named process by the time I
looked at it, but I suspect that's what it was since it's really the only
thing this machine is used for and it stopped working. It seems to have
oddly become unstuck when I logged into the machine and started looking
around. I never restarted named. Everything else on the server was
running normally from what I could tell and no other errors existed that
I could find. Unfortunately my logs rolled over too fast to check if
named had logged anything else interesting.
From what I've found in googling, this is an OS level error stating the
process isn't accepting new TCP connections and it's an application
fault. I've only ever seen this on this particular machine, and just this
once. My other recursive servers are running older versions of FreeBSD.
Or it's just a plain DoS attack. For any service it is possible to
send tcp connection requests faster than the service can handle it.
Has anyone come across this before and know how to prevent or correct
this properly?
You can tune tcp-listen-queue in named.conf. The current default is 10.
Thanks!
-Vinny
My logs have been filling up with
sonewconn: pcb 0xfffffe02bb7187a8: Listen queue overflow: 10 already in queue
awaiting acceptance
Which seems to have started since upgrading to FreeBSD 9.2 (though there have
been other changes, but on the email front...so looking at BIND hadn't
crossed my mind at all until I spotted this thread), though its only on one
server, so I had been hunting around trying to figure out where its been
coming from.
The hex number doesn't correspond to any socket that shows up with lsof,
though the sockets that lsof show some resemblence.
doing "lsof -i -T fqs" and looking at QLIM=, I had thought sendmail was the
culprit since its default Listen queue is 10. But bumping it to 128, didn't
stop the messages. And, I couldn't find any other sockets this way with
QLIM=10.
The sockets associated with named ... the tcp domain sockets have QLIM=3 and
the rndc socket has a QLIM=128. For these systems, they're all running the
system BIND (9.8.4-P2).
named 1276 bind 20u IPv4 0xfffffe00a73697a0 0t0 TCP zen:domain
(LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named 1276 bind 21u IPv4 0xfffffe00a73693d0 0t0 TCP
zen2:domain (LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named 1276 bind 22u IPv4 0xfffffe00a738b3d0 0t0 TCP
localhost:domain (LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=3,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
named 1276 bind 23u IPv4 0xfffffe00a75223d0 0t0 TCP
localhost:rndc (LISTEN QR=0 QS=0
SO=ACCEPTCONN,NOSIGPIPE,PQLEN=0,QLEN=0,QLIM=128,RCVBUF=524288,REUSEADDR,SNDBUF=524288
SS=NBIO TF=MSS=536,REQ_SCALE,REQ_TSTMP,SACK_PERMIT)
FWIW, the only socket with QLIM=16 on my system is upsd (nut).
--
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
from this list
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users