Hello, apologies in advance for the silly question.
We are having some stability issues with our squid farms after a recent upgrade
from Centos/Squid 3.5.x to Ubuntu/Squid 5.7/6.9. I wonder if anyone here has
seen something similar, and might have some suggestion about what we are
obviously missing?
In short, after running for a certain period the servers run out of file
descriptors. We see a slowly growing number of TCP or TCPv6 socket handles,
that eventually hits the configured maximum. The handles do not get released
until after squid is restarted (-k restart)
It is somewhat similar to what reported under
https://access.redhat.com/solutions/3362211 . They state that
- If an application fails to close() it's socket descriptors and continues
to allocate new sockets then it can use up all the system memory on TCP(v6)
slab objects.
- Note some of these sockets will not show up in /proc/net/sockstat(6).
Sockets that still have a file descriptor but are in the TCP_CLOSE state will
consume a slab object. But will not be accounted for in /proc/net/sockstat(6)
or "ss" or "netstat".
- It can be determined whether this is an application sockets leak, by
stopping the application processes that are consuming sockets. If the slab
objects in /proc/slabinfo are freed then the application is responsible. As
that means that destructor routines have found open file descriptors to sockets
in the process.
"This is most likely to be a case of the application not handling error
conditions correctly and not calling close() to free the FD and socket."
For example, on a server with squid 5.7, unmodified package:
list of open files;
lsof |wc -l56963
of which 35K in TCPv6:
lsof |grep proxy |grep TCPv6 |wc -l
35301
under /proc I see less objects
cat /proc/net/tcp6 |wc -l
3095
but the number of objects in the slabs is highcat /proc/slabinfo |grep
TCPv6 MPTCPv6 0 0 2048 16 8 : tunables 0 0
0 : slabdata 0 0 0tw_sock_TCPv6 1155 1155 248
33 2 : tunables 0 0 0 : slabdata 35 35 0
request_sock_TCPv6 0 0 304 26 2 : tunables 0 0 0 :
slabdata 0 0 0TCPv6 38519 38519 2432 13
8 : tunables 0 0 0 : slabdata 2963 2963 0
I have 35K of lines like thislsof |grep proxy |grep TCPv6 |moresquid
1049 proxy 13u sock 0,8 0t0
5428173 protocol: TCPv6squid 1049 proxy 14u sock
0,8 0t0 27941608 protocol: TCPv6squid 1049
proxy 24u sock 0,8 0t0 45124047
protocol: TCPv6squid 1049 proxy 25u sock
0,8 0t0 50689821 protocol: TCPv6...
We thought maybe this is a weird IPv6 thing, as we only route IPv4, so we
compiled a more recent version of squid with no v6 support. The thing just
moved to TCP4..
lsof |wc -l120313
cat /proc/slabinfo |grep TCPMPTCPv6 0 0 2048 16 8 :
tunables 0 0 0 : slabdata 0 0 0tw_sock_TCPv6 0
0 248 33 2 : tunables 0 0 0 : slabdata 0 0
0request_sock_TCPv6 0 0 304 26 2 : tunables 0 0 0 :
slabdata 0 0 0TCPv6 208 208 2432 13 8 :
tunables 0 0 0 : slabdata 16 16 0MPTCP 0
0 1856 17 8 : tunables 0 0 0 : slabdata 0 0
0tw_sock_TCP 5577 5577 248 33 2 : tunables 0 0 0 :
slabdata 169 169 0request_sock_TCP 1898 2002 304 26 2 :
tunables 0 0 0 : slabdata 77 77 0TCP 102452
113274 2240 14 8 : tunables 0 0 0 : slabdata 8091 8091 0
cat /proc/net/tcp |wc -l255
After restarting squid the slab objects are released and the open file
descriptors drop to a reasonable value. This further suggests it is squid
hanging on to these FDs.
lsof |grep proxy |wc -l1221
Any suggestion? I guess it's something blatantly obvious, but it's a couple of
days we look at this and we're not going anywhere...
Thanks again
___
squid-users mailing list
squid-users@lists.squid-cache.org
https://lists.squid-cache.org/listinfo/squid-users