Hi Stuart, great questions, here you go:
2013/8/14 Stuart Henderson <[email protected]> > On 2013/08/13 21:16, KÄrlis MiÄ·elsons wrote: > > Now let me explain "hanging up". When it "hangs up", it responds to > > ICMP echo requests but none of TCP services respond (it is running > > sshd, Apache httpd, OpenBSD spamd, Postfix). > > "none of TCP services respond" - please expand on this: if you try and > connect to a listening port, does it totally fail to respond, i.e.: > > $ telnet $somehost 25 > Trying $somehost... > << big pause >> > telnet: connect to address $somehost: Connection timed out > > Or, does it connect but you get no connection banner / response, i.e. > At first, Connection reset by peer and (then, as far as I remember) Connection refused. Then after ~2 hours, they just started blocking i.e. connect() took forever and got no error message. > $ telnet $somehost 25 > Trying $somehost... > Connected to $somehost. > Escape character is '^]'. > << just sits there >> > This does not happen - no connections go through, as in, accept() never returns in the daemon processes. > (Look at a couple of different ports and see if there's any difference - > some daemons fork a new process to answer a request, some don't). > I tried daemons that do not fork in order to answer a request - I know this as I tried with a minimal example C-based TCP server just to double-check that it was accept() that did not go through. Also: is the console responsive Yes. > (can you login or run processes) or > is that also hanging? Yes. > Is there any indication in logs (usual logs in > /var/log, and also /var/cron/log) as to whether processes keep running > and whether new processes get started during this time or not? > "ps aux" showed all processes that ordinarily run to be intact. dmesg had no notice. I will check logs more thoroughly the next time however I did not see anything in logs. This server was spammed with TCP connections indeed, like 10-20 of them coming in every second, so Claudio's statement on what happens when mclusters run out can be an explanation - I will get into details on this in a separate response to Claudio's message here. Best regards, Mikael
