On Thu, 23 Jun 2005 02:50:06 -0700, Winston Williams wrote: >This is a continuation of my 'sshd suddenly not responding' message from >Tuesday. > >I still haven't resolved the problems on this machine. I had to have >someone at the data center reboot the machine so that I could get back >in over ssh. After they rebooted the machine, I was able to work for >about 20 minutes before the ssh session (and sshd) died again. I >put /sbin/reboot in the crontab and tested it, and the machine rebooted. >I left that in the crontab to run hourly, and I also put in another >entry to kill and restart sshd every 30 minutes. I also let that run >and it worked. I stopped qmail and I disabled pf but I left apache >running. > >After that 20 minutes or so, my ssh session died unexpectedly, and when >I went to reconnect, the socket opens on that port but then it just sits >forever. It never shows the OpenSSH banner and nothing further happens. >Apache is still running and working fine. Here is where it gets really >strange... The crontab for reboot does not run now, and neither does the >crontab to restart ssh. I know it is not rebooting because I run hping >and it never has an interruption. I now suspect that the machine is >unable to fork new processes. > >Here are the results of some tests that I have run: > >1-When I connect via SSH, the socket connects but then just sits before >any data is sent. I suspect that the main process listens and accepts >the connection, but then tries to fork a new process and fails. > >2-named is still running and seems to be working fine > >3-Nothing on cron seems to run at this point. I tested the entires in >cron by letting them run while the system was operating normally, and >they did work when the system was operating normally, like after a fresh >reboot for that 20 minute or so window. After that, the reboot never >happens and I don't think it is killing and restarting sshd either > >4-Apache can still do it's thing. I am assuming this is because it >automatically starts a number of processes right away. It has enough >processes already running so that it does not need to fork when a new >connection comes in. > >5-One other interesting thing to note is that /var/log/authlog was >around 21,000 lines when I checked it. The OS install is only about 5 >days old. I moved ssh to a non-standard port to try to help reduce the >random break-in attempts. > >I would really like to use OpenBSD on this machine. If I can't figure >it out in the next day or two, I will have to switch to another >operating system. > >Do any of you have any ideas for what I could try to either test out >this fork failure theory, or other suggestions for what might be causing >my problem? > >-- >Winston Williams <[EMAIL PROTECTED]> > >
It is not the operating system. I cannot reproduce your problem and I have many machines running in the field on various old and new hardware with 3.5 3.6 & 3.7 and a labrat here that usually runs current (for some definition of current) for days at a time. sshd is as reliable as can be. i.e. zero problems. Look elsewhere. Threatening the OS with replacement is not likely to shock it into behaving any differently. >From the land "down under": Australia. Do we look <umop apisdn> from up over? Do NOT CC me - I am subscribed to the list. Replies to the sender address will fail except from the list-server.