On Tue, Jun 27, 2006 at 05:24:02PM -0400, Carl Fink wrote: > You're out of memory, just like the messages say. Presumably some process > on that server has used it all, including all your swap. Eventually the > process should be killed automatically or the program might segfault. If > you can get on as root and stay on long enough to type some commands, you > could do: > > dd if=/dev/zero of=/var/spool/swapfile bs=1024 count=262144 > > swapon /var/spool/swapfile
Realistically, this isn't likely to help... He's already used up 5GB of virtual memory -- 2GB of RAM and 3 GB of swap space. At such a point, the problem is the system is thrashing the swap disk... that is, it is trying to rapidly pull processes back from swap space as the kernel changes context between all the runable processes. People still advocate having swap that's anywhere from 1.5 to 3 times your physical RAM... That made sense on ancient hardware with 8MB of RAM, when memory was relatively a lot slower and way more expensive, but I think on modern hardware, that idea is totally brain-dead. Part of the problem is that memory speeds have not kept up with CPU improvements (so context switches kill you), but mostly I think it's that memory is way, way faster than disk (especially as compared to 20 years ago), so virtual memory doesn't buy you as much as it used to on paleolithic hardware. If you're actively using 3GB of swap, there's no way your disks can keep up to the CPU's context switches, and your system is dead in the water (note: emphasis on ACTIVELY -- If you have a 3GB process swapped to disk, but it's just sitting around doing nothing, it's not going to kill your system... at least not until someone decides they need to use it again). The only real solution is to buy more RAM, particularly if this problem continues to reoccur. Though, someone suggested a memory leak... there's a real possibility that one of the processes (or more than one) does actually have one. That would be where getting output from top while the system is thrashing would be useful. It's difficult to get due to the state of the system, but totally necessary to figure out what's really going on. Steps that might help: 1. log in on the machine's console. There's less work for the system to do, compared with logging in over the network, so logging in locally should be easier. 2. Boost the priority of your shell (you must be root). This command will do it (including the $$): # renice -20 $$ If the system is at all capable of being responsive, this should make your shell usable. The $$ is an automatic shell variable which expands to the process id of your shell. Here's an example. First, let's show what the process id of my shell is: [EMAIL PROTECTED] ddm]# echo $$ 13357 Now, notice the "NI" value for that PID in the output of ps -elf, below: [EMAIL PROTECTED] ddm]# ps -elf |egrep "$$|PPID" F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S root 13357 13355 0 76 0 - 1389 wait 22:44 pts/5 00:00:00 -bash 4 R root 13439 13357 0 76 0 - 1415 - 22:47 pts/5 00:00:00 ps -elf It's 0, which is the normal nice value for any process. This means the process has the default priority, same as every other normal process on the system. But by reducing the nice value, we increase the priority. Not exactly intuitive, I know... but just remember that by reducing the NICE value, we are making our process "less nice" than before. :) [EMAIL PROTECTED] ddm]# renice -20 $$ 13357: old priority 0, new priority -20 Now, notice the new NICE value in the output of ps: [EMAIL PROTECTED] ddm]# ps -elf |egrep "$$|PPID" F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S root 13357 13355 0 60 -20 - 1389 wait 22:44 pts/5 00:00:00 -bash 4 R root 13460 13357 0 60 -20 - 1415 - 22:51 pts/5 00:00:00 ps -elf 0 R root 13461 13357 0 60 -20 - 1235 - 22:51 pts/5 00:00:00 egrep 13357|PPID We've changed the nice value to -20, as low as it can go, i.e. it's the "least nice" we can make our process. You must be root to reduce the nice value... Regular users can only increase it. The idea is to make processes which the user is running for a long time in the background be nice to other users... So, once you log in, make sure "renice -20 $$" is the first thing you do. After that, the system may respond better for you... but also realize that all the other processes will run worse for everyone else. If your system is thrashing like this, about the only solution is to stop and restart proceses (or just reboot)... but the above is meant to give you a way to see WHY the system is falling over, so hopefully you can do something to prevent it after you do finally reboot the system. ;-) -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0x81CFE75D
pgpLinLp1WtdZ.pgp
Description: PGP signature