> There were a couple of threads on lkml recently that may be relevant, > but I have to run so I can't find the URL:s atm (todo later tonight).
Ok, I cannot figure out how to find the "first" message in a thread in any of the lkml archives, but these two threads may be of interest, especially if you can find their beginnings: http://lkml.indiana.edu/hypermail/linux/kernel/1011.3/00030.html And to a lesser extent (I started that before knowing about the above one): http://lkml.indiana.edu/hypermail/linux/kernel/1011.3/00252.html They don't really talk about the same symptoms, but there are some good tips on monitoring what's going on there and some of the things (numactl interleaving, avoiding higher order allocations) might conceivably be useful in this case too. At least on the theory that some kind of eviction or looking-for-free-space loop is what's spinning (and yes, this is an assumption based on very little evidence...). Also, you're virtualized (given %steal), right? I wonder to what extent that impacts the vm subsystem in the guest kernel (I don't really know to what extent there is guest<->host co-op nowadays on ec2 etc). -- / Peter Schuller