Hi, On Mon, Jul 25, 2011 at 10:32:03AM +0200, Thomas Schwinge wrote:
> Building a certain GCC configuration on a freshly booted system: 11 h. > Remove build tree, build it again (2nd): 12 h 50 min. Huh. Remove > build tree, reboot, build it again (1st): back to 11 h. Remove build > tree, build it again (2nd): 12 h 40 min. Remove build tree, build it > again (3rd): 15 h. I first observed this a long time ago: opening my large NFS-mounted mailbox is considerably quicker after a fresh boot than after running for a while. As recently I aquired the (bad) habit of running my system 24/7, almost never voluntarily rebooting, I was able to make further observations. Over a period of a couple of weeks (mostly light load), many things become increasingly more sluggish; until finally the system dies with paging errors. Swap usage also rises constantly over this time, generally approaching somewhat above 500 MiB before the crash -- which more or less matches my physical RAM size. (But there is still plenty of swap free.) One of the things becoming sluggish is opening a new interactive bash instance: it goes up from a few seconds to about 20 seconds before the crash. At a guess, most of the slowness is related to reading my oversized command history... But I don't know whether it's actually the file read getting slower, or allocating memory for the stuff being read in. (Same for the mail thing.) Recently I also observed that writing the mailbox to the local /tmp also seems to become slow -- haven't tried confirming this though. Another test case is opening an 8 megapixel photo in ImageMagick's "display" (which is very inefficient on memory in general): while on a freshly booted system it's quite OK, it becomes really slow over time. (My last paging crash was actually triggered while doing this...) Regarding the used swap approaching RAM size, it might be related to the fact that once memory is full, things start getting swapped out, and remain there even when read back in, as long as the pages remain clean. However, this wouldn't explain the crash coinciding with the used swap size approaching physical RAM size... unless it's really mere coincidence. But why does the memory usage grow in the first place? Some of it might be explained by the fact that I tend to accumulate more and more open shells (and other processes) over time -- but I don't believe it accounts for the bulk of it. Also, the sum of the physical memory used by all processes according to PS is considerably below the total RAM used as reported by vmstat -- where does this memory go? The slowdown might be a result of the growing memory usage, causing a need to swap all the time. There is no audible thrashing though while the system behaves sluggish... And also, it wouldn't explain the crash while there is still plenty of free swap. Another possible explanation would be some kind of fragmentation, making VM operations increasingly more costly, and finally causing some kind of resource exhaustion. (The disappearing memory however wouldn't be explained by this either I guess...) Either way, I have no idea how to go about narrowing down the problem :-( -antrik-