Re: vm_pageout_scan badness
Long ago, it was written here on 25 Oct 2000 by Matt Dillon: > :Consider that a file with a huge number of pages outstanding > :should probably be stealing pages from its own LRU list, and > :not the system, to satisfy new requests. This is particularly > :true of files that are demanding resources on a resource-bound > :system. > :... > : Terry Lambert > : [EMAIL PROTECTED] > > This isn't exactly what I was talking about. The issue in regards to > the filesystem syncer is that it fsync()'s an entire file. If > you have a big file (e.g. a USENET news history file) the > filesystem syncer can come along and exclusively lock it for > *seconds* while it is fsync()ing it, stalling all activity on > the file every 30 seconds. [...] > One of the reasons why Yahoo uses MAP_NOSYNC so much (causing the problem > that Alfred has been talking about) is because the filesystem > syncer is 'broken' in regards to generating unnecessarily long stalls. > > Personally speaking, I would much rather use MAP_NOSYNC anyway, even with > a fixed filesystem syncer. MAP_NOSYNC pages are not restricted by > the size of the filesystem buffer cache, so you can have a whole > lot more dirty pages in the system then you would normally be able to > have. This 'feature' has had the unfortunate side effect of screwing > up *THWACK* Yeah, no kidding -- here's what I see it screwing up. First, some background: I've built three news machines, two transit boxen and one reader box, with recent INN k0dez, and 4.2-STABLE of a few days ago (having tested NetBSD, more on that later), and a brief detour into 5-current. The two transit boxes have somewhere on the order of ~400MB memory or less; the amount I've put in the reader box has increased up to a Gig as I try to figure out what's happening. I'm using the MAP_NOSYNC on the history database files on all to try to get the NetBSD performance of not hitting history, and I've made a couple other minor tweaks to use mmap where the INN history code probably should, but doesn't. Everything starts out well, where the history disk is beaten at startup but as time passes, the time taken to do lookups and writes drops down to near-zero levels, and the disk gets quiet. And actually, the transit machines stay that way, while the reader machine gives me problems after some time. What I notice is that the amount of memory used keeps increasing, until it's all used, and the Free amount shown by `top' drops to a meg or so. Cache and Buf get a bit, but most of it is Active. Far more than is accounted for by the processes. Now, what happens on the reader machine is that after some time of the Active memory increasing, it runs out and starts to swap out processes, and the timestamps on the history database files (.index and .hash, this is the md5-based history) get updated, rather than remaining at the time INN is started. Then the rapid history times skyrocket until it takes more than 1/4 of the time. I don't see this on the transit boxen even after days of operation. Now, what happens when I stop INN and everything news-related is that some memory is freed up, but still, there can be, say, 400MB still reported as Active. More when I had a full gig in this machine to try to keep it from swapping, all of which got used... Then, when I reboot the machine, it gives the kernel messages about syncing disks; done, and then suddenly the history drive light goes on and it starts grinding for five minutes or so, before the actual reboot happens. No history activity happens when I shut down INN normally, which should free the MAP_NOSYNC'ed pages and make them available to be written to disk before rebooting, maybe. I'm also running BerkeleyDB for the reader overview on this machine, and I just discovered that I had applied MAP_NOSYNC to an earlier release, but the library linked in had not had this -- I just fixed that and am running that way now (and see a noticeable improvement) so now when I reboot, I may see both the overview database disk and the history disk get some pre-reboot activity, if what I think is happening really is happening. What I think is happening, based on these observations, is that the data from the history hash files (less than 100MB) gets read into memory, but the updates to it are not written over the data to be replaced -- it's simply appended to, up to the limit of the available memory. When this limit is reached on the transit machines, then things stabilize and old pages get recycled (but still, more memory overall is used than the size of the actual file). I'm guessing that additional activity of the reader machine causes jumps in memory usage not seen on the transit machines, that is enough to force some of the unwritten dirty pages to be written to the history file, as a few megs of swap get used, which is why it does not sta
Re: vm_pageout_scan badness
> :> Personally speaking, I would much rather use MAP_NOSYNC anyway, > even with > :... > :Everything starts out well, where the history disk is beaten at startup > :but as time passes, the time taken to do lookups and writes drops down > :to near-zero levels, and the disk gets quiet. And actually, the transit > :... > :What I notice is that the amount of memory used keeps increasing, until > :it's all used, and the Free amount shown by `top' drops to a meg or so. > :Cache and Buf get a bit, but most of it is Active. Far more than is > :accounted for by the processes. > > This is to be expected, because the dirty MAP_NOSYNC pages will not > be written out until they are forced out, or by msync(). I just discovered the user command `fsync' which has revealed a few things to me, clearing up some mysteries. Also, I've watched more closely the pattern of what happens to the available memory following a fresh boot... At the moment, this (reader) machine has been up for half a day, with performance barely able to keep up with a full feed (but starting to slip as the overnight burst of binaries is starting), but at last look, history lookups and writes are accounting for more than half (!) of the INN news process time, with available idle time being essentially zero. So... > :Now, what happens on the reader machine is that after some time of the > :Active memory increasing, it runs out and starts to swap out processes, > :and the timestamps on the history database files (.index and .hash, this > :is the md5-based history) get updated, rather than remaining at the > :time INN is started. Then the rapid history times skyrocket until it > :takes more than 1/4 of the time. I don't see this on the transit boxen > :even after days of operation. > > Hmm. That doesn't sound right. Free memory should drop to near zero, > but then what should happen is the pageout daemon should come along > and deactivate a big chunk of the 'active' pages... so you should > see a situation where you have, say, 200MB worth of active pages > and 200MB worth of inactive pages. After that the pageout daemon > should start paging out the inactive pages and increasing the 'cache'. > The number of 'free' pages will always be near zero, which is to be > expected. But it should not be swapping out any process. Here is what I noticed while watching the `top' values for Active, Inactive, and Free following this last boot (I didn't pay any attention to the other fields to notice any wild fluctuations there, next time maybe), on this machine with 512MB of RAM, if it reveals anything: Following the boot, things start out with plenty of memory Free, and something like 4MB Active, which seems reasonable to me. Then I start things. As is to be expected, INN increases in size as it does history lookups and updates, and the amount of memory shown as Active tracks this, more or less. But what's happening to the Free value! It's going down at as much as 4MB per `top' interval. Or should I say, what is happening to the Inactive value -- it's constantly increasing, and I observe a rapid migration of all the Free memory to Inactive, until the value of Inactive peaks out at the time that Free drops to about 996k, beyond which it changes little. None of the swap space has been touched yet. As soon as the value for Free hits bottom and that of Inactive has reached a max, now the migration happens from Inactive to Active -- until this point, the value of Active has been roughly what I would expect to see, given the size of the history hash/index files, and the BerkeleyDB file I'm now using MAP_NOSYNC as well for a definite improvement in overview access times. Anyway, I don't remember what values exactly I was seeing for Free and Inactive or Active, since I was just watching for general trends, but I seem to recall Active being ~100MB, and Inactive somewhat more. (Are you saying above that this Inactive value should be migrating to Cache, which I'm not seeing, rather than to Active, which I do see? If so, then hmmm.) Now memory is drifting at a fairly rapid pace from Inactive (the meaning of which I'm not exactly clear about, although there's some explanation in the `top' man page that hasn't quite clicked into understanding yet), over to the Active field, at something like 2MB or so per `top' interval. Free remains close to 1MB, but Active is constantly growing, although no processes are clearly taking up any of this, apart from INN which only accounts for around 100MB at this time, and isn't increasing at the rate of increase of Active memory. Anyway, the Active field continues to increase as Inactive decreases until finally Inactive bottoms out, down from several hundred MB to a one or two digit MB value (I don't remember exactly), while Active has increased to almost 400MB. This is something like 20 minutes after the reboot, and now the first bit of swap gets hit. However, the value of A
Re: vm_pageout_scan badness
> :but at last look, history lookups and writes are accounting for more > :than half (!) of the INN news process time, with available idle time > :being essentially zero. So... > > No idle time? That doesn't sound like blocked I/O to me, it sounds > like the machine has run out of cpu. Um, I knew I'd be unclear somehow. The machine itself (with 2 CPUs) has plenty of idle time -- `top' reports typically 70-80% idle, and INN takes from 20-40% of CPU (being SMP, a process like `perl' locked to one CPU will appear around 98%, unlike a certain other OS that will show this percentage for the system total, rather than for a particular CPU). What I mean is that the INN process timer, which is basically Joe Greco's timer that wraps key functions with start/stop timer calls, showing where INN spends much of its time, is showing little to no idle time (meaning it couldn't take more articles in no matter how hard I push them). Let me show you the timer stats from the time I started things not long ago on this reader machine, where it's taking in backlogs: Dec 3 04:33:47 crotchety innd: ME time 300449 idle 376(4577) all times in milliseconds: elapsed time^^=5min ^^^idle time (numbers in parentheses are number of calls; only significant in calls like artwrite to show how many articles were actually written to spool, hiswrite to show how many unique articles were received over this time period, and hishave to show how many history lookups were done) artwrite 52601(6077) artlink 0(0) hiswrite 40200(7035) hissync 11(14) ^^^ 53 seconds writing articles ^^ 40 seconds updating history sitesend 647(12154) artctrl 2297(308) artcncl 2288(308) hishave 38857(26474) 39 seconds doing history lookups ^^ hisgrep 70(111) artclean 12264(6930) perl 13819(6838) overv 112176(6077) python 0(0) ncread 13818(21287) ncproc 284413(21287) Dec 3 04:38:48 crotchety innd: ME time 301584 idle 406(5926) artwrite 55774(6402) artlink 0(0) hiswrite 25483(7474) hissync 15(15) sitesend 733(12805) artctrl 1257(322) artcncl 1245(321) hishave 22114(28196) hisgrep 90(38) artclean 12757(7295) perl 14696(7191) overv 136855(6402) python 0(0) ncread 14446(23235) ncproc 284767(23235) (as time passes and more of the MAP_NOSYNC file is in memory, the time needed for history writes/lookups drops) [...] Dec 3 04:58:49 crotchety innd: ME time 300047 idle 566(6272) artwrite 59850(6071) artlink 0(0) hiswrite 11630(6894) hissync 33(14) sitesend 692(12142) artctrl 324(244) artcncl 320(244) hishave 13614(24312) hisgrep 0(77) artclean 13232(6800) perl 14531(6727) overv 156723(6071) python 0(0) ncread 15116(23838) ncproc 281745(23838) Dec 3 05:03:49 crotchety innd: ME time 300018 idle 366(5936) artwrite 56956(6620) artlink 0(0) hiswrite 8850(7749) hissync 7(15) sitesend 760(13240) artctrl 255(160) artcncl 255(160) hishave 9944(25198) hisgrep 0(31) artclean 13441(7753) perl 15605(7620) overv 164223(6620) python 0(0) ncread 14783(24123) ncproc 282791(24123) Most of the time is spent on the BerkeleyDB overview now. This is probably because some reader is giving repeated commands pounding the overview database.That reader's IP now has a different gateway address, and won't be bothering me for a while. Now, for a reference, here are the timings on a transit-only machine with no readers, after it's been running for a while: Dec 3 05:22:09 news-feed69 innd: ME time 30 idle 91045(91733) a reasonable amount of idle time ^^ artwrite 48083(2096) artlink 0(0) hiswrite 1639(2096) hissync 33(11) sitesend 4291(12510) artctrl 0(0) artcncl 0(0) hishave 1600(30129) hisgrep 0(0) artclean 25591(2121) perl 79(2096) overv 0(0) python 0(0) ncread 69798(147925) ncproc 108624(147919) A total of just over 3 seconds out of every 300 seconds spent on history activity. That's reflected by the timestamps on the NOSYNC'ed history database (index/hash) files you see here: -rw-rw-r-- 1 news news 436206889 Dec 3 05:22 history -rw-rw-r-- 1 news news 67 Dec 3 05:22 history.dir -rw-rw-r-- 1 news news 8100 Dec 1 01:55 history.hash -rw-rw-r-- 1 news news 5400 Nov 30 22:49 history.index However, the timings shown by `top' here show from 10 to 20% idle CPU time, even though INN itself has capacity to do more work. The problem is that I'm not seeing this on the reader box. Or if I do see it, it doesn't last long. The timestamps on the above files are pretty much current, in spite of the files being NOSYNC'ed. > :As is to be expected, INN increases in size as it does history lookups > :and updates, and the amount of memory shown as Active tracks this, > :more or less. But what's happening to the Free value! It's going > :down at as much as 4MB per `top' interval. Or should I say, what is > :happening to the Inactive value -- it's constan
Re: vm_pageout_scan badness
> ok, since I got about 6 requests in four hours to be Cc'd, I'm > throwing this back onto the list. Sorry for the double-response that > some people are going to get! Ah, good, since I've been deliberately avoiding reading mail in an attempt to get something useful done in my last days in the country, and probably wouldn't get around to reading it until I'm without Net access in a couple weeks... (Also, because your mailer seems to be ignoring the `Reply-To:' header I've been using, but I'd get a copy through the cc: list, in case you puzzled over why your previous messages bounced) > I am going to include some additional thoughts in the front, then break > to my originally private email response. I'll mention that I've discovered the miracle of man pages, and found the interesting `madvise' capability of `MADV_WILLNEED' that, from the description, looks very promising. Pity the results I'm seeing still don't match my expectations. Also, in case the amount of system memory on this machine might be insufficient to do what I want with the size of the history.hash/.index files, I've just gotten an upgrade to a full gig. Unfortunately, now performance is worse than it had been, so it looks I'll be butchering the k0deZ to see if I can get my way. Now, for `madvise' -- this is already used in the INN source in lib/dbz.c (where one would add MAP_NOSYNC to the MAP__FLAGS) as MADV_RANDOM -- this matches the random access pattern of the history hash table. Supposedly, MADV_WILLNEED will tell the system to avoid freeing these pages, which looks to be my holy grail of this week, plus the immediate mapping that certainly can't hurt. There's only a single madvise call in the INN source, but I see that the Diablo code does make two calls to it (although both WILLNEED and, unlike INN, SEQUENTIAL access -- this could be part of the cause of the apparent misunderstanding of the INN history file that I see below). Since it looks to my non-progammer eyes like I can't combine the behaviours in a single call, I followed Diablo's example to specify both RANDOM and the WILLNEED that I thought would improve things. The machine is, of course, as you can see from the timings, not optimized at all, since I've just thrown something together as a proof of concept having run into a brick wall with the codes under test with Slowaris, And because a departmental edict has come down that I must migrate all services off Free/NetBSD and onto Slowaris, I can't expect to get the needed hardware to beef up the system -- even though the MAP_NOSYNC option on the transit machine enabled it to whup the pants off a far more expensive chunk of Sun hardware. So I'm trying to be able to say `Look, see? see what you can do with FreeBSD' as I'm shown out the door. > I ran a couple of tests with MAP_NOSYNC to make sure that the > fragmentation issue is real. It definitely is. If you create a > file by ftruncate()ing it to a large size, then mmap() it SHARED + > NOSYNC, then modify the file via the mmap, massive fragmentation occurs I've heard it confirmed that even the newer INN does not mmap() the newly-created files for makehistory or expire. As reported to the INN-workers mailing list: : From: [EMAIL PROTECTED] (Richard Todd) : Newsgroups: mailing.unix.inn-workers : Subject: Re: expire/makehistory and mmap/madvise'd dbz filez : Date: 4 Dec 2000 06:30:47 +0800 : Message-ID: <90ehin$1ndk$[EMAIL PROTECTED]> : : In servalan.mailinglist.inn-workers you write: : : >Moin moin : : >I'm engaged in a discussion on one of the FreeBSD developer lists : >and I thought I'd verify the present source against my memory of how : >INN 1.5 runs, to see if I might be having problems... : : >Anyway, the Makefile in the 1.5 expire directory has the following bit, : >that seems to be absent in present source, and I didn't see any : >obvious indication in the makedbz source as to how it's initializing : >the new files, which, if done wrong, could trigger some bugs, at least : >when `expire' is run. : : ># Build our own version of dbz.o for expire and makehistory, to avoid : ># any -DMMAP in DBZCFLAGS - using mmap() for dbz in expire can slow it : ># down really bad, and has no benefits as it pertains to the *new* .pag. : >dbz.o: ../lib/dbz.c : > $(CC) $(CFLAGS) -c ../lib/dbz.c : : >Is this functionality in the newest expire, or do I need to go a hackin'? : : Whether dbz uses mmap or not on a given invocation is controlled by the : dbzsetoptions() call; look for that call and setting of the INCORE_MEM : option in expire/expire.c and expire/makedbz.c. Neither expire nor : makedbz mmaps the new dbz indices it creates. The remaining condition I'm not positive about is the case of an overflow, that ideally would not be a case to consider, and is not the case on the machine now. > on the file. This is easily demonstrated by issuing a sequential read > on the file and noting that the syste
Re: vm_pageout_scan badness
Howdy, I'm going to breach all sorts of ethics in the worst way by following up to my own message, just to throw out some new info... 'kay? Matt wrote, and I quote -- : > However, I noticed something interesting! Of course I clipped away the interesting Thing, but note the following that I saw... : INN after adding the memory, I did a `cp -p' on both the history.hash : and history.index files, just to start fresh and clean. It didn't seem [...] : > There is an easy way to test file fragmentation. Kill off everything : > and do a 'dd if=history of=/dev/null bs=32k'. Do the same for : > history.hash and history.index. Look at the iostat on the history : > drive. Specifically, do an 'iostat 1' and look at the KB/t (kilobytes : > per transfer). You should see 32-64KB/t. If you see 8K/t the file : > is severely fragmented. Go through the entire history file(s) w/ dd... : : Okay, I'm doing this: The two hash-type files give me between 9 and : 10K/t; the history text file gives me more like 60KB/t. Hmmm. It's Now, remember what Matt wrote, that partially-cached data played havoc with read-ahead. That is apparently what I was seeing here, pulling some bit of data off the disk proper, but then pulling a chunk of data that was cached, and so on. I figured that out as I attempted to copy one of the files to create an unfragmented copy to test transfer size and saw the expected 64K (well DUH, that was the write size), and then attempted to `dd' these to /dev/null and saw ... no disk activity. The file was in cache. Bummer. Oh well, I had to reboot anyway for some reason, and did so. Immediately after reboot I `dd'ed the two database files and got the expected 64K/t of an unfragmented file. I also made copies of them just to push their contents into memory, because... : The actual history lookups and updates that matter are all done within : the memory taken up by the .index and .hash files. So, by keeping : them in memory, one doesn't need to do any disk activity at all for : lookups, and updates, well, so long as you commit them to the disk at : shutdown, all should be okay. That's what I'm attempting to achieve. : These lookups and updates are bleedin' expensive when disk activity : rears its ugly head. : : Not to worry, I'm going to keep plugging to see if there is a way for : me to lock these two files into memory so that they *stay* there, just : to prove whether or not that's a significant performance improvement. : I may have to break something, but hey... I b0rked something. I `fixed' the mlock operation to allow a lowly user such as myself to use it, just as proof of concept. (I still need to do a bit of tuning, I can see, but hey, I got results) So I attempt to pass all the madvise suggestions I can for both the history.index and .hash files, and then I attempt to mlock both of them. I don't get a failure, although the history.hash file (108MB) doesn't quite achieve the desired results -- I do see Good Things with the smaller history.index (72MB and don't remind me that 1MB really isn't 100bytes). Anyway, the number of `Wired' Megs in `top' is up from 71MB to 200+, and after some hours of operation, look at the timestamps of the two database files (the .n.* files are those I copied after reboot, and serve as a nice reference for when I started things) -rw-rw-r-- 1 news news 755280213 Dec 5 19:05 history -rw-rw-r-- 1 news news 57 Dec 5 19:05 history.dir -rw-rw-r-- 1 news news 10800 Dec 5 19:05 history.hash -rw-rw-r-- 1 news news 7200 Dec 5 08:44 history.index -rw-rw-r-- 1 news news 10800 Dec 5 08:43 history.n.hash -rw-rw-r-- 1 news news 7200 Dec 5 08:44 history.n.index So, okay, history.hash still sees disk activity, but look at a handful of INN timer stats following the boot: The last two stats with the default vm k0deZ before restart: Dec 5 08:30:40 crotchety innd: ME time 301532 idle 28002(120753) artwrite 70033(2853) artlink 0(0) hiswrite 49396(3097) hissync 28(6) ^ sitesend 460(5706) artctrl 296(25) artcncl 295(25) hishave 32016(8923) ^ hisgrep 45(10) artclean 20816(3150) perl 12536(3082) overv 29927(2853) python 0(0) ncread 33729(152735) ncproc 227796(152735) 80 seconds of 300 spent on history activity... urk... on a steady-state system with a few readers that had been running for some hours. Dec 5 08:35:37 crotchety innd: ME time 300052 idle 16425(136209) artwrite 77811(2726) artlink 0(0) hiswrite 35676(2941) hissync 28(6) sitesend 571(5450) artctrl 454(41) artcncl 451(41) hishave 33311(7392) hisgrep 55(14) artclean 22778(3000) perl 14137(2914) overv 28516(2726) python 0(0) ncread 38832(172145) ncproc 226513(172145) [REB00T] Dec 5 08:59:32 crotchety innd: ME time 300059 idle 62840(189385) artwrite 68361(5580) artlink 0(0) hiswrite 8782(6567) hissync 104(12
Re: vm_pageout_scan badness
> :The mlock man page refers to some system limit on wired pages; I get no > :error when mlock()'ing the hash file, and I'm reasonably sure I tweaked > :the INN source to treat both files identically (and on the other machines > :I have running, the timestamps of both files remains pretty much unchanged). > :I'm not sure why I'm not seeing the desired results here with both files > > I think you are on to something here. It's got to be mlock(). Run > 'limit' from csh/tcsh and you will see a 'memorylocked' resource. > Whatever this resource is as of when innd is run -- presumably however > it is initialized for the 'news' user (see /etc/login.conf) is going Yep, `unlimited'... same as the bash `ulimit -a'. OH NO. I HAVE IT SET TO `infinity' IN LOGIN DOT CONF, no wonder it is all b0rken-like. The weird thing is that mlock() does return success, the amount of wired memory matches the two files, and I've seen nothing obvious in the source code as to why it's different, but I'll keep plugging away at it. > History files are nortorious for random I/O... the problem is due > to the hash table being, well, a hash table. The hash table > lookups are bad enough but this will also result in random-like > lookups on the main history file. You get a little better > locality of reference on the main history file (meaning the system Ah, but ... This is how the recent history format (based on MD5 hashes) introduced as dbz v6 at the time you were busy with Diablo and your history mechanism there differs from that which you remember -- AI, speaking of your 64-bit CRC history mechanism, whatever happened to the links that would get you there from the backplane homepage... -- in this case, you don't do the random-like lookups to verify message ID presence in the text file at all. Everything you do is in the data in the two hash tables. At least for transit. I'm not sure if the reader requests do require a hit on the main file -- it'd be worth it to point a Diablo frontend at such a box to see how it does there even when the overview performance for traditional readership is, uh, suboptimal. I think it does but that's a trivial seek to one specific known offset. I'm sure this is applicable to other databases somehow, for those who aren't doing news and are bored stiff by this. > At the moment madvise() MADV_WILLNEED does nothing more then activate > the pages in question and force them into the process'es mmap. > You have to call it every so often to keep the pages 'fresh'... calling > it once isn't going to do anything. Well, it definitely does do a Good Thing when I call it once, as you can see from the initial timer numbers that approach the long-running values I'm used to (that I tried to simulate by doing lookups on a small fraction of history entries, in hope of activating a majority of the needed pages, that wasn't perfect but was a decent hack). You can see from the timestamps of the debugging here that while it slows down the startup somewhat, the work of reading in the data happens quickly and is a definite positive tradeoff: Dec 6 07:32:14 crotchety innd: dbz openhashtable /news/db/history.index Dec 6 07:32:14 crotchety innd: dbz madvise WILLNEED ok Dec 6 07:32:14 crotchety innd: dbz madvise RANDOM ok Dec 6 07:32:14 crotchety innd: dbz madvise NOSYNC ok Dec 6 07:32:27 crotchety innd: dbz mlock ok Dec 6 07:32:27 crotchety innd: dbz openhashtable /news/db/history.hash Dec 6 07:32:27 crotchety innd: dbz madvise WILLNEED ok Dec 6 07:32:27 crotchety innd: dbz madvise RANDOM ok Dec 6 07:32:27 crotchety innd: dbz madvise NOSYNC ok Dec 6 07:32:38 crotchety innd: dbz mlock ok This happens quickly when the data is still in cache, leading me to believe it's something else affecting the .hash file (I added the madvise() MADV_NOSYNC call just in case somehow it wasn't happening in the mmap() for some reason): Dec 6 09:29:34 crotchety innd: dbz openhashtable /news/db/history.index Dec 6 09:29:34 crotchety innd: dbz madvise WILLNEED ok Dec 6 09:29:34 crotchety innd: dbz madvise RANDOM ok Dec 6 09:29:34 crotchety innd: dbz madvise NOSYNC ok Dec 6 09:29:34 crotchety innd: dbz mlock ok Dec 6 09:29:34 crotchety innd: dbz openhashtable /news/db/history.hash Dec 6 09:29:34 crotchety innd: dbz madvise WILLNEED ok Dec 6 09:29:34 crotchety innd: dbz madvise RANDOM ok Dec 6 09:29:34 crotchety innd: dbz madvise NOSYNC ok Dec 6 09:29:34 crotchety innd: dbz mlock ok > You may be able to achieve an effect very similar to mlock(), but > runnable by the 'news' user without hacking the kernel, by Yeah, sounds like a hack, but I figured out what was going on earlier with my mlock() hack -- INN and the reader daemon now use a dynamically linked library so the nnrpd processes also were trying to mlock() the files too. Hmmm. Either I can statically compile INN (which I chose to do) or I can further butcher the source by attempting to