In an earlier posting I pointed out that I had noticed that a
db_verify on bayes_toks frequently yields errors of the form

  db_verify: Page 2289: hash page has bad prev_pgno
  db_verify: Page 2110: hash page has bad prev_pgno

and I asked if I should just ignore the errors since bayes seemed
to be working OK, and they disappear for a while after an expiry.

Now I've got a much bigger problem. The expiry is starting to take
more than 10 minutes and as a result the journal grows to max size and
an opportunistic rebuild kills the lock file and wrecks the expiry
operation. Here is what I observe in the bayes directory area

-rw-------   1 defang   defang        32 Jan 26 12:19 bayes.lock
-rw-------   1 defang   defang   2750039 Jan 26 12:27 bayes_journal
-rw-------   1 defang   defang   20897792 Jan 26 12:19 bayes_seen
-rw-------   1 defang   defang   21733376 Jan 26 12:22 bayes_toks
-rw-------   1 defang   defang   9437184 Jan 26 12:22 bayes_toks.expire16781
-rw-------   1 defang   defang   11173888 Jan 26 11:35 bayes_toks.expire23012
-rw-------   1 defang   defang   5341184 Jan 26 10:54 bayes_toks.expire27549
-rw-------   1 defang   defang   11182080 Jan 26 11:59 bayes_toks.expire27570
-rw-------   1 defang   defang   11403264 Jan 26 10:44 bayes_toks.expire4752

The interesting thing is that the last expiry process, in this case
process 16781, has not updated its rewrite of bayes_toks since 12:22
and as you can see from the date on the journal file the current time
is 12:29 so it appears the expiry is stalled. This is also indicated
by the fact that the system load is currently quite low.  Running a
db_verify on the file bayes_toks.expire16781 yields

  db_verify: Page 3: unreferenced page
  db_verify: Page 4: unreferenced page
  db_verify: Page 5: unreferenced page
  db_verify: Page 6: unreferenced page
  db_verify: Page 7: unreferenced page
  db_verify: Page 8: unreferenced page
  .
  .

and on and on. I was running an expiry every hour, BTW, because I was
worried about the possibility that an expiry might take more than 10
minutes and get killed by a journal'izing operation but I decided to
return to the default mode of operation and let the expiries happen
automatically. In this case "sa-learn --dump magic" indicates that
the last expiry happened at "Sun Jan 25 22:33:43 2004" and it looks
like my problem happened about 12 hours later so I guess I should
return to running an expiry every hour. Anyone care to speculate?

As I mentioned in my previous post I'm running
SpamAssasin 2.63 in conjunction with Mimedefang 2.39 on a Sparc Sun
Solaris 8 system with 2GB memory of which 800MB is used as a tmpfs
filesystem for the mimedefang spool area which is where the bayes
database files are kept. Bayes is running in auto learn mode and
its been running for a couple of weeks without any major problems.

Anyone have any advice on where to look to solve this problem.

- rick


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to