On Wed, 26 Nov 2003, alan premselaar wrote:

> I've recently noticed something I think is a little strange but I'd
> like to confirm it with the list.
>
> My bayes database seems excessively large at 967M:
>
> -rw-rw-rw-    1 defang   defang        61k Nov 26 16:34 bayes_journal
> -rw-rw-rw-    1 defang   defang       624k Nov 26 15:58 bayes_seen
> -rw-rw-rw-    1 defang   defang       967M Nov 26 15:58 bayes_toks
>
> sa-learn --dump magic
> 0.000          0          2          0  non-token data: bayes db version
> 0.000          0       3236          0  non-token data: nspam
> 0.000          0       2628          0  non-token data: nham
> 0.000          0     121176          0  non-token data: ntokens
> 0.000          0 1066969971          0  non-token data: oldest atime
> 0.000          0 1069829904          0  non-token data: newest atime
> 0.000          0 1069829905          0  non-token data: last journal
> sync atime
> 0.000          0 1069735390          0  non-token data: last expiry
> atime
> 0.000          0    2764800          0  non-token data: last expire
> atime delta
> 0.000          0      38065          0  non-token data: last expire
> reduction count
>
>
> is this really larger than it should be? or am i delusional?
>
> i'm running redhat 7.3 , sendmail 8.12.10 , mimedefang 2.37 and
> spamassassin 2.60
>
> any ideas are welcome

Yes, that size seems way out of line. It should be using about 30~50
bytes per token, assuming typical token size.
According to your 'non-token data: ntokens' that bayes_toks file should
be using about 5~6 Mbytes; unless something is whacko, or you have some
-very- large tokens in there.

One possibility, the "--dump magic" may be looking at a different set
of files. Just to double-check do a "sa-learn -D --dump magic" to see
which set of files it is looking at.

Dave

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to