On Wed, 26 Nov 2003, alan premselaar wrote: > I've recently noticed something I think is a little strange but I'd > like to confirm it with the list. > > My bayes database seems excessively large at 967M: > > -rw-rw-rw- 1 defang defang 61k Nov 26 16:34 bayes_journal > -rw-rw-rw- 1 defang defang 624k Nov 26 15:58 bayes_seen > -rw-rw-rw- 1 defang defang 967M Nov 26 15:58 bayes_toks > > sa-learn --dump magic > 0.000 0 2 0 non-token data: bayes db version > 0.000 0 3236 0 non-token data: nspam > 0.000 0 2628 0 non-token data: nham > 0.000 0 121176 0 non-token data: ntokens > 0.000 0 1066969971 0 non-token data: oldest atime > 0.000 0 1069829904 0 non-token data: newest atime > 0.000 0 1069829905 0 non-token data: last journal > sync atime > 0.000 0 1069735390 0 non-token data: last expiry > atime > 0.000 0 2764800 0 non-token data: last expire > atime delta > 0.000 0 38065 0 non-token data: last expire > reduction count > > > is this really larger than it should be? or am i delusional? > > i'm running redhat 7.3 , sendmail 8.12.10 , mimedefang 2.37 and > spamassassin 2.60 > > any ideas are welcome
Yes, that size seems way out of line. It should be using about 30~50 bytes per token, assuming typical token size. According to your 'non-token data: ntokens' that bayes_toks file should be using about 5~6 Mbytes; unless something is whacko, or you have some -very- large tokens in there. One possibility, the "--dump magic" may be looking at a different set of files. Just to double-check do a "sa-learn -D --dump magic" to see which set of files it is looking at. Dave -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{ ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk