Alexander Litvinov wrote: > > Today I have dumped my bayes db and calculate some statistics. > > 742753 - total number of words in it > 515654 - total number of words which have been seen only once > 80485 - ... twice > 35325 - ... 3 times > > This statistics shows that most of the db us not used, just eating my hard drive (44 > MB total size). Is it normal situation ? >
My dump magic output is similar -- I didn't run through it all, but there are a lot of tokens with few occurrences. 44 mb seems large for 743k. The docs say there should be about 5mb/100k tokens. You might look at your configuration expiry variables and such if you want a smaller db. How did you calculate those statistics -- does sa-learn do this, or did you code up something for sa-learn's output? Bryan > > sa-learn --dump magic > 0.000 0 2 0 non-token data: bayes db version > 0.000 0 7711 0 non-token data: nspam > 0.000 0 35161 0 non-token data: nham > 0.000 0 742761 0 non-token data: ntokens > 0.000 0 1062407317 0 non-token data: oldest atime > 0.000 0 1071338468 0 non-token data: newest atime > 0.000 0 1071338011 0 non-token data: last journal sync atime > 0.000 0 1071310234 0 non-token data: last expiry atime > 0.000 0 0 0 non-token data: last expire atime delta > 0.000 0 0 0 non-token data: last expire reduction count > > SA 2.61 > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id78&alloc_id371&op=click -- Nothing in the world has more potential for beauty than woman. Nothing has more potential to destroy it, than the world. - (Anonymous) http://www.wecs.com/content.htm This signature file is generated by Pick-a-Tag ! Written by Jeroen van Vaarsel http://www.google.com/search?hl=en&ie=ISO-8859-1&q=pick-a-tag ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk