Alexander Litvinov wrote:
> 
> Today I have dumped my bayes db and calculate some statistics.
> 
>  742753 - total number of words in it
>  515654 - total number of words which have been seen only once
>   80485 - ... twice
>   35325 - ... 3 times
> 
> This statistics shows that most of the db us not used, just eating my hard drive (44 
> MB total size). Is it normal situation ?
> 

My dump magic output is similar -- I didn't run through it all, but
there are a lot of tokens with few occurrences.

44 mb seems large for 743k.  The docs say there should be about 5mb/100k
tokens.  You might look at your configuration expiry variables and such
if you want a smaller db.

How did you calculate those statistics -- does sa-learn do this, or did
you code up something for sa-learn's output?

Bryan

> > sa-learn --dump magic
> 0.000          0          2          0  non-token data: bayes db version
> 0.000          0       7711          0  non-token data: nspam
> 0.000          0      35161          0  non-token data: nham
> 0.000          0     742761          0  non-token data: ntokens
> 0.000          0 1062407317          0  non-token data: oldest atime
> 0.000          0 1071338468          0  non-token data: newest atime
> 0.000          0 1071338011          0  non-token data: last journal sync atime
> 0.000          0 1071310234          0  non-token data: last expiry atime
> 0.000          0          0          0  non-token data: last expire atime delta
> 0.000          0          0          0  non-token data: last expire reduction count
> 
> SA 2.61
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
> Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
> Click now! http://ads.osdn.com/?ad_id78&alloc_id371&op=click

-- 
Nothing in the world has more potential for beauty than woman.  Nothing
has more potential to destroy it, than the world. - (Anonymous)

http://www.wecs.com/content.htm

This signature file is generated by Pick-a-Tag !
Written by Jeroen van Vaarsel
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=pick-a-tag



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to