On 2003/11/26, at 19:22, David B Funk wrote: ...snip...
well, I'm sure there are a decent amount of 2-byte tokens as I receive a decent amount of mail in japanese, however, even so i thought the database was a bit large.
Yes, that size seems way out of line. It should be using about 30~50 bytes per token, assuming typical token size. According to your 'non-token data: ntokens' that bayes_toks file should be using about 5~6 Mbytes; unless something is whacko, or you have some -very- large tokens in there.
One possibility, the "--dump magic" may be looking at a different set of files. Just to double-check do a "sa-learn -D --dump magic" to see which set of files it is looking at.
Dave
it appears to be looking at the right database...
sa-learn -D --dump magic
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/root/bin', which doesn't exist, dropping.
debug: Final PATH set to: /usr/local/sbin:/usr/sbin:/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/ X11R6/bin
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 17512 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_toks
debug: bayes: 17512 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 2 chosen.
debug: Initialising learner
0.000 0 2 0 non-token data: bayes db version
0.000 0 3254 0 non-token data: nspam
0.000 0 2638 0 non-token data: nham
0.000 0 121896 0 non-token data: ntokens
0.000 0 1066969971 0 non-token data: oldest atime
0.000 0 1069839375 0 non-token data: newest atime
0.000 0 1069839376 0 non-token data: last journal sync atime
0.000 0 1069735390 0 non-token data: last expiry atime
0.000 0 2764800 0 non-token data: last expire atime delta
0.000 0 38065 0 non-token data: last expire reduction count
debug: bayes: 17512 untie-ing
debug: bayes: 17512 untie-ing db_toks
debug: bayes: 17512 untie-ing db_seen
debug: bayes: 17512 untie-ing
------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk