My source files have: Djs-ham - 9000 records Djs-spam - 22000 records Newspam - 9000 records
Here are the sizes of the source. Yes they are big, but I wouldn't think 30,000 messages would be to huge for bayes. Is this not correct? -rw-r--r-- 1 filter filter 109323317 Sep 26 12:05 djs-ham -rw-r--r-- 1 filter filter 109557101 Sep 26 12:05 djs-spam -rw-r--r-- 1 filter filter 58835401 Oct 2 16:26 newspam 1. Do I need to feed it less spam to initialize? 2. Should I make sure there are no really big files in the ham? 3. Any other limitations you can think of? <<Dan>> | -----Original Message----- | From: Theo Van Dinter [mailto:[EMAIL PROTECTED] | Sent: Friday, October 03, 2003 8:39 AM | To: Smart,Dan | Cc: [EMAIL PROTECTED] | Subject: Re: [SAtalk] 2.60 Upgrade - SpamD not using trained | bayes databas e | | On Fri, Oct 03, 2003 at 08:33:06AM -0500, Smart,Dan wrote: | > Could the database size have caused all my learned spam/ham to be | > eliminated? I noticed that the database looks like: | > -rw-rw-rw- 1 filter filter 4718592 Oct 3 08:21 | .spamassassin_seen | > -rw-rw-rw- 1 filter filter 58859520 Oct 3 08:21 | .spamassassin_toks | > Which is close the the 6mb mentioned in the sa-learn man file. | | There is no size limit on the DB from SA's end of things, but | you're off an order of magnitude. That says that seen is | ~4.7MB and toks is ~58.9MB. | Which means either you have a huge number of tokens, or an | expiry hasn't run in a while. For comparison, my DBs are: | | -rw------- 1 felicity fame 21041152 Oct 3 09:35 bayes_seen | -rw------- 1 felicity fame 10813440 Oct 3 09:35 bayes_toks | | 0.000 0 2 0 non-token data: bayes | db version | 0.000 0 205700 0 non-token data: nspam | 0.000 0 41402 0 non-token data: nham | 0.000 0 337763 0 non-token data: ntokens | 0.000 0 1064570332 0 non-token data: oldest atime | 0.000 0 1065188270 0 non-token data: newest atime | 0.000 0 1065188071 0 non-token data: last | journal sync atime | 0.000 0 1065142091 0 non-token data: last | expiry atime | 0.000 0 571739 0 non-token data: last | expire atime delta | 0.000 0 247958 0 non-token data: last | expire reduction count | | -- | Randomly Generated Tagline: | Look, just gimme some inner peace, or I'll mop the floor with ya! | | -- Homer Simpson | El Viaje Misterioso de Nuestro Homer | ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk