My source files have:
Djs-ham - 9000 records
Djs-spam - 22000 records
Newspam - 9000 records

Here are the sizes of the source.  Yes they are big, but I wouldn't think
30,000 messages would be to huge for bayes.  Is this not correct?

-rw-r--r--    1 filter   filter   109323317 Sep 26 12:05 djs-ham
-rw-r--r--    1 filter   filter   109557101 Sep 26 12:05 djs-spam
-rw-r--r--    1 filter   filter   58835401 Oct  2 16:26 newspam

1. Do I need to feed it less spam to initialize?
2. Should I make sure there are no really big files in the ham?
3. Any other limitations you can think of?

<<Dan>> 

| -----Original Message-----
| From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
| Sent: Friday, October 03, 2003 8:39 AM
| To: Smart,Dan
| Cc: [EMAIL PROTECTED]
| Subject: Re: [SAtalk] 2.60 Upgrade - SpamD not using trained 
| bayes databas e
| 
| On Fri, Oct 03, 2003 at 08:33:06AM -0500, Smart,Dan wrote:
| > Could the database size have caused all my learned spam/ham to be 
| > eliminated?  I noticed that the database looks like:
| > -rw-rw-rw-    1 filter   filter    4718592 Oct  3 08:21 
| .spamassassin_seen
| > -rw-rw-rw-    1 filter   filter   58859520 Oct  3 08:21 
| .spamassassin_toks
| > Which is close the the 6mb mentioned in the sa-learn man file.
| 
| There is no size limit on the DB from SA's end of things, but 
| you're off an order of magnitude.  That says that seen is 
| ~4.7MB and toks is ~58.9MB.
| Which means either you have a huge number of tokens, or an 
| expiry hasn't run in a while.  For comparison, my DBs are:
| 
| -rw-------    1 felicity fame     21041152 Oct  3 09:35 bayes_seen
| -rw-------    1 felicity fame     10813440 Oct  3 09:35 bayes_toks
| 
| 0.000          0          2          0  non-token data: bayes 
| db version
| 0.000          0     205700          0  non-token data: nspam
| 0.000          0      41402          0  non-token data: nham
| 0.000          0     337763          0  non-token data: ntokens
| 0.000          0 1064570332          0  non-token data: oldest atime
| 0.000          0 1065188270          0  non-token data: newest atime
| 0.000          0 1065188071          0  non-token data: last 
| journal sync atime
| 0.000          0 1065142091          0  non-token data: last 
| expiry atime
| 0.000          0     571739          0  non-token data: last 
| expire atime delta
| 0.000          0     247958          0  non-token data: last 
| expire reduction count
| 
| --
| Randomly Generated Tagline:
| Look, just gimme some inner peace, or I'll mop the floor with ya!
|  
|               -- Homer Simpson
|                  El Viaje Misterioso de Nuestro Homer
| 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to