On Tue, Aug 26, 2003 at 11:21:46AM +0100, Martin Radford wrote: > >From my own collections: > > with FQDN with hostname only > ham: 2331 (85.6%) 391 (14.4%) > spam: 1925 (76%) 608 (24%) > > While I'm not very good with statistics, this rule doesn't look very > good for distinguishing ham from spam.
But it does! spamassassin deals with statistics, and this rule apparently is capable of making it no less than THREE times as likely that a mail is spam! Until the spammers adjust themselfs of course. This kind of test will become useless in the future. Worse - perhaps they already adjusted themself. What we'd need is a tool that makes a graph with vertical the percentage of ham/spam and horizontal the date at which the spam was Received:. I kept *all* spams that I received since... as long as I can remember (many years), unfortunately - I didn't keep all normal mails :/ The bulk of the normal mails that I kept are of the type 'Need to look to it later'. Ie, ham like "Hi! I wish you a merry X-mas" will not be found in my collection - and then influences the body statistics. It won't influence the header statics though, I think. I think that one person might have a collection of say 6000 mails - but for this graph thing to work I think we'd need like 60000 mails (covers 5 years with 1000 mails per month). That should be doable, only to find 10 people with large collections of the past 5 years. Getting large collections of spam shouldn't be too hard, you probably have them already - don't you? Typical mailinglist mails are not hard to get either, if you are only concerned with the body content. -- Carlo Wood <[EMAIL PROTECTED]> ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk