On Tue, Aug 26, 2003 at 11:21:46AM +0100, Martin Radford wrote:
> >From my own collections:
>   
>            with FQDN            with hostname only
> ham:      2331 (85.6%)             391 (14.4%)
> spam:     1925 (76%)               608 (24%)
> 
> While I'm not very good with statistics, this rule doesn't look very
> good for distinguishing ham from spam.

But it does!
spamassassin deals with statistics, and this rule apparently
is capable of making it no less than THREE times as likely
that a mail is spam!

Until the spammers adjust themselfs of course.

This kind of test will become useless in the future.
Worse - perhaps they already adjusted themself.

What we'd need is a tool that makes a graph with vertical
the percentage of ham/spam and horizontal the date at which
the spam was Received:.

I kept *all* spams that I received since... as long as I
can remember (many years), unfortunately - I didn't keep
all normal mails :/  The bulk of the normal mails that I
kept are of the type 'Need to look to it later'.  Ie,
ham like "Hi!  I wish you a merry X-mas" will not be found
in my collection - and then influences the body statistics.
It won't influence the header statics though, I think.

I think that one person might have a collection of say
6000 mails - but for this graph thing to work I think
we'd need like 60000 mails (covers 5 years with 1000 mails
per month).  That should be doable, only to find 10 people
with large collections of the past 5 years.

Getting large collections of spam shouldn't be too hard,
you probably have them already - don't you?

Typical mailinglist mails are not hard to get either, if
you are only concerned with the body content.

-- 
Carlo Wood <[EMAIL PROTECTED]>


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to