Re: [SAtalk] Mail arrival time may be a criteria

Justin Mason Sun, 10 Aug 2003 00:38:50 -0700

Basically you

  1. extract the time-of-day from the Received header.


  2. normalize to nearest 2-hour mark.

  3. add that in the Bayes message tokenizer, to the list of tokens.

The Bayesian learner then allows you to "train" on your local mail
collection, which you've classified into ham and spam piles.  It will
make a database internally of values like this:

        time-of-day     ham-count       spam-count
        0900            435             122
        ....
        2300            10              943

Then it produces a probability that a mail may be spam based on those
numbers.  That prob is mixed in with the other bayes probs and affects
the results.

That's all it takes to get the time of day included in the results.

Or at least to get it tested, then, with a 10fold cross-validation
run.  Note that the SpamBayes folks tried it with dubious results:

  http://mail.python.org/pipermail/spambayes/2002-October/001334.html
  http://mail.python.org/pipermail/spambayes/2002-October/001344.html
  http://mail.python.org/pipermail/spambayes/2002-October/001348.html
  http://mail.python.org/pipermail/spambayes/2002-October/001369.html

(unfortunately the graphs got scrubbed.)

--j.


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Mail arrival time may be a criteria

Reply via email to