Ross Vandegrift said the following on 19/11/02 14:17:
They are not being accurate when they call it bayesian. It is, at best, naive bayesian.On Tue, Nov 19, 2002 at 09:39:13AM +0000, Matt Sergeant wrote:The spammers have. An even better way they've found is to include a snippet from a legit mailing list, but put it in a white text on white background box. This was discussed on the spambayes mailing list.
Now, I am not a statistician but I am a mathematician. If my understanding of Bayesian statistics (and if people actually are being accurate when they call this method Bayesian), this shouldn't matter at all - that's the beauty of the process.
See now I did two years of a Maths degree, and you've already gone way over my head :-)If the Bayseian analysis actaully takes into account the joint and conditional densities of word frequency, and it has a reasonable way to assign an expectation to them (ie, if the corpus is seeded with real-non spam and real spam), the fact that a spam has been seeded with real words should show up in the joint and conditional frequency analysis. This would allow the filter to assign a spam score, though perhaps with a smaller confidence interval.
What does "joint and conditional frequency analysis" mean?
-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk