Bayes duplicate message detection algorithm?

David B Funk Fri, 13 May 2016 10:45:07 -0700

What algorithm does Bayes use to detect that it has already 'seen' a given message?

When I receive a bolus (say 40~60) of 'phish' messages from a compromised Hotmail/gmail/yahoo account which are mostly the same (body, many headers same,

only recipients, Message-ID, Date, and a few Received headers are different)
if I feed all of them to Bayes, it will learn only about 10% of them, the
other 90% will be ignored as 'already seen'.


So how does Bayes decide that it has 'already seen' a given message when
it actually hasn't (it has already seen one that is -almost- identical).

--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

smime.p7s
Description: S/MIME Cryptographic Signature

Bayes duplicate message detection algorithm?

Reply via email to