When I receive a bolus (say 40~60) of 'phish' messages from a compromised Hotmail/gmail/yahoo account which are mostly the same (body, many headers same,
only recipients, Message-ID, Date, and a few Received headers are different) if I feed all of them to Bayes, it will learn only about 10% of them, the other 90% will be ignored as 'already seen'.
So how does Bayes decide that it has 'already seen' a given message when it actually hasn't (it has already seen one that is -almost- identical). -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{
smime.p7s
Description: S/MIME Cryptographic Signature