Re: I have some bad news

Marc Perkel Wed, 17 Aug 2016 11:03:06 -0700

For what it's worth I have noticed that people who are familiar withBayesian filtering seem to have a mental block when it comes tounderstanding this. People who know nothing about bayesian get itinstantly. Here's the actual formula.


card(Test_message intersect Spam diff Ham) minus card(Test_message intersect 
Ham diff Spam)




On 08/17/16 09:16, Shawn Bakhtiar wrote:

On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas<uh...@fantomas.sk <mailto:uh...@fantomas.sk>> wrote:
On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersectHAM and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.
so, if mail matches both hammy and spammy tokens (or token sets), youdon't
classify at all?
I guess what is confusing me (and I imagine others, as alluded to byMatus) is the fact that you are describing a special conditionof Bayes' probability theorem. You are testing two variables (matchSPAM and match HAM) (not matching is simply the negation of matching)thus giving you four conditions:
1) SPAM  &&HAM
2) SPAM  &&~HAM
3) ~SPAM &&HAM
4) ~SPAM &&~HAM

Here is a great diagram to show the four probable conditions:
https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg
So (if I am correct) Matus is asking what if condition 1 is true? Howare you classifying an email than? Which is often the state of mostemails, and thus why the use of Naive Bayes spam filtering, whichgenerates a probability based on Bayes' probability theorem and is theconventional methodology to date. A Rose by any other name....
Condition 4 is obvious it's nothing you have ever seen so classifyingit anything other than HAM would be a huge mistake (IMHO), and fullycovered by the aforementioned theorem as the probability of SPAM would(should) be 0. Same with Condition 3, obviously it never hits SPAM sowether it matches HAM or not you're going to treat it as HAM anywaysame as condition 4.
That leaves condition 2. Which (if I'm not mistaken) is "... itmatches SPAM and does NOT match HAM - then it's SPAM.". Which bringsus back to Matus question, what if the email contains a single HAMtoken? Two HAM tokens? This is exactly what Bayes' probability theoremis designed for. All you are doing is defining a special condition inwhich the HAM probability is ZERO.
I think that's were I need to understand a bit more about what HAMmeans in this solution, does getting a hit on HAM somehow negate itbeing SPAM completely? In other words if the email contains some setof tokens that are SPAM, yet only one HAM token, that single HAM tokenmakes it not SPAM? If so, you have a long way to go in convincing methat this is a good solution.
So if I say to you, "Let's get some lunch" that's ham becausespammers never say that, but normal people do. So the way to testwhat "spammers never say" is to store what they do say and see ifit's NOT in the list. (Thus the infinite set)
Actually I get SPAM with that very set of tokes in it. If somehow theHAM rating of it overrides the SPAM, I don't believe it would have adesirable effect.
I get plenty of:

"
Hay Shawn,
Hope you have time to do some lunch, click on this link and check outmy new pictures!
Wannabe Phisher
"
Based on your example there's plenty of HAM and SPAM tokens in there,"Click on this link" high probability of SPAM-e-ness, would it getHAMed based on "hope you have time to do lunch". Or am I missingsomething?
Similarly, there's only so many ways to misspell viagra, and goodemail wouldn't have it spelled wrong.
Does that make sense?
Again, what you are saying makes sense in that it is special conditionof the probability theory, What does not make sense is why would younot simply use the probability theory, that already encompasses thatcondition?
--
Matus UHLAR - fantomas, uh...@fantomas.sk <mailto:uh...@fantomas.sk>; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Reply via email to