On Wed, 2016-09-28 at 13:29 +0000, Nicola Piazzi wrote:
> a plugin that check similar words in oldest messages (for example 3 of
> 4 words match)
> 
> Then plugin check if sender domain is different and recipient is
> different

<snip>

>         Detection routine
>         
>          
>         
>         A mail arrive
>         
>         Subject is : FedEx Shipment 702193383647 Notification
>         
>         I search in maillog table for a regex that MATCH FedEx
>         Shipment 702193383647 Notification ALSO IN FedEx Shipment
>         722566383641 Notification AND IN FedEx Shipment 734563383644
>         Notification
>         
>         If it match I verify that FROM DOMAIN IS DIFFERENT
>         And then I verify that TO ADDRESS IS DIFFERENT
>         
>          
>         
>         Now I need a regex sintax to put all extracted words of PHRASE
>         FedEx Shipment 734563383644 Notification and match if it found
>         at least 3 of 4 words


I'm also not clear on exactly what you're intending, but this certainly
sounds reminiscent of Marc Perkel's "evolution filter" (which I don't
know that anyone fully understands).  What I've made out of the
discussion is it is token-based like bayes, using multi-word (and
partial-word/string?) tokens and adds some other data and metadata as
tokens (data from headers, eg. your from: and to: domains), and tosses
out results that aren't confident (nearly 100% ham or spam); it utilizes
Redis Sets for set logic/operations.  If you are creating a plugin for
these phishing emails, it may be an avenue to pursue; it sounds like it
works quite well (when trained with a large ham/spam corpus).


-- 
Jesse Norell
Kentec Communications, Inc.
970-522-8107  -  www.kci.net

Reply via email to