Hi, On Sun, Apr 3, 2016 at 2:56 PM, Martin Gregorie <mar...@gregorie.org> wrote: > OK, I've analysed this a bit further. I did some searching, using the 4 > word phrase which, unless I'm totally confused, is the one I've picked > out of your piece of spam and added into my fake invoice detection > ruleset. Just now I ran some seatches on my ham and spam collections. > Here are my stats for its usage: > > - The phrase occurs only once in my collection of 1016 spams > - It occurs in 7 of the 173394 messages in my mail archive > > Of the archive content: > - only two of 7 hits these have any attachments. > - One has a single MS Word .DOC attachment > - The other has a number of attachments. One is an .HTML > version of the plain text content and the others are a mix > of .JPG and .GIF images, i.e. its a typical business message > to a customer. > > None of these file extensions appear in my dangerous attachments rule. > Maybe .DOC should be included, but it isn't and I simply don't remember > if MSWord supported macros back then (2004). > > Obviously this approach needs some care when you're choosing phrases to > include in rules, but if the phrase matching rules are combined with > something not directly connected by using an AND meta then the FP rate > can be gratifying low.
Okay, I can appreciate that. However, I still feel like it's just another body text rule, and with the smallest modification, it becomes ineffective. It's also always chasing something after the fact. I also wouldn't expect that exact phrase to hit very many times in your archive because there are just so many possible variations. I only said it was common language, not that it's frequent. I was hoping to consider the broader issue of PDF spam, particularly those that do not necessarily contain a virus that can be detected by clamav+sanesecurity. The idea of a only a link in the document was new to me. Fake invoices, with or without viruses, is a real issue for us. Do you have any rules for your fake invoice detection (perhaps pseudocode?) that you'd like to share? I'm hesitant to emphasize the value in his comments, but Bill's advice has thus far been the best. Surprised that I missed that in the first place. Thanks, Alex