Re: More text/plain questions

Amir 'CG' Caspi Wed, 23 Jul 2014 10:47:08 -0700

On 2014-07-02 15:04, Amir Caspi wrote:

For what it's worth, I just received a spam that basically is the same
as what Philip complained about.  I've posted a spample here:


http://pastebin.com/Y2YGwL49

[...]

I'm wondering if we shouldn't write a rule looking for lots of
&#x0[0-9]{3}; patterns... say, 500 of them in one email.  Or, would we
expect legitimate emails to have these?

So, to follow up on this... over the past couple of weeks I've beengetting a lot more FNs than normal, and almost every single one of theseis an "encoded character" spam like the example above. Bayes trainingdoes appear to work, in that many of these FNs are already atBAYES_999... but there aren't enough other rules hit to cause the FNs tocross the 5.0 threshold. (Other, similar spams do cross the threshold,usually due to RAZOR and/or PYZOR hits.)

Since these are basically unicode character encodings, is there a moveto translate all charsets to UTF-8 (or some other fixed standard) beforeapplying body and/or URI rules? That would, presumably, help withtrying to catch these.

I'm definitely considering writing a rule to catch &#x0[0-9]{3};patterns. I'm definitely worried it could cause FPs, but are therecommon circumstances where legitimate emails would include dozens tohundreds of these? (The latest FNs only include a few dozen, not thehundreds seen in the spample above.)

Otherwise, I'm not sure what "template" rule I could write to catchthese things, and they're increasing in frequency (with more and morebeing missed as FNs).


Thanks.

-- Amir

Re: More text/plain questions

Reply via email to