On 9/19/2013 3:09 PM, Jay Sekora wrote:
On 09/16/2013 10:12 AM, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages? I'm doing some work on improving this.
Are you trying to match UTF-8 encoded messages as a stream of bytes,
or are you using normalize_charset? (And if the latter, how is it
working for you? I asked on this list a while back whether the advice
I'd seen that normalize_charset is dangerous resource-wise was still
valid, and didn't get any replies.)
I guess I don't have anything to offer other than that I really want
to see what you come up with, too. :-)
Right now, I'm just having problems with really putting a nail in the
coffin of spams using UTF8 from and Subjects.
For Example:
From: "=?utf-8?B?RNGWcmVjdCDOknV5?=" <wholes...@wholesalefirst-munged.co>
Subject: =?utf-8?B?VG9wIM6ScmFuZHMgQXQgV2hvbGVzYWxlIM6hctGWY9GWbmc=?=
As of yet, I'm not using normalize_charset and researching what hits
things the best. Most of these still look REALLY spammy from a pathway
analysis though.