On 9/19/2013 3:09 PM, Jay Sekora wrote:
On 09/16/2013 10:12 AM, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages?  I'm doing some work on improving this.

Are you trying to match UTF-8 encoded messages as a stream of bytes, or are you using normalize_charset? (And if the latter, how is it working for you? I asked on this list a while back whether the advice I'd seen that normalize_charset is dangerous resource-wise was still valid, and didn't get any replies.)

I guess I don't have anything to offer other than that I really want to see what you come up with, too. :-)

Right now, I'm just having problems with really putting a nail in the coffin of spams using UTF8 from and Subjects.

For Example:

From: "=?utf-8?B?RNGWcmVjdCDOknV5?=" <wholes...@wholesalefirst-munged.co>
Subject: =?utf-8?B?VG9wIM6ScmFuZHMgQXQgV2hvbGVzYWxlIM6hctGWY9GWbmc=?=

As of yet, I'm not using normalize_charset and researching what hits things the best. Most of these still look REALLY spammy from a pathway analysis though.

Reply via email to