On 9/25/2013 11:15 PM, Karsten Bräckelmann wrote:
On Fri, 2013-09-20 at 14:20 -0400, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages? I'm doing some work on improving this.
Right now, I'm just having problems with really p
On 9/20/2013 2:30 PM, David F. Skoll wrote:
You won't like my answer, but...
You really*have* to normalize everything to Unicode (possible using UTF-8
as the canonical on-disk format) before trying to apply rules or extract
Bayes tokens. Then you can do nice things like blocking CJK spams
with
On Fri, 2013-09-20 at 14:20 -0400, Kevin A. McGrail wrote:
> > > Anyone have some examples of rules designed to catch words by content in
> > > UTF-8 encoded messages? I'm doing some work on improving this.
> Right now, I'm just having problems with really putting a nail in the
> coffin of spams
On 9/19/2013 3:09 PM, Jay Sekora wrote:
On 09/16/2013 10:12 AM, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages? I'm doing some work on improving this.
Are you trying to match UTF-8 encoded messages as a stream of bytes,
On Fri, 20 Sep 2013 14:20:58 -0400
"Kevin A. McGrail" wrote:
> As of yet, I'm not using normalize_charset and researching what hits
> things the best.
You won't like my answer, but...
You really *have* to normalize everything to Unicode (possible using UTF-8
as the canonical on-disk format) be
On 09/16/2013 10:12 AM, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages? I'm doing some work on improving this.
Are you trying to match UTF-8 encoded messages as a stream of bytes, or
are you using normalize_charset? (An