Re: spam in foreign characters

2012-08-21 Thread Niamh Holding
Hello Darxus, Tuesday, August 21, 2012, 8:42:33 PM, you wrote: dcc> match all Chinese email if that's what you want mimeheader NH_CHINESE Content-Type =~ /charset="?gb2312/i score NH_CHINESE 2.5 describeNH_CHINESE Chinese character s

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Axb
On 08/21/2012 11:51 PM, Ben Johnson wrote: On 8/21/2012 5:19 PM, John Hardin wrote: On Tue, 21 Aug 2012, Ben Johnson wrote: Aug 21 13:08:33.729 [23714] dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks ---8<-- # sa-learn --username=amavis --dump magic Run

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Axb
On 08/21/2012 11:51 PM, Ben Johnson wrote: On 8/21/2012 5:19 PM, John Hardin wrote: On Tue, 21 Aug 2012, Ben Johnson wrote: Aug 21 13:08:33.729 [23714] dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks ---8<-- # sa-learn --username=amavis --dump magic Run

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Ben Johnson
On 8/21/2012 5:19 PM, John Hardin wrote: > On Tue, 21 Aug 2012, Ben Johnson wrote: > >> Aug 21 13:08:33.729 [23714] dbg: bayes: tie-ing to DB file R/O >> /var/lib/amavis/.spamassassin/bayes_toks >> >> ---8<-- >> # sa-learn --username=amavis --dump magic > > Run that with --debug and ver

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Jonas Eckerman
On 2012-08-15 20:56, Ben Johnson wrote: On 8/15/2012 2:24 PM, John Hardin wrote: You may also want to set up some mechanism for users to submit misclassified messages for training. That sounds like a good idea. [...] this server runs Ubuntu 10.04 with Dovecot Since you're using Dovecot you

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Axb
On 08/21/2012 11:19 PM, John Hardin wrote: On Tue, 21 Aug 2012, Ben Johnson wrote: Aug 21 13:08:33.729 [23714] dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks ---8<-- # sa-learn --username=amavis --dump magic Run that with --debug and verify that the filen

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread John Hardin
On Tue, 21 Aug 2012, Ben Johnson wrote: Aug 21 13:08:33.729 [23714] dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks ---8<-- # sa-learn --username=amavis --dump magic Run that with --debug and verify that the filenames match. -- John Hardin KA7OHZ

Re: spam in foreign characters

2012-08-21 Thread John Hardin
On Tue, 21 Aug 2012, Adam Moffett wrote: One of our users definitely emails with Chinese vendors. I'm sure they correspond in English, but I'm guessing the Chinese folks might have Chinese characters in their signature line or some such. Consider Bayes. I have trained my Bayes with Chinese-

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Ben Johnson
On 8/20/2012 2:47 PM, Ben Johnson wrote: > I was able to resolve the issue by adding the --username switch to the > 'sa-learn' executable: > > # sa-learn --username=amavis --spam > /var/vmail/example.com/trainer/Maildir/.INBOX.Spam/cur > > Thanks for all of the hints, folks! So, I've been traini

Re: spam in foreign characters

2012-08-21 Thread Adam Moffett
I think I'd have to read Chinese to tackle that accurately. So, you should probably try using ok_locales, and if it doesn't work, create your own rules to match these spams, if you can find good common patterns that don't seem likely to match non-spams (or match all Chinese email if that's what

Re: spam in foreign characters

2012-08-21 Thread Adam Moffett
Awesome, thanks for the tip. Any guess how this affects messages with mixed character sets? One of our users definitely emails with Chinese vendors. I'm sure they correspond in English, but I'm guessing the Chinese folks might have Chinese characters in their signature line or some such. T

Re: spam in foreign characters

2012-08-21 Thread darxus
SpamAssassin has an ok_locales thing that allows you to specify basically languages you want to accept. But it has problems: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078 I don't believe anybody has created rules to match these kinds of spams. A big part of the problem is lacking ex

spam in foreign characters

2012-08-21 Thread Adam Moffett
I have a user who seems to get 4-5 messages per day with Chinese characters for the subject and body. They come from a variety of domains and IP's so I guess she somehow got onto a list used to spam Chinese speaking people. If I paste them into Google Translate they seem to be roughly the sam