On Sat, 2008-11-01 at 22:54 +0000, Martin Gregorie wrote: > On Sat, 2008-11-01 at 23:19 +0100, Karsten Bräckelmann wrote: > > > Yes, there is. Your MUA, Evolution, features pre-formatted paragraphs in > > the Composer. But I don't feel like repeating myself today. > > [...] I must remember to use it selectively to prevent line wrapping.
It's most handy for code snippets, config and logs slightly exceeding the default line-wrapping width. But I digress... > > > describe MG_CASINO Casino gambling > > > body __MG_CAS1 /(csnaio|casino)/i > > > header __MG_CAS2 Subject =~ /casino/i > > > header __MG_CAS3 From =~ /casino/i > > > body __MG_CAS4 /(\$[0-9]+|[0-9]+ *euro|gold|real deal|invite.*play)/i > > > meta MG_CASINO ((__MG_CAS1||__MG_CAS2||__MG_CAS3)&&__MG_CAS4) > > > score MG_CASINO 2.0 > > > > Hmm, it might be worth for local rules, to score at least a few of > > them on sight with a low score, yet keeping them in the meta. (Yes, > > single word rules are generally bad, but scoring a From header that > > contains specific words might help catch these.) I'd enforce word > > breaks, though. > > ...and reduce the meta score to compensate? Well, that's up to you. ;) The score is rather arbitrary, so you can use whatever you feel comfortable with. Reducing the meta score to compensate indeed might be good. My thought was, to partially split up the score in case the meta doesn't match. I guess the word "casino" in either the Subject or (even stronger) From header might be worth at least 0.2 or something on its own. One note I missed earlier, regarding the quantifiers: Using unbounded quantifiers can and will be expensive. Wherever possible you should use bounds. So, rather than /.*/, using /.{0,20}/ with a suitable upper bound will prevent the RE from backtracking an entire mail. Similar for any occurrence of the + quantifier, of course. > Has the Perl regex syntax changed since Perl4? If it has I think I need > to get another Perl book before venturing away from the simple subset > I'm comfortable with. Yes, it did change -- not positive about Perl 4, but I guess it's mostly additions only to the RE syntax. In particular a "simple subset" likely should still be valid. You can find more info than you ever want here: http://perldoc.perl.org/perlre.html Assuming this was due to recommending word boundaries (see Regular Expressions / Assertions in perlre), here's a rewritten From matching rule: header __MG_CAS3 From =~ /\bcasino\b/i > > This one would have been flagged as spam when using the default > > required_score spam threshold of 5.0. > > I'm thinking about reducing that back to the default. I initially set it > higher while finding out how to use SA. I see. Something to keep in mind when pondering if it's actually worth the effort of writing custom rules -- it might not, if you're going to use the default anyway. > > Also, I notice you're apparently > > not using Bayes, which likely could raise the score above your 6.0 > > threshold, when trained on these. > > Not entirely. Its enabled but I'm only using auto-learn with default > thresholds. However its probably not doing much at present because I > recently reset it by deleting the bayes database. Ah, so that's why it didn't show up -- since dropping your Bayes DB, SA didn't learn sufficient ham and spam mail (200 each by default). You should bootstrap and do some initial learning with existing ham and spam respectively. Also, as you can see in this example, you specifically should train low-scoring and missed spam after the initial training. SA did not auto-learn this one, because it is way below the threshold(s). > > On my check the sample also scored 0.8 for SPF_HELO_SOFTFAIL. Plus > > Pyzor, which is not enabled by default unless you install Pyzor. > > Noted. Pyzor is more complicated to set up and heavy-weight. The missing SPF_HELO_SOFTFAIL though likely is simply because you don't have the Perl Mail::SPF module installed. If you do, it should start working out-of-the-box. > > Oh, and then I got a custom rule worth 0.5 for any single Relay, direct > > client to MX mail. > > Nope, I'm not seeing that one. That's because it is a custom rule on my setup. :) guenther -- char *t="[EMAIL PROTECTED]"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}