One thing to try, for your particular situation.  
This rule could match in some strange base-64
encoded files, but it's extremely unlikely -- I ran it through
my spam corpus, and it hit 7 lines out of 260 megabytes, so
you should be OK:

body     GENETICS_DATA                 /([ACGT]{3,}[CGT][ACGT]?\s*){3,}/
describe GENETICS_DATA                 A, C, T, G, who do we appreciate?
score    GENETICS_DATA -5

The rule, unfortunately, will
match a long line of C,G, or T -- but will not match all As.  It
should be possible to craft it a bit better, but to do so, I believe,
would make the regexp really slow.

I wouldn't recommend this rule for general consumption, obviously, but
if you're in the habit of getting genetics data...

  -Dave

Geoff Gibbs just mooed:
> David G. Andersen wrote:
> 
> > > > anyone else seeing false-positives more often with 2.11?
> > > 
> > > Yes, I have had to roll back to 2.01.
> > 
> > A bit of a suggestion, since you're seeing false positives in a highly
> > specific domain.  I've been creating word-frequency-based whitelists
> > from various mailing lists I'm on (alas, little genetics talk).
> > But I've found great success on matching networking-geek specific
> > terms, and would think the same approach would prove quite fruitful
> > for genetics specific terms.  Spammers, happily, don't often say
> > adenosine. :-)
> 
> That is an interesting suggestion, although most of the false positives
> were not related to genetic specific terms. Solid blocks of ACGT do
> trigger the whole line of shouting, but an empty Subject should
> not trigger Subject is all in capitals. An e-mail with a base-64
> attachment should not count as spam with no other trigger.
> I also had one e-mail that triggered the ascii form and whole line
> of shouting, where I cannot see a whole line of shouting and I have
> not yet had time to work out what triggered the form, but it is
> not obvious to the beginner (me).

-- 
work: [EMAIL PROTECTED]                          me:  [EMAIL PROTECTED]
      MIT Laboratory for Computer Science           http://www.angio.net/

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to