Ahh, heck. Here's a better one for all of the geneticists on the list (one of them? :-):
/\b([ACGT]{1,}\s*[CGT]\s*[ACGT]{1,}\s*){3,}\b/ The addition of the word boundary test also avoids all of the false matches from my corpus. Requires that the sequence be at least 9 bps, and have at least 3 non-A components. -Dave David G. Andersen just mooed: > One thing to try, for your particular situation. > This rule could match in some strange base-64 > encoded files, but it's extremely unlikely -- I ran it through > my spam corpus, and it hit 7 lines out of 260 megabytes, so > you should be OK: > > body GENETICS_DATA /([ACGT]{3,}[CGT][ACGT]?\s*){3,}/ > describe GENETICS_DATA A, C, T, G, who do we appreciate? > score GENETICS_DATA -5 > > The rule, unfortunately, will > match a long line of C,G, or T -- but will not match all As. It > should be possible to craft it a bit better, but to do so, I believe, > would make the regexp really slow. > > I wouldn't recommend this rule for general consumption, obviously, but > if you're in the habit of getting genetics data... > > -Dave > > Geoff Gibbs just mooed: > > David G. Andersen wrote: > > > > > > > anyone else seeing false-positives more often with 2.11? > > > > > > > > Yes, I have had to roll back to 2.01. > > > > > > A bit of a suggestion, since you're seeing false positives in a highly > > > specific domain. I've been creating word-frequency-based whitelists > > > from various mailing lists I'm on (alas, little genetics talk). > > > But I've found great success on matching networking-geek specific > > > terms, and would think the same approach would prove quite fruitful > > > for genetics specific terms. Spammers, happily, don't often say > > > adenosine. :-) > > > > That is an interesting suggestion, although most of the false positives > > were not related to genetic specific terms. Solid blocks of ACGT do > > trigger the whole line of shouting, but an empty Subject should > > not trigger Subject is all in capitals. An e-mail with a base-64 > > attachment should not count as spam with no other trigger. > > I also had one e-mail that triggered the ascii form and whole line > > of shouting, where I cannot see a whole line of shouting and I have > > not yet had time to work out what triggered the form, but it is > > not obvious to the beginner (me). > > -- > work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] > MIT Laboratory for Computer Science http://www.angio.net/ > > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk -- work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] MIT Laboratory for Computer Science http://www.angio.net/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk