David G. Andersen wrote:

> > body     GENETICS_DATA                 /([ACGT]{3,}[CGT][ACGT]?\s*){3,}/
> > describe GENETICS_DATA                 A, C, T, G, who do we appreciate?
> > score    GENETICS_DATA -5

> Ahh, heck.  Here's a better one for all of the geneticists
> on the list (one of them? :-):
> 
> /\b([ACGT]{1,}\s*[CGT]\s*[ACGT]{1,}\s*){3,}\b/
> 
> The addition of the word boundary test also avoids all of the 
> false matches from my corpus.  Requires that the sequence
> be at least 9 bps, and have at least 3 non-A components.

Yes, thank you, that seems to be happy with a number of formats
(but you probably knew that anyway ;-)

All the best

Geoff

UK-Human Genome Mapping Project-Resource Centre,
Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494530  Fax: +44 1223 494512  E-mail: [EMAIL PROTECTED]


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to