David G. Andersen wrote: > > body GENETICS_DATA /([ACGT]{3,}[CGT][ACGT]?\s*){3,}/ > > describe GENETICS_DATA A, C, T, G, who do we appreciate? > > score GENETICS_DATA -5
> Ahh, heck. Here's a better one for all of the geneticists > on the list (one of them? :-): > > /\b([ACGT]{1,}\s*[CGT]\s*[ACGT]{1,}\s*){3,}\b/ > > The addition of the word boundary test also avoids all of the > false matches from my corpus. Requires that the sequence > be at least 9 bps, and have at least 3 non-A components. Yes, thank you, that seems to be happy with a number of formats (but you probably knew that anyway ;-) All the best Geoff UK-Human Genome Mapping Project-Resource Centre, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494530 Fax: +44 1223 494512 E-mail: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk