It sounds then, like what you essentially want to match is:
four "word" characters
followed by
one or more "S"
followed by
four "word" characters
which could be represented like so:
@peptides = $sequence =~ /(\w{4}S+\w{4})/g;
The one fallacy of this algorithm is that if you have a string like so:
ADSFSREDSS (which may or may not happen, I don't know anything about
peptides :))
Then only the second S will be matched, because the first one does not have
four characters before it, and the four characters after it will suck up the
first S from the next match.
-----Original Message-----
From: Richard Adams [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 24, 2002 10:00 AM
To: [EMAIL PROTECTED]
Subject: <no subject>
Hi,
I have a long sequence of letters ( an amino acid sequence). I want to
extract 4letters either side
of each S and get them into an array.
e.g.,
ADFGTREDSWQACVDFRSSSGHYT
would get
TREDSWQAC
VDFRSSSGH
DFRSSSGHY etc.
I have worked out how to do this by using substr() but wondered if there
was a more elegant way
using regexps . I tried:
@peptides = $sequence =~ /(\w{4}S\w{4})/g;
this works up to a point, but if there are 2 adjacent 'S' the 2nd one is
not extracted, I guess because
the regexp engine continues after the end of the previous match ie., it
doesn't extract DFRSSSGHY above.
. Is it possible to try the next match from
within the previous match to remedy this?
Thanks for any tips or flashes of inspiration,
Richard
--
Dr Richard Adams
University of Edinburgh
Kings Buildings,
Mayfield Rd,
Edinburgh
Email [EMAIL PROTECTED]
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]