It sounds then, like what you essentially want to match is:
four "word" characters followed by one or more "S" followed by four "word" characters which could be represented like so: @peptides = $sequence =~ /(\w{4}S+\w{4})/g; The one fallacy of this algorithm is that if you have a string like so: ADSFSREDSS (which may or may not happen, I don't know anything about peptides :)) Then only the second S will be matched, because the first one does not have four characters before it, and the four characters after it will suck up the first S from the next match. -----Original Message----- From: Richard Adams [mailto:[EMAIL PROTECTED]] Sent: Monday, June 24, 2002 10:00 AM To: [EMAIL PROTECTED] Subject: <no subject> Hi, I have a long sequence of letters ( an amino acid sequence). I want to extract 4letters either side of each S and get them into an array. e.g., ADFGTREDSWQACVDFRSSSGHYT would get TREDSWQAC VDFRSSSGH DFRSSSGHY etc. I have worked out how to do this by using substr() but wondered if there was a more elegant way using regexps . I tried: @peptides = $sequence =~ /(\w{4}S\w{4})/g; this works up to a point, but if there are 2 adjacent 'S' the 2nd one is not extracted, I guess because the regexp engine continues after the end of the previous match ie., it doesn't extract DFRSSSGHY above. . Is it possible to try the next match from within the previous match to remedy this? Thanks for any tips or flashes of inspiration, Richard -- Dr Richard Adams University of Edinburgh Kings Buildings, Mayfield Rd, Edinburgh Email [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]