RE:

Timothy Johnson Mon, 24 Jun 2002 09:50:39 -0700


It sounds then, like what you essentially want to match is:


  four "word" characters
    followed by
  one or more "S"
    followed by
  four "word" characters

which could be represented like so:

  @peptides = $sequence =~ /(\w{4}S+\w{4})/g;

The one fallacy of this algorithm is that if you have a string like so:

  ADSFSREDSS (which may or may not happen, I don't know anything about
peptides :))

Then only the second S will be matched, because the first one does not have
four characters before it, and the four characters after it will suck up the
first S from the next match.

-----Original Message-----
From: Richard Adams [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 24, 2002 10:00 AM
To: [EMAIL PROTECTED]
Subject: <no subject>


Hi,

I have a long sequence of letters ( an amino acid sequence). I want to
extract 4letters either side
of each S and get them into an array.
e.g., 

ADFGTREDSWQACVDFRSSSGHYT
 would get
 
 TREDSWQAC
 VDFRSSSGH
 DFRSSSGHY etc.
 
 I have worked out how to do this by using substr() but wondered if there
was a more elegant way
 using regexps . I tried:
 
 @peptides = $sequence =~ /(\w{4}S\w{4})/g;
 
 this works up to a point, but if there are 2 adjacent 'S' the 2nd one is
not extracted, I guess because
 the regexp engine continues after the end of the previous match ie., it
doesn't extract DFRSSSGHY above.
 . Is it possible to try the next match from
 within the previous match to remedy this?
 Thanks for any tips or flashes of inspiration,
 
 Richard
-- 
Dr Richard Adams

University of Edinburgh
Kings Buildings,
Mayfield Rd,
Edinburgh


Email [EMAIL PROTECTED]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE:

Reply via email to