On Oct 4, Cedric said:

>cgttgctagctgctatcgatgtgctagtcgatgctagtgcatgcgtagtgcagtcatatgctaggcat
>
>I want to extract all the substrings beginning with tag and finishing with
>tag including substrings with same start point but different length like :
>
>tagctgctatcgatgtgctag
>tagctgctatcgatgtgctagtcgatgctag
>tagctgctatcgatgtgctagtcgatgctagtgcatgcgtag

One way of matching your sequences would be to use a regex with code
blocks embedded in it.

Here is my solution:

  my $dna = ...;
  my @matches;

  $dna =~ m{
    (?=
      tag
      (?:
        .*? tag
        # the substr(...) is there to avoid using $&
        (?{ push @matches, substr($dna, $-[0], $+[0] - $-[0]) })
      )+
    )
    (?!)
  }x;

Now @matches holds all those tag...tag strings.  I'll explain the regex if
people would like. ;)

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to