Thanks. I took a  look at your site and book and found the chapter on look
ahead. realized how much i was underutilizing them and they could have saved
me alot of headaches. !!

> -----Original Message-----
> From: Jeff 'japhy' Pinyan [mailto:[EMAIL PROTECTED]]
> Sent: Friday, October 04, 2002 11:20 AM
> To: Kipp, James
> Cc: [EMAIL PROTECTED]
> Subject: RE: Reg Exp
> 
> 
> >>   $dna =~ m{
> >>     (?=
> >>       tag
> >>       (?:
> >>         .*? tag
> >>         # the substr(...) is there to avoid using $&
> >>         (?{ push @matches, substr($dna, $-[0], $+[0] - $-[0]) })
> >>       )+
> >>     )
> >>     (?!)
> >>   }x;
> 
> First of all, I haven't benchmarked, and I had thought of doing the
> index() and substr() as approach that J. Krahn demonstrated.
> 
> The regex uses (?= ... ) to look ahead, so it can match stuff without
> consuming it.  Here's an example of what I mean:  if I have a string
> "ABCADEFA", and I want all chunks of "A...A", if the regex actually
> CONSUMES the "ABCADEFA", then it will have to start after the last A,
> meaning I've missed embedded "ADEFA" chunk.  By using a 
> look-ahead, I can
> match text while staying where I am in the string.  Compare:
> 
>   print "japhy" =~ /(..)/g;
> 
> with
> 
>   print "japhy" =~ /(?=(..))/g;
> 
> Next, to get all the "tag...tag" chunks of varying lengths, I use
> 
>   /tag(?:.*?tag)+/
> 
> which matches "tagAtag", "tagAtagBtag", "tagAtagBtagCtag", and so on.
> 
> The real magic is the code block (?{ ... }) that does the dirty work.
> First of all, substr($DNA, $-[0], $+[0] - $-[0]) is just a way of
> accessing $& without incurring the penalties associated with 
> it.  So let's
> just use $& for now.  The code (push @matches, $&) is 
> executed after every
> point that the regex has matched up to an occurence of "tag", so in
> 
>   tagTHIStagTHATtagTHOSEtag
> 
> it'll happen at:
> 
>   tagTHIStag X
>   tagTHIStagTHATtag X
>   tagTHIStagTHATtagTHOSEtag X
>          tagTHATtag X
>          tagTHATtagTHOSEtag X
>                 tagTHOSEtag X
> 
> those six locations.  The last thing in the regex is the 
> (?!), which is a
> negative look-ahead for nothing, which ALWAYS fails.  This forces the
> regex to backtrack, so I get all the matches.
> 
> -- 
> Jeff "japhy" Pinyan      [EMAIL PROTECTED]      
> http://www.pobox.com/~japhy/
> RPI Acacia brother #734   http://www.perlmonks.org/   
http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to