Re: Multiple matching of a group of characters

Jim Gibson Mon, 01 Oct 2012 15:51:43 -0700

On Oct 1, 2012, at 3:15 PM, Florian Huber wrote:

> Dear all,
> 
> I'm trying to extract a DNA sequence out of a larger string, i.e. the string 
> is of the following structure:
> 
> $string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/"
> 
> But when I do
> 
> $string =~ /[ACGT]/;
> 
> it matches only the last letter, i.e. "G". Why doesn't it start at the 
> beginning?


How do you know that it only matches the last letter. That pattern should match 
the first letter that is either A, C, G, or T. In the value shown for $string, 
that should be 'A'. Can you show us a minimal program that demonstrates 
matching the 'G' at the end of the string? We need to see the whole program to 
explain something so inexplicable.

> 
> But it gets even better, I figured that adding the greedy * should help:
> 
> $string =~ /[ACGT]*/;
> 
> and now it doesn't match anything. Shouldn't it try to match as many times as 
> possible?

Not necessarily. You are asking for the FIRST position where the pattern 
matches. Since an asterisk means "zero or more", the regular expression 
/[ACGT]*/ can match anywhere in any string. Therefore, since the regular 
expression matches at the first letter, the regular expression will declare a 
match and stop.

If you want to match "one or more" characters, then use /[ACGT]+/.

> 
> My confusion was complete when I tried
> 
> $string =~ /[ACGT]{5}/;
> 
> now it matches 5 letters, but this time from the beginning, i.e.: ACGAC.

The regular expression engine starts at the beginning of the string and tries 
all matches from left to right until it succeeds or fails. You can use greedy 
or non-greedy quantifiers or anchors, e.g. \A and \z, to modify this behavior.

> I fail to understand that behaviour. I checked the Perl documentation a bit 
> and I sort of understand why /[ACGT]/ only matches one letter only (but not 
> why it starts at the end). However, I'm simply puzzled at the other things.
> 



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Multiple matching of a group of characters

Reply via email to