On Oct 1, 2012, at 3:15 PM, Florian Huber wrote: > Dear all, > > I'm trying to extract a DNA sequence out of a larger string, i.e. the string > is of the following structure: > > $string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/" > > But when I do > > $string =~ /[ACGT]/; > > it matches only the last letter, i.e. "G". Why doesn't it start at the > beginning?
How do you know that it only matches the last letter. That pattern should match the first letter that is either A, C, G, or T. In the value shown for $string, that should be 'A'. Can you show us a minimal program that demonstrates matching the 'G' at the end of the string? We need to see the whole program to explain something so inexplicable. > > But it gets even better, I figured that adding the greedy * should help: > > $string =~ /[ACGT]*/; > > and now it doesn't match anything. Shouldn't it try to match as many times as > possible? Not necessarily. You are asking for the FIRST position where the pattern matches. Since an asterisk means "zero or more", the regular expression /[ACGT]*/ can match anywhere in any string. Therefore, since the regular expression matches at the first letter, the regular expression will declare a match and stop. If you want to match "one or more" characters, then use /[ACGT]+/. > > My confusion was complete when I tried > > $string =~ /[ACGT]{5}/; > > now it matches 5 letters, but this time from the beginning, i.e.: ACGAC. The regular expression engine starts at the beginning of the string and tries all matches from left to right until it succeeds or fails. You can use greedy or non-greedy quantifiers or anchors, e.g. \A and \z, to modify this behavior. > I fail to understand that behaviour. I checked the Perl documentation a bit > and I sort of understand why /[ACGT]/ only matches one letter only (but not > why it starts at the end). However, I'm simply puzzled at the other things. > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/