Vyacheslav Karamov wrote: > Hi All! > > I need to capture cite numbers, but I have an extra values. I need to > capture cites, not figures, chapters and so on. > For example in "[see 9; Figure 7]", only "9" i.e. citation number must > be captured. > > > my $regex = qr > { > (?i) > # Case-insensitive > [\p{IsAlpha}\.\s]* > # Any number of letters, dots and/or spaces (greedy) > ( > [\x{2022}\*]* > # Any number of bullet or asterisk characters > [1-9]+ > # One or more digits 1-9 > \s* > # Any number of spaces > (?: > \-|\,|through|and > # Any number of dash, comma or "through" or "and" > )*? > \s* > # Any number of spaces > )+ > [\,\;]? > # One or more comma or semicolon > \s* > # Any number of spaces > (?: > (?: > figure | fig[s]?[\.]?? | table | box | > chapter | diagram | scheme | chart | plate | appendix | part | section | > footnote | [p]{1,2}\.?? | page > ) > \s* > # Any number of spaces > [1-9]+ > # One or more digits 1-9 > )*? > }msx; > my @vancouverCites = > ( > "[5, Figure 3]", > "[8, Chapter 60]", > "[9 through 15, pp. 35 - 46]", > "[11, pp. 37 Through 47]", > "[see 1, 4]", > "[e.g. 2, 5]", > "[e.g. •2, ••5]", > "[e.g. *2, **5]", > "[for example 1,17]", > "[2, 9]", > ); > > foreach my $cite (@vancouverCites) > { > my @matches = $cite =~ /$regex/g; > foreach my $arr (@matches) > { > print "$arr\n" if defined $arr; > } > } > > > Script output: > > 5 > 3 - wrong > 8 > 6 - wrong. I don't understand why 6 instead of 60 was captured. > Actually, only 8 is correct > 9 > 15 > 35 - wrong > 46 - wrong > 11 > 37 - wrong > 47 - wrong > 1 > 4 > 2 > 5 > •2 > ••5 > *2 > **5 > 1 > 17 > 2 > 9
Your immediate problem is because you have used the character class [1-9] twice instead of [0-9]. However I think there may well be more problems with your regex than that. Is your list a complete set of citations that you expect to match? In which case why don't you try to match the square brackets? HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/