Hi All!

I need to capture cite numbers, but I have an extra values. I need to capture cites, not figures, chapters and so on. For example in "[see 9; Figure 7]", only "9" i.e. citation number must be captured.


my $regex = qr
{
(?i) # Case-insensitive [\p{IsAlpha}\.\s]* # Any number of letters, dots and/or spaces (greedy)
                   (
[\x{2022}\*]* # Any number of bullet or asterisk characters [1-9]+ # One or more digits 1-9 \s* # Any number of spaces
                       (?:
\-|\,|through|and # Any number of dash, comma or "through" or "and"
                       )*?
\s* # Any number of spaces
                   )+
[\,\;]? # One or more comma or semicolon \s* # Any number of spaces
                   (?:
                       (?:
figure | fig[s]?[\.]?? | table | box | chapter | diagram | scheme | chart | plate | appendix | part | section | footnote | [p]{1,2}\.?? | page
                       )
\s* # Any number of spaces [1-9]+ # One or more digits 1-9
                   )*?
}msx;
my @vancouverCites =
(
"[5, Figure 3]",
"[8, Chapter 60]",
"[9 through 15, pp. 35 - 46]",
"[11, pp. 37 Through 47]",
"[see 1, 4]",
"[e.g. 2, 5]",
"[e.g. •2, ••5]",
"[e.g. *2, **5]",
"[for example 1,17]",
"[2, 9]",
);

foreach my $cite (@vancouverCites)
{
   my @matches = $cite =~ /$regex/g;
   foreach my $arr (@matches)
   {
       print "$arr\n" if defined $arr;
   }
}


Script output:

5 3 - wrong
8
6 - wrong. I don't understand why 6 instead of 60 was captured. Actually, only 8 is correct
9
15
35  - wrong
46  - wrong
11
37  - wrong
47  - wrong
1
4
2
5
•2
••5
*2
**5
1
17
2
9


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to