Hi All!
I need to capture cite numbers, but I have an extra values. I need to
capture cites, not figures, chapters and so on.
For example in "[see 9; Figure 7]", only "9" i.e. citation number must
be captured.
my $regex = qr
{
(?i)
# Case-insensitive
[\p{IsAlpha}\.\s]*
# Any number of letters, dots and/or spaces (greedy)
(
[\x{2022}\*]*
# Any number of bullet or asterisk characters
[1-9]+
# One or more digits 1-9
\s*
# Any number of spaces
(?:
\-|\,|through|and
# Any number of dash, comma or "through" or "and"
)*?
\s*
# Any number of spaces
)+
[\,\;]?
# One or more comma or semicolon
\s*
# Any number of spaces
(?:
(?:
figure | fig[s]?[\.]?? | table | box |
chapter | diagram | scheme | chart | plate | appendix | part | section |
footnote | [p]{1,2}\.?? | page
)
\s*
# Any number of spaces
[1-9]+
# One or more digits 1-9
)*?
}msx;
my @vancouverCites =
(
"[5, Figure 3]",
"[8, Chapter 60]",
"[9 through 15, pp. 35 - 46]",
"[11, pp. 37 Through 47]",
"[see 1, 4]",
"[e.g. 2, 5]",
"[e.g. •2, ••5]",
"[e.g. *2, **5]",
"[for example 1,17]",
"[2, 9]",
);
foreach my $cite (@vancouverCites)
{
my @matches = $cite =~ /$regex/g;
foreach my $arr (@matches)
{
print "$arr\n" if defined $arr;
}
}
Script output:
5
3 - wrong
8
6 - wrong. I don't understand why 6 instead of 60 was captured.
Actually, only 8 is correct
9
15
35 - wrong
46 - wrong
11
37 - wrong
47 - wrong
1
4
2
5
•2
••5
*2
**5
1
17
2
9
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/