On Mon, May 9, 2011 at 11:44 PM, Tiago Hori <[email protected]> wrote:
> I am trying to write a small script to parse bibliographic references like
> this:
>
> Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
> reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
>
> What I want to be able to do eventually is parse each name separately and
> associate that with the title. I am not sure how yet, but I haven't even got
> there.

I took a stab at this. It might not be perfect and catch all possible
variations. But in any case, unless you have rules for the text in
these entries, it is very difficult to catch them all.

=========================================================
#!/usr/bin/perl
#

use strict;
use warnings;

my $text = <<END;
Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
END

my @authors=();

# Extract authors
# Assuming each author is composed of one of more matches of:
#   <SPACE>* WORD, <SPACE>* (ALPHABET PERIOD)+
if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) {
    while(@matches) {
        my $match = shift @matches;
        my @comps = map {s/^ +//;s/ +$//;$_} (split ",", $match);
        push @authors, join " ",@comps[1,0];
        shift @matches;
    }
}

# Extract title
# Everything from the first period followed by a space to the next period.
# Authors should have periods followed by either a letter or a comma
# for this to work
if ($text =~m/\. (.*?)\./s) {
    my $title =  $1;
    $title =~ s/\n/ /g;
    foreach(@authors) {
        print "$title: $_\n";
    }
}
=====================================================================

$ ./match_2.pl
The effect of stress on reproduction in Atlantic cod: M.J. Morgan
The effect of stress on reproduction in Atlantic cod: C.E. Wilson
The effect of stress on reproduction in Atlantic cod: L.W. Crim

All, please let me know if there is a way to combine both the regexes.
I had a brain coredump before I gave up.

Thanks,
  Sandip

-- 
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/


Reply via email to