On Mon, Jun 9, 2008 at 5:39 PM, Christopher Morgan <[EMAIL PROTECTED]> wrote: > Jonathan, > > Many thanks. I get no errors on the command line or in the error log when I > run the script. The file just executes with no output. If you have the time > to run it, I've included the scriupt below, and have attached the name > authority record it tries to process:
The problem is that the SAX parser is looking for the element Name instead of LocalName. I've attached a patch that tests both LocalName and NamespaceURI. If you could apply this to your version of MARC/File/SAX.pm and give it a test, and it works for you, I'll commit it to the CVS repo. --miker > > #! /usr/bin/perl > use strict; > > use MARC::Record; > use MARC::Batch; > use MARC::File::XML; > use constant MAX => 20; > > MARC::File::XML->default_record_format('UNIMARCAUTH'); > my $batch = MARC::Batch->new( 'XML', 'name_authority_file'); > while (my $record = $batch->next()) { > for my $field ($record->field("100")){ > my $name= $field->subfield('a'); > print "$name", "\n"; > } > } > > I think you're right about the LOC files -- they probably got the extra > spaces by accident. That's easy enough to fix. > > As far as the name authorities go, if I can't get MARC::File::XML to process > them, I can always use XML::Tokeparser. Not as elegant, but it would get the > job done. > > - Chris > > -----Original Message----- > From: Jonathan Gorman [mailto:[EMAIL PROTECTED] > Sent: Monday, June 09, 2008 4:43 PM > To: Christopher Morgan; perl4lib@perl.org > Subject: Re: Can't parse MARC Authority XML files with mx: prefixes in their > tags > > > >>However, I'm having trouble parsing the name authority records online >>at http://alcme.oclc.org/eprintsUK/index.html > > [snipped code examples] >> >>There are "mx:" prefixes in all the tags. What format is this? Is there >>any way I can get MARC::File::XML to parse these files? > > The prefixes are the namespace. The parser should be able to handle this, > but I don't honestly know if it does it correctly. What also might be the > problem is the second namespace in there. It might help us if you included > some information about what is not working (what error are you getting etc). > I don't have the time right now to run my own test, but actual error > messages might provide some clue. > >>A related question: When I first tried to process the subject authority >>files from the LOC (in my first example, above), the program complained >>that the "Leader must be 24 bytes long". > > Right, that comes from the MARC specification, there are 24 bytes. > >>XML files are five years old. I wonder if the XML spec has changed >>since >>then?) > > Doubt it, again it doesn't have anything really to do with the XML spec but > the underlying xml record. More likely it is some error in creating the > files. Can't give any more info though, sorry. > > Jon Gorman > -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://www.esilibrary.com
Index: MARC/File/SAX.pm =================================================================== RCS file: /cvsroot/marcpm/marc-xml/lib/MARC/File/SAX.pm,v retrieving revision 1.6 diff -p -u -r1.6 SAX.pm --- MARC/File/SAX.pm 27 Nov 2007 20:28:18 -0000 1.6 +++ MARC/File/SAX.pm 10 Jun 2008 15:54:47 -0000 @@ -17,16 +17,17 @@ use MARC::Charset qw(utf8_to_marc8); sub start_element { my ( $self, $element ) = @_; - my $name = $element->{ Name }; - if ( $name eq 'leader' ) { + my $name = $element->{ LocalName }; + my $ns = $element->{ NamespaceURI }; + if ( $name eq 'leader' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { $self->{ tag } = 'LDR'; - } elsif ( $name eq 'controlfield' ) { + } elsif ( $name eq 'controlfield' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { $self->{ tag } = $element->{ Attributes }{ '{}tag' }{ Value }; - } elsif ( $name eq 'datafield' ) { + } elsif ( $name eq 'datafield' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { $self->{ tag } = $element->{ Attributes }{ '{}tag' }{ Value }; $self->{ i1 } = $element->{ Attributes }{ '{}ind1' }{ Value }; $self->{ i2 } = $element->{ Attributes }{ '{}ind2' }{ Value }; - } elsif ( $name eq 'subfield' ) { + } elsif ( $name eq 'subfield' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { $self->{ subcode } = $element->{ Attributes }{ '{}code' }{ Value }; } } @@ -34,7 +35,8 @@ sub start_element { sub end_element { my ( $self, $element ) = @_; my $name = $element->{ Name }; - if ( $name eq 'subfield' ) { + my $ns = $element->{ NamespaceURI }; + if ( $name eq 'subfield' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { push @{ $self->{ subfields } }, $self->{ subcode }; if ($self->{ transcode }) { @@ -45,13 +47,13 @@ sub end_element { $self->{ chars } = ''; $self->{ subcode } = ''; - } elsif ( $name eq 'controlfield' ) { + } elsif ( $name eq 'controlfield' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { $self->{ record }->append_fields( MARC::Field->new( $self->{ tag }, $self->{ chars } ) ); $self->{ chars } = ''; $self->{ tag } = ''; - } elsif ( $name eq 'datafield' ) { + } elsif ( $name eq 'datafield' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { $self->{ record }->append_fields( MARC::Field->new( $self->{ tag }, @@ -65,7 +67,7 @@ sub end_element { $self->{ i2 } = ''; $self->{ subfields } = []; $self->{ chars } = ''; - } elsif ( $name eq 'leader' ) { + } elsif ( $name eq 'leader' and $ns eq 'http://www.loc.gov/MARC21/slim' ) { my $ldr = $self->{ chars }; $self->{ transcode }++ if (substr($ldr,9,1) eq 'a' and $self->{toMARC8});