I have been successfully using MARC::File::XML to process MARC subject
authority files from the LOC, such as this sample record:

 

<?xml version="1.0" encoding="UTF-8" ?>

<collection xmlns="http://www.loc.gov/MARC21";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="http://www.loc.gov/MARC21
http://www.loc.gov/standards/marcxml/schema/MARC21.xsd";>

 

<record type="Bibliographic">

<leader>00495cz   2200169n  4500</leader>

<controlfield tag="001">sh 00000014     </controlfield>

<controlfield tag="003">DLC  </controlfield>

<controlfield tag="005">20000508151507.0    </controlfield>

<controlfield tag="008">000321i| anannbabn          |a ana
</controlfield>

<datafield tag="010" ind1="" ind2="">

<subfield code="a">sh 00000014     </subfield>

</datafield>

<datafield tag="040" ind1="" ind2="">

<subfield code="a">DLC</subfield>

<subfield code="b">eng</subfield>

<subfield code="c">DLC     </subfield>

</datafield>

<datafield tag="150" ind1="" ind2="">

<subfield code="a">Tacos   </subfield>

</datafield>

</record>

 

The following script prints subfield "a" of tag 150:

 

MARC::File::XML->default_record_format('UNIMARCAUTH');

my $batch = MARC::Batch->new( 'XML', '../filename');

while (my $record = $batch->next()) {     

      for my $field ($record->field("150")){

            my $name= $field->subfield('a'); 

            print "$name", "\n";

     }

}

 

However, I'm having trouble parsing the name authority records online at
http://alcme.oclc.org/eprintsUK/index.html 

 

Here is part of one of these records (from
<http://errol.oclc.org/laf/n50-7168.MarcXML>
http://errol.oclc.org/laf/n50-7168.MarcXML): 

  

  <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 

  <mx:record xmlns:mx="http://www.loc.gov/MARC21/slim";
xmlns=http://www.w3.org/TR/xhtml1/strict

      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>

  

  <mx:leader>00000cz 2200000n 0000</mx:leader> 

  <mx:controlfield tag="001">oca00042708</mx:controlfield> 

  . . . . .

  . . . . .

  etc.

 

There are "mx:" prefixes in all the tags. What format is this? Is there any
way I can get MARC::File::XML to parse these files? 

 

A related question: When I first tried to process the subject authority
files from the LOC (in my first example, above), the program complained that
the "Leader must be 24 bytes long". All the leader tags in the authority
files I got from the LOC have five trailing blank spaces at the end. I
manually removed the spaces to get the test files to work. I can always
preprocess the files to take out the trailing spaces, but I wonder if
there's a way around this with MARC::File::XML. (These LOC subject authority
XML files are five years old. I wonder if the XML spec has changed since
then?)

      

Many thanks for any help!

 

- Chris Morgan

Reply via email to