I have been successfully using MARC::File::XML to process MARC subject authority files from the LOC, such as this sample record:
<?xml version="1.0" encoding="UTF-8" ?> <collection xmlns="http://www.loc.gov/MARC21" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21 http://www.loc.gov/standards/marcxml/schema/MARC21.xsd"> <record type="Bibliographic"> <leader>00495cz 2200169n 4500</leader> <controlfield tag="001">sh 00000014 </controlfield> <controlfield tag="003">DLC </controlfield> <controlfield tag="005">20000508151507.0 </controlfield> <controlfield tag="008">000321i| anannbabn |a ana </controlfield> <datafield tag="010" ind1="" ind2=""> <subfield code="a">sh 00000014 </subfield> </datafield> <datafield tag="040" ind1="" ind2=""> <subfield code="a">DLC</subfield> <subfield code="b">eng</subfield> <subfield code="c">DLC </subfield> </datafield> <datafield tag="150" ind1="" ind2=""> <subfield code="a">Tacos </subfield> </datafield> </record> The following script prints subfield "a" of tag 150: MARC::File::XML->default_record_format('UNIMARCAUTH'); my $batch = MARC::Batch->new( 'XML', '../filename'); while (my $record = $batch->next()) { for my $field ($record->field("150")){ my $name= $field->subfield('a'); print "$name", "\n"; } } However, I'm having trouble parsing the name authority records online at http://alcme.oclc.org/eprintsUK/index.html Here is part of one of these records (from <http://errol.oclc.org/laf/n50-7168.MarcXML> http://errol.oclc.org/laf/n50-7168.MarcXML): <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <mx:record xmlns:mx="http://www.loc.gov/MARC21/slim" xmlns=http://www.w3.org/TR/xhtml1/strict xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <mx:leader>00000cz 2200000n 0000</mx:leader> <mx:controlfield tag="001">oca00042708</mx:controlfield> . . . . . . . . . . etc. There are "mx:" prefixes in all the tags. What format is this? Is there any way I can get MARC::File::XML to parse these files? A related question: When I first tried to process the subject authority files from the LOC (in my first example, above), the program complained that the "Leader must be 24 bytes long". All the leader tags in the authority files I got from the LOC have five trailing blank spaces at the end. I manually removed the spaces to get the test files to work. I can always preprocess the files to take out the trailing spaces, but I wonder if there's a way around this with MARC::File::XML. (These LOC subject authority XML files are five years old. I wonder if the XML spec has changed since then?) Many thanks for any help! - Chris Morgan