So, in the course of dumping a pile of MARC bib records to MARCXML I ran into a funky legacy record. It was 8-bit encoded (probably MARC-8), but it matched well enough with ANSI (ISO-8859-1) (which is a valid encoding for XML) that I decided to just dump it with that encoding.
"But how?" one might ask. Well, here's a patch that allows it. The extra 3 lines of perl, plus some perldoc to explain it, will allow you to set the encoding of the output XML to anything you like, and defaults to UTF-8 so as not to change the current functionallity. Feedback welcome and encouraged. :) -- miker
--- /usr/lib/perl5/site_perl/5.8.5/MARC/File/XML.pm 2004-05-19 22:21:03.000000000 -0400 +++ XML.pm 2004-09-22 22:51:27.816511864 -0400 @@ -201,11 +201,16 @@ different portions. Returns a string of XML to use as the header to your XML file. +This method takes an optional $encoding parameter to set the output encoding +to something other than 'UTF-8'. This is meant mainly to support slightly +broken records that are in ISO-8859-1 (ANSI) format with 8-bit characters. + =cut sub header { + my $encoding = shift || 'UTF-8'; return( <<MARC_XML_HEADER ); -<?xml version="1.0" encoding="UTF-8"?> +<?xml version="1.0" encoding="$encoding"?> <collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> MARC_XML_HEADER } @@ -325,19 +330,23 @@ sub decode { } -=head2 encode() +=head2 encode([$encoding]) You probably want to use the as_marc() method on your MARC::Record object instead of calling this directly. But if you want to you just need to pass in the MARC::Record object you wish to encode as XML, and you will be returned the XML as a scalar. +This method takes an optional $encoding parameter to set the output encoding +to something other than 'UTF-8'. This is meant mainly to support slightly +broken records that are in ISO-8859-1 (ANSI) format with 8-bit characters. + =cut sub encode { my $record = shift; my @xml = (); - push( @xml, header() ); + push( @xml, header(shift) ); push( @xml, record( $record ) ); push( @xml, footer() ); return( join( "\n", @xml ) );