So, in the course of dumping a pile of MARC bib records to MARCXML I ran into 
a funky legacy record.  It was 8-bit encoded (probably MARC-8), but it 
matched well enough with ANSI (ISO-8859-1) (which is a valid encoding for 
XML) that I decided to just dump it with that encoding.

"But how?" one might ask.  Well, here's a patch that allows it.  The extra 3 
lines of perl, plus some perldoc to explain it, will allow you to set the 
encoding of the output XML to anything you like, and defaults to UTF-8 so as 
not to change the current functionallity.

Feedback welcome and encouraged. :)

-- 
miker


--- /usr/lib/perl5/site_perl/5.8.5/MARC/File/XML.pm	2004-05-19 22:21:03.000000000 -0400
+++ XML.pm	2004-09-22 22:51:27.816511864 -0400
@@ -201,11 +201,16 @@ different portions.  
 
 Returns a string of XML to use as the header to your XML file.
 
+This method takes an optional $encoding parameter to set the output encoding
+to something other than 'UTF-8'.  This is meant mainly to support slightly
+broken records that are in ISO-8859-1 (ANSI) format with 8-bit characters.
+
 =cut 
 
 sub header {
+    my $encoding = shift || 'UTF-8';
     return( <<MARC_XML_HEADER );
-<?xml version="1.0" encoding="UTF-8"?>
+<?xml version="1.0" encoding="$encoding"?>
 <collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"; xmlns="http://www.loc.gov/MARC21/slim";>
 MARC_XML_HEADER
 }
@@ -325,19 +330,23 @@ sub decode { 
     
 }
 
-=head2 encode()
+=head2 encode([$encoding])
 
 You probably want to use the as_marc() method on your MARC::Record object
 instead of calling this directly. But if you want to you just need to 
 pass in the MARC::Record object you wish to encode as XML, and you will be
 returned the XML as a scalar.
 
+This method takes an optional $encoding parameter to set the output encoding
+to something other than 'UTF-8'.  This is meant mainly to support slightly
+broken records that are in ISO-8859-1 (ANSI) format with 8-bit characters.
+
 =cut
 
 sub encode {
     my $record = shift;
     my @xml = ();
-    push( @xml, header() );
+    push( @xml, header(shift) );
     push( @xml, record( $record ) );
     push( @xml, footer() );
     return( join( "\n", @xml ) );

Reply via email to