Hi all,

I ran across some gnarly MARC data today, which contained, among other
things, MARC codes of "<".  I realized that Marc::File::XML outputs the MARC
tags, codes, and indicators without escaping them.  This results, in my
case, in invalid XML like:

<subfield code="<">France</subfield>

It seems reasonable that, regardless of the (horrible) content of the MARC,
marc::file::xml should produce valid XML.

Attached is a patch to explicitly escape the values before inserting them
into the XML document under construction.  I'm not sure if it's the best
approach, but it got me up and running again.

Thanks,

-b
--- XML.pm.orig	2008-10-29 16:29:47.000000000 -0400
+++ XML.pm	2008-10-29 16:33:13.000000000 -0400
@@ -346,17 +346,17 @@
     push( @xml, "  <leader>" . escape( $record->leader ) . "</leader>" );
 
     foreach my $field ( $record->fields() ) {
-        my $tag = $field->tag();
+        my ($tag) = escape( $field->tag() );
         if ( $field->is_control_field() ) { 
             my $data = $field->data;
             push( @xml, qq(  <controlfield tag="$tag">) .
                     escape( ($_transcode ? marc8_to_utf8($data) : $data) ). qq(</controlfield>) );
         } else {
-            my $i1 = $field->indicator( 1 );
-            my $i2 = $field->indicator( 2 );
+            my ($i1) = escape( $field->indicator( 1 ) );
+            my ($i2) = escape( $field->indicator( 2 ) );
             push( @xml, qq(  <datafield tag="$tag" ind1="$i1" ind2="$i2">) );
             foreach my $subfield ( $field->subfields() ) { 
-                my ( $code, $data ) = @$subfield;
+                my ( $code, $data ) = ( escape( $$subfield[0] ), $$subfield[1] );
                 push( @xml, qq(    <subfield code="$code">).
                         escape( ($_transcode ? marc8_to_utf8($data) : $data) ).qq(</subfield>) );
             }

Reply via email to