Hi Michele:

Yes, I see a UTF-8 encoding error in that file when I try to check it
with xmllint (from the libxml2 package):

e...@curry:~/Downloads$ xmllint marc.xml
marc.xml:1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE0 0x20 0x3A 0x3C
ld code="b">le infrastrutture, l' organizzazione, i contratti e le responsabilit

This causes MARC::Record->new_from_xml to blow up too, with a somewhat
unhelpful error:

not well-formed (invalid token) at line 1, column 1533, byte 1533 at
/usr/lib/perl5/XML/Parser.pm line 187

It looks like your xml file might be in ISO-8859-1 (at least the unix
file command told me):

e...@curry:~/Projects/marc-xml$ file marc.xml
marc.xml: ISO-8859 text, with very long lines, with no line terminators

So you could try to convert your XML string with Encode before handing
it off to MARC::Record->new_from_xml:

  use Encode;
  Encode->from_to($xml, 'iso-8859-1', 'utf-8');

I attached the full script which seems to work OK. Note, if you are on
ubuntu it looks like they are a few versions back on their
libmarc-xml-perl package (v0.88) instead of the latest on CPAN (v0.92)
... and v0.88 doesn't handle namespaces properly...

//Ed

Reply via email to