hi,
I have some problems with Marc::File::XML parser.

Take those two xml records.
Despite the fact that I agree that there are odd characters in some
subfields.
I am wondering why, since those characters are UTF8, MARC::File::XML
should drop them when parsing.
Is there a reason why MARC::File::XML considers only a very strict
subset of utf-8 as valid ? (For instance no linebreak, no ...) ?

Couldnot it  say "OK It is XML record, encoded UTF8, i take it for
granted and no matter if there are "odd" characters" ?
This could be a really BIG trouble for kanjis or hindu languages imho.


<?xml version="1.0" encoding="UTF-8"?>
<record
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
 xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/
standards/marcxml/schema/MARC21slim.xsd"
 xmlns="http://www.loc.gov/MARC21/slim";>

 <leader>00150nx  a2200073   4500 </leader>
 <datafield tag="200" ind1=" " ind2="1">
   <subfield code="a">Nicolas</subfield>
   <subfield code="b">Jérôme</subfield>
   <subfield code="4">Traducteur</subfield>
 </datafield>
 <datafield tag="100" ind1=" " ind2=" ">
   <subfield code="a">19980124afrey50      ba0</subfield>
 </datafield>
 <controlfield tag="001">3568</controlfield>
 <datafield tag="152" ind1=" " ind2=" ">
   <subfield code="b">NP</subfield>
 </datafield>
</record>
<?xml version="1.0" encoding="UTF-8"?>
<record
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
 xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/
standards/marcxml/schema/MARC21slim.xsd"
 xmlns="http://www.loc.gov/MARC21/slim";>

 <leader>00151nx  a2200073   4500 </leader>
 <datafield tag="200" ind1=" " ind2="1">
   <subfield code="a">Guynemer</subfield>
   <subfield code="b">Georges</subfield>
   <subfield code="f">(1894-1917)</subfield>
 </datafield>
 <datafield tag="100" ind1=" " ind2=" ">
   <subfield code="a">19980129afrey50      ba0</subfield>
 </datafield>
 <controlfield tag="001">4642</controlfield>
 <datafield tag="152" ind1=" " ind2=" ">
   <subfield code="b">NP</subfield>
 </datafield>
</record>

-- 
Henri Damien LAURENT et Paul POULAIN
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)


Reply via email to