hi, I have some problems with Marc::File::XML parser. Take those two xml records. Despite the fact that I agree that there are odd characters in some subfields. I am wondering why, since those characters are UTF8, MARC::File::XML should drop them when parsing. Is there a reason why MARC::File::XML considers only a very strict subset of utf-8 as valid ? (For instance no linebreak, no ...) ?
Couldnot it say "OK It is XML record, encoded UTF8, i take it for granted and no matter if there are "odd" characters" ? This could be a really BIG trouble for kanjis or hindu languages imho. <?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/ standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <leader>00150nx a2200073 4500 </leader> <datafield tag="200" ind1=" " ind2="1"> <subfield code="a">Nicolas</subfield> <subfield code="b">Jérôme</subfield> <subfield code="4">Traducteur</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">19980124afrey50 ba0</subfield> </datafield> <controlfield tag="001">3568</controlfield> <datafield tag="152" ind1=" " ind2=" "> <subfield code="b">NP</subfield> </datafield> </record> <?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/ standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <leader>00151nx a2200073 4500 </leader> <datafield tag="200" ind1=" " ind2="1"> <subfield code="a">Guynemer</subfield> <subfield code="b">Georges</subfield> <subfield code="f">(1894-1917)</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">19980129afrey50 ba0</subfield> </datafield> <controlfield tag="001">4642</controlfield> <datafield tag="152" ind1=" " ind2=" "> <subfield code="b">NP</subfield> </datafield> </record> -- Henri Damien LAURENT et Paul POULAIN Consultants indépendants en logiciels libres et bibliothéconomie (http://www.koha-fr.org)