I've been attempting to beat the MARC::File::XML stuff into a usable shape as of late, so I'm going to take a stab at fixing this. There will be some limitations (at first) as to what encodings we'll accept for UNIMARC records, but I'll cover the cases that I know about (and understand).
Here's the plan: I will add a use flag to set the script-wide default for record format use MARC::File::XML ( RecordFormat => 'UNIMARC' ); that will default to MARC21. There will also be a class method to set this flag MARC::File::XML->default_record_format( 'UNIMARC' ); and, finally, a flag to both as_xml and new_from_xml to tell MARC::File::XML about individual records. I don't think, at this point, we should autodetect based on the existence of a 200 tag, as I'd like to stay away from heuristics if it can be avoided. If others disagree, please make the case! When processing a UNIMARC record, I'll look in 100$a for the encoding, and proceed if it's either 01 (iso646 -- nominally compatible with iso8859, though it requires interpretation) or 50 (UNICODE, which will always mean UTF8 in XML produced by MARC::File::XML). If it's anything else an error will be thrown. We can add support for other encodings as the direct need arises. For UNIMARC/UNICODE, the XML is obviously going to be UTF-8 encoded. For UNIMARC/ISO646, the XML will be marked as ISO-8859-1. Yes, it's a bit of a fib, but most XML parsers don't support ISO646, and most do support LATIN1 (8859-1), and the bytes won't get mangled by the parser in that case. Comments? On 3/16/06, Zeno Tajoli <[EMAIL PROTECTED]> wrote: > Hi, > > >PROBLEM : > >* in MARC21, the encoding is defined by position 9 of the leader. > >'a' means UTF-8 > >* in UNIMARC, this is an empty position ! the encoding is in > >positions 26-27 and 28-29 of 100$a (<200 are all fixed coded fields > >in unimarc : http://bibliotheque.bgp-fr.com/Unimarc_abrege.pdf, page > >8 for 100$a) > > > >BIG PROBLEM : > >MARC::File::XML only checks for position 9, thinking the XML is > >necessary a marc21 file. > > > >I think (& joshua agrees) we will have to hack MARC::File::XML to > >solve this problem. > >We have 2 solutions : > >* add a test to define wether we are UNIMARC or MARC21. In UNIMARC, > >title is in 200, while 200 is empty in MARC21. > >* add a parameter to ->new_as_xml($xml,'UTF-8','UNIMARC') to specify > >we are sending the parser an unimarc file. > > as a person that has write a Unimarc -> MARC21 converter, I prefer > the second solution. > > Thanks for all > Bye > > Zeno Tajoli > CILEA - Segrate (MI) > tajoliAT_SPAM_no_prendiATcilea.it > (Indirizzo mascherato anti-spam; sostituisci quanto tra AT con @) > > -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org