I've updated the cvs for MARC::File::XML with what I described below, with one caveat. The one difference from what I was planning is that, because as_xml() is generated by MARC::Record, I can't give it new parameters. To test exporting to XML you'll need to set the record format for export either in the use line for the module or using the default_record_format() class method. Just call that with 'UNIMARC' as the parameter and then export your record as normal using as_xml() on the MARC::Record object.
(new_from_xml() does not suffer from this as that method is defined in MARC/File/XML.pm, so it takes both an encoding parameter and a format paramter, as explained in the documentation.) Will some brave soul please test this with some UNIMARC records and let me know how it goes? ----------------------------------- CVS checkout intsructions cvs -d:pserver:[EMAIL PROTECTED]:/cvsroot/marcpm login cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/marcpm co -P marc-xml Then, cd marc-xml perl Makefile.PL make make test And assuming 'make test' succeeds ... make install ------------------------------- Thanks in advance, --miker On 3/16/06, Mike Rylander <[EMAIL PROTECTED]> wrote: > I've been attempting to beat the MARC::File::XML stuff into a usable > shape as of late, so I'm going to take a stab at fixing this. There > will be some limitations (at first) as to what encodings we'll accept > for UNIMARC records, but I'll cover the cases that I know about (and > understand). > > Here's the plan: > > I will add a use flag to set the script-wide default for record format > > use MARC::File::XML ( RecordFormat => 'UNIMARC' ); > > that will default to MARC21. There will also be a class method to set this > flag > > MARC::File::XML->default_record_format( 'UNIMARC' ); > > and, finally, a flag to both as_xml and new_from_xml to tell > MARC::File::XML about individual records. I don't think, at this > point, we should autodetect based on the existence of a 200 tag, as > I'd like to stay away from heuristics if it can be avoided. If others > disagree, please make the case! > > When processing a UNIMARC record, I'll look in 100$a for the encoding, > and proceed if it's either 01 (iso646 -- nominally compatible with > iso8859, though it requires interpretation) or 50 (UNICODE, which will > always mean UTF8 in XML produced by MARC::File::XML). If it's > anything else an error will be thrown. We can add support for other > encodings as the direct need arises. > > For UNIMARC/UNICODE, the XML is obviously going to be UTF-8 encoded. > For UNIMARC/ISO646, the XML will be marked as ISO-8859-1. Yes, it's a > bit of a fib, but most XML parsers don't support ISO646, and most do > support LATIN1 (8859-1), and the bytes won't get mangled by the parser > in that case. > > Comments? > > On 3/16/06, Zeno Tajoli <[EMAIL PROTECTED]> wrote: > > Hi, > > > > >PROBLEM : > > >* in MARC21, the encoding is defined by position 9 of the leader. > > >'a' means UTF-8 > > >* in UNIMARC, this is an empty position ! the encoding is in > > >positions 26-27 and 28-29 of 100$a (<200 are all fixed coded fields > > >in unimarc : http://bibliotheque.bgp-fr.com/Unimarc_abrege.pdf, page > > >8 for 100$a) > > > > > >BIG PROBLEM : > > >MARC::File::XML only checks for position 9, thinking the XML is > > >necessary a marc21 file. > > > > > >I think (& joshua agrees) we will have to hack MARC::File::XML to > > >solve this problem. > > >We have 2 solutions : > > >* add a test to define wether we are UNIMARC or MARC21. In UNIMARC, > > >title is in 200, while 200 is empty in MARC21. > > >* add a parameter to ->new_as_xml($xml,'UTF-8','UNIMARC') to specify > > >we are sending the parser an unimarc file. > > > > as a person that has write a Unimarc -> MARC21 converter, I prefer > > the second solution. > > > > Thanks for all > > Bye > > > > Zeno Tajoli > > CILEA - Segrate (MI) > > tajoliAT_SPAM_no_prendiATcilea.it > > (Indirizzo mascherato anti-spam; sostituisci quanto tra AT con @) > > > > > > > -- > Mike Rylander > [EMAIL PROTECTED] > GPLS -- PINES Development > Database Developer > http://open-ils.org > -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org