I've been attempting to beat the MARC::File::XML stuff into a usable
shape as of late, so I'm going to take a stab at fixing this.  There
will be some limitations (at first) as to what encodings we'll accept
for UNIMARC records, but I'll cover the cases that I know about (and
understand).

Here's the plan:

I will add a use flag to set the script-wide default for record format

  use MARC::File::XML ( RecordFormat => 'UNIMARC' );

that will default to MARC21.  There will also be a class method to set this flag

  MARC::File::XML->default_record_format( 'UNIMARC' );

and, finally, a flag to both as_xml and new_from_xml to tell
MARC::File::XML about individual records.  I don't think, at this
point, we should autodetect based on the existence of a 200 tag, as
I'd like to stay away from heuristics if it can be avoided.  If others
disagree, please make the case!

When processing a UNIMARC record, I'll look in 100$a for the encoding,
and proceed if it's either 01 (iso646 -- nominally compatible with
iso8859, though it requires interpretation) or 50 (UNICODE, which will
always mean UTF8 in XML produced by MARC::File::XML).  If it's
anything else an error will be thrown.  We can add support for other
encodings as the direct need arises.

For UNIMARC/UNICODE, the XML is obviously going to be UTF-8 encoded. 
For UNIMARC/ISO646, the XML will be marked as ISO-8859-1.  Yes, it's a
bit of a fib, but most XML parsers don't support ISO646, and most do
support LATIN1 (8859-1), and the bytes won't get mangled by the parser
in that case.

Comments?

On 3/16/06, Zeno Tajoli <[EMAIL PROTECTED]> wrote:
> Hi,
>
> >PROBLEM :
> >* in MARC21, the encoding is defined by position 9 of the leader.
> >'a' means UTF-8
> >* in UNIMARC, this is an empty position ! the encoding is in
> >positions 26-27 and 28-29 of 100$a (<200 are all fixed coded fields
> >in unimarc : http://bibliotheque.bgp-fr.com/Unimarc_abrege.pdf, page
> >8 for 100$a)
> >
> >BIG PROBLEM :
> >MARC::File::XML only checks for position 9, thinking the XML is
> >necessary a marc21 file.
> >
> >I think (& joshua agrees) we will have to hack MARC::File::XML to
> >solve this problem.
> >We have 2 solutions :
> >* add a test to define wether we are UNIMARC or MARC21. In UNIMARC,
> >title is in 200, while 200 is empty in MARC21.
> >* add a parameter to ->new_as_xml($xml,'UTF-8','UNIMARC') to specify
> >we are sending the parser an unimarc file.
>
> as a person that has write a Unimarc -> MARC21 converter, I prefer
> the second solution.
>
> Thanks for all
> Bye
>
> Zeno Tajoli
> CILEA - Segrate (MI)
> tajoliAT_SPAM_no_prendiATcilea.it
> (Indirizzo mascherato anti-spam; sostituisci quanto tra AT con @)
>
>


--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

Reply via email to