Hi Henri-Damien,
> And any LOWERCASE DIGRAPH AE or UPPERCASE DIGRAPH AE or
> LOWERCASE DIGRAPH OE is not well encoded. Encoding is
> **assumed** to be latin1 translated into utf-8 in the
> catalogue I am working on but appears respectively µ, ¥,¶
> in biblios.
hex MARC-8 ISO-8859-1 (Latin-1)
- ---- -------------------- --------------------
µ 0xB5 LOWERCASE DIGRAPH AE MICRO SIGN
¥ 0xA5 UPPERCASE DIGRAPH AE YEN SIGN
¶ 0xB6 LOWERCASE DIGRAPH OE PILCROW SIGN
> Is there a way to fix things up ?
If the underlying numerical encoding in your MARC records for the digraphs in
question is hex 0xB5, 0xA5, and 0xB6, then the character set is not Latin-1; it
is MARC-8. If that is the case, I don't believe that anything needs to be
fixed; if you are using MARC::Charset to convert the records from MARC-8 to
UTF-8, it should work.
However, it may also be that I am misunderstanding the issue. It would help if
you could provide the pertinent Perl code you are using for the character set
translation and a couple of the MARC records with digraphs that are failing.
> ... but appears respectively µ, ¥,¶ in biblios.
Please excuse my ignorance, but what is 'biblios' in the context of this
discussion?
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -----Original Message-----
> From: Henri-Damien LAURENT [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 14, 2007 4:18 AM
> To: Doran, Michael D; perl4lib
> Subject: Re: MARC::Charset
>
> Doran, Michael D a écrit :
> > Hi Henri,
> >
> > Although in my email client, the character in question
> appears as a MICRO SIGN ("µ"), I am assuming that it is
> actually meant to be a LOWERCASE DIGRAPH AE ("æ") since that
> is consistent with the Latin vernacular text in your record.
> In MARC-8, the LOWERCASE DIGRAPH AE character is a
> precomposed character represented by 0xB5 in hex [1]. You
> mention that you are using MARC::File::XML which in turn uses
> MARC::Charset. I'm wondering if there is some confusion as
> to the expected encoding of the MARC records being
> processed/converted? If MARC::Charset is expecting MARC21
> Unicode/UCS encoded records, but is actually getting MARC-8
> encoded records, then in that context it likely wouldn't know
> what to do with the 0xB5 octet and that might be the cause of
> the error you are seeing.
> >
> > -- Michael
> >
> > [1] Your MARC records appear to be encoded in MARC-8 as
> evidenced by "ergáo" in which the combining accent character
> comes before the character to be modified. I.e. the byte
> string that displays as "ergáo" in your email would display
> as "ergò" (with a Latin small letter o with grave) in a
> MARC-8 aware client.
> >
> >
> Thanks for your answer.
> Well, this could be a precious hint.
> Indeed, in that catalogue I want to process, some books are
> ancient books and were catalogued from OCLC or SUDOC.
> And any LOWERCASE DIGRAPH AE or UPPERCASE DIGRAPH AE or
> LOWERCASE DIGRAPH OE is not well encoded. Encoding is
> **assumed** to be latin1 translated into utf-8 in the
> catalogue I am working on but appears respectively µ, ¥,¶ in biblios.
>
> Is there a way to fix things up ?
>
> --
> Henri Damien LAURENT et Paul POULAIN
> Consultants indépendants
> en logiciels libres et bibliothéconomie (http://www.koha-fr.org)
>