Doran, Michael D a écrit :
> Hi Henri,
>   
> Although in my email client, the character in question appears as a MICRO 
> SIGN ("µ"), I am assuming that it is actually meant to be a LOWERCASE DIGRAPH 
> AE ("æ") since that is consistent with the Latin vernacular text in your 
> record.  In MARC-8, the LOWERCASE DIGRAPH AE character is a precomposed 
> character represented by 0xB5 in hex [1].  You mention that you are using 
> MARC::File::XML which in turn uses MARC::Charset.  I'm wondering if there is 
> some confusion as to the expected encoding of the MARC records being 
> processed/converted?  If MARC::Charset is expecting MARC21 Unicode/UCS 
> encoded records, but is actually getting MARC-8 encoded records, then in that 
> context it likely wouldn't know what to do with the 0xB5 octet and that might 
> be the cause of the error you are seeing.
>
> -- Michael
>
> [1] Your MARC records appear to be encoded in MARC-8 as evidenced by "ergáo" 
> in which the combining accent character comes before the character to be 
> modified.  I.e. the byte string that displays as "ergáo" in your email would 
> display as "ergò" (with a Latin small letter o with grave) in a MARC-8 aware 
> client.
>   
>   
Thanks for your answer.
Well, this could be a precious hint.
Indeed, in that catalogue I want to process, some books are ancient
books and were catalogued from OCLC or SUDOC.
And any LOWERCASE DIGRAPH AE or UPPERCASE DIGRAPH AE or LOWERCASE
DIGRAPH OE is not well encoded. Encoding is **assumed** to be latin1
translated into utf-8 in the catalogue I am working on but appears
respectively µ, ¥,¶ in biblios.

Is there a way to fix things up ?

-- 
Henri Damien LAURENT et Paul POULAIN 
Consultants indépendants 
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)

Reply via email to