Doran, Michael D a écrit :
> Hi Henri,
>
> Although in my email client, the character in question appears as a MICRO
> SIGN ("µ"), I am assuming that it is actually meant to be a LOWERCASE DIGRAPH
> AE ("æ") since that is consistent with the Latin vernacular text in your
> record. In MARC-8,
Your MARC records appear to be encoded in MARC-8 as evidenced by "ergáo" in
which the combining
accent character comes before the character to be modified. I.e. the byte
string that displays as
"ergáo" in your email would display as "ergò" (with a Latin small letter o with
grave) in a MARC-8
a
Hi Henri-Damien,
> And any LOWERCASE DIGRAPH AE or UPPERCASE DIGRAPH AE or
> LOWERCASE DIGRAPH OE is not well encoded. Encoding is
> **assumed** to be latin1 translated into utf-8 in the
> catalogue I am working on but appears respectively µ, ¥,¶
> in biblios.
hex MARC-8
Hi Ashley,
> I think 〹 is now legal in MARC-8 now to indicate a
> Unicode character that isn't in the MARC-8 repertoire.
Yes, that's also my understanding [1,2], though I've not personally come across
any records yet that use that method. (Although not being a cataloger, I don't
routinely exa
Michael,
So, basically, you either need prior knowledge about the
actual character encoding used, or you have to test. Testing
for UTF-8 is fairly straightforward...
How are you testing for UTF-8?
There's a handy perl regexp on the W3C web site at:
http://www.w3.org/International/questi
Hi Ashley,
Thanks for the info! Trying to keep up with i18n and/or character set stuff is
almost a full time job.
> > How are you testing for UTF-8?
>
> There's a handy perl regexp on the W3C web site at:
>
> http://www.w3.org/International/questions/qa-forms-utf-8
>
> You'll need to cha