Your MARC records appear to be encoded in MARC-8 as evidenced by "ergáo" in 
which the combining
accent character comes before the character to be modified.  I.e. the byte 
string that displays as
"ergáo" in your email would display as "ergò" (with a Latin small letter o with 
grave) in a MARC-8
aware client.

I'd just like to relate my recent experiences of retrieving MARC21 records through various library Z39.50 servers. Put simply, you cannot trust the MARC leader character
9 to correctly indicate the character set used.

From libraries that have set the leader to indicate the records are in the MARC-8 character
set, I have retrieved records encoded as Latin-1, UTF-8 and MARC-8.

From libraries that set the leader to indicate Unicode, I get records in MARC-8
and UTF-8.

You also get encodings in MARC-8 records like \1EF6 to indicate a Unicode character. I think 〹 is now legal in MARC-8 now to indicate a Unicode character that isn't
in the MARC-8 repertoire.

So, basically, you either need prior knowledge about the actual character encoding
used, or you have to test. Testing for UTF-8 is fairly straightforward and a
long string of text (which admittedly you don't tend to get in MARC records) that tests as UTF-8 is very unlikely to be anything else. Distinguishing Latin-1 from MARC-8 is a bit more like guess work. As a test for MARC-8 I look for the common
combining diacritics followed by a vowel.

Regards,

Ashley.
--
Ashley Sanders               [EMAIL PROTECTED]
Copac http://copac.ac.uk A MIMAS Service funded by JISC

Reply via email to