Your MARC records appear to be encoded in MARC-8 as evidenced by "ergáo" in
which the combining
accent character comes before the character to be modified. I.e. the byte
string that displays as
"ergáo" in your email would display as "ergò" (with a Latin small letter o with
grave) in a MARC-8
aware client.
I'd just like to relate my recent experiences of retrieving MARC21
records through
various library Z39.50 servers. Put simply, you cannot trust the MARC
leader character
9 to correctly indicate the character set used.
From libraries that have set the leader to indicate the records are in
the MARC-8 character
set, I have retrieved records encoded as Latin-1, UTF-8 and MARC-8.
From libraries that set the leader to indicate Unicode, I get records
in MARC-8
and UTF-8.
You also get encodings in MARC-8 records like \1EF6 to indicate a
Unicode character.
I think 〹 is now legal in MARC-8 now to indicate a Unicode
character that isn't
in the MARC-8 repertoire.
So, basically, you either need prior knowledge about the actual
character encoding
used, or you have to test. Testing for UTF-8 is fairly straightforward and a
long string of text (which admittedly you don't tend to get in MARC
records) that
tests as UTF-8 is very unlikely to be anything else. Distinguishing
Latin-1 from
MARC-8 is a bit more like guess work. As a test for MARC-8 I look for
the common
combining diacritics followed by a vowel.
Regards,
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk A MIMAS Service funded by JISC