On 2019-03-17 15:01:40 +0000, Warner, Gary, Jr wrote:
> Many of us have faced character encoding issues because we are not in control
> of our input sources and made the common assumption that UTF-8 covers
> everything.

UTF-8 covers "everything" in the sense that there is a round-trip from
each character in every commonly-used charset/encoding to Unicode and
back.

The actual code may of course be different. For example, the € sign is
0xA4 in iso-8859-15, but U+20AC in Unicode. So you need an
encoding/decoding step.

And "commonly-used" means just that. Unicode covers a lot of character
sets, but it can't cover every character set ever invented (I invented
my own character sets when I was sixteen. Nobody except me ever used
them and they have long succumbed to bit rot).

> In my lab, as an example, some of our social media posts have included ZawGyi
> Burmese character sets rather than Unicode Burmese.  (Because Myanmar 
> developed
> technology In a closed to the world environment, they made up their own
> non-standard character set which is very common still in Mobile phones.).

I'd be surprised if there was a character set which is "very common in
Mobile phones", even in a relatively poor country like Myanmar. Does
ZawGyi actually include characters which aren't in Unicode are are they
just encoded differently?

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | h...@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment: signature.asc
Description: PGP signature

Reply via email to