Dan Sugalski wrote:

At 6:03 PM -0600 4/21/04, kj wrote:

Hello folks,

This will be of interest to only a few people, but it will be good to have it in the archives for when we need it.

Here is a list of Korean character sets that represent hangul (Korean symbols) and hanja (Sino-Korean):

- EUC-KR (KSC 5601, renamed to KS X 1001) or Microsoft's superset UHC
- ISO-2022 comes in both -JP and -KR versions.
- johab is a legacy 16-bit encoding, leading bit = 1 + 3 * 5 bits for leading consonant, vowel, optional consonant(s) at the end
http://trade.chonbuk.ac.kr/~leesl/code/johap.gif


Ah, cool. Looks like that stuff's in the O'reilly CJKV book (which I desperately want a second edition of) but that book's a bit slanted towards Chinese and Japanese.

The URL above goes to a useful table for working with johab. I do know it is a legacy charset, but I don't know how much it is still used. Technically, ASCII is legacy, too. :)


Ah, at this point Unicode's legacy too. Besides, as long as RAD-50 lives, nobody's got much standing to call a character set "Legacy" :)

Do we have any local experts on Japanese charsets? If not, I can do a little bit of research there, too.


There, at least, I can get access to folks who've done work, and I can get by enough myself that I'm not too worried.

I don't agree with the Unicode legacy comment... :-(


But if you want to see another source of mapping tables, you can try this one: http://oss.software.ibm.com/icu/charset/index.html

I'm sure Dan and others are aware of ICU's charset repository. It contains mapping tables that I have been able collect from various platforms. Others may find it useful too.

Unicode can also represent the hangul and hanja characters.

George

Reply via email to