> > Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > >> I'm confused. If this is exactly the same as EUC_JP, why do we need > > >> any new code at all? > > > > > I said *encoding schema" is same, not the contents (character set) is > > > same. In another word, characters included in EUC_JP are not same as > > > EUC_JIS_2004. > > > > I'm still confused. If the set of characters is different, then surely > > we need at least a different UTF8<->EUC_JIS_2004 conversion function? > > Yes, exactly. I will come up with new conversions later.
I have committed changes to add JIS X 0213 along with conversions. New encodings: EUC_JIS_2004: JIS X 0213 encoded in EUC SHIFT_JIS_2004: JIS X 0213 encoded in Shift JIS (client only encoding) These encodings support following character sets: ASCII, JIS X 0201 (single byte "katakana"), JIS X 0213 plane 1, 2 New conversions: EUC_JIS_2004 --> UTF8: euc_jis_2004_to_utf8 UTF8 --> EUC_JIS_2004: utf8_to_euc_jis_2004 SHIFT_JIS_2004 --> UTF8: shift_jis_2004_to_utf8 UTF8 --> SHIFT_JIS_2004: utf8_to_shift_jis_2004 EUC_JIS_2004 --> SHIFT_JIS_2004: euc_jis_2004_to_shift_jis_2004 SHIFT_JIS_2004 --> EUC_JIS_2004: shift_jis_2004_to_euc_jis_2004 To generate conversion maps, I have created two perl scripts UCS_to_SHIFT_JIS_2004.pl and UCS_to_EUC_JIS_2004.pl, which use sjis-0213-2004-std.txt and euc-jis-2004-std.txt as the source of conversion specification. They are freely obtained from http://x0213.org. Conversions to UTF-8 from EUC_JIS_2004 and SHIFT_JIS_2004 require supporting UTF-8 "combined characters" i.e. a logical character consists of two UTF-8 characters. To implement this, I have modified LocalToUtf() and UtfToLocal() by adding new parameter: "combined character map". docs changes and regression test changes are committed too. Beware that I have updated catalog versions. Please do initdb. -- Tatsuo Ishii SRA OSS, Inc. Japan ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings