At Wed, 17 Feb 1999 23:10:16 +0100, <[EMAIL PROTECTED]> wrote: > P.D. is there any substantial difference between the various characters > (Auto Detect, Shift-JIS, and EUCP-JP)?
For Japanese characters, there are three major encoding methods (in Internet world): - ISO-2022-JP (or so called JIS), variant ISO-2022, using ESC sequence to switch character sets. This encoding is used for Japanese e-mail and/or NetNews articles. - EUC-JP, also variant ISO-2022, but it uses GL plane for Japanese character sets, and it needs 8bit for each bytes. This encoding is usually used for Japanese text on UNIX boxes. - Shift_JIS, is code point shifted version. This encoding is usually used for Japanese text on DOS/Windows/Mac boxes. Someone uses ISO-2022-JP for HTML, others uses EUC-JP, and others uses Shift_JIS. Auto Detect will detect which encoding is used automatically. But, between EUC-JP and Shift_JIS, there are some byte patterns which can not detect whether EUC-JP or Shift_JIS. Regards, Fumitoshi UKAI / Debian JP Project