Re: [HACKERS] invalidly encoded strings

Martijn van Oosterhout Mon, 10 Sep 2007 09:09:30 -0700

On Tue, Sep 11, 2007 at 12:30:51AM +0900, Tatsuo Ishii wrote:
> Why do you think that employing the Unicode code point as the chr()
> argument could avoid endianness issues? Are you going to represent
> Unicode code point as UCS-4? Then you have to specify the endianness
> anyway.  (see the UCS-4 standard for more details)


Because the argument to chr() is an integer, which has no endian-ness.
You only get into endian-ness if you look at how you store the
resulting string.

> Also I'd like to point out all encodings has its own code point
> systems as far as I know. For example, EUC-JP has its corresponding
> code point systems, ASCII, JIS X 0208 and JIS X 0212. So I don't see
> we can't use "code point" as chr()'s argument for othe encodings(of
> course we need optional parameter specifying which character set is
> supposed).

Oh, the last discussion on this didn't answer this question. Is there a
standard somewhere that maps integers to characters in EUC-JP. If so,
how can I find out what character 512 is?

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] invalidly encoded strings

Reply via email to