Hi,

Yes, the value 0xFFFF can be stored as either 3 byte UTF-8 string or a 2 byte UCS-2 value, but the Unicode standard specifically says that the values 0xFFFF, 0xFFFE and 0xFEFF are NOT valid codepoints and should never appear in a Unicode string. 0xFFFF is reserved for out-of-band signaling (such the -1 returnd by getc()) and 0xFFFE and 0xFEFF are specificaly reserved for out-of-band marking a UCS-2 file as being either bigendian or littlendian, but are specifically not considered part of the data. chr() is currently defined to mean convert an int value to a Unicode codepoint. That's why I said that chr(65535) should return an exception, it's an argument error similar to sqrt(-1).

Thanks, I didn't know about it. I thought they just not appear in UTF-8 coded strings, but you're right. I recommend it to raise an exception, too.


Bye,
  Andras

Reply via email to