On Tue, Feb 12, 2002, Shachar Shemesh wrote about "Re: Linux filenames with definite 
encoding (Was: FTP server with intl support)":
>...
> >UTF-8 is designed to be 100% backwards compatible with ASCII -- the
> >encoding of an ASCII string in UTF-8 is exactly the same. Series of 
> >two or more non-ASCII characters (over 127) stand for higher Unicode. 
> >So if you take a UTF-8 string of a Latin language text and try to 
> >display it as if it were Latin-1, or alternatively pass it through
> >a 7-bit-only system and try to display it as ASCII, most of it will
> >come out intact (including the "/") and only the special characters
> >will have noise in their place.
> >
> Again, totally irrelevant. If the "aleph" character happened to have a 
> "/" as one of the bytes of the encoding, non-UTF parsers would not allow 
> you to have a filename with Aleph. I am well aware that this doesn't 
> happen, and am only brining that as a clarifying example for my previous 
> claim that encoding free parsing is not possible.

I you are "well aware that this doesn't happen", what are you arguing about??
Please read again what he said, and perhaps the utf-8 manual, if you don't
know how exactly utf8 works (I'm not saying you don't - maybe you're just
playing the devil's advocate ;)).

In UTF8, a multibyte character (i.e., any character with accent, japanese,
Hebrew, or whatever) is always composed *ONLY* from non-ascii characters
(c>=128). (incidentally, they are further limited in a way that you can
always recognize the first byte of a UTF8 multibyte character).

So "/" or null (space, or any other ascii character) CANNOT happen to be one
byte out of Aleph, or any other unicode character, because these are all
composed only from non-ascii bytes.
This was an explicit design decision of UTF8, and not some "lucky accident".
Other encodins - such as UCS-16 (each character is two bytes) - indeed do not
have this property and hence are quite useless in practice on Unix-like systems
(except for an internal representation).

-- 
Nadav Har'El                        |     Tuesday, Feb 12 2002, 30 Shevat 5762
[EMAIL PROTECTED]             |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |There are 2 ways to do it - my way and
http://nadav.harel.org.il           |the right way

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to