On Tue, Feb 12, 2002, Shachar Shemesh wrote about "Re: Linux filenames with definite encoding (Was: FTP server with intl support)": >... > >UTF-8 is designed to be 100% backwards compatible with ASCII -- the > >encoding of an ASCII string in UTF-8 is exactly the same. Series of > >two or more non-ASCII characters (over 127) stand for higher Unicode. > >So if you take a UTF-8 string of a Latin language text and try to > >display it as if it were Latin-1, or alternatively pass it through > >a 7-bit-only system and try to display it as ASCII, most of it will > >come out intact (including the "/") and only the special characters > >will have noise in their place. > > > Again, totally irrelevant. If the "aleph" character happened to have a > "/" as one of the bytes of the encoding, non-UTF parsers would not allow > you to have a filename with Aleph. I am well aware that this doesn't > happen, and am only brining that as a clarifying example for my previous > claim that encoding free parsing is not possible.
I you are "well aware that this doesn't happen", what are you arguing about?? Please read again what he said, and perhaps the utf-8 manual, if you don't know how exactly utf8 works (I'm not saying you don't - maybe you're just playing the devil's advocate ;)). In UTF8, a multibyte character (i.e., any character with accent, japanese, Hebrew, or whatever) is always composed *ONLY* from non-ascii characters (c>=128). (incidentally, they are further limited in a way that you can always recognize the first byte of a UTF8 multibyte character). So "/" or null (space, or any other ascii character) CANNOT happen to be one byte out of Aleph, or any other unicode character, because these are all composed only from non-ascii bytes. This was an explicit design decision of UTF8, and not some "lucky accident". Other encodins - such as UCS-16 (each character is two bytes) - indeed do not have this property and hence are quite useless in practice on Unix-like systems (except for an internal representation). -- Nadav Har'El | Tuesday, Feb 12 2002, 30 Shevat 5762 [EMAIL PROTECTED] |----------------------------------------- Phone: +972-53-245868, ICQ 13349191 |There are 2 ways to do it - my way and http://nadav.harel.org.il |the right way ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]