Re: Latin-1-characters

2004-03-16 Thread Larry Wall
On Tue, Mar 16, 2004 at 10:17:57PM +0100, Karl Brodowsky wrote: : With FFFE and FEFF this seems obvious. In case of #! it would not be clear : to me if this defaults to ISO-8859-1 (latin-1) or to utf-8. See HTML : vs. XHTML as an example where the default has been changed. Perl 6 would certainly

Re: Latin-1-characters

2004-03-16 Thread Karl Brodowsky
Dear All, from what has been written by others, there are enough useful encodings other than utf-8, utf-16/UCS-2 and UCS-4 that support efficient storage even for unicode-files whose contents are Greek, Cyrillic, etc.. Sorry for the confusion caused by the fact that I was not aware of these. utf-

Re: Latin-1-characters

2004-03-16 Thread James Mastros
Karl Brodowsky wrote: Mark J. Reed wrote: The UTF-8 encoding is not so attractive in locales that make heavy use of characters which require several bytes to encode therein, or relatively little use of characters in the ASCII range; utf-8 is fine for languages like German, Polish, Norwegian, Spanis

Re: Latin-1-characters

2004-03-16 Thread Mark J. Reed
On 2004-03-16 at 00:28:32, Karl Brodowsky wrote: > Mark J. Reed wrote: > > >Unicode per se doesn't do anything to file sizes; it's all in how you > >encode it. > > Yes. And basically there are common ways to encode this: utf-8 and utf-16 > (or similar variants requiring >= 2 bytes per character)

Re: Latin-1-characters

2004-03-16 Thread mark . a . biggar
Another possibility is to use a UTF-8 extended system where you use values over 0x10 to encode temporary code block swaps in the encoding. I.e., some magic value means the one byte UTF-8 codes now mean the Greek block instead of the ASCII block. But you would need broad agreement for that t

Re: Latin-1-characters

2004-03-15 Thread Dan Sugalski
At 11:36 PM + 3/15/04, [EMAIL PROTECTED] wrote: Another possibility is to use a UTF-8 extended system where you use values over 0x10 to encode temporary code block swaps in the encoding. I.e., some magic value means the one byte UTF-8 codes now mean the Greek block instead of the ASCII b

Re: Latin-1-characters

2004-03-15 Thread Dan Sugalski
At 12:28 AM +0100 3/16/04, Karl Brodowsky wrote: Anyway, it will be necessary to specify the encoding of unicode in some way, which could possibly allow even to specify even some non-unicode-charsets. While I'll skip diving deeper into the swamp that is character sets and encoding (I'm already up

Re: Latin-1-characters

2004-03-15 Thread Karl Brodowsky
Mark J. Reed wrote: Unicode per se doesn't do anything to file sizes; it's all in how you encode it. Yes. And basically there are common ways to encode this: utf-8 and utf-16 (or similar variants requiring >= 2 bytes per character) The UTF-8 encoding is not so attractive in locales that make heav

Re: Latin-1-characters

2004-03-15 Thread Mark J. Reed
On 2004-03-13 at 09:02:50, Karl Brodowsky wrote: > For these guys Unicode is not so attractive, because it kind of doubles the > size of their files, Unicode per se doesn't do anything to file sizes; it's all in how you encode it. The UTF-8 encoding is not so attractive in locales that make heav