Re: Unicode BOM marks

2005-03-13 Thread Steve Horsley
Martin v. LÃwis wrote: Steve Horsley wrote: It is my understanding that the BOM (U+feff) is actually the Unicode character "Non-breaking zero-width space". My understanding is that this used to be the case. According to http://www.unicode.org/faq/utf_bom.html#38 the application should now speci

Re: Unicode BOM marks

2005-03-09 Thread "Martin v. LÃwis"
Steve Horsley wrote: It is my understanding that the BOM (U+feff) is actually the Unicode character "Non-breaking zero-width space". My understanding is that this used to be the case. According to http://www.unicode.org/faq/utf_bom.html#38 the application should now specify specific processing,

Re: Unicode BOM marks

2005-03-09 Thread Steve Horsley
Francis Girard wrote: Le lundi 7 Mars 2005 21:54, "Martin v. LÃwis" a Ãcrit : Hi, Thank you for your very informative answer. Some interspersed remarks follow. I personally would write my applications so that they put the signature into files that cannot be concatenated meaningfully (since the si

Re: Unicode BOM marks

2005-03-08 Thread John Roth
""Martin v. LÃwis"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Francis Girard wrote: Well, no text files can't be concatenated ! Sooner or later, someone will use "cat" on the text files your application did generate. That will be a lot of fun for the new unicode aware "super-ca

Re: Unicode BOM marks

2005-03-08 Thread Francis Girard
Hi, Thank you for your answer. That confirms what Martin v. LÃwis says. You can choose between UCS-2 or UCS-4 for internal unicode representation. Francis Girard Le mardi 8 Mars 2005 00:44, Jeff Epler a ÃcritÂ: > On Mon, Mar 07, 2005 at 11:56:57PM +0100, Francis Girard wrote: > > BTW, the pytho

Re: Unicode BOM marks

2005-03-08 Thread Francis Girard
Hi, > Well, no. For example, Python source code is not typically concatenated, > nor is source code in any other language. We did it with C++ files in order to have only one compilation unit to accelarate compilation time over network. Also, all the languages with some "include" directive will

Re: Unicode BOM marks

2005-03-07 Thread "Martin v. LÃwis"
Francis Girard wrote: Well, no text files can't be concatenated ! Sooner or later, someone will use "cat" on the text files your application did generate. That will be a lot of fun for the new unicode aware "super-cat". Well, no. For example, Python source code is not typically concatenated, nor

Re: Unicode BOM marks

2005-03-07 Thread Jeff Epler
On Mon, Mar 07, 2005 at 11:56:57PM +0100, Francis Girard wrote: > BTW, the python "unicode" built-in function documentation says it returns a > "unicode" string which scarcely means something. What is the python > "internal" unicode encoding ? The language reference says farily little about unic

Re: Unicode BOM marks

2005-03-07 Thread Francis Girard
Le lundi 7 Mars 2005 21:54, "Martin v. LÃwis" a ÃcritÂ: Hi, Thank you for your very informative answer. Some interspersed remarks follow. > > I personally would write my applications so that they put the signature > into files that cannot be concatenated meaningfully (since the > signature simp

Re: Unicode BOM marks

2005-03-07 Thread "Martin v. LÃwis"
Francis Girard wrote: If I understand well, into the UTF-8 unicode binary representation, some systems add at the beginning of the file a BOM mark (Windows?), some don't. (Linux?). Therefore, the exact same text encoded in the same UTF-8 will result in two different binary files, and of a slightl

Unicode BOM marks

2005-03-07 Thread Francis Girard
Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary representation, some systems add at the beginning of the file a BOM mark (Windows?), some don't. (Linux?). Therefo