UTF-8 has big and little endian byte orderings.
If there is no byte mark, it will be significant to use a particular byte ordering (either little-endian or big-endian).
If there is a BOM, then it can be interrogated and the UTF can be interpret in either fashion.
Even so, I think that it would be best to settle upon a particular byte ordering.
Windows does it backward from the rest of the world.


Chris Little wrote:



Troy A. Griffitts wrote:

My guess about the characters which keep the .conf file from being recognized... try adding a few newlines to the beginning of the file. I would guess that XXX[Section Name] at the beginning is just causing our .conf reader to not recognize the "Section Name".


The three characters are the Unicode byte-order mark (BOM). See http://www.unicode.org/faq/utf_bom.html#BOM for full details. But, basically, it's the codepoint U+FEFF, encoded at the beginning of a file. From this character, you can tell whether you have UTF-16 big-endian, UTF-16 little-endian, or UTF-8.

I would recommend we go ahead and support it (to the extent that we check for it and throw it away) since it's not something that just notepad adds to file. (No need to fix before the trip, though, I think.)

--Chris

_______________________________________________
sword-devel mailing list
sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel

_______________________________________________
sword-devel mailing list
sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel

Reply via email to