On Sun, 4 Sep 2005, Werner LEMBERG wrote: > You are mixing up Unicode with one of its possible representations, > UTF-8. A Unicode character is a number between 0x0 and 0x10FFFF; > UTF-8 represents such code points as multi-byte sequences of varying > length, where the range 0x00-0x7F is identical to ASCII.
Thank you. I didn't know unicode was broader than UTF-8. The 3-byte value 10FFFF (rather than FFFFFF) seems like a rather strange upper limit, but that only points up the fact that I'm going to have to learn about unicode once I get through my current arranging binge. > Today, Windows uses Unicode exclusively -- even in North America. You > won't have big success with latin1 files. I routinely switch files between Latin1 text and MS-Word docs with no problem whatsoever. When one saves a file in Word selecting the type Text or Text With Line Breaks, one gets a Latin1 file -- and I have verified these text files (put out by Word) directly with a hex editor: e-acute, a-grave, etc. are all represented by a single byte, and it is the standard Latin1 byte. As far back as Word 97, Microsoft claimed that Word and its Visual Basic ("VBA") used unicode "internally". But if one looks at a Word .doc file with a hex editor, one sees that, in the file, all the French accented characters are stored as single-byte standard Latin1 codes. Microsoft's unicode claims are a marketing ploy; Latin1 still rules. > Well, it is straightforward to use a converter like `iconv' within a > script which automatically transforms your latin1 file into UTF-8. Yet another converter. Well, it's good to know that. But for the moment I encounter accented letters only in song titles (I use no lyrics), so typing in the UTF-8 double-byte for the rare accented character here and there takes about 3 seconds, which is easy. Thank you for taking the trouble to send me the information on unicode & UTF-8. -- Tom _______________________________________________ lilypond-user mailing list lilypond-user@gnu.org http://lists.gnu.org/mailman/listinfo/lilypond-user