Werner LEMBERG <w...@gnu.org> writes: >>> If we get an invalid UTF-8 sequence, I'm all for it. But it is not >>> too difficult to not get invalid sequences but still have wrong >>> output. >> >> Theoretically. But it is impossible to write just a single >> non-ASCII byte without hitting an invalid sequence since all >> non-ASCII bytes must be part of multi-byte sequences. Only >> combinations of non-ASCII bytes can form valid utf-8 sequences, and >> the probability of several of them being "just right" is not all >> that high. > > For single-byte encodings, you are correct. However, the probability > is *much* higher if you consider legacy two-byte encodings for CJK > scripts.
The probability of people accidentally writing two-byte encodings for CJK scripts in an ASCII-based programming language and being totally surprised by coding issues is not all that high. I also consider it much much more likely that somebody unused to coding problems tries getting just a composer's name right in a Latin script is higher than with Chinese letters. It is much easier to make your computer produce a diacritical Latin letter foreign to you (like with using a Compose key) than produce a Chinese letter. So I don't really see the point in giving up before trying. -- David Kastrup _______________________________________________ bug-lilypond mailing list bug-lilypond@gnu.org https://lists.gnu.org/mailman/listinfo/bug-lilypond