David Kastrup wrote: > \lyricmode does not mean "Paste arbitrary text here". >
How is this relevant to anything I wrote? > LilyPond intentionally uses exclusively the ASCII character range for > syntactic purposes. ...except it doesn't, as stated. Lilypond source files aren't encoded in ASCII, and anything in a source file is (potentially) syntactic, at least as I understand that word. Lilypond source files can contain (AFAIK) any UTF-8 character. And that fact, I believe, means that Lilypond has to have at least a modicum of awareness of the properties of those characters as guaranteed by the Unicode standard. Consider: 1. Lilypond already recognizes multiple word-break characters: space (U+0020), newline, tab, and so on. 2. U+3000 IDEOGRAPHIC SPACE has essentially the same semantics as U+0020 SPACE (the differences are presentational, and the two characters are separate in Unicode largely due to historical accident). 3. Given 1. and 2., I think that it's silly to treat U+3000 semantically differently from U+0020 just because it happens not to match a certain 7-bit legacy encoding. :) > Everything else can be part of identifiers or > words. Any character can be part of a word, including {, }, \, space, and all the rest. That's why we have quoting constructs: "this is a syllable with { } in it". If the user wants a syllable with a space in it -- ideographic or otherwise -- I think that he *should* be forced to quote it. As for identifiers...are you saying that U+3000 IDEOGRAPHIC SPACE can be part of an identifier? If so...just...wow. I think this is a bad state of affairs. I don't think *any* (breaking) space character should be legal in an identifier (at least with Lilypond's syntax generally allowing spaces as delimiters). > That makes LilyPond documents robust against changes in Unicode. No, *Unicode's own stability policies* make Lilypond documents (and everything else) robust against changes in Unicode. For background, please see the Unicode Standard, especially v. 10 §3.5 ( http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf#page=28 ), as well as UAX #44 ( http://unicode.org/reports/tr44 ) and the various stability policies on http://www.unicode.org/policies/stability_policy.htm (especially the Property Stability Policy and the Property Value Stability Policies). For issues of word segmentation and identifier syntax specifically, please see UAX #29 ( http://www.unicode.org/reports/tr29 ) and UAX #31 ( http://www.unicode.org/reports/tr31) respectively. Basically, Unicode defines properties Pattern_White_Space and Pattern_Syntax (and some others) for identifier syntax, and White_Space for general purposes as well. In particular, the Pattern_* properties are *immutable*; that is, once defined for a character in a given version of the Unicode Standard, they are guaranteed to be the same for that character in every future version. There is a larger guarantee too, namely that a string legal as an identifier under one version of the Standard will stay legal as an identifier under every future version. In reality, you probably don't need to worry about the exact minutiae of these properties in most cases: every decent programming language these days has at least one Unicode string library that has already implemented logic based on them. But the conclusion here, I think, is that changes in Unicode are not something that we really need to worry about in this respect. Having established that, we can move on to what behavior will surprise the experienced Lilypond user least. For myself, I was *extremely* surprised that U+3000 doesn't behave like every other space, and so I don't think this is desirable behavior at all. Best, -- Marnen Laibow-Koser mar...@marnen.org http://www.marnen.org _______________________________________________ bug-lilypond mailing list bug-lilypond@gnu.org https://lists.gnu.org/mailman/listinfo/bug-lilypond