On 2009-10-22, David Kastrup wrote: > Patrick McCarty <pnor...@gmail.com> writes: > > > On 2009-10-18, David Kastrup wrote: > >> > >> GNU LilyPond 2.13.4 > >> Processing `bad.ly' > >> Parsing... > >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER > >> MÃÃÃ A\342\231 > >> \257 Bâ \break > >> error: failed files: "bad.ly" > >> > >> Apparently, the error column is being tracked by counting characters, > >> but is displayed by counting bytes. The indicator appears too early > >> because of that (which caused me to look for the wrong bug in an input > >> file of mine). > > > > This patch seems to correct the issue, but I don't know if it's the > > correct fix (or if there are any side effects I'm unaware of). > > The code before states: > > while (left > 0) > { > /* > FIXME, this is apparently locale dependent. > */ > #if HAVE_MBRTOWC > wchar_t multibyte[2]; > size_t thislen = mbrtowc (multibyte, line_chars, left, &state); > #else > size_t thislen = 1; > #endif /* !HAVE_MBRTOWC */ > > The question is what we do about locales. I think that in this case > behavior is arguably correct since we are talking about column numbers > on the terminal/locale, and even when Lilypond is using utf-8, those > will correspond with the interpretation of the locale.
Sorry about the delay. The output looks okay to me when invoking xterm with various locales. Also, the point-and-click functionality still seems to work correctly, so this *might* fix the problem Harmath reported a few weeks ago: http://lists.gnu.org/archive/html/bug-lilypond/2009-10/msg00001.html > By the way: when I switch into POSIX locale, the error message will > occur before the first Umlaut which is then no longer considered text > apparently. So we already have some built-in locale dependencies > elsewhere. Yes, I'm pretty sure this is coming from glibc. After stepping through Source_file::get_counts() when LC_ALL=POSIX, I noticed that mbrtowc() returned -1 (type size_t) when it processed the ä. As a result, this condition prevents the consideration of more characters: /* Stop converting at invalid character; this can mean we have read just the first part of a valid character. */ if (thislen == (size_t) -1) break; It seems that non-ASCII characters are not valid characters when the locale is POSIX. The glibc docs aren't very clear on this point, and only mention the fact that mbrtowc() is locale-dependent. BTW, as the comment states, it would be nice to use a function that is not locale-dependent, since the only information we need is the size (in bytes) of the current UTF-8 character. > My vote is on getting it merged, but it probably would do no harm if > somebody checked this on Windows where the old version purportedly > worked. I'll apply it and make a note to check the next devel release on Windows. Thanks, Patrick _______________________________________________ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond