Re: Lilypond's error column printer confuses bytes and characters

Patrick McCarty Mon, 26 Oct 2009 09:52:25 -0700

On 2009-10-22, David Kastrup wrote:
> Patrick McCarty <pnor...@gmail.com> writes:
> 
> > On 2009-10-18, David Kastrup wrote:
> >> 
> >> GNU LilyPond 2.13.4
> >> Processing `bad.ly'
> >> Parsing...
> >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
> >>      MÃÃÃ A\342\231
> >>                 \257 Bâ \break
> >> error: failed files: "bad.ly"
> >> 
> >> Apparently, the error column is being tracked by counting characters,
> >> but is displayed by counting bytes.  The indicator appears too early
> >> because of that (which caused me to look for the wrong bug in an input
> >> file of mine).
> >
> > This patch seems to correct the issue, but I don't know if it's the
> > correct fix (or if there are any side effects I'm unaware of).
> 
> The code before states:
> 
>   while (left > 0)
>     {
>       /*
>       FIXME, this is apparently locale dependent.
>       */
> #if HAVE_MBRTOWC
>       wchar_t multibyte[2];
>       size_t thislen = mbrtowc (multibyte, line_chars, left, &state);
> #else
>       size_t thislen = 1;
> #endif /* !HAVE_MBRTOWC */
> 
> The question is what we do about locales.  I think that in this case
> behavior is arguably correct since we are talking about column numbers
> on the terminal/locale, and even when Lilypond is using utf-8, those
> will correspond with the interpretation of the locale.


Sorry about the delay.  The output looks okay to me when invoking
xterm with various locales.

Also, the point-and-click functionality still seems to work correctly,
so this *might* fix the problem Harmath reported a few weeks ago:

http://lists.gnu.org/archive/html/bug-lilypond/2009-10/msg00001.html

> By the way: when I switch into POSIX locale, the error message will
> occur before the first Umlaut which is then no longer considered text
> apparently.  So we already have some built-in locale dependencies
> elsewhere.

Yes, I'm pretty sure this is coming from glibc.

After stepping through Source_file::get_counts() when LC_ALL=POSIX, I
noticed that mbrtowc() returned -1 (type size_t) when it processed the
ä.  As a result, this condition prevents the consideration of more
characters:

      /* Stop converting at invalid character;
         this can mean we have read just the first part
         of a valid character.  */
      if (thislen == (size_t) -1)
        break;


It seems that non-ASCII characters are not valid characters when the
locale is POSIX.  The glibc docs aren't very clear on this point, and
only mention the fact that mbrtowc() is locale-dependent.

BTW, as the comment states, it would be nice to use a function that is
not locale-dependent, since the only information we need is the size
(in bytes) of the current UTF-8 character.

> My vote is on getting it merged, but it probably would do no harm if
> somebody checked this on Windows where the old version purportedly
> worked.

I'll apply it and make a note to check the next devel release on
Windows.


Thanks,
Patrick


_______________________________________________
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond

Re: Lilypond's error column printer confuses bytes and characters

Reply via email to