On Thu, 17 Nov 2016 22:24:27 +0100 David Kastrup <d...@gnu.org> wrote:
> Antonio Ospite <a...@ao2.it> writes: > > > --------------------------------------------------------------------- > > (process:18706): Pango-WARNING **: Invalid UTF-8 string passed to > > pango_layout_set_text() > > --------------------------------------------------------------------- > > > > and in the final files only a part of the "büüh" string was rendered, > > however the "ü" was rendered correctly. > > > > So I added a printout to see what was going on: > > > > --------------------------------------------------------------------- > > diff --git a/lily/lily-guile.cc b/lily/lily-guile.cc > > index 2c519ec..9c0c10c 100644 > > --- a/lily/lily-guile.cc > > +++ b/lily/lily-guile.cc > > @@ -132,6 +132,7 @@ ly_scm2string (SCM str) > > result.resize (len); > > scm_to_locale_stringbuf (str, &result.at (0), len); > > } > > + fprintf(stderr, "%s: len: %d result: '%s'\n", __func__, len, > > result.c_str()); > > return result; > > } > > --------------------------------------------------------------------- > > > > with guile-1.8: > > --------------------------------------------------------------------- > > ly_scm2string: len: 6 result: 'büüh' > > --------------------------------------------------------------------- > > > > with guile-2.0: > > --------------------------------------------------------------------- > > ly_scm2string: len: 4 result: 'bü�' > > > > (process:18706): Pango-WARNING **: Invalid UTF-8 string passed to > > pango_layout_set_text() > > --------------------------------------------------------------------- > > > > In ly_scm2string() I see that scm_c_string_length() is used, by looking > > at the documentation > > (https://www.gnu.org/software/guile/manual/html_node/String-Selection.html#String-Selection) > > I read: > > > > Return the number of characters in string. > > > > So 4 characters looks correct to me, even if they take 6 bytes. > > > > IMHO it can be safer not to mix scm_c_string_length() and > > scm_to_locale_stringbuf(). > > I've just done a git grep of ly_scm2string and even if you fix that bug, > most uses of it should _not_ use the current locale. So obviously > ly_scm2string needs to get split into several different functions. The > current locale should only be used for writing to the _console_. > Possibly also for writing to the log file. For everything else, > LilyPond is likely utf-8 (or Latin-1 for efficiency reasons when > LilyPond _knows_ that only the common ASCII subset of utf-8 and Latin-1 > is being used). > That makes sense of course; having the current locale affecting how the input and output files are treated seemed a little weird to me. I don't think I can put time into it, tho. However, if you confirm that my patch above is valid (even if not complete) I'll start submitting that, which already improves the situation with guile-2.0. A self-contained patch is here: https://ao2.it/tmp/lilypond-guile2/patches_2016-11-19/0010-Fix-converting-SCM-strings-with-wide-characters-to-s.patch Thanks, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing? _______________________________________________ lilypond-devel mailing list lilypond-devel@gnu.org https://lists.gnu.org/mailman/listinfo/lilypond-devel