Davide Liessi <davide.lie...@gmail.com> writes: > Il giorno dom 20 mag 2018 alle ore 18:35 Davide Liessi > <davide.lie...@gmail.com> ha scritto: >> The file >> >> \version "2.19.81" >> \header { title = "č" } >> { b1 } >> >> results in a PDF with correct printed title (lowercase c with caron) >> but wrong title field in metadata (Ċ, i.e. uppercase c with dot >> above). > > On Sun, 20 May 2018 20:52:58 +0200 David Kastrup wrote: >> Ghostscript bug when converting PostScript output to PDF. The >> PostScript reads (pasted from less' display) >> >> mark /Creator (LilyPond 2.21.0) >> /Title (<FE><FF>^A^M) >> /DOCINFO pdfmark >> >> which is the correct UTF16-LE string with BOM. GhostScript however >> converts the ^M (0x0d) into ^J (0x0a), basically converting an ASCII CR >> to an ASCII LF. Unfortunately, we are not in the middle of ASCII here. > > Actually, it turns out that the behaviour of GhostScript is not wrong > and this is probably a bug in how LilyPond produces the PostScript > file. > > PostScript strings must either properly escape non-ASCII or ASCII > non-printable bytes, e.g., as \ddd with ddd the octal representation, > or they must be defined as a hexadecimal string (see [1], pages > 29–31).
Uh WHAT? To quote: The \ddd form may be used to include any 8-bit character constant in a string. One, two, or three octal digits may be specified, with high-order overflow ignored. This notation is preferred for specifying a character outside the recommended ASCII character set for the PostScript language, since the notation itself stays within the standard set and thereby avoids possible difficulties in transmitting or storing the text of the program. It is recommended that three octal digits always be used, with leading zeros as needed, to prevent ambiguity. The string (\0053) , for example, contains two characters—an ASCII 5 (Control-E) followed by the digit 3—whereas the strings (\53) and (\053) contain one character, the ASCII character whose code is octal 53 (plus sign). Recommended/preferred is not at all equivalent to "must". However, one problem indeed is that strings as such have no notion of encoding and CR, LF, CRLF are all equivalent. So at least those bytes, when they occur as part of UTF-16, would warrant escaping. -- David Kastrup _______________________________________________ bug-lilypond mailing list bug-lilypond@gnu.org https://lists.gnu.org/mailman/listinfo/bug-lilypond