Mark H Weaver <m...@netris.org> writes: > David Kastrup <d...@gnu.org> writes: > >> In 2.0.9, the following patch/code for getting what amounts to a binary >> string port worked. >> >> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16 >> Author: David Kastrup <d...@gnu.org> >> Date: Sun Sep 21 18:40:06 2014 +0200 >> >> Source_file::init_port: Keep GUILEv2 from redecoding string input >> >> diff --git a/lily/source-file.cc b/lily/source-file.cc >> index 1118b9d..75ed0d9 100644 >> --- a/lily/source-file.cc >> +++ b/lily/source-file.cc >> @@ -152,7 +152,11 @@ Source_file::init_port () >> // we do our own utf8 encoding and verification in the parser, so we >> // use the no-conversion equivalent of latin1 >> SCM str = scm_from_latin1_string (c_str ()); >> - str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, >> __FUNCTION__); >> + scm_dynwind_begin ((scm_t_dynwind_flags)0); >> + // Why doesn't scm_set_port_encoding_x work here? >> + scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), >> SCM_BOOL_F); >> + str_port_ = scm_open_input_string (str); >> + scm_dynwind_end (); >> scm_set_port_filename_x (str_port_, ly_string2scm (name_)); >> } > > This hack of giving Guile a buffer containing UTF-8, but claiming that > it is Latin-1, is not good. It will cause Guile to see non-ASCII > characters as garbage.
For one thing we are talking about an external file here that is mainly parsed by LilyPond. LilyPond provides sensible pinpointing of UTF-8 encoding errors, something which GUILE cannot do with its UTF-8 representation since it has no transparent or reproducible representation of bad bytes. Emacs uses overlong encodings for 0-127 to represent badly encoded bytes (which includes any overlong sequences) in the range 128-255, making 128-255 encode as patterns 0xc0 0x80 to 0xc1 0xbf. Since this leads to a reproducible encoding, one always has the information required for resynchronization even in the case of encoding errors. For another, synchronization of GUILE and LilyPond parsers requires that both can make use of byte offsets for positioning. GUILE's mandatory recoding on opening the port does not provide that. > However, if you insist on doing this, I would > suggest using a bytevector input port instead, like this: (untested) > > char *buf = c_str (); > SCM bv = scm_c_make_bytevector (strlen (buf) + 1); > strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf); > str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED); dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port v2.0.11 dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port origin/stable-2.0 dak@lola:/usr/local/tmp/guile$ The idea would seem nice, but we are still talking about GUILE 2.0.11 here. "It is not good" for a facility that, unpretty as it may seem, was changed _within_ a stable version series without functionally equivalent replacement is not helpful. The whole point of a stable release series is to provide dependable functionality. Any changes based on the "we don't want people to use that since it is not nice" rationale should happen between stable release series. The way it looks, we'll have to use one mechanism for version 2.0.5 to 2.0.9, have to find out whether to reject 2.0.10, have to reject 2.0.11 and pray for 2.0.12 to provide scm_open_byte_vector_input_port. And depending on whether the dynamic library versions have been bumped, we might have to do this at runtime. -- David Kastrup