bug#20109: Incompatible API change in 2.0 series for string port encoding

David Kastrup Tue, 17 Mar 2015 01:40:26 -0700

Mark H Weaver <[email protected]> writes:

> David Kastrup <[email protected]> writes:
>
>> In 2.0.9, the following patch/code for getting what amounts to a binary
>> string port worked.
>>
>> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
>> Author: David Kastrup <[email protected]>
>> Date:   Sun Sep 21 18:40:06 2014 +0200
>>
>>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>>
>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>> index 1118b9d..75ed0d9 100644
>> --- a/lily/source-file.cc
>> +++ b/lily/source-file.cc
>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>>    // we do our own utf8 encoding and verification in the parser, so we
>>    // use the no-conversion equivalent of latin1
>>    SCM str = scm_from_latin1_string (c_str ());
>> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, 
>> __FUNCTION__);
>> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
>> +  // Why doesn't scm_set_port_encoding_x work here?
>> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), 
>> SCM_BOOL_F);
>> +  str_port_ = scm_open_input_string (str);
>> +  scm_dynwind_end ();
>>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>>  }
>
> This hack of giving Guile a buffer containing UTF-8, but claiming that
> it is Latin-1, is not good.  It will cause Guile to see non-ASCII
> characters as garbage.


For one thing we are talking about an external file here that is mainly
parsed by LilyPond.  LilyPond provides sensible pinpointing of UTF-8
encoding errors, something which GUILE cannot do with its UTF-8
representation since it has no transparent or reproducible
representation of bad bytes.  Emacs uses overlong encodings for 0-127 to
represent badly encoded bytes (which includes any overlong sequences) in
the range 128-255, making 128-255 encode as patterns 0xc0 0x80 to 0xc1
0xbf.  Since this leads to a reproducible encoding, one always has the
information required for resynchronization even in the case of encoding
errors.

For another, synchronization of GUILE and LilyPond parsers requires that
both can make use of byte offsets for positioning.  GUILE's mandatory
recoding on opening the port does not provide that.

> However, if you insist on doing this, I would
> suggest using a bytevector input port instead, like this: (untested)
>
>   char *buf = c_str ();
>   SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
>   strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
>   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port v2.0.11
dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port 
origin/stable-2.0 
dak@lola:/usr/local/tmp/guile$ 

The idea would seem nice, but we are still talking about GUILE 2.0.11
here.  "It is not good" for a facility that, unpretty as it may seem,
was changed _within_ a stable version series without functionally
equivalent replacement is not helpful.

The whole point of a stable release series is to provide dependable
functionality.  Any changes based on the "we don't want people to use
that since it is not nice" rationale should happen between stable
release series.

The way it looks, we'll have to use one mechanism for version 2.0.5 to
2.0.9, have to find out whether to reject 2.0.10, have to reject 2.0.11
and pray for 2.0.12 to provide scm_open_byte_vector_input_port.

And depending on whether the dynamic library versions have been bumped,
we might have to do this at runtime.

-- 
David Kastrup

bug#20109: Incompatible API change in 2.0 series for string port encoding

Reply via email to