On Wed, 2009-09-09 at 01:00 +0200, Ludovic Courtès wrote: > Hello! > > "Michael Gran" <spk...@yahoo.com> writes: > > > http://git.savannah.gnu.org/cgit/guile.git/commit/?id=0d05ae7c4b1eddf6257f99f44eaf5cb7b11191be > > [...] > > > - return scm_getc (input_port); > > + return scm_get_byte_or_eof (input_port); > > This is actually an earlier change, but the prototype of scm_getc is now > different from that in 1.8. Presumably, this means that it’s not > source-compatible with 1.8, e.g., on platforms where > sizeof (int) < sizeof (scm_t_wchar), right?
The readline library can't handle UCS-4 codepoints, but, it is capable of dealing with locale-encoded text. So, it needs to have the raw bytes of the locale-encoded characters, and scm_get_byte_or_eof returns the raw bytes. > > > --- a/libguile/strings.h > > +++ b/libguile/strings.h > > @@ -111,7 +111,7 @@ SCM_API SCM scm_substring_shared (SCM str, SCM start, > > SCM end); > > SCM_API SCM scm_substring_copy (SCM str, SCM start, SCM end); > > SCM_API SCM scm_string_append (SCM args); > > > > -SCM_INTERNAL SCM scm_i_from_stringn (const char *str, size_t len, > > +SCM_API SCM scm_i_from_stringn (const char *str, size_t len, > > const char *encoding, > > > > scm_t_string_failed_conversion_handler > > handler); > > @@ -157,7 +157,7 @@ SCM_INTERNAL const scm_t_wchar *scm_i_string_wide_chars > > (SCM str); > > SCM_INTERNAL SCM scm_i_string_start_writing (SCM str); > > SCM_INTERNAL void scm_i_string_stop_writing (void); > > SCM_INTERNAL int scm_i_is_narrow_string (SCM str); > > -SCM_INTERNAL scm_t_wchar scm_i_string_ref (SCM str, size_t x); > > +SCM_API scm_t_wchar scm_i_string_ref (SCM str, size_t x); > > Were these changes intended? Well, one of the two of them was intended. :) > > > + (with-locale "en_US.iso88591" > > + (pass-if-exception "no args" exception:wrong-num-args > > + (regexp-quote)) > > Is the locale part of the API? That is, should programs that use > regexps explicitly ask for a locale with 8-bit encoding? Basically yes. On Wed, 2009-09-09 at 01:00 +0200, Ludovic Courtès wrote: > Hello! > > "Michael Gran" <spk...@yahoo.com> writes: > > > http://git.savannah.gnu.org/cgit/guile.git/commit/?id=0d05ae7c4b1eddf6257f99f44eaf5cb7b11191be > > [...] > > > - return scm_getc (input_port); > > + return scm_get_byte_or_eof (input_port); > > This is actually an earlier change, but the prototype of scm_getc is now > different from that in 1.8. Presumably, this means that it’s not > source-compatible with 1.8, e.g., on platforms where > sizeof (int) < sizeof (scm_t_wchar), right? The readline library can't handle UCS-4 codepoints, but, it is capable of dealing with locale-encoded text. So, it needs to have the raw bytes of the locale-encoded characters, and scm_get_byte_or_eof returns the raw bytes instead of doing the processing necessary to make codepoints. > > > --- a/libguile/strings.h > > +++ b/libguile/strings.h > > @@ -111,7 +111,7 @@ SCM_API SCM scm_substring_shared (SCM str, SCM start, > > SCM end); > > SCM_API SCM scm_substring_copy (SCM str, SCM start, SCM end); > > SCM_API SCM scm_string_append (SCM args); > > > > -SCM_INTERNAL SCM scm_i_from_stringn (const char *str, size_t len, > > +SCM_API SCM scm_i_from_stringn (const char *str, size_t len, > > const char *encoding, > > > > scm_t_string_failed_conversion_handler > > handler); > > @@ -157,7 +157,7 @@ SCM_INTERNAL const scm_t_wchar *scm_i_string_wide_chars > > (SCM str); > > SCM_INTERNAL SCM scm_i_string_start_writing (SCM str); > > SCM_INTERNAL void scm_i_string_stop_writing (void); > > SCM_INTERNAL int scm_i_is_narrow_string (SCM str); > > -SCM_INTERNAL scm_t_wchar scm_i_string_ref (SCM str, size_t x); > > +SCM_API scm_t_wchar scm_i_string_ref (SCM str, size_t x); > > Were these changes intended? Well, one of the two of them was intended. :) > > > + (with-locale "en_US.iso88591" > > + (pass-if-exception "no args" exception:wrong-num-args > > + (regexp-quote)) > > Is the locale part of the API? That is, should programs that use > regexps explicitly ask for a locale with 8-bit encoding? Basically yes. The libc regex is 8-bit, and it uses scm_to/from_locale_string to convert regex's input and output. Until libunistring comes with Unicode regex, I think this is the best we can do. Thanks, Mike