Re: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-164-g0d05ae7

Mike Gran Tue, 08 Sep 2009 21:17:17 -0700

On Wed, 2009-09-09 at 01:00 +0200, Ludovic Courtès wrote:
> Hello!
> 
> "Michael Gran" <spk...@yahoo.com> writes:
> 
> > http://git.savannah.gnu.org/cgit/guile.git/commit/?id=0d05ae7c4b1eddf6257f99f44eaf5cb7b11191be
> 
> [...]
> 
> > -  return scm_getc (input_port);
> > +  return scm_get_byte_or_eof (input_port);
> 
> This is actually an earlier change, but the prototype of scm_getc is now
> different from that in 1.8.  Presumably, this means that it’s not
> source-compatible with 1.8, e.g., on platforms where
> sizeof (int) < sizeof (scm_t_wchar), right?


The readline library can't handle UCS-4 codepoints, but, it is capable
of dealing with locale-encoded text.  So, it needs to have the raw bytes
of the locale-encoded characters, and scm_get_byte_or_eof returns the
raw bytes.

> 
> > --- a/libguile/strings.h
> > +++ b/libguile/strings.h
> > @@ -111,7 +111,7 @@ SCM_API SCM scm_substring_shared (SCM str, SCM start, 
> > SCM end);
> >  SCM_API SCM scm_substring_copy (SCM str, SCM start, SCM end);
> >  SCM_API SCM scm_string_append (SCM args);
> >  
> > -SCM_INTERNAL SCM scm_i_from_stringn (const char *str, size_t len, 
> > +SCM_API SCM scm_i_from_stringn (const char *str, size_t len, 
> >                                       const char *encoding,
> >                                       
> > scm_t_string_failed_conversion_handler 
> >                                       handler);
> > @@ -157,7 +157,7 @@ SCM_INTERNAL const scm_t_wchar *scm_i_string_wide_chars 
> > (SCM str);
> >  SCM_INTERNAL SCM scm_i_string_start_writing (SCM str);
> >  SCM_INTERNAL void scm_i_string_stop_writing (void);
> >  SCM_INTERNAL int scm_i_is_narrow_string (SCM str);
> > -SCM_INTERNAL scm_t_wchar scm_i_string_ref (SCM str, size_t x);
> > +SCM_API scm_t_wchar scm_i_string_ref (SCM str, size_t x);
> 
> Were these changes intended?

Well, one of the two of them was intended.  :)

> 
> > +  (with-locale "en_US.iso88591"
> > +    (pass-if-exception "no args" exception:wrong-num-args
> > +      (regexp-quote))
> 
> Is the locale part of the API?  That is, should programs that use
> regexps explicitly ask for a locale with 8-bit encoding?

Basically yes.  On Wed, 2009-09-09 at 01:00 +0200, Ludovic Courtès
wrote: 
> Hello!
> 
> "Michael Gran" <spk...@yahoo.com> writes:
> 
> > http://git.savannah.gnu.org/cgit/guile.git/commit/?id=0d05ae7c4b1eddf6257f99f44eaf5cb7b11191be
> 
> [...]
> 
> > -  return scm_getc (input_port);
> > +  return scm_get_byte_or_eof (input_port);
> 
> This is actually an earlier change, but the prototype of scm_getc is now
> different from that in 1.8.  Presumably, this means that it’s not
> source-compatible with 1.8, e.g., on platforms where
> sizeof (int) < sizeof (scm_t_wchar), right?

The readline library can't handle UCS-4 codepoints, but, it is capable
of dealing with locale-encoded text.  So, it needs to have the raw bytes
of the locale-encoded characters, and scm_get_byte_or_eof returns the
raw bytes instead of doing the processing necessary to make codepoints.

> 
> > --- a/libguile/strings.h
> > +++ b/libguile/strings.h
> > @@ -111,7 +111,7 @@ SCM_API SCM scm_substring_shared (SCM str, SCM start, 
> > SCM end);
> >  SCM_API SCM scm_substring_copy (SCM str, SCM start, SCM end);
> >  SCM_API SCM scm_string_append (SCM args);
> >  
> > -SCM_INTERNAL SCM scm_i_from_stringn (const char *str, size_t len, 
> > +SCM_API SCM scm_i_from_stringn (const char *str, size_t len, 
> >                                       const char *encoding,
> >                                       
> > scm_t_string_failed_conversion_handler 
> >                                       handler);
> > @@ -157,7 +157,7 @@ SCM_INTERNAL const scm_t_wchar *scm_i_string_wide_chars 
> > (SCM str);
> >  SCM_INTERNAL SCM scm_i_string_start_writing (SCM str);
> >  SCM_INTERNAL void scm_i_string_stop_writing (void);
> >  SCM_INTERNAL int scm_i_is_narrow_string (SCM str);
> > -SCM_INTERNAL scm_t_wchar scm_i_string_ref (SCM str, size_t x);
> > +SCM_API scm_t_wchar scm_i_string_ref (SCM str, size_t x);
> 
> Were these changes intended?

Well, one of the two of them was intended.  :)

> 
> > +  (with-locale "en_US.iso88591"
> > +    (pass-if-exception "no args" exception:wrong-num-args
> > +      (regexp-quote))
> 
> Is the locale part of the API?  That is, should programs that use
> regexps explicitly ask for a locale with 8-bit encoding?

Basically yes. The libc regex is 8-bit, and it uses
scm_to/from_locale_string to convert regex's input and output.

Until libunistring comes with Unicode regex, I think this is the best we
can do.

Thanks,

Mike

Re: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-164-g0d05ae7

Reply via email to