Hi! Thank you for working on this!
Mike Gran <spk...@yahoo.com> writes: > For now, I think a good strategy is to make strings into a pseudo-class > where the internals are opaque to most of Guile and strings are accessed > through accessors and other methods. Sounds reasonable. > This was the strategy already begun with scm_to_locale_string but the > code isn't fully committed to the idea. The function scm_i_string_chars > exposes the internal representation of the string, and it is used > throughout the code. Yes, that's going to be difficult, but I can't think of a better solution. > - SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, > - SCM_ARG1, scm_i_symbol_chars (SCM_SNAME (proc))); > + { > + char *str = scm_to_locale_string (scm_symbol_to_string (SCM_SNAME > (proc))); > + SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, SCM_ARG1, str); > + free (str); > + } This is the kind of thing we can't afford in most cases. Here STR is only needed because `SCM_WTA_DISPATCH_1 ()' calls `scm_wrong_type_arg ()', which operates on C strings. One solution would be to change `scm_wrong_type_arg ()' to operate on opaque strings (e.g., take an `SCM' instead of `const char *'). The same applies to all the functions in "error.h", and probably many others. > - if (len > 0 && (s[0] == '/' || s[0] == '\\')) > + if (len > 0 && (scm_i_string_ref_eq_char (filename, 0, '/') > + || scm_i_string_ref_eq_char (filename, 0, '\\'))) I think procedures like `scm_i_string_ref_eq_char ()' are a good idea because it fulfills the goal of having an opaque string type *and* the goal of being able to handle them easily in C. > - scm_ungets (scm_i_string_chars (str), scm_i_string_length (str), port); > + buf = scm_to_locale_stringn (str, &len); > + scm_ungets (buf, len, port); > + free (buf); Eventually, we might need to change `scm_ungets ()', or provide a variant that takes an opaque string. > @@ -1549,12 +1552,16 @@ SCM_DEFINE (scm_recvfrom, "recvfrom!", 2, 3, 0, > > /* recvfrom will not necessarily return an address. usually nothing > is returned for stream sockets. */ > - buf = scm_i_string_writable_chars (str); > + buf = scm_malloc (cend - offset); All the POSIX interface needs fast access to ASCII strings. How about something like: const char *layout = scm_i_ascii_symbol_chars (SCM_PACK (slayout)); where `scm_i_ascii_symbol_chars ()' throws an exception if its argument is a non-ASCII symbol? This would mean special-casing ASCII stringbufs so that we can treat them as C strings. > +static const char * > +make_failsafe_ascii_sz (const char *str, size_t len) > +{ > + static char buf[SCM_FAILSAFE_STRING_LEN]; Ouch, that would be bug-prone. > +SCM_INTERNAL int scm_i_string_ref_eq_char (SCM str, size_t x, char c); > +SCM_INTERNAL int scm_i_symbol_ref_eq_char (SCM str, size_t x, char c); > +SCM_INTERNAL int scm_i_string_ref_neq_char (SCM str, size_t x, char c); > +SCM_INTERNAL int scm_i_symbol_ref_neq_char (SCM str, size_t x, char c); I'd remove the `neq' variants. > +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, size_t x, int c); Does it assume sizeof (int) >= 32 ? > +SCM_INTERNAL size_t scm_i_string_contains_char (SCM str, char ch); Since it really returns a boolean, I'd use `int' as the return type. > +SCM_INTERNAL char *scm_i_string_to_write_sz (SCM str); > +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str); > +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *str); > +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (SCM str); > +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_sz (SCM str); What does "sz" mean? > +/* For ASCII strings, SUB can be used to represent an invalid > + character. */ > +#define SCM_SUB ('\x1A') Why SUB? How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or similar? Thanks, Ludo'.