Re: Wide string strategies

Ludovic Courtès Thu, 09 Apr 2009 13:26:38 -0700

Hi!

Thank you for working on this!

Mike Gran <spk...@yahoo.com> writes:

> For now, I think a good strategy is to make strings into a pseudo-class
> where the internals are opaque to most of Guile and strings are accessed
> through accessors and other methods.  

Sounds reasonable.

> This was the strategy already begun with scm_to_locale_string but the
> code isn't fully committed to the idea.  The function scm_i_string_chars
> exposes the internal representation of the string, and it is used
> throughout the code.

Yes, that's going to be difficult, but I can't think of a better
solution.

> -  SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1,
> -                   SCM_ARG1, scm_i_symbol_chars (SCM_SNAME (proc)));
> +  {
> +    char *str = scm_to_locale_string (scm_symbol_to_string (SCM_SNAME 
> (proc)));
> +    SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, SCM_ARG1, str);
> +    free (str);
> +  }

This is the kind of thing we can't afford in most cases.

Here STR is only needed because `SCM_WTA_DISPATCH_1 ()' calls
`scm_wrong_type_arg ()', which operates on C strings.

One solution would be to change `scm_wrong_type_arg ()' to operate on
opaque strings (e.g., take an `SCM' instead of `const char *').  The
same applies to all the functions in "error.h", and probably many
others.

> -      if (len > 0 && (s[0] == '/' || s[0] == '\\'))
> +      if (len > 0 && (scm_i_string_ref_eq_char (filename, 0, '/') 
> +                   || scm_i_string_ref_eq_char (filename, 0, '\\')))

I think procedures like `scm_i_string_ref_eq_char ()' are a good idea
because it fulfills the goal of having an opaque string type *and* the
goal of being able to handle them easily in C.

> -  scm_ungets (scm_i_string_chars (str), scm_i_string_length (str), port);
> +  buf = scm_to_locale_stringn (str, &len);
> +  scm_ungets (buf, len, port);
> +  free (buf);

Eventually, we might need to change `scm_ungets ()', or provide a
variant that takes an opaque string.

> @@ -1549,12 +1552,16 @@ SCM_DEFINE (scm_recvfrom, "recvfrom!", 2, 3, 0,
>  
>    /* recvfrom will not necessarily return an address.  usually nothing
>       is returned for stream sockets.  */
> -  buf = scm_i_string_writable_chars (str);
> +  buf = scm_malloc (cend - offset);

All the POSIX interface needs fast access to ASCII strings.  How about
something like:

  const char *layout = scm_i_ascii_symbol_chars (SCM_PACK (slayout));

where `scm_i_ascii_symbol_chars ()' throws an exception if its argument
is a non-ASCII symbol?

This would mean special-casing ASCII stringbufs so that we can treat
them as C strings.

> +static const char *
> +make_failsafe_ascii_sz (const char *str, size_t len)
> +{
> +  static char buf[SCM_FAILSAFE_STRING_LEN];

Ouch, that would be bug-prone.

> +SCM_INTERNAL int scm_i_string_ref_eq_char (SCM str, size_t x, char c);
> +SCM_INTERNAL int scm_i_symbol_ref_eq_char (SCM str, size_t x, char c);
> +SCM_INTERNAL int scm_i_string_ref_neq_char (SCM str, size_t x, char c);
> +SCM_INTERNAL int scm_i_symbol_ref_neq_char (SCM str, size_t x, char c);

I'd remove the `neq' variants.

> +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, size_t x, int c);

Does it assume sizeof (int) >= 32 ?

> +SCM_INTERNAL size_t scm_i_string_contains_char (SCM str, char ch);

Since it really returns a boolean, I'd use `int' as the return type.

> +SCM_INTERNAL char *scm_i_string_to_write_sz (SCM str);
> +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str);
> +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *str);
> +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (SCM str);
> +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_sz (SCM str);

What does "sz" mean?

> +/* For ASCII strings, SUB can be used to represent an invalid
> +   character.  */
> +#define SCM_SUB ('\x1A')

Why SUB?  How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or
similar?

Thanks,
Ludo'.

Re: Wide string strategies

Reply via email to