> From: Neil Jerram n...@ossau.uklinux.net > I'm afraid I don't understand the problem, on two counts. > > 1. The doc (in the manual) says that scm_to_locale_stringbuf doesn't > add a terminating \0. So presumably any \0s present must be padding. > > 2. The doc also says that if scm_to_locale_stringbuf's return value > is > max_len (as it would be in your case), the caller should call it > again with a larger buffer. >
Right now, the internal coding of strings is an unspecified 8-bit encoding, and is assumed to be compatible with the locale in which it is being run. So if I have a guile string with some 8-bit character that is between 128 and 255, it just gets passed through. If I request the contents of that string from C with scm_to_locale_string, it just returns the buffer of the scheme string. But, in future, scm_to_locale_string or scm_to_locale_stringbuf should actually do the proper conversion to the current locale so that wide characters are printed properly. So, if we move the internal representation of strings away from unspecified 8-bit data and toward something concrete, like ISO-8859-1 or UCS-4, and if a program is running in an environment where a locale that has a multibyte encoding like UTF-8, then the created locale string could have multi-byte characters. Consider a scheme string that is internally the single character "LATIN SMALL LETTER A WITH ACUTE", which is U+00E1. If the locale were some sort of UTF-8, like en_US.utf-8, this letter should become the two bytes 0xC3 and 0xA1 when converted to the locale. So what should happen in this case if I call scm_to_locale_stringbuf (str, buf, 1)? Note that here BUF can only contain 1 byte. Should the one byte 0xC3 be copied into it, which creates an illegal string? Or, should nothing be copied into it. In either case, there should be some mechanism in the API to provide information that an incomplete last character has occurred, because outputting just the one byte 0xC3 would cause problems somewhere down the road. So what I was saying was that in this case maybe the best thing to do would be to pad the output buffer with '\0' instead of putting in half of a multibyte character, and then signal that there is some padding at the end of the string. For instance, one could have a function scm_to_locale_stringbufn (SCM str, char *buf, size_t max_len, size_t *len_used) where LEN_USED is size of the buffer that was actually used. Sorry for the book-length explanation, Mike Gran