On Sun, 2009-08-09 at 20:22 +0200, Ludovic Courtès wrote: > Hi Mike!
> I would feel more confident if the number of lines of tests added was > proportional to the number of new C lines of code. Do you think some > more tests could be added? Or maybe this will come at a later stage? I should probably add some tests now. They would have to use hex escapes, since the character encoding/decoding functionality is not ready. That's not a problem, but, it doesn't look as cool as using actual non-latin glyphs in the tests. > > > + else if (c == '\b') > > + { > > + SCM_DECCOL (port); > > + } > > Style! ;-) OK. The SCM_DECCOL et al macros are missing enclosing do/while statements, making them conflict with the if statement, so do/while statements will have to be added. > > > +SCM_API void scm_charprint (scm_t_uint32 c, SCM port); > > This ought to be internal, no? Could be. A couple of the types are given their own print functions: scm_intprint and an scm_uintprint. Most types don't have their own print functions. Are int and uint given special treatment because of their radix term? > > > #define STRINGBUF_F_SHARED 0x100 > > #define STRINGBUF_F_INLINE 0x200 > > +#define STRINGBUF_F_WIDE 0x400 > > Although other flags miss this, can you add a comment to the right > saying that this means UCS-4 encoding? OK. > > > +#if SCM_DEBUG > > + if (len < 1000) > > + lenhist[len]++; > > + else > > + lenhist[1000]++; > > +#endif > > I would use "#ifdef SCM_STRING_LENGTH_HISTOGRAM" for that. Conversely, > I'd make `%string-dump' and `%symbol-dump' always available (with > docstring and possibly manual entry). I like that idea. > > > + (scm_t_wchar) (unsigned char) STRINGBUF_INLINE_CHARS (buf)[i]; > > Is the double cast needed? Sort of. Unsigned char will successfully be implicitly cast to scm_t_wchar, so the scm_t_wchar term is just for clarity. The unsigned char term is definitely needed. Negative 8-bit chars are the upper half of the 8-bit charset (128 - 255). Casting them directly to scm_t_wchar may return 0xFFFFFF80 - 0xFFFFFFFF instead of 128-255. I don't have any problem removing the scm_t_wchar cast. Would you prefer that? > > > + if (scm_i_is_narrow_string (name)) > > "Narrow strings" are Latin-1, right? Right. > > > +SCM_DEFINE (scm_sys_string_dump, "%string-dump", 1, 0, 0, (SCM str), "") > > How about returning an alist with all this information instead of using > printf? OK > > +SCM_DEFINE (scm_string_width, "string-width", 1, 0, 0, > > + (SCM string), > > + "Return the bytes used to represent a character in > > @var{string}." > > + "This will return 1 or 4.") > > I was wondering whether we should expose as much of the internal > representation, but I think it's a good debugging aid and it can't hurt. > > I find the name slightly misleading, but I can't think of a better one. I put it in because that information needs to be available in the bytecode compiler. A slightly clearer name would probably be string-bytes-per-character, I suppose. > > +SCM_INTERNAL char *scm_to_stringn (SCM str, size_t *lenp, > > + const char *encoding, > > + enum iconv_ilseq_handler handler); > > I suppose this would eventually become public. What do you think? > Should we use a different type for HANDLER before that happens? The simplest thing would be to make some constants like scm_c_define ("STRING_ESCAPE", scm_from_int(iconveh_escape_sequence)) Something similar is done in the scm_seek function's constants, such as SEEK_CUR. > > > +SCM_API const scm_t_wchar *scm_i_string_wide_chars (SCM str); > > Should be SCM_INTERNAL. OK > > Thank you! Thanks for taking the time to check it out. > > Ludo'. Mike