Re: [PATCH] Enable utf8->string to take a range

2022-03-09 Thread Vijay Marupudi
Maxime Devos writes: > Nevermind, seems like a misinterpreded a comment and #vu8(97 0 98) is > valid UTF-8 after all, it's just not possible to encode it as a zero- > terminated string. Thanks for the catch on the typo in the docstrings. I've attached the updated versions of the patches that fix

Re: [PATCH] Enable utf8->string to take a range

2022-03-09 Thread Maxime Devos
Maxime Devos schreef op wo 09-03-2022 om 14:27 [+0100]: > That's not quite correct, seems like Guile uses another encoding, but > still. Nevermind, seems like a misinterpreded a comment and #vu8(97 0 98) is valid UTF-8 after all, it's just not possible to encode it as a zero- terminated string.

Re: [PATCH] Enable utf8->string to take a range

2022-03-09 Thread Maxime Devos
Maxime Devos schreef op wo 09-03-2022 om 14:24 [+0100]: > This is incorrect, since the nul character is encoded even though > UTF- > proper does not allow encoding the nul character -- UTF-8 with an > encoding of the nul character is sometimes called ‘modified UTF-8’. That's not quite correct, see

Re: [PATCH] Enable utf8->string to take a range

2022-03-09 Thread Maxime Devos
Vijay Marupudi schreef op vr 21-01-2022 om 20:21 [-0500]: > +SCM_DEFINE (scm_utf8_range_to_string, "utf8->string", > +    1, 2, 0, > +    (SCM utf, SCM start, SCM end), > +    "Return a newly allocate string that contains from the > UTF-8-" > +    "encoded contents o

Re: [PATCH] Enable utf8->string to take a range

2022-03-09 Thread Maxime Devos
Vijay Marupudi schreef op vr 21-01-2022 om 20:21 [-0500]: > +SCM_DEFINE (scm_utf8_range_to_string, "utf8->string", > +    1, 2, 0, > +    (SCM utf, SCM start, SCM end), > +    "Return a newly allocate string that contains from the > UTF-8-" > +    "encoded contents o

Re: [PATCH] Enable utf8->string to take a range

2022-03-09 Thread Maxime Devos
Vijay Marupudi schreef op vr 21-01-2022 om 20:21 [-0500]: > +SCM_DEFINE (scm_utf16_range_to_string, "utf16->string", > +    1, 3, 0, > +    (SCM utf, SCM endianness, SCM start, SCM end), > +    "Return a newly allocate string that contains from the > UTF-8-" > +    "

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Vijay Marupudi
> It would be nice to check multibyte characters as well, > to verify that byte indices and not character indices are used. > > E.g., (utf8->string #vu8(195 169) 0 2) should return "é". > > Another nice test: (utf8->string #vu8(195 169) 0 1) should raise > a 'decoding-error', even though #vu8(195 1

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Maxime Devos
Vijay Marupudi schreef op vr 21-01-2022 om 15:20 [-0500]: +  (pass-if-exception "utf8->string range: end < start" +  exception:out-of-range +  (let* ((utf8 (string->utf8 "gnu guile"))) +    (utf8->string utf8 1 0))) + [other tests] It would be nice to check multibyte characters as wel

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Vijay Marupudi
> There seems to be an inconsistency here. Can (c_start >= c_len) be > relaxed to c_start > c_len? Done. `substring' was a useful reference. > It would be nice to document if it's an open, closed or half- > open/closed range. E.g. see the documentation of 'substring': Done. > It seems a bit

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Maxime Devos
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]: > +  c_start = scm_to_size_t (start); This seems suboptimal because if start > SIZE_MAX, then this will throw an 'out-of-range' exception without attributing it to 'utf8->string' (untested). Greetings, Maxime. signature.asc Descripti

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Maxime Devos
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]: > +@deffn {Scheme Procedure} utf8->string utf [start [end]] >  @deffnx {Scheme Procedure} utf16->string utf [endianness] >  @deffnx {Scheme Procedure} utf32->string utf [endianness] >  @deffnx {C Function} scm_utf8_to_string (utf) > +@deffnx

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Maxime Devos
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]: > +  c_start = scm_to_size_t (start); > +  if (SCM_UNLIKELY (c_start >= c_len)) > +    { > +  scm_out_of_range (FUNC_NAME, start); > +    } > + > +  if (!scm_is_eq (end, SCM_UNDEFINED)) > + { > +   c_end =

Re: [PATCH] Enable utf8->string to take a range

2022-01-21 Thread Maxime Devos
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]: > --- a/libguile/bytevectors.c > +++ b/libguile/bytevectors.c > [...] Boundary conditions can be tricky, I would recommend writing some tests. Greetings, Maxime. signature.asc Description: This is a digitally signed message part