Maxime Devos writes:
> Nevermind, seems like a misinterpreded a comment and #vu8(97 0 98) is
> valid UTF-8 after all, it's just not possible to encode it as a zero-
> terminated string.
Thanks for the catch on the typo in the docstrings. I've attached the
updated versions of the patches that fix
Maxime Devos schreef op wo 09-03-2022 om 14:27 [+0100]:
> That's not quite correct, seems like Guile uses another encoding, but
> still.
Nevermind, seems like a misinterpreded a comment and #vu8(97 0 98) is
valid UTF-8 after all, it's just not possible to encode it as a zero-
terminated string.
Maxime Devos schreef op wo 09-03-2022 om 14:24 [+0100]:
> This is incorrect, since the nul character is encoded even though
> UTF-
> proper does not allow encoding the nul character -- UTF-8 with an
> encoding of the nul character is sometimes called ‘modified UTF-8’.
That's not quite correct, see
Vijay Marupudi schreef op vr 21-01-2022 om 20:21 [-0500]:
> +SCM_DEFINE (scm_utf8_range_to_string, "utf8->string",
> + 1, 2, 0,
> + (SCM utf, SCM start, SCM end),
> + "Return a newly allocate string that contains from the
> UTF-8-"
> + "encoded contents o
Vijay Marupudi schreef op vr 21-01-2022 om 20:21 [-0500]:
> +SCM_DEFINE (scm_utf8_range_to_string, "utf8->string",
> + 1, 2, 0,
> + (SCM utf, SCM start, SCM end),
> + "Return a newly allocate string that contains from the
> UTF-8-"
> + "encoded contents o
Vijay Marupudi schreef op vr 21-01-2022 om 20:21 [-0500]:
> +SCM_DEFINE (scm_utf16_range_to_string, "utf16->string",
> + 1, 3, 0,
> + (SCM utf, SCM endianness, SCM start, SCM end),
> + "Return a newly allocate string that contains from the
> UTF-8-"
> + "
> It would be nice to check multibyte characters as well,
> to verify that byte indices and not character indices are used.
>
> E.g., (utf8->string #vu8(195 169) 0 2) should return "é".
>
> Another nice test: (utf8->string #vu8(195 169) 0 1) should raise
> a 'decoding-error', even though #vu8(195 1
Vijay Marupudi schreef op vr 21-01-2022 om 15:20 [-0500]:
+ (pass-if-exception "utf8->string range: end < start"
+ exception:out-of-range
+ (let* ((utf8 (string->utf8 "gnu guile")))
+ (utf8->string utf8 1 0)))
+ [other tests]
It would be nice to check multibyte characters as wel
> There seems to be an inconsistency here. Can (c_start >= c_len) be
> relaxed to c_start > c_len?
Done. `substring' was a useful reference.
> It would be nice to document if it's an open, closed or half-
> open/closed range. E.g. see the documentation of 'substring':
Done.
> It seems a bit
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]:
> + c_start = scm_to_size_t (start);
This seems suboptimal because if start > SIZE_MAX,
then this will throw an 'out-of-range' exception without attributing
it to 'utf8->string' (untested).
Greetings,
Maxime.
signature.asc
Descripti
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]:
> +@deffn {Scheme Procedure} utf8->string utf [start [end]]
> @deffnx {Scheme Procedure} utf16->string utf [endianness]
> @deffnx {Scheme Procedure} utf32->string utf [endianness]
> @deffnx {C Function} scm_utf8_to_string (utf)
> +@deffnx
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]:
> + c_start = scm_to_size_t (start);
> + if (SCM_UNLIKELY (c_start >= c_len))
> + {
> + scm_out_of_range (FUNC_NAME, start);
> + }
> +
> + if (!scm_is_eq (end, SCM_UNDEFINED))
> + {
> + c_end =
Vijay Marupudi schreef op do 20-01-2022 om 22:23 [-0500]:
> --- a/libguile/bytevectors.c
> +++ b/libguile/bytevectors.c
> [...]
Boundary conditions can be tricky, I would recommend writing some
tests.
Greetings,
Maxime.
signature.asc
Description: This is a digitally signed message part
13 matches
Mail list logo