On Thu, 28 May 2020 01:28:49 +0200
Mattias Andrée <[email protected]> wrote:

Dear Mattias,

> I missed something, that I will fix later, but there are three
> options of what to do:
> 
> grapheme_len() assumes cp_decode() returns 0 at the end of the
> string, whereas this change will return 1 (it is counter-intuitive
> that an UTF-8 decode would say that the NUL character is 0 bytes
> longs as it is indeed a character, and one which you may want to
> support (I did when I rewrote the decoder for another project)).
> 
> grapheme_len.3 is sparse on details, but in the example it checks for
> the NUL byte before calling grapheme_len() rather than checking if
> grapheme_len() returns 0 (started at the NUL byte).
> 
> So option 1 is to make cp_decode return 0 on NUL. I really don't like
> this option as it eliminates the support for the NUL character, and
> you may want to return NUL characters without any special handling.
> And it doesn't simply anything for the user even if he wanted to and
> at NUL.
> 
> Option 2 as option 1 but also change the man page check if
> grapheme_len() returns 0. This is a little butter as it would
> document this feature.
> 
> Option 3 is change grapheme_len() to support the NUL character. I
> think this is the best option as it would add support NUL character
> without the user doing anything special, and it barely complicates
> things. The change needed in grapheme_len() would be to compare `cp0`
> instead of `ret` against 0, after running `len += ret`, in the first
> call to `cp_decode`. This handling of NUL in grapheme_len() would
> still be needed to ensure that it does not read outside the string.

thanks a lot for putting in your time to improve the decoder. I
actually had this as a big TODO for years and was a reason why I have
not tagged a release for libgrapheme yet.

I pushed two commits regarding the decoder and went exactly with what
you suggested: Going with handling null-bytes in grapheme_len() and
shaping the interface of the grapheme_cp_decode()-function such that it
can be used to "pick up" the null byte when it appears.

This obviously needs better documentation, but I'll wait with that
until all interfaces have been improved based on your previous feedback
and as we discussed earlier.

With best regards

Laslo

Reply via email to